메뉴 건너뛰기




Volumn 2015-January, Issue , 2015, Pages 1086-1090

Modeling speaker variability using long short-term memory networks for speech recognition

Author keywords

Deep neural networks; Dvector; I vector; Long short term memory; Speaker adaptation; Speech recognition

Indexed keywords

BRAIN; RECURRENT NEURAL NETWORKS; SPEECH; SPEECH COMMUNICATION; VECTORS;

EID: 84959173377     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (26)

References (43)
  • 1
    • 0029288633 scopus 로고
    • Maximum likelihood linear re-gression for speaker adaptation of continuous density hiddenmarkov models
    • C. Leggetter and P. Woodland, "Maximum likelihood linear re-gression for speaker adaptation of continuous density hiddenmarkov models, " Computer Speech and Language, vol. 9, pp. 171-185, 1995.
    • (1995) Computer Speech and Language , vol.9 , pp. 171-185
    • Leggetter, C.1    Woodland, P.2
  • 2
    • 0028419019 scopus 로고
    • Maximum a posteriori estimationfor multivariate Gaussian mixture observations of markov chains
    • J. Gauvain and C.-H. Lee, "Maximum a posteriori estimationfor multivariate Gaussian mixture observations of markov chains, "IEEE Trans. Audio Speech Lang. Processing, vol. 2, pp. 291-298, 1994.
    • (1994) IEEE Trans. Audio Speech Lang. Processing , vol.2 , pp. 291-298
    • Gauvain, J.1    Lee, C.-H.2
  • 3
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for hmm-based speech recognition
    • M. Gales, "Maximum likelihood linear transformations for hmm-based speech recognition, " Computer Speech and Language, vol. 12, pp. 75-98, 1998.
    • (1998) Computer Speech and Language , vol.12 , pp. 75-98
    • Gales, M.1
  • 4
    • 0029747183 scopus 로고    scopus 로고
    • Speaker normalization using effecient fre-quency warping procedures
    • L. Lee and R. Rose, "Speaker normalization using effecient fre-quency warping procedures, " in ICASSP, 1996.
    • (1996) ICASSP
    • Lee, L.1    Rose, R.2
  • 5
    • 84890542079 scopus 로고    scopus 로고
    • Kl-divergenceregularized deep neural network adaptation for improved largevocabulary speech recognition
    • D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "Kl-divergenceregularized deep neural network adaptation for improved largevocabulary speech recognition, " in ICASSP, 2013.
    • (2013) ICASSP
    • Yu, D.1    Yao, K.2    Su, H.3    Li, G.4    Seide, F.5
  • 6
    • 33947635130 scopus 로고    scopus 로고
    • Regularized adaptation of discriminativeclassifiers
    • J. Li and J. Bilmes, "Regularized adaptation of discriminativeclassifiers, " in ICASSP, 2006.
    • (2006) ICASSP
    • Li, J.1    Bilmes, J.2
  • 8
    • 34548012893 scopus 로고    scopus 로고
    • Linear hidden transformations for adaptation of hybrid ann/hmmmodels
    • R. Gemello, F. Mana, S. Scanzio, P. Laface, and R. DeMori, "Linear hidden transformations for adaptation of hybrid ann/hmmmodels, " Speech Communication, vol. 49, pp. 827-835, 2007.
    • (2007) Speech Communication , vol.49 , pp. 827-835
    • Gemello, R.1    Mana, F.2    Scanzio, S.3    Laface, P.4    DeMori, R.5
  • 9
    • 79959849500 scopus 로고    scopus 로고
    • Comparison of discriminative input and out-put transformations for speaker adaptation in the hybrid nn/hmmsystem
    • B. Li and K. C. Sim, "Comparison of discriminative input and out-put transformations for speaker adaptation in the hybrid nn/hmmsystem, " in Interspeech, 2010.
    • (2010) Interspeech
    • Li, B.1    Sim, K.C.2
  • 10
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineeringin context-dependent deep neural networks for conversationalspeech transcription
    • F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineeringin context-dependent deep neural networks for conversationalspeech transcription, " in ASRU, 2011, pp. 24-29.
    • (2011) ASRU , pp. 24-29
    • Seide, F.1    Li, G.2    Chen, X.3    Yu, D.4
  • 12
    • 84890452886 scopus 로고    scopus 로고
    • Fast speaker adaptation of hybirdnn/hmm model for speech recognition based on discriminativelearning of speaker code
    • O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybirdnn/hmm model for speech recognition based on discriminativelearning of speaker code, " in ICASSP, 2013, pp. 7942-7946.
    • (2013) ICASSP , pp. 7942-7946
    • Abdel-Hamid, O.1    Jiang, H.2
  • 13
    • 84921731072 scopus 로고    scopus 로고
    • Fastadaptation of deep neural network based on discriminative codesfor speech recognition
    • S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q. Liu, "Fastadaptation of deep neural network based on discriminative codesfor speech recognition, " IEEE/ACM Trans. Audio Speech Lang. Processing, vol. 22, pp. 1713-1725, 2014.
    • (2014) IEEE/ACM Trans. Audio Speech Lang. Processing , vol.22 , pp. 1713-1725
    • Xue, S.1    Abdel-Hamid, O.2    Jiang, H.3    Dai, L.4    Liu, Q.5
  • 14
    • 84893691530 scopus 로고    scopus 로고
    • Speakeradaptin of neural network acoustic models using i-vectors
    • G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speakeradaptin of neural network acoustic models using i-vectors, " inASRU, 2013, pp. 55-59.
    • (2013) ASRU , pp. 55-59
    • Saon, G.1    Soltau, H.2    Nahamoo, D.3    Picheny, M.4
  • 15
    • 84905259138 scopus 로고    scopus 로고
    • Improving dnn speaker indepen-dence with i-vector inputs
    • A. Senior and I. Lopez-Moreno, "Improving dnn speaker indepen-dence with i-vector inputs, " in ICASSP, 2014, pp. 225-229.
    • (2014) ICASSP , pp. 225-229
    • Senior, A.1    Lopez-Moreno, I.2
  • 17
    • 84905281466 scopus 로고    scopus 로고
    • Text-dependent speaker verification using deep neural networks
    • E. Variani, X. Lei, E. McDermott, and I. Lopez-Moreno, "Text-dependent speaker verification using deep neural networks, " inICASSP, 2014.
    • (2014) ICASSP
    • Variani, E.1    Lei, X.2    McDermott, E.3    Lopez-Moreno, I.4
  • 20
    • 84910047819 scopus 로고    scopus 로고
    • Tts synthesis with bidi-rectional lstm based recurrent neural networks
    • Y. Fan, Y. Qian, F. Xie, and F. Soong, "Tts synthesis with bidi-rectional lstm based recurrent neural networks, " in Interspeech, 2014, pp. 1964-1948.
    • (2014) Interspeech , pp. 1964-1948
    • Fan, Y.1    Qian, Y.2    Xie, F.3    Soong, F.4
  • 21
    • 84910072596 scopus 로고    scopus 로고
    • Automatic language identification using long short-term memoryrecurrent neural networks
    • J. Gonzalez-Dominguez, I. Lopez-Moreno, H. Sak, and et al., "Automatic language identification using long short-term memoryrecurrent neural networks, " in Interspeech, 2014, pp. 2155-2159.
    • (2014) Interspeech , pp. 2155-2159
    • Gonzalez-Dominguez, J.1    Lopez-Moreno, I.2    Sak, H.3
  • 22
    • 74949119753 scopus 로고    scopus 로고
    • How the human brain recognizes speech in the context of chang-ing speakers
    • K. Kriegstein, D. Smith, R. Patterson, K. S. J., and T. Griffiths, "How the human brain recognizes speech in the context of chang-ing speakers, " The Journel of Neuroscience, vol. 30, pp. 629-638, 2010.
    • (2010) The Journel of Neuroscience , vol.30 , pp. 629-638
    • Kriegstein, K.1    Smith, D.2    Patterson, R.3    Griffiths, T.4
  • 23
    • 84863380535 scopus 로고    scopus 로고
    • Unsupervised featurelearning for audio classification using convolutional deep beliefnetworks
    • H. Lee, Y. Largman, P. Pham, and A. Ng, "Unsupervised featurelearning for audio classification using convolutional deep beliefnetworks, " in NIPS, 2009.
    • (2009) NIPS
    • Lee, H.1    Largman, Y.2    Pham, P.3    Ng, A.4
  • 26
    • 0034293152 scopus 로고    scopus 로고
    • Learning to forget: Continual prediction with lstm
    • F. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with lstm, " Neural Computation, vol. 12, pp. 2451-2471, 2000.
    • (2000) Neural Computation , vol.12 , pp. 2451-2471
    • Gers, F.1    Schmidhuber, J.2    Cummins, F.3
  • 28
    • 84893701254 scopus 로고    scopus 로고
    • Hybrid speech recogni-tion with deep bidirectional lstm
    • A. Graves, N. Jaitly, and A. Mohamed, "Hybrid speech recogni-tion with deep bidirectional lstm, " in ASRU, 2013, pp. 273-278.
    • (2013) ASRU , pp. 273-278
    • Graves, A.1    Jaitly, N.2    Mohamed, A.3
  • 29
    • 84910056633 scopus 로고    scopus 로고
    • Ro-bust speech recognition using long short-term memory recurrentneural networks for hybrid acoustic modelling
    • J. Geiger, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, "Ro-bust speech recognition using long short-term memory recurrentneural networks for hybrid acoustic modelling, " in Interspeech, 2014, pp. 631-635.
    • (2014) Interspeech , pp. 631-635
    • Geiger, J.1    Zhang, Z.2    Weninger, F.3    Schuller, B.4    Rigoll, G.5
  • 30
    • 84976236057 scopus 로고    scopus 로고
    • Preliminary investigation of boltz-mann machine classifiers for speaker recognition
    • J. Xue, J. Li, and Y. Gong, "Preliminary investigation of boltz-mann machine classifiers for speaker recognition, " in Interspeech, 2013.
    • (2013) Interspeech
    • Xue, J.1    Li, J.2    Gong, Y.3
  • 31
    • 84910031119 scopus 로고    scopus 로고
    • Towards speaker adaptivetraining of deep neural network acoustic models
    • Y. Miao, H. Zhang, and F. Metze, "Towards speaker adaptivetraining of deep neural network acoustic models, " in Interspeech, 2014.
    • (2014) Interspeech
    • Miao, Y.1    Zhang, H.2    Metze, F.3
  • 32
    • 0031189914 scopus 로고    scopus 로고
    • Multitask learning: A knowledge-based source ofinductive bias
    • R. Caruana, "Multitask learning: A knowledge-based source ofinductive bias, " Machine Learning, vol. 28, pp. 41-75, 1997.
    • (1997) Machine Learning , vol.28 , pp. 41-75
    • Caruana, R.1
  • 33
    • 84890545600 scopus 로고    scopus 로고
    • Multi-task learning in deep neuralnetworks for improved phoneme recognition
    • M. Seltzer and J. Droppo, "Multi-task learning in deep neuralnetworks for improved phoneme recognition, " in ICASSP, 2013, pp. 6965-6969.
    • (2013) ICASSP , pp. 6965-6969
    • Seltzer, M.1    Droppo, J.2
  • 34
    • 77249131724 scopus 로고    scopus 로고
    • Hkust/mts: A very large scale mand arin telephone speech cor-pus
    • Y. Liu, P. Fung, Y. Yang, C. Cieri, S. Huang, and D. Graff, "Hkust/mts: A very large scale mand arin telephone speech cor-pus, " in ISCSLP, 2006, pp. 724-735.
    • (2006) ISCSLP , pp. 724-735
    • Liu, Y.1    Fung, P.2    Yang, Y.3    Cieri, C.4    Huang, S.5    Graff, D.6
  • 35
    • 84946062764 scopus 로고    scopus 로고
    • Margin-based discriminative pronun-ciation modeling for large vocabulary mand arin speech recogni-tion
    • Y. Liu, X. Li, and X. Wu, "Margin-based discriminative pronun-ciation modeling for large vocabulary mand arin speech recogni-tion, " in SLT, 2014.
    • (2014) SLT
    • Liu, Y.1    Li, X.2    Wu, X.3
  • 36
    • 84946083498 scopus 로고    scopus 로고
    • Constructing long short-term memory baseddeep recurrent neural network for large vocabulary speech recog-nition
    • X. Li and X. Wu, "Constructing long short-term memory baseddeep recurrent neural network for large vocabulary speech recog-nition, " in ICASSP, 2015.
    • (2015) ICASSP
    • Li, X.1    Wu, X.2
  • 37
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recog-nition
    • G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recog-nition, " IEEE Trans. Audio Speech Lang. Processing, vol. 20, pp. 30-42, 2012.
    • (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , pp. 30-42
    • Dahl, G.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 38
  • 39
    • 84890512601 scopus 로고    scopus 로고
    • Asynchronousstochastic gradient descent for dnn training
    • S. Zhang, C. Zhang, Z. You, R. Zheng, and B. Xu, "Asynchronousstochastic gradient descent for dnn training, " in ICASSP, 2013, pp. 6660-6663.
    • (2013) ICASSP , pp. 6660-6663
    • Zhang, S.1    Zhang, C.2    You, Z.3    Zheng, R.4    Xu, B.5
  • 40
    • 0001609567 scopus 로고
    • An efficient gradient-based algorithmfor online training of recurrent neural network trajectories
    • R. Williams and J. Peng, "An efficient gradient-based algorithmfor online training of recurrent neural network trajectories, " Neu-ral Computation, vol. 2, pp. 490-501, 1990.
    • (1990) Neu-ral Computation , vol.2 , pp. 490-501
    • Williams, R.1    Peng, J.2
  • 43
    • 84946042568 scopus 로고    scopus 로고
    • Speech recognitionwith prediction-adaptation-correction recurrent neural networks
    • Y. Zhang, D. Yu, M. Seltzer, and J. Droppo, "Speech recognitionwith prediction-adaptation-correction recurrent neural networks, "in ICASSP, 2015.
    • (2015) ICASSP
    • Zhang, Y.1    Yu, D.2    Seltzer, M.3    Droppo, J.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.