SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 1086-1090

Modeling speaker variability using long short-term memory networks for speech recognition

(2) Li, Xiangang a Wu, Xihong a

a Peking University (China)

Author keywords

Deep neural networks; Dvector; I vector; Long short term memory; Speaker adaptation; Speech recognition

Indexed keywords

BRAIN; RECURRENT NEURAL NETWORKS; SPEECH; SPEECH COMMUNICATION; VECTORS;

DEEP NEURAL NETWORKS; DVECTOR; I VECTORS; LONG SHORT TERM MEMORY; SPEAKER ADAPTATION;

SPEECH RECOGNITION;

EID: 84959173377 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (26)

References (43)

1
- 0029288633
- Maximum likelihood linear re-gression for speaker adaptation of continuous density hiddenmarkov models
- C. Leggetter and P. Woodland, "Maximum likelihood linear re-gression for speaker adaptation of continuous density hiddenmarkov models, " Computer Speech and Language, vol. 9, pp. 171-185, 1995.
- (1995) Computer Speech and Language , vol.9 , pp. 171-185
- Leggetter, C.¹ Woodland, P.²

2
- 0028419019
- Maximum a posteriori estimationfor multivariate Gaussian mixture observations of markov chains
- J. Gauvain and C.-H. Lee, "Maximum a posteriori estimationfor multivariate Gaussian mixture observations of markov chains, "IEEE Trans. Audio Speech Lang. Processing, vol. 2, pp. 291-298, 1994.
- (1994) IEEE Trans. Audio Speech Lang. Processing , vol.2 , pp. 291-298
- Gauvain, J.¹ Lee, C.-H.²

3
- 0032050110
- Maximum likelihood linear transformations for hmm-based speech recognition
- M. Gales, "Maximum likelihood linear transformations for hmm-based speech recognition, " Computer Speech and Language, vol. 12, pp. 75-98, 1998.
- (1998) Computer Speech and Language , vol.12 , pp. 75-98
- Gales, M.¹

4
- 0029747183
- Speaker normalization using effecient fre-quency warping procedures
- L. Lee and R. Rose, "Speaker normalization using effecient fre-quency warping procedures, " in ICASSP, 1996.
- (1996) ICASSP
- Lee, L.¹ Rose, R.²

5
- 84890542079
- Kl-divergenceregularized deep neural network adaptation for improved largevocabulary speech recognition
- D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "Kl-divergenceregularized deep neural network adaptation for improved largevocabulary speech recognition, " in ICASSP, 2013.
- (2013) ICASSP
- Yu, D.¹ Yao, K.² Su, H.³ Li, G.⁴ Seide, F.⁵

6
- 33947635130
- Regularized adaptation of discriminativeclassifiers
- J. Li and J. Bilmes, "Regularized adaptation of discriminativeclassifiers, " in ICASSP, 2006.
- (2006) ICASSP
- Li, J.¹ Bilmes, J.²

7
- 84937854847
- Speaker-adaptation for hybrid hmm-anncontinuous speech recognition system
- J. Neto, L. Almeida, M. Hochberg, C. Martins, L. Nunes, S. Re-nals, and T. Robinson, "Speaker-adaptation for hybrid hmm-anncontinuous speech recognition system, " in Eurospeech, 1995.
- (1995) Eurospeech
- Neto, J.¹ Almeida, L.² Hochberg, M.³ Martins, C.⁴ Nunes, L.⁵ Re-Nals, S.⁶ Robinson, T.⁷

8
- 34548012893
- Linear hidden transformations for adaptation of hybrid ann/hmmmodels
- R. Gemello, F. Mana, S. Scanzio, P. Laface, and R. DeMori, "Linear hidden transformations for adaptation of hybrid ann/hmmmodels, " Speech Communication, vol. 49, pp. 827-835, 2007.
- (2007) Speech Communication , vol.49 , pp. 827-835
- Gemello, R.¹ Mana, F.² Scanzio, S.³ Laface, P.⁴ DeMori, R.⁵

9
- 79959849500
- Comparison of discriminative input and out-put transformations for speaker adaptation in the hybrid nn/hmmsystem
- B. Li and K. C. Sim, "Comparison of discriminative input and out-put transformations for speaker adaptation in the hybrid nn/hmmsystem, " in Interspeech, 2010.
- (2010) Interspeech
- Li, B.¹ Sim, K.C.²

10
- 84858976070
- Feature engineeringin context-dependent deep neural networks for conversationalspeech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineeringin context-dependent deep neural networks for conversationalspeech transcription, " in ASRU, 2011, pp. 24-29.
- (2011) ASRU , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

11
- 84923929378
- Springer
- D. Yu and L. Deng, Automatic Speech Recognition-A DeepLearning Approach. Springer, 2014.
- (2014) Automatic Speech Recognition-A DeepLearning Approach
- Yu, D.¹ Deng, L.²

12
- 84890452886
- Fast speaker adaptation of hybirdnn/hmm model for speech recognition based on discriminativelearning of speaker code
- O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybirdnn/hmm model for speech recognition based on discriminativelearning of speaker code, " in ICASSP, 2013, pp. 7942-7946.
- (2013) ICASSP , pp. 7942-7946
- Abdel-Hamid, O.¹ Jiang, H.²

13
- 84921731072
- Fastadaptation of deep neural network based on discriminative codesfor speech recognition
- S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q. Liu, "Fastadaptation of deep neural network based on discriminative codesfor speech recognition, " IEEE/ACM Trans. Audio Speech Lang. Processing, vol. 22, pp. 1713-1725, 2014.
- (2014) IEEE/ACM Trans. Audio Speech Lang. Processing , vol.22 , pp. 1713-1725
- Xue, S.¹ Abdel-Hamid, O.² Jiang, H.³ Dai, L.⁴ Liu, Q.⁵

14
- 84893691530
- Speakeradaptin of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speakeradaptin of neural network acoustic models using i-vectors, " inASRU, 2013, pp. 55-59.
- (2013) ASRU , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

15
- 84905259138
- Improving dnn speaker indepen-dence with i-vector inputs
- A. Senior and I. Lopez-Moreno, "Improving dnn speaker indepen-dence with i-vector inputs, " in ICASSP, 2014, pp. 225-229.
- (2014) ICASSP , pp. 225-229
- Senior, A.¹ Lopez-Moreno, I.²

16
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification, " IEEE Trans. Audio, Speech Lang. Process, vol. 19, pp. 788-798, 2011.
- (2011) IEEE Trans. Audio, Speech Lang. Process , vol.19 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

17
- 84905281466
- Text-dependent speaker verification using deep neural networks
- E. Variani, X. Lei, E. McDermott, and I. Lopez-Moreno, "Text-dependent speaker verification using deep neural networks, " inICASSP, 2014.
- (2014) ICASSP
- Variani, E.¹ Lei, X.² McDermott, E.³ Lopez-Moreno, I.⁴

18
- 64849110608
- A novelconnnectionist system for unconstrained hand writing recogni-tion
- A. Graves, M. Liwichi, S. Fernánd ez, and et al., "A novelconnnectionist system for unconstrained hand writing recogni-tion, " IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 31, pp. 855-868, 2009.
- (2009) IEEE Trans. on Pattern Analysis and Machine Intelligence , vol.31 , pp. 855-868
- Graves, A.¹ Liwichi, M.² Fernánd Ez, S.³

19
- 84908677215
- arXiv: 1402. 1128
- H. Sak, A. Senior, and F. Beaufays, "Long short-term memorybased recurrent neural network architectures for large vocabularyspeech recognition, " 2014, arXiv: 1402. 1128.
- (2014) Long Short-term Memorybased Recurrent Neural Network Architectures for Large Vocabularyspeech Recognition
- Sak, H.¹ Senior, A.² Beaufays, F.³

20
- 84910047819
- Tts synthesis with bidi-rectional lstm based recurrent neural networks
- Y. Fan, Y. Qian, F. Xie, and F. Soong, "Tts synthesis with bidi-rectional lstm based recurrent neural networks, " in Interspeech, 2014, pp. 1964-1948.
- (2014) Interspeech , pp. 1964-1948
- Fan, Y.¹ Qian, Y.² Xie, F.³ Soong, F.⁴

21
- 84910072596
- Automatic language identification using long short-term memoryrecurrent neural networks
- J. Gonzalez-Dominguez, I. Lopez-Moreno, H. Sak, and et al., "Automatic language identification using long short-term memoryrecurrent neural networks, " in Interspeech, 2014, pp. 2155-2159.
- (2014) Interspeech , pp. 2155-2159
- Gonzalez-Dominguez, J.¹ Lopez-Moreno, I.² Sak, H.³

22
- 74949119753
- How the human brain recognizes speech in the context of chang-ing speakers
- K. Kriegstein, D. Smith, R. Patterson, K. S. J., and T. Griffiths, "How the human brain recognizes speech in the context of chang-ing speakers, " The Journel of Neuroscience, vol. 30, pp. 629-638, 2010.
- (2010) The Journel of Neuroscience , vol.30 , pp. 629-638
- Kriegstein, K.¹ Smith, D.² Patterson, R.³ Griffiths, T.⁴

23
- 84863380535
- Unsupervised featurelearning for audio classification using convolutional deep beliefnetworks
- H. Lee, Y. Largman, P. Pham, and A. Ng, "Unsupervised featurelearning for audio classification using convolutional deep beliefnetworks, " in NIPS, 2009.
- (2009) NIPS
- Lee, H.¹ Largman, Y.² Pham, P.³ Ng, A.⁴

24
- 85073226083
- Preliminary investigation of boltzmann machine classifiers forspeaker recognition
- T. Stafylakis, P. Kenny, M. Senoussaoui, and P. Dumouchel, "Preliminary investigation of boltzmann machine classifiers forspeaker recognition, " in Odyssey Speaker and Language Recogni-tion Workshop, 2012.
- (2012) Odyssey Speaker and Language Recogni-tion Workshop
- Stafylakis, T.¹ Kenny, P.² Senoussaoui, M.³ Dumouchel, P.⁴

25
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schimidhuber, "Long short-term memory, "Neural Computation, vol. 9, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , pp. 1735-1780
- Hochreiter, S.¹ Schimidhuber, J.²

26
- 0034293152
- Learning to forget: Continual prediction with lstm
- F. Gers, J. Schmidhuber, and F. Cummins, "Learning to forget: Continual prediction with lstm, " Neural Computation, vol. 12, pp. 2451-2471, 2000.
- (2000) Neural Computation , vol.12 , pp. 2451-2471
- Gers, F.¹ Schmidhuber, J.² Cummins, F.³

27
- 0041965934
- Learning precisetiming with lstm recurrent networks
- F. Gers, N. Schraudolph, and J. Schmidhuber, "Learning precisetiming with lstm recurrent networks, " Journal of Machine Learn-ing Research, vol. 3, pp. 115-143, 2003.
- (2003) Journal of Machine Learn-ing Research , vol.3 , pp. 115-143
- Gers, F.¹ Schraudolph, N.² Schmidhuber, J.³

28
- 84893701254
- Hybrid speech recogni-tion with deep bidirectional lstm
- A. Graves, N. Jaitly, and A. Mohamed, "Hybrid speech recogni-tion with deep bidirectional lstm, " in ASRU, 2013, pp. 273-278.
- (2013) ASRU , pp. 273-278
- Graves, A.¹ Jaitly, N.² Mohamed, A.³

29
- 84910056633
- Ro-bust speech recognition using long short-term memory recurrentneural networks for hybrid acoustic modelling
- J. Geiger, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll, "Ro-bust speech recognition using long short-term memory recurrentneural networks for hybrid acoustic modelling, " in Interspeech, 2014, pp. 631-635.
- (2014) Interspeech , pp. 631-635
- Geiger, J.¹ Zhang, Z.² Weninger, F.³ Schuller, B.⁴ Rigoll, G.⁵

30
- 84976236057
- Preliminary investigation of boltz-mann machine classifiers for speaker recognition
- J. Xue, J. Li, and Y. Gong, "Preliminary investigation of boltz-mann machine classifiers for speaker recognition, " in Interspeech, 2013.
- (2013) Interspeech
- Xue, J.¹ Li, J.² Gong, Y.³

31
- 84910031119
- Towards speaker adaptivetraining of deep neural network acoustic models
- Y. Miao, H. Zhang, and F. Metze, "Towards speaker adaptivetraining of deep neural network acoustic models, " in Interspeech, 2014.
- (2014) Interspeech
- Miao, Y.¹ Zhang, H.² Metze, F.³

32
- 0031189914
- Multitask learning: A knowledge-based source ofinductive bias
- R. Caruana, "Multitask learning: A knowledge-based source ofinductive bias, " Machine Learning, vol. 28, pp. 41-75, 1997.
- (1997) Machine Learning , vol.28 , pp. 41-75
- Caruana, R.¹

33
- 84890545600
- Multi-task learning in deep neuralnetworks for improved phoneme recognition
- M. Seltzer and J. Droppo, "Multi-task learning in deep neuralnetworks for improved phoneme recognition, " in ICASSP, 2013, pp. 6965-6969.
- (2013) ICASSP , pp. 6965-6969
- Seltzer, M.¹ Droppo, J.²

34
- 77249131724
- Hkust/mts: A very large scale mand arin telephone speech cor-pus
- Y. Liu, P. Fung, Y. Yang, C. Cieri, S. Huang, and D. Graff, "Hkust/mts: A very large scale mand arin telephone speech cor-pus, " in ISCSLP, 2006, pp. 724-735.
- (2006) ISCSLP , pp. 724-735
- Liu, Y.¹ Fung, P.² Yang, Y.³ Cieri, C.⁴ Huang, S.⁵ Graff, D.⁶

35
- 84946062764
- Margin-based discriminative pronun-ciation modeling for large vocabulary mand arin speech recogni-tion
- Y. Liu, X. Li, and X. Wu, "Margin-based discriminative pronun-ciation modeling for large vocabulary mand arin speech recogni-tion, " in SLT, 2014.
- (2014) SLT
- Liu, Y.¹ Li, X.² Wu, X.³

36
- 84946083498
- Constructing long short-term memory baseddeep recurrent neural network for large vocabulary speech recog-nition
- X. Li and X. Wu, "Constructing long short-term memory baseddeep recurrent neural network for large vocabulary speech recog-nition, " in ICASSP, 2015.
- (2015) ICASSP
- Li, X.¹ Wu, X.²

37
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recog-nition
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recog-nition, " IEEE Trans. Audio Speech Lang. Processing, vol. 20, pp. 30-42, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

38
- 80052387251
- Asynchronous peer-to-peer data mining with stochastic gradient descent
- R. Ormánd i, I. Hegedüs, and M. Jelasity, "Asynchronous peer-to-peer data mining with stochastic gradient descent, " Lecture Notesin Computer Science, pp. 528-540, 2011.
- (2011) Lecture Notesin Computer Science , pp. 528-540
- Ormánd, I.R.¹ Hegedüs, I.² Jelasity, M.³

39
- 84890512601
- Asynchronousstochastic gradient descent for dnn training
- S. Zhang, C. Zhang, Z. You, R. Zheng, and B. Xu, "Asynchronousstochastic gradient descent for dnn training, " in ICASSP, 2013, pp. 6660-6663.
- (2013) ICASSP , pp. 6660-6663
- Zhang, S.¹ Zhang, C.² You, Z.³ Zheng, R.⁴ Xu, B.⁵

40
- 0001609567
- An efficient gradient-based algorithmfor online training of recurrent neural network trajectories
- R. Williams and J. Peng, "An efficient gradient-based algorithmfor online training of recurrent neural network trajectories, " Neu-ral Computation, vol. 2, pp. 490-501, 1990.
- (1990) Neu-ral Computation , vol.2 , pp. 490-501
- Williams, R.¹ Peng, J.²

41
- 84883153580
- arXiv: 1211. 5063
- R. Pascanu and Y. Bengio, "On the difficulty of training recurrentneural networks, " 2012, arXiv: 1211. 5063.
- (2012) On the Difficulty of Training Recurrentneural Networks
- Pascanu, R.¹ Bengio, Y.²

42
- 84890471125
- Onrectified linear units for speech processing
- M. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. Le, P. Nguyen, A. Senior, V. Vanhouche, J. Dean, and G. Hinton, "Onrectified linear units for speech processing, " in ICASSP, 2013, pp. 3517-3521.
- (2013) ICASSP , pp. 3517-3521
- Zeiler, M.¹ Ranzato, M.² Monga, R.³ Mao, M.⁴ Yang, K.⁵ Le, Q.⁶ Nguyen, P.⁷ Senior, A.⁸ Vanhouche, V.⁹ Dean, J.¹⁰ Hinton, G.¹¹

43
- 84946042568
- Speech recognitionwith prediction-adaptation-correction recurrent neural networks
- Y. Zhang, D. Yu, M. Seltzer, and J. Droppo, "Speech recognitionwith prediction-adaptation-correction recurrent neural networks, "in ICASSP, 2015.
- (2015) ICASSP
- Zhang, Y.¹ Yu, D.² Seltzer, M.³ Droppo, J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.