SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 2199-2203

Speaker dependent bottleneck layer training for Speaker adaptation in automatic speech recognition

(3) Doddipatla, Rama a Hasan, Madina a Hain, Thomas a

a UNIVERSITY OF SHEFFIELD (United Kingdom)

Author keywords

Automatic speech recognition; Bottleneck features; Deep neural networks; Speaker adaptation

Indexed keywords

LINEAR TRANSFORMATIONS; SPEECH COMMUNICATION; SPEECH PROCESSING;

AUTOMATIC SPEECH RECOGNITION; BOTTLENECK FEATURES; DEEP NEURAL NETWORKS; DISCRIMINATIVE FEATURES; GLOBAL TRANSFORMATION; MATRIX TRANSFORMATION; SPEAKER ADAPTATION; SPEAKER ADAPTIVE TRAININGS;

SPEECH RECOGNITION;

EID: 84910028538 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (19)

References (28)

1
- 78649297301
- Deep belief networks for phone recognition
- A. Mohamed, G. Dahl, and G. Hinton, "Deep belief networks for phone recognition, " in Proc. NIPS Workshop Deep Learning for Speech Recognition and Related Applications, 2009.
- (2009) Proc. NIPS Workshop Deep Learning for Speech Recognition and Related Applications
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

2
- 80051654263
- Deep belief networks using discriminative features for phone recognition
- A. Mohamed, T. N. Sainath, G. E. Dahl, B. Ramabhadran, G. E. Hinton, and M. Picheny, "Deep belief networks using discriminative features for phone recognition, " in Proc. ICASSP, 2011.
- (2011) Proc. ICASSP
- Mohamed, A.¹ Sainath, T.N.² Dahl, G.E.³ Ramabhadran, B.⁴ Hinton, G.E.⁵ Picheny, M.⁶

3
- 84055211743
- Acoustic modeling using deep belief networks
- Jan
- A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks, " IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 1, pp. 1422, Jan. 2012.
- (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.1
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

4
- 84865801985
- Conversational speech transcription using context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, " In Inter speech 2011.
- (2011) Inter Speech
- Seide, F.¹ Li, G.² Yu, D.³

5
- 84055222005
- Context-dependent pre-trained deep neural networks for large vocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large vocabulary speech recognition, " IEEE Trans. on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 3042, 2012.
- (2012) IEEE Trans. on Audio, Speech, and Language Processing , vol.20 , Issue.1
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

6
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition, " IEEE Signal Processing Magazine, 2012.
- (2012) IEEE Signal Processing Magazine
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

7
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models
- C. J. Leggetter and P. C.Woodland, "Maximum Likelihood Linear Regression for Speaker Adaptation of Continuous Density Hidden Markov Models, " Computer Speech & Language, vol. 9, no. 2, pp. 171 - 185, 1995.
- (1995) Computer Speech & Language , vol.9 , Issue.2 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

8
- 0029375590
- Speaker adaptation using constrained estimation of gaussian mixtures
- Sep
- V. V. Digalakis, D. Rtischev, and L. G. Neumeyer, "Speaker Adaptation using Constrained Estimation of Gaussian Mixtures, " IEEE Trans. on Speech and Audio Processing, vol. 3, no. 5, pp. 357 -366, Sep. 1995.
- (1995) IEEE Trans. on Speech and Audio Processing , vol.3 , Issue.5 , pp. 357-366
- Digalakis, V.V.¹ Rtischev, D.² Neumeyer, L.G.³

9
- 0032050110
- Maximum likelihood linear transformations for hmm-based speech recognition
- M. J. F. Gales, "Maximum Likelihood Linear Transformations for HMM-Based Speech Recognition, " Computer Speech and Language, vol. 12, no. 2, pp. 75-98, 1998.
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

10
- 0030362995
- A compact model for speaker-adaptive training
- Oct
- T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, "A Compact Model for Speaker-Adaptive Training, " in Proc. of ICSLP, pp. 1137-1140, Oct. 1996.
- (1996) Proc. of ICSLP , pp. 1137-1140
- Anastasakos, T.¹ McDonough, J.² Schwartz, R.³ Makhoul, J.⁴

11
- 84910030053
- Rec norm: Simultaneous normalisation and classification applied to speech recognition
- J. S. Bridle, and S. Cox, "Rec Norm: Simultaneous Normalisation and Classification Applied to Speech Recognition, " in NIPS, page 234-240, 1990.
- (1990) NIPS , pp. 234-240
- Bridle, J.S.¹ Cox, S.²

12
- 84937854847
- Speaker-adaptation for hybrid hmm-ann continuous speech recognition system
- J. P. Neto, L. B. Almeida, M. Hochberg, C. Martins, L. Nunes, S. Renals, and T. Robinson, "Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system, " in EURO SPEECH, 1995.
- (1995) Euro Speech
- Neto, J.P.¹ Almeida, L.B.² Hochberg, M.³ Martins, C.⁴ Nunes, L.⁵ Renals, S.⁶ Robinson, T.⁷

13
- 79959849500
- Comparison of discriminative input and output transformations for speaker adaptation in the hybrid nn/hmm systems
- B. Li, and K. C. Sim, "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems, " in INTER SPEECH, 2010.
- (2010) Inter Speech
- Li, B.¹ Sim, K.C.²

14
- 33947703156
- Adaptation of hybrid ann/hmm models using linear hidden transformations and conservative training
- R. Gemello, F. Mana, S. Scanzio, P. Laface, and R. D. Mori, "Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training, " in ICASSP, 2006.
- (2006) ICASSP
- Gemello, R.¹ Mana, F.² Scanzio, S.³ Laface, P.⁴ Mori, R.D.⁵

15
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in ASRU, 2011.
- (2011) ASRU
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

16
- 84890478625
- Adaptation of context-dependent deep neural networks for automatic speech recognition
- K. Yao, D. Yu F. Seide, H. Su, L. Deng and Y. Gong, "Adaptation of context-dependent deep neural networks for automatic speech recognition, " in IEEE SLT, 2012.
- (2012) IEEE SLT
- Yao, K.¹ Seide, D.Y.F.² Su, H.³ Deng, L.⁴ Gong, Y.⁵

17
- 84890509526
- Mlp-based factor analysis for tandem speech recognition
- M. Ferras and H. Bourlard, "MLP-based factor analysis for tandem speech recognition, " in ICASSP, 2013.
- (2013) ICASSP
- Ferras, M.¹ Bourlard, H.²

18
- 84890452886
- Fast speaker adaptation of hybrid nn/hmm model for speech recognition based on discriminative learning of speaker code
- O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code, " in ICASSP, 2013.
- (2013) ICASSP
- Abdel-Hamid, O.¹ Jiang, H.²

19
- 84906225505
- Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
- O. Abdel-Hamid and H. Jiang, "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition, " in INTERSPEECH, 2013.
- (2013) Inter Speech
- Abdel-Hamid, O.¹ Jiang, H.²

20
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Soan, H. Soltau, D. Nahamoo and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors, " in ASRU 2013.
- (2013) ASRU
- Soan, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

21
- 84905269643
- Using neural network front-ends on far field multiple microphones based speech recognition
- to apper
- Y. Liu, P. Zhang and T. Hain, "Using neural network front-ends on far field multiple microphones based speech recognition, " to apper ICASSP 2014.
- (2014) ICASSP
- Liu, Y.¹ Zhang, P.² Hain, T.³

22
- 33846265193
- The ami meeting corpus
- Edinburgh
- J. Carletta, S. Ashby, S. Bourban, M. Guillemot, M. Kronenthal, G. Lathoud, M. Lincoln, I. McCowan, T. Hain, W. Kraaij, W. Post, J. Kadlec, P.Wellner, M. Flynn, and D. Reidsma, "The AMI meeting corpus, " In Proc. MLMI05, Edinburgh, 2005.
- (2005) Proc. MLMI05
- Carletta, J.¹ Ashby, S.² Bourban, S.³ Guillemot, M.⁴ Kronenthal, M.⁵ Lathoud, G.⁶ Lincoln, M.⁷ McCowan, I.⁸ Hain, T.⁹ Kraaij, W.¹⁰ Post, W.¹¹ Kadlec, J.¹² Wellner, P.¹³ Flynn, M.¹⁴ Reidsma, D.¹⁵

23
- 47749084324
- The 2007 ami(da) system for meeting transcription
- T. Hain, L. Burget, J. Dines, G. Garau, M. Karafíat, D. A. van Leeuwen, M. Lincoln, V. Wan: The 2007 AMI(DA) System for Meeting Transcription. CLEAR 2007: 414-428.
- (2007) CLEAR , pp. 414-428
- Hain, T.¹ Burget, L.² Dines, J.³ Garau, G.⁴ Karafiát, M.⁵ Van Leeuwen, D.A.⁶ Lincoln, M.⁷ Wan, V.⁸

24
- 0141814662
- The icsi meeting corpus
- A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart, N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters. "The ICSI meeting corpus, " In ICASSP 2003.
- (2003) ICASSP
- Janin, A.¹ Baron, D.² Edwards, J.³ Ellis, D.⁴ Gelbart, D.⁵ Morgan, N.⁶ Peskin, B.⁷ Pfau, T.⁸ Shriberg, E.⁹ Stolcke, A.¹⁰ Wooters, C.¹¹

25
- 47749107522
- NIST: Rich transcription evaluations (2007- 2009), http://www.itl.nist.gov/iad/mig//tests/rt/.
- (2007) NIST: Rich Transcription Evaluations

26
- 0032289099
- Heteroscedastic discriminant analysis and reduced rank hmms for improved speech recognition
- N. Kumar and A. G. Andreou, "Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition, " Speech Communication, vol. 26, no. 4, pp. 283 297, 1998.
- (1998) Speech Communication , vol.26 , Issue.4 , pp. 283-297
- Kumar, N.¹ Andreou, A.G.²

27
- 84910039988
- TNet: Neural Network Trainer. http://speech.fit.vutbr.cz/software/neural-network-trainertnet.
- TNet: Neural Network Trainer

28
- 84873944678
- NIST, "Speech recognition scoring toolkit (SCTK), " http://www.nist.gov/speech/tools/.
- Speech Recognition Scoring Toolkit (SCTK)

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.