SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 2189-2193

Towards speaker adaptive training of deep neural network acoustic models

(3) Miao, Yajie a Zhang, Hao a Metze, Florian a

a Carnegie Mellon University (United States)

Author keywords

Automatic speech recognition; Deep neural networks; Speaker adaptive training

Indexed keywords

SPEECH COMMUNICATION; VECTOR SPACES;

ACOUSTIC MODEL; AUTOMATIC SPEECH RECOGNITION; DEEP NEURAL NETWORKS; FEATURE MAPPING; FEATURE SPACE; SPEAKER ADAPTATION; SPEAKER ADAPTIVE TRAININGS; WORD ERROR RATE REDUCTIONS;

SPEECH RECOGNITION;

EID: 84910031119 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (87)

References (32)

1
- 84055222005
- Context dependent pre-trained deep neural networks for large vocabulary speech recognition
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Context dependent pre-trained deep neural networks for large vocabulary speech recognition, " IEEE Transactions on Audio, Speech and Language Processing, vol. 20(1), pp. 30-42, 2012.
- (2012) IEEE Transactions on Audio, Speech and Language Processing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

2
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Proc. ASRU, pp. 24-29, 2011.
- (2011) Proc. ASRU , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

3
- 0032050110
- Maximum likelihood linear transformations for hmm-based speech recognition
- M. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition, " Computer Speech and Language, vol. 12, pp. 75-98, 1998.
- (1998) Computer Speech and Language , vol.12 , pp. 75-98
- Gales, M.¹

4
- 79959849500
- Comparison of discriminative input and output transformations for speaker adaptation in the hybrid nn/hmm systems
- B. Li, and K. C. Sim, "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems, " in Proc. Inter speech, pp. 526-529, 2010.
- (2010) Proc. Inter Speech , pp. 526-529
- Li, B.¹ Sim, K.C.²

5
- 84874226579
- Adaptation of context-dependent deep neural networks for automatic speech recognition
- K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong, "Adaptation of context-dependent deep neural networks for automatic speech recognition, " in Proc. IEEE Spoken Language Technology Workshop, pp. 366-369, 2012.
- (2012) Proc. IEEE Spoken Language Technology Workshop , pp. 366-369
- Yao, K.¹ Yu, D.² Seide, F.³ Su, H.⁴ Deng, L.⁵ Gong, Y.⁶

6
- 84878606732
- Hermitian-based hidden activation functions for adaptation of hybrid hmm/ann models
- S. M. Siniscalchi, J. Li, and C.-H. Lee, "Hermitian-based hidden activation functions for adaptation of hybrid HMM/ANN models, " in Proc. Inter speech, pp. 526-529, 2012.
- (2012) Proc. Inter Speech , pp. 526-529
- Siniscalchi, S.M.¹ Li, J.² Lee, C.-H.³

7
- 84906241049
- Improved feature processing for deep neural networks
- S. P. Rath, D. Povey, K. Vesely, and J. Cernocky, "Improved feature processing for deep neural networks, " in Proc. Inter speech, 2013.
- (2013) Proc. Inter Speech
- Rath, S.P.¹ Povey, D.² Vesely, K.³ Cernocky, J.⁴

8
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors, " in Proc. ASRU, pp. 55-59, 2013.
- (2013) Proc. ASRU , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

9
- 70450180849
- Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification
- N. Dehak, R. Dehak, P. Kenny, N. Brummer, P. Ouellet, and P. Dumouchel, "Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification, " in Proc. Inter speech, pp. 1559- 1562, 2009.
- (2009) Proc. Inter Speech , pp. 1559-1562
- Dehak, N.¹ Dehak, R.² Kenny, P.³ Brummer, N.⁴ Ouellet, P.⁵ Dumouchel, P.⁶

10
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification, " IEEE Transactions on Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
- (2011) IEEE Transactions on Audio, Speech and Language Processing , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.J.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

11
- 80051634401
- Simplification and optimization of i-vector extraction
- O. Glembek, L. Burget, P. Matejka, M. Karafiat, and P. Kenny, "Simplification and optimization of i-vector extraction, " in Proc. ICASSP, pp. 4516-4519, 2011.
- (2011) Proc. ICASSP , pp. 4516-4519
- Glembek, O.¹ Burget, L.² Matejka, P.³ Karafiat, M.⁴ Kenny, P.⁵

12
- 0030677475
- Speaker adaptive training: A maximum likelihood approach to speaker normalization
- T. Anastasakos, J. McDonough, and J. Makhoul, "Speaker adaptive training: A maximum likelihood approach to speaker normalization, " in Proc. ICASSP, pp. 1043-1046, 1997.
- (1997) Proc. ICASSP , pp. 1043-1046
- Anastasakos, T.¹ McDonough, J.² Makhoul, J.³

13
- 84890452886
- Fast speaker adaptation of hybrid nn/hmm model for speech recognition based on discriminative learning of speaker code
- O. Abdel-Hamid, and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code, " in Proc. ICASSP, pp. 7942-7946, 2013.
- (2013) Proc. ICASSP , pp. 7942-7946
- Abdel-Hamid, O.¹ Jiang, H.²

14
- 84906225505
- Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
- O. Abdel-Hamid, and H. Jiang, "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition, " in Proc. Inter speech, 2013.
- (2013) Proc. Inter Speech
- Abdel-Hamid, O.¹ Jiang, H.²

15
- 58349106697
- A study of inter speaker variability in speaker verification
- P. Kenny, P. Ouellet, N. Dehak, V. Gupta, and P. Dumouchel, "A study of interspeaker variability in speaker verification, " IEEE Transactions on Audio, Speech and Language Processing, vol. 16, no. 5, pp. 980- 988, 2008.
- (2008) IEEE Transactions on Audio, Speech and Language Processing , vol.16 , Issue.5 , pp. 980-988
- Kenny, P.¹ Ouellet, P.² Dehak, N.³ Gupta, V.⁴ Dumouchel, P.⁵

16
- 84858984756
- Ivector-based discriminative adaptation for automatic speech recognition
- M. Karafiat, L. Burget, P. Matejka, O. Glembek, and J. Cernocky, "iVector-based discriminative adaptation for automatic speech recognition, " in Proc. ASRU, pp. 152- 157, 2011.
- (2011) Proc. ASRU , pp. 152-157
- Karafiat, M.¹ Burget, L.² Matejka, P.³ Glembek, O.⁴ Cernocky, J.⁵

17
- 33646788786
- Fmpe: Discriminatively trained features for speech recognition
- D. Povey, B. Kingsbury, L. Mangu, G. Saon, H. Soltau, and G. Zweig, "fMPE: discriminatively trained features for speech recognition, " in Proc. ICASSP, pp. 961-964, 2005.
- (2005) Proc. ICASSP , pp. 961-964
- Povey, D.¹ Kingsbury, B.² Mangu, L.³ Saon, G.⁴ Soltau, H.⁵ Zweig, G.⁶

18
- 84890483211
- Learning discriminative basis coefficients for eigen space mllr unsupervised adaptation
- Y. Miao, F. Metze, and A. Waibel, "Learning discriminative basis coefficients for eigen space MLLR unsupervised adaptation, " in Proc. ICASSP, pp. 7927- 7931, 2013.
- (2013) Proc. ICASSP , pp. 7927-7931
- Miao, Y.¹ Metze, F.² Waibel, A.³

19
- 44949102463
- Recent progress on the discriminative region-dependent transform for speech feature extraction
- B. Zhang, S. Matsoukas, and R. Schwartz, "Recent progress on the discriminative region-dependent transform for speech feature extraction, " in Proc. Inter speech, 2006.
- (2006) Proc. Inter Speech
- Zhang, B.¹ Matsoukas, S.² Schwartz, R.³

20
- 84867605836
- Applying convolutional neural networks concepts to hybrid nn-hmm model for speech recognition
- O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " in Proc. ICASSP, pp. 4277-4280, 2012.
- (2012) Proc. ICASSP , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.² Jiang, H.³ Penn, G.⁴

21
- 84890525984
- Deep convolutional neural networks for lvcsr
- T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran, "Deep convolutional neural networks for LVCSR, " in Proc. ICASSP, pp. 8614-8618, 2013.
- (2013) Proc. ICASSP , pp. 8614-8618
- Sainath, T.N.¹ Mohamed, A.² Kingsbury, B.³ Ramabhadran, B.⁴

22
- 84858953642
- The kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, et al., "The Kaldi speech recognition toolkit, " in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Povey, D.¹ Ghoshal, A.²

23
- 85084012167
- Alize/spkdet: A state-of-the-art open source software for speaker recognition
- J.-F. Bonastre, N. Scheffer, D. Matrouf, C. Fredouille, A. Larcher, A. Preti, G. Pouchoulin, N. Evans, B. Fauve, and J. Mason, "ALIZE/SpkDet: A state-of-the-art open source software for speaker recognition, " in Proc. ISCA/IEEE Speaker Odyssey 2008.
- (2008) Proc. ISCA/IEEE Speaker Odyssey
- Bonastre, J.-F.¹ Scheffer, N.² Matrouf, D.³ Fredouille, C.⁴ Larcher, A.⁵ Preti, A.⁶ Pouchoulin, G.⁷ Evans, N.⁸ Fauve, B.⁹ Mason, J.¹⁰

24
- 84910038371
- arXiv:1401.6984
- Y. Miao, "Kaldi+PDNN: building DNN-based ASR systems with Kaldi and PDNN, " arXiv:1401.6984, 2014.
- (2014) Kaldi+PDNN: Building DNN-based ASR Systems with Kaldi and PDNN
- Miao, Y.¹

25
- 79551480483
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
- P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P. Manzagol, "Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion, " Journal of Machine Learning Research, vol. 11, pp. 3371-3408, 2010.
- (2010) Journal of Machine Learning Research , vol.11 , pp. 3371-3408
- Vincent, P.¹ Larochelle, H.² Lajoie, I.³ Bengio, Y.⁴ Manzagol, P.⁵

26
- 84890482429
- Extracting deep bottleneck features using stacked auto encoders
- J. Gehring, Y. Miao, F. Metze, and A. Waibel, "Extracting deep bottleneck features using stacked auto encoders, " in Proc. ICASSP, 2013.
- (2013) Proc. ICASSP
- Gehring, J.¹ Miao, Y.² Metze, F.³ Waibel, A.⁴

27
- 84906273176
- Modular combination of deep neural networks for acoustic modeling
- J. Gehring, W. Lee, K. Kilgour, I. Lane, Y. Miao, and A. Waibel, "Modular combination of deep neural networks for acoustic modeling, " in Proc. Inter speech, pp. 94-98, 2013.
- (2013) Proc. Inter Speech , pp. 94-98
- Gehring, J.¹ Lee, W.² Kilgour, K.³ Lane, I.⁴ Miao, Y.⁵ Waibel, A.⁶

28
- 84893701756
- Deep maxout networks for low-resource speech recognition
- Y. Miao, F. Metze, and S. Rawat, "Deep maxout networks for low-resource speech recognition, " in Proc. ASRU, pp. 398-403, 2013.
- (2013) Proc. ASRU , pp. 398-403
- Miao, Y.¹ Metze, F.² Rawat, S.³

29
- 84906283232
- Using conversational word bursts in spoken term detection
- J. Chiu, and A. Rudnicky, "Using conversational word bursts in spoken term detection, " in Proc. Inter speech, 2013.
- (2013) Proc. Inter Speech
- Chiu, J.¹ Rudnicky, A.²

30
- 84906273501
- Improving low-resource cddnn- hmm using dropout and multilingual dnn training
- Y. Miao, and F. Metze, "Improving low-resource CDDNN- HMM using dropout and multilingual DNN training, " in Proc. Inter speech, pp. 2237-2241, 2013.
- (2013) Proc. Inter Speech , pp. 2237-2241
- Miao, Y.¹ Metze, F.²

31
- 84910068044
- Distributed learning of multilingual dnn feature extractors using gpus
- to appear
- Y. Miao, H. Zhang, and F. Metze, "Distributed learning of multilingual DNN feature extractors using GPUs, " to appear in Proc. Inter speech, 2014.
- (2014) Proc. Inter Speech
- Miao, Y.¹ Zhang, H.² Metze, F.³

32
- 84910028405
- Improving language-universal feature extraction with deep maxout and convolutional neural networks
- to appear
- Y. Miao, and F. Metze, "Improving language-universal feature extraction with deep maxout and convolutional neural networks, " to appear in Proc. Inter speech, 2014.
- (2014) Proc. Inter Speech
- Miao, Y.¹ Metze, F.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.