SCOPUS 정보 검색 플랫폼

2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Volumn , Issue , 2016, Pages 145-152

Learning factorized feature transforms for speaker normalization

(2) Samarakoon, Lahiru a Sim, Khe Chai a

a NATIONAL UNIVERSITY OF SINGAPORE (Singapore)

Author keywords

Automatic speech recognition; deep neural networks; speaker normalization

Indexed keywords

MAXIMUM LIKELIHOOD; MAXIMUM LIKELIHOOD ESTIMATION;

ACOUSTIC FEATURES; AUTOMATIC SPEECH RECOGNITION; CONSTRAINED MAXIMUM LIKELIHOOD LINEAR REGRESSIONS (CMLLR); DEEP NEURAL NETWORKS; FEATURE TRANSFORM; SPEAKER DEPENDENTS; SPEAKER NORMALIZATION; SPEAKER VARIABILITY;

SPEECH RECOGNITION;

EID: 84964489805 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ASRU.2015.7404787 Document Type: Conference Paper

Times cited : (7)

References (31)

1
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

2
- 0028419019
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains
- J. Gauvain and C.H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains," IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994
- (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.2 , pp. 291-298
- Gauvain, J.¹ Lee, C.H.²

3
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models," Computer Speech and Language, vol. 9, no. 2, pp. 171-186, 1995
- (1995) Computer Speech and Language , vol.9 , Issue.2 , pp. 171-186
- Leggetter, C.J.¹ Woodland, P.C.²

4
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D.P.W. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. ICASSP, 2000, pp. 1635-1638
- (2000) Proc. ICASSP , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

5
- 84890537527
- Multi-level adaptive networks in tandem and hybrid ASR systems
- P. Bell, P. Swietojanski, and S. Renals, "Multi-level adaptive networks in tandem and hybrid ASR systems," in Proc. ICASSP, 2013, pp. 6975-6979
- (2013) Proc. ICASSP , pp. 6975-6979
- Bell, P.¹ Swietojanski, P.² Renals, S.³

6
- 0025659256
- Continuous speech recognition using multilayer perceptrons with hidden markov models
- N. Morgan and H. Bourlard, "Continuous speech recognition using multilayer perceptrons with hidden markov models," in Proc. ICASSP, 1990, pp. 413-416
- (1990) Proc. ICASSP , pp. 413-416
- Morgan, N.¹ Bourlard, H.²

7
- 84890542079
- Kldivergence regularized deep neural network adaptation for improved large vocabulary speech recognition
- D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "Kldivergence regularized deep neural network adaptation for improved large vocabulary speech recognition," in Proc. ICASSP, 2013, pp. 7893-7897
- (2013) Proc. ICASSP , pp. 7893-7897
- Yu, D.¹ Yao, K.² Su, H.³ Li, G.⁴ Seide, F.⁵

8
- 84905259145
- Ivector-based speaker adaptation of deep neural networks for French broadcast audio transcription
- V. Gupta, P. Kenny, P. Ouellet, and T. Stafylakis, "Ivector-based speaker adaptation of deep neural networks for French broadcast audio transcription," in Proc. ICASSP, 2014, pp. 6334-6338
- (2014) Proc. ICASSP , pp. 6334-6338
- Gupta, V.¹ Kenny, P.² Ouellet, P.³ Stafylakis, T.⁴

9
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors," in Proc. ASRU, 2013, pp. 55-59
- (2013) Proc. ASRU , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

10
- 84890452886
- Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
- O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code," in Proc. ICASSP, 2013, pp. 7942-7946
- (2013) Proc. ICASSP , pp. 7942-7946
- Abdel-Hamid, O.¹ Jiang, H.²

11
- 80051639925
- Front end factor analysis for speaker verification
- N. Dehak, P. J. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front end factor analysis for speaker verification," IEEE Transactions on Audio, Speech and Language Processing, 2010
- (2010) IEEE Transactions on Audio, Speech and Language Processing
- Dehak, N.¹ Kenny, P.J.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

12
- 80051634401
- Simplification and optimization of i-vector extraction
- O. Glembek, L. Burget, P. Matejka, M. Karafiat, and P. Kenny, "Simplification and optimization of i-vector extraction," in Proc. ICASSP, 2011, pp. 4516-4519
- (2011) Proc. ICASSP , pp. 4516-4519
- Glembek, O.¹ Burget, L.² Matejka, P.³ Karafiat, M.⁴ Kenny, P.⁵

13
- 84937880519
- Connectionist speaker normalization and adaptation
- V. Abrash, H. Franco, A. Sankar, and M. Cohen, "Connectionist speaker normalization and adaptation," in Proc. Eurospeech, 1995, pp. 2183-2186
- (1995) Proc. Eurospeech , pp. 2183-2186
- Abrash, V.¹ Franco, H.² Sankar, A.³ Cohen, M.⁴

14
- 79959849500
- Comparison of discriminative input and output transformation for speaker adaptation in the hybrid nn/hmm systems
- B. Li and K. C. Sim, "Comparison of discriminative input and output transformation for speaker adaptation in the hybrid nn/hmm systems," in Proc. Interspeech, 2010, pp. 526-529
- (2010) Proc. Interspeech , pp. 526-529
- Li, B.¹ Sim, K.C.²

15
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, Gang Li, Xie Chen, and Dong Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. ASRU, 2011, pp. 24-29
- (2011) Proc. ASRU , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

16
- 33947703156
- Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training
- R. Gemello, F. Mana, S. Scanzio, P. Laface, and R. D. Mori, "Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training," in Proc. ICASSP, 2006, pp. 1189-1192
- (2006) Proc. ICASSP , pp. 1189-1192
- Gemello, R.¹ Mana, F.² Scanzio, S.³ Laface, P.⁴ Mori, R.D.⁵

17
- 33947635130
- Regularized adaptation of discriminative classifiers
- X. Li and J. Bilmes, "Regularized adaptation of discriminative classifiers," in Proc. ICASSP, 2006, vol. 1, pp. 1-1
- (2006) Proc. ICASSP , vol.1 , pp. 1
- Li, X.¹ Bilmes, J.²

18
- 84905229915
- Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network
- J. Xue, J. Li, D. Yu, M. Seltzer, and Y. Gong, "Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network," in Proc. ICASSP, 2014, pp. 6359-6363
- (2014) Proc. ICASSP , pp. 6359-6363
- Xue, J.¹ Li, J.² Yu, D.³ Seltzer, M.⁴ Gong, Y.⁵

19
- 33646794050
- Two-stage speaker adaptation of hybrid tied-posterior acoustic models
- J. Stadermann and G. Rigoll, "Two-stage speaker adaptation of hybrid tied-posterior acoustic models," in Proc. ICASSP, 2005, pp. 977-980
- (2005) Proc. ICASSP , pp. 977-980
- Stadermann, J.¹ Rigoll, G.²

20
- 84874226579
- Adaptation of context-dependent deep neural networks for automatic speech recognition
- K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong, "Adaptation of context-dependent deep neural networks for automatic speech recognition," in Proc. SLT, 2012, pp. 366-369
- (2012) Proc. SLT , pp. 366-369
- Yao, K.¹ Yu, D.² Seide, F.³ Su, H.⁴ Deng, L.⁵ Gong, Y.⁶

21
- 84983119674
- Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
- P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models," in Proc. SLT, 2014, pp. 171-176
- (2014) Proc. SLT , pp. 171-176
- Swietojanski, P.¹ Renals, S.²

22
- 84946032695
- Differentiable pooling for unsupervised speaker adaptation
- P. Swietojanski and S. Renals, "Differentiable pooling for unsupervised speaker adaptation," in Proc. ICASSP, 2015, pp. 4305-4309
- (2015) Proc. ICASSP , pp. 4305-4309
- Swietojanski, P.¹ Renals, S.²

23
- 0033677005
- Fast speaker adaptation of artificial neural networks for automatic speech recognition
- S. Dupont and L. Cheboub, "Fast speaker adaptation of artificial neural networks for automatic speech recognition," in Proc. ICASSP, 2000, pp. 1795-1798
- (2000) Proc. ICASSP , pp. 1795-1798
- Dupont, S.¹ Cheboub, L.²

24
- 84946083667
- Cluster adaptive training for deep neural network
- T. Tian, Q. Yanmin, Y. Maofan, Z. Yimeng, and K. Yu, "Cluster adaptive training for deep neural network," in Proc. ICASSP, 2015, pp. 4325-4329
- (2015) Proc. ICASSP , pp. 4325-4329
- Tian, T.¹ Yanmin, Q.² Maofan, Y.³ Yimeng, Z.⁴ Yu, K.⁵

25
- 84946054484
- Multi-basis adaptive neural network for rapid adaptation in speech recognition
- C. Wu and M. J. F. Gales, "Multi-basis adaptive neural network for rapid adaptation in speech recognition," in Proc. ICASSP, 2015, pp. 4315-4319
- (2015) Proc. ICASSP , pp. 4315-4319
- Wu, C.¹ Gales, M.J.F.²

26
- 84905259138
- Improving dnn speaker independence with i-vector inputs
- A. Senior and I. Lopez-Moreno, "Improving dnn speaker independence with i-vector inputs," in Proc. ICASSP, 2014, pp. 225-229
- (2014) Proc. ICASSP , pp. 225-229
- Senior, A.¹ Lopez-Moreno, I.²

27
- 84946035423
- An investigation of augmenting speaker representations to improve speaker normalization for DNN-based speech recognition
- H. Huang and K. C. Sim, "An investigation of augmenting speaker representations to improve speaker normalization for DNN-based speech recognition," in Proc. ICASSP, 2015, pp. 4610-4613
- (2015) Proc. ICASSP , pp. 4610-4613
- Huang, H.¹ Sim, K.C.²

28
- 0032050110
- Maximum likelihood linear transformations for hmm-based speech recognition
- M.J.F. Gales, "Maximum likelihood linear transformations for hmm-based speech recognition," Computer speech &language, vol. 12, no. 2, pp. 75-98, 1998
- (1998) Computer Speech &Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

29
- 0032638856
- Semi-tied covariance matrices for hidden markov models
- M.J.F. Gales, "Semi-tied covariance matrices for hidden markov models," IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 272-281, 1999
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , Issue.3 , pp. 272-281
- Gales, M.J.F.¹

30
- 84928146953
- Tech. Rep., Tech. Rep. MSR, Microsoft Research
- D. Yu, A. Eversole, M. Seltzer, K. Yao, Z. Huang, B. Guenter, O. Kuchaiev, Y. Zhang, F. Seide, H. Wang, et al., "An introduction to computational networks and the computational network toolkit," Tech. Rep., Tech. Rep. MSR, Microsoft Research, 2014, http://codebox/cntk, 2014
- (2014) An Introduction to Computational Networks and the Computational Network Toolkit
- Yu, D.¹ Eversole, A.² Seltzer, M.³ Yao, K.⁴ Huang, Z.⁵ Guenter, B.⁶ Kuchaiev, O.⁷ Zhang, Y.⁸ Seide, F.⁹ Wang, H.¹⁰

31
- 84858953642
- The kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, N. Goel, M. Hannemann, Y. Qian, P. Schwarz, and G. Stemmer, "The kaldi speech recognition toolkit," in Proc. ASRU, 2011
- (2011) Proc. ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Goel, N.⁴ Hannemann, M.⁵ Qian, Y.⁶ Schwarz, P.⁷ Stemmer, G.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.