SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2016-May, Issue , 2016, Pages 5275-5279

On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalization in DNN acoustic models

(2) Samarakoon, Lahiru a Sim, Khe Chai a

a NATIONAL UNIVERSITY OF SINGAPORE (Singapore)

Author keywords

Automatic speech recognition; deep neural networks; speaker normalization

Indexed keywords

EID: 84973352080 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2016.7472684 Document Type: Conference Paper

Times cited : (20)

References (30)

1
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, " IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

2
- 0028419019
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains
- J. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains, " IEEE Transactions on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.
- (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.2 , pp. 291-298
- Gauvain, J.¹ Lee, C.H.²

3
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models, " Computer Speech & Language, vol. 9, no. 2, pp. 171-185, 1995.
- (1995) Computer Speech & Language , vol.9 , Issue.2 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

4
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D. P. W. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems, " in ICASSP. IEEE, 2000, pp. 1635-1638.
- (2000) ICASSP. IEEE , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

5
- 84890537527
- Multi-level adaptive networks in tandem and hybrid ASR systems
- P. Bell, P. Swietojanski, and S. Renals, "Multi-level adaptive networks in tandem and hybrid ASR systems, " in ICASSP. IEEE, 2013, pp. 6975-6979.
- (2013) ICASSP. IEEE , pp. 6975-6979
- Bell, P.¹ Swietojanski, P.² Renals, S.³

6
- 84973343516
- Learning factorized transforms for speaker normalization
- L. T. Samarakoon and K. C. Sim, "Learning factorized transforms for speaker normalization, " in ASRU. IEEE, 2015.
- (2015) ASRU. IEEE
- Samarakoon, L.T.¹ Sim, K.C.²

7
- 84890542079
- Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
- D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, " in ICASSP. IEEE, 2013, pp. 7893-7897.
- (2013) ICASSP. IEEE , pp. 7893-7897
- Yu, D.¹ Yao, K.² Su, H.³ Li, G.⁴ Seide, F.⁵

8
- 84905259145
- I-vectorbased speaker adaptation of deep neural networks for French broadcast audio transcription
- V. Gupta, P. Kenny, P. Ouellet, and T. Stafylakis, "I-vectorbased speaker adaptation of deep neural networks for French broadcast audio transcription, " in ICASSP. IEEE, 2014, pp. 6334-6338.
- (2014) ICASSP. IEEE , pp. 6334-6338
- Gupta, V.¹ Kenny, P.² Ouellet, P.³ Stafylakis, T.⁴

9
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors, " in ASRU. IEEE, 2013, pp. 55-59.
- (2013) ASRU. IEEE , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

10
- 84890452886
- Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
- O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code, " in ICASSP. IEEE, 2013, pp. 7942-7946.
- (2013) ICASSP. IEEE , pp. 7942-7946
- Abdel-Hamid, O.¹ Jiang, H.²

11
- 84983119674
- Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
- P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models, " in SLT. IEEE, 2014, pp. 171-176.
- (2014) SLT. IEEE , pp. 171-176
- Swietojanski, P.¹ Renals, S.²

12
- 84937880519
- Connectionist speaker normalization and adaptation
- V. Abrash, H. Franco, A. Sankar, and M. Cohen, "Connectionist speaker normalization and adaptation, " in Eurospeech. ISCA, 1995, pp. 2183-2186.
- (1995) Eurospeech. ISCA , pp. 2183-2186
- Abrash, V.¹ Franco, H.² Sankar, A.³ Cohen, M.⁴

13
- 79959849500
- Comparison of discriminative input and output transformation for speaker adaptation in the hybrid nn/hmm systems
- B. Li and K. C. Sim, "Comparison of discriminative input and output transformation for speaker adaptation in the hybrid nn/hmm systems, " in INTERSPEECH. ISCA, 2010, pp. 526-529.
- (2010) Interspeech. ISCA , pp. 526-529
- Li, B.¹ Sim, K.C.²

14
- 33947703156
- Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training
- R. Gemello, F. Mana, S. Scanzio, P. Laface, and R. D. Mori, "Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training, " in ICASSP. IEEE, 2006, pp. 1189-1192.
- (2006) ICASSP. IEEE , pp. 1189-1192
- Gemello, R.¹ Mana, F.² Scanzio, S.³ Laface, P.⁴ Mori, R.D.⁵

15
- 44849101939
- Regularized adaptation of discriminative classifiers
- X. Li and J. Bilmes, "Regularized adaptation of discriminative classifiers, " in ICASSP. IEEE, 2006, vol. 1, pp. I-I.
- (2006) ICASSP. IEEE , vol.1 , pp. I-I
- Li, X.¹ Bilmes, J.²

16
- 33646794050
- Two-stage speaker adaptation of hybrid tied-posterior acoustic models
- J. Stadermann and G. Rigoll, "Two-stage speaker adaptation of hybrid tied-posterior acoustic models, " in ICASSP. IEEE, 2005, pp. 977-980.
- (2005) ICASSP. IEEE , pp. 977-980
- Stadermann, J.¹ Rigoll, G.²

17
- 84874226579
- Adaptation of context-dependent deep neural networks for automatic speech recognition
- K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong, "Adaptation of context-dependent deep neural networks for automatic speech recognition, " in SLT. IEEE, 2012, pp. 366-369.
- (2012) SLT. IEEE , pp. 366-369
- Yao, K.¹ Yu, D.² Seide, F.³ Su, H.⁴ Deng, L.⁵ Gong, Y.⁶

18
- 84946032695
- Differentiable pooling for unsupervised speaker adaptation
- P. Swietojanski and S. Renals, "Differentiable pooling for unsupervised speaker adaptation, " in ICASSP. IEEE, 2015, pp. 4305-4309.
- (2015) ICASSP. IEEE , pp. 4305-4309
- Swietojanski, P.¹ Renals, S.²

19
- 84905259138
- Improving dnn speaker independence with i-vector inputs
- A. Senior and I. Lopez-Moreno, "Improving dnn speaker independence with i-vector inputs, " in ICASSP. IEEE, 2014, pp. 225-229.
- (2014) ICASSP. IEEE , pp. 225-229
- Senior, A.¹ Lopez-Moreno, I.²

20
- 84946035423
- An investigation of augmenting speaker representations to improve speaker normalization for DNN-based speech recognition
- H. Huang and K. C. Sim, "An investigation of augmenting speaker representations to improve speaker normalization for DNN-based speech recognition, " in ICASSP. IEEE, 2015, pp. 4610-4613.
- (2015) ICASSP. IEEE , pp. 4610-4613
- Huang, H.¹ Sim, K.C.²

21
- 84946083667
- Cluster adaptive training for deep neural network
- T. Tian, Q. Yanmin, Y. Maofan, Z. Yimeng, and K. Yu, "Cluster adaptive training for deep neural network, " in ICASSP. IEEE, 2015, pp. 4325-4329.
- (2015) ICASSP. IEEE , pp. 4325-4329
- Tian, T.¹ Yanmin, Q.² Maofan, Y.³ Yimeng, Z.⁴ Yu, K.⁵

22
- 84946054484
- Multi-basis adaptive neural network for rapid adaptation in speech recognition
- C. Wu and M. J. F. Gales, "Multi-basis adaptive neural network for rapid adaptation in speech recognition, " in ICASSP. IEEE, 2015, pp. 4315-4319.
- (2015) ICASSP. IEEE , pp. 4315-4319
- Wu, C.¹ Gales, M.J.F.²

23
- 84973382376
- Towards speaker adaptive training of deep neural network acoustic models
- Yajie M., Hao Z., and Florian M., "Towards speaker adaptive training of deep neural network acoustic models, " in INTERSPEECH. ISCA, 2014.
- (2014) Interspeech. ISCA
- Yajie, M.¹ Hao, Z.² Florian, M.³

24
- 84905216195
- Speaker adaptive training using deep neural networks
- O. Toshihiko, M. Shodai, L Xugang, H. Chiori, and K. Souichi, "Speaker adaptive training using deep neural networks, " in ICASSP. IEEE, 2014, pp. 6349-6353.
- (2014) ICASSP. IEEE , pp. 6349-6353
- Toshihiko, O.¹ Shodai, M.² Xugang, L.³ Chiori, H.⁴ Souichi, K.⁵

25
- 84946076428
- Ted-lium: An automatic speech recognition dedicated corpus
- A. Rousseau, P. Deléglise, and Y. Esteve, "Ted-lium: an automatic speech recognition dedicated corpus., " in LREC. ELRA, 2012, pp. 125-129.
- (2012) LREC. ELRA , pp. 125-129
- Rousseau, A.¹ Deléglise, P.² Esteve, Y.³

26
- 0032638856
- Semi-tied covariance matrices for hidden markov models
- M. J. F. Gales, "Semi-tied covariance matrices for hidden markov models, " IEEE Transactions on Speech and Audio Processing, vol. 7, no. 3, pp. 272-281, 1999.
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , Issue.3 , pp. 272-281
- Gales, M.J.F.¹

27
- 84928146953
- Tech. Rep., Tech. Rep. MSR, Microsoft Research 2014
- D. Yu, A. Eversole, M. Seltzer, K. Yao, Z. Huang, B. Guenter, O. Kuchaiev, Y. Zhang, F. Seide, H. Wang, et al., "An introduction to computational networks and the computational network toolkit, " Tech. Rep., Tech. Rep. MSR, Microsoft Research, 2014, http: //codebox/cntk, 2014.
- (2014) An Introduction to Computational Networks and the Computational Network Toolkit
- Yu, D.¹ Eversole, A.² Seltzer, M.³ Yao, K.⁴ Huang, Z.⁵ Guenter, B.⁶ Kuchaiev, O.⁷ Zhang, Y.⁸ Seide, F.⁹ Wang, H.¹⁰

28
- 84946091011
- Scaling recurrent neural network language models
- W. Williams, N. Prasad, D. Mrva, T. Ash, and T. Robinson, "Scaling recurrent neural network language models, " in ICASSP. IEEE, 2015, pp. 5391-5395.
- (2015) ICASSP. IEEE , pp. 5391-5395
- Williams, W.¹ Prasad, N.² Mrva, D.³ Ash, T.⁴ Robinson, T.⁵

29
- 84858953642
- The kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, N. Goel, M. Hannemann, Y. Qian, P. Schwarz, and G. Stemmer, "The kaldi speech recognition toolkit, " in ASRU. IEEE, 2011.
- (2011) ASRU. IEEE
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Goel, N.⁴ Hannemann, M.⁵ Qian, Y.⁶ Schwarz, P.⁷ Stemmer, G.⁸

30
- 84867616340
- Generating exact lattices in the wfst framework
- D. Povey, M. Hannemann, G. Boulianne, L. Burget, A. Ghoshal, M. Janda, M. Karafiát, S. Kombrink, P. Motlicek, Y. Qian, et al., "Generating exact lattices in the wfst framework, " in ICASSP. IEEE, 2012, pp. 4213-4216.
- (2012) ICASSP. IEEE , pp. 4213-4216
- Povey, D.¹ Hannemann, M.² Boulianne, G.³ Burget, L.⁴ Ghoshal, A.⁵ Janda, M.⁶ Karafiát, M.⁷ Kombrink, S.⁸ Motlicek, P.⁹ Qian, Y.¹⁰

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.