SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 22, Issue 1, 2014, Pages 17-27

Cross-lingual subspace Gaussian mixture models for low-resource speech recognition

(3) Lu, Liang a Ghoshal, Arnab a Renals, Steve a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Acoustic modeling; Adaptation; Cross lingual speech recognition; Regularization; Subspace Gaussian mixture model

Indexed keywords

COMMUNICATION CHANNELS (INFORMATION THEORY); HIDDEN MARKOV MODELS; OBJECT RECOGNITION; SPEECH RECOGNITION; AUDIO ACOUSTICS; COMPUTATIONAL LINGUISTICS; GAUSSIAN DISTRIBUTION; MARKOV PROCESSES; TRELLIS CODES;

ACOUSTIC MODEL; ADAPTATION; CROSS-LINGUAL SPEECH RECOGNITION; REGULARIZATION; SUBSPACE GAUSSIAN MIXTURE MODELS;

AUDIO ACOUSTICS; SPEECH RECOGNITION;

EID: 84897937578 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2013.2281575 Document Type: Article

Times cited : (25)

References (40)

1
- 85135166225
- Fast bootstrapping of LVCSRsystems with multilingual phoneme sets
- T. Schultz and A. Waibel, "Fast bootstrapping of LVCSRsystems with multilingual phoneme sets," in Proc. Eurospeech, 1997, pp. 371-374.
- Proc. Eurospeech, 1997 , pp. 371-374
- Schultz, T.¹ Waibel, A.²

2
- 0004694838
- Multilingual and crosslingual speech recognition
- T. Schultz and A. Waibel, "Multilingual and crosslingual speech recognition," in Proc. DARPAWorkshop Broadcast News Transcript. Understand., 1998.
- Proc. DARPAWorkshop Broadcast News Transcript. Understand., 1998
- Schultz, T.¹ Waibel, A.²

3
- 0033690885
- Towards language independent acoustic modeling
- W. Byrne, P. Beyerlein, J. M. Huerta, S. Khudanpur, B. Marthi, J. Morgan, N. Peterek, J. Picone, D. Vergyri, and W. Wang, "Towards language independent acoustic modeling," in Proc. ICASSP, 2000, pp. 1029-1032.
- Proc. ICASSP, 2000 , pp. 1029-1032
- Byrne, W.¹ Beyerlein, P.² Huerta, J.M.³ Khudanpur, S.⁴ Marthi, B.⁵ Morgan, J.⁶ Peterek, N.⁷ Picone, J.⁸ Vergyri, D.⁹ Wang, W.¹⁰

4
- 33646764228
- First steps in fast acoustic modeling for a new target language: Application to Vietnamese
- V. B. Le and L. Besacier, "First steps in fast acoustic modeling for a new target language: Application to Vietnamese," in Proc. ICASSP, 2005, pp. 821-824.
- Proc. ICASSP, 2005 , pp. 821-824
- Le, V.B.¹ Besacier, L.²

5
- 79959819891
- Cross-lingual and multi-stream posterior features for low resource LVCSR systems
- S. Thomas, S. Ganapathy, and H. Hermansky, "Cross-lingual and multi-stream posterior features for low resource LVCSR systems," in Proc. INTERSPEECH, 2010, pp. 877-880.
- Proc. INTERSPEECH, 2010 , pp. 877-880
- Thomas, S.¹ Ganapathy, S.² Hermansky, H.³

6
- 0030363039
- Dictionary learning for spontaneous speech recognition
- T. Slobada and A. Waibel, "Dictionary learning for spontaneous speech recognition," in Proc. ICSLP, 1996, pp. 2328-2331.
- Proc. ICSLP, 1996 , pp. 2328-2331
- Slobada, T.¹ Waibel, A.²

7
- 0033708114
- Automatic generation of phone sets and lexical transcriptions
- R. Singh, B. Raj, and R. M. Stern, "Automatic generation of phone sets and lexical transcriptions," in Proc. ICASSP, 2000, pp. 1691-1694.
- Proc. ICASSP, 2000 , pp. 1691-1694
- Singh, R.¹ Raj, B.² Stern, R.M.³

8
- 78049369273
- Approaches to automatic lexicon learning with limited training examples
- N. Goel, S. Thomas, M. Agarwal, P. Akyazi, L. Burget, K. Feng, A. Ghoshal, O. Glembek, M. Karafiát, D. Povey, A. Rastrow, R. C. Rose, and P. Schwarz, "Approaches to automatic lexicon learning with limited training examples," in Proc. ICASSP, 2010, pp. 5094-5097.
- Proc. ICASSP, 2010 , pp. 5094-5097
- Goel, N.¹ Thomas, S.² Agarwal, M.³ Akyazi, P.⁴ Burget, L.⁵ Feng, K.⁶ Ghoshal, A.⁷ Glembek, O.⁸ Karafiát, M.⁹ Povey, D.¹⁰ Rastrow, A.¹¹ Rose, R.C.¹² Schwarz, P.¹³

9
- 0035426931
- Language-independent and language-adaptive acoustic modeling for speech recognition
- T. Schultz and A. Waibel, "Language-independent and language-adaptive acoustic modeling for speech recognition," Speech Commun., vol. 35, no. 1, pp. 31-52, 2001.
- (2001) Speech Commun. , vol.35 , Issue.1 , pp. 31-52
- Schultz, T.¹ Waibel, A.²

10
- 85009274666
- GlobalPhone: A multilingual speech and text database developed at Karlsruhe University
- T. Schultz, "GlobalPhone: A multilingual speech and text database developed at Karlsruhe University," in Proc. ICLSP, 2002, pp. 345-348.
- Proc. ICLSP, 2002 , pp. 345-348
- Schultz, T.¹

11
- 0030371812
- Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds
- J. Kohler, "Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds," in Proc. ICSLP, 1996, pp. 2195-2198.
- Proc. ICSLP, 1996 , pp. 2195-2198
- Kohler, J.¹

12
- 84862931515
- Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data
- Mar.
- S. M. Siniscalchi, D. C. Lyu, T. Svendsen, and C. H. Lee, "Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 875-887, Mar. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.3 , pp. 875-887
- Siniscalchi, S.M.¹ Lyu, D.C.² Svendsen, T.³ Lee, C.H.⁴

13
- 51449101990
- Robust phone set mapping using decision tree clustering for cross-lingual phone recognition
- K. C. Sim and H. Li, "Robust phone set mapping using decision tree clustering for cross-lingual phone recognition," in Proc. ICASSP, 2008, pp. 4309-4312.
- Proc. ICASSP, 2008 , pp. 4309-4312
- Sim, K.C.¹ Li, H.²

14
- 77949342620
- Discriminative product-of-expert acoustic mapping for cross-lingual phone recognition
- K. C. Sim, "Discriminative product-of-expert acoustic mapping for cross-lingual phone recognition," in Proc. ASRU, 2009, pp. 546-551.
- Proc. ASRU, 2009 , pp. 546-551
- Sim, K.C.¹

15
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D. P. W. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. ICASSP, 2000, pp. 1635-1638.
- Proc. ICASSP, 2000 , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

16
- 33947619591
- Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptrons
- A. Stolcke, F. Grézl, M. Y. Hwang, X. Lei, N. Morgan, and D. Vergyri, "Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptrons," in Proc. ICASSP, 2006, pp. 321-324.
- Proc. ICASSP, 2006 , pp. 321-324
- Stolcke, A.¹ Grézl, F.² Hwang, M.Y.³ Lei, X.⁴ Morgan, N.⁵ Vergyri, D.⁶

17
- 44849132075
- Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs
- O. Çetin, M. Magimai-Doss, K. Livescu, A. Kantor, S. King, C. Bartels, and J. Frankel, "Monolingual and crosslingual comparison of tandem features derived from articulatory and phone MLPs," in Proc. ASRU, 2007, pp. 36-41.
- Proc. ASRU, 2007 , pp. 36-41
- Çetin, O.¹ Magimai-Doss, M.² Livescu, K.³ Kantor, A.⁴ King, S.⁵ Bartels, C.⁶ Frankel, J.⁷

18
- 84858971854
- Strategies for using MLP based features with limited target-language training data
- Y. Qian, J. Xu, D. Povey, and L. Jia, "Strategies for using MLP based features with limited target-language training data," in Proc. ASRU, 2011, pp. 354-358.
- Proc. ASRU, 2011 , pp. 354-358
- Qian, Y.¹ Xu, J.² Povey, D.³ Jia, L.⁴

19
- 84858976609
- Cross-lingual portability of Chinese and English neural network features for French and German LVCSR
- C. Plahl, R. Schluter, and H. Ney, "Cross-lingual portability of Chinese and English neural network features for French and German LVCSR," in Proc. ASRU, 2011, pp. 371-376.
- Proc. ASRU, 2011 , pp. 371-376
- Plahl, C.¹ Schluter, R.² Ney, H.³

20
- 84858965424
- Ph.D. thesis, The Univ. of Edinburgh, Edinburgh, U.K.
- P. Lal, "Cross-lingual automatic speech recognition using tandem features," Ph.D. thesis, The Univ. of Edinburgh, Edinburgh, U.K., 2011.
- (2011) Cross-lingual Automatic Speech Recognition Using Tandem Features
- Lal, P.¹

21
- 84055222005
- Context-dependent pretrained deep neural networks for large-vocabulary speech recognition
- Jan.
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

22
- 84874278045
- Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR
- P. Swietojanski, A. Ghoshal, and S. Renals, "Unsupervised cross-lingual knowledge transfer in DNN-based LVCSR," in Proc. IEEE SLT, 2012, pp. 246-251.
- Proc. IEEE SLT, 2012 , pp. 246-251
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

23
- 84867207576
- Using KL-based acoustic models in a large vocabulary recognition task
- G. Aradilla, H. Bourlard, and M. Magimai-Doss, "Using KL-based acoustic models in a large vocabulary recognition task," in Proc. INTERSPEECH, 2008.
- Proc. INTERSPEECH, 2008
- Aradilla, G.¹ Bourlard, H.² Magimai-Doss, M.³

24
- 84858976352
- Fast and flexible kullback-leibler divergence based acoustic modelling for non-native speech recognition
- D. Imseng, R. Rasipuram, and M. Magimai-Doss, "Fast and flexible kullback-leibler divergence based acoustic modelling for non-native speech recognition," in Proc. ASRU, 2011, pp. 348-353.
- Proc. ASRU, 2011 , pp. 348-353
- Imseng, D.¹ Rasipuram, R.² Magimai-Doss, M.³

25
- 84867616349
- Using KL-divergence and multilingual informaiton to improve ASR for under-resourced languages
- D. Imseng, H. Bourlard, and P. N. Garner, "Using KL-divergence and multilingual informaiton to improve ASR for under-resourced languages," in Proc. ICASSP, 2012, pp. 4869-4872.
- Proc. ICASSP, 2012 , pp. 4869-4872
- Imseng, D.¹ Bourlard, H.² Garner, P.N.³

26
- 78049502526
- The subspace Gaussian mixture model - A structured model for speech recognition
- D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, A. Rastrow, R. C. Rose, P. Schwarz, and S. Thomas, "The subspace Gaussian mixture model - A structured model for speech recognition," Comput. Speech Lang., vol. 25, no. 2, pp. 404-439, 2011.
- (2011) Comput. Speech Lang. , vol.25 , Issue.2 , pp. 404-439
- Povey, D.¹ Burget, L.² Agarwal, M.³ Akyazi, P.⁴ Kai, F.⁵ Ghoshal, A.⁶ Glembek, O.⁷ Goel, N.⁸ Karafiát, M.⁹ Rastrow, A.¹⁰ Rose, R.C.¹¹ Schwarz, P.¹² Thomas, S.¹³

27
- 78049394188
- Multilingual acousticmodeling for speech recognition based on subspace Gaussian mixture models
- L. Burget, P. Schwarz, M. Agarwal, P. Akyazi, K. Feng, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, D. Povey, A. Rastrow, R. Rose, and S. Thomas, "Multilingual acousticmodeling for speech recognition based on subspace Gaussian mixture models," in Proc. IEEE ICASSP, 2010, pp. 4334-4337.
- Proc. IEEE ICASSP, 2010 , pp. 4334-4337
- Burget, L.¹ Schwarz, P.² Agarwal, M.³ Akyazi, P.⁴ Feng, K.⁵ Ghoshal, A.⁶ Glembek, O.⁷ Goel, N.⁸ Karafiát, M.⁹ Povey, D.¹⁰ Rastrow, A.¹¹ Rose, R.¹² Thomas, S.¹³

28
- 84858952433
- Regularized subspace Gaussian mixture models for cross-lingual speech recognition
- L. Lu, A. Ghoshal, and S. Renals, "Regularized subspace Gaussian mixture models for cross-lingual speech recognition," in Proc. IEEE ASRU, 2011, pp. 922-932.
- Proc. IEEE ASRU, 2011 , pp. 922-932
- Lu, L.¹ Ghoshal, A.² Renals, S.³

29
- 84867597584
- Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition
- L. Lu, A. Ghoshal, and S. Renals, "Maximum a posteriori adaptation of subspace Gaussian mixture models for cross-lingual speech recognition," in Proc. ICASSP, 2012, pp. 4887-4877-4880.
- Proc. ICASSP, 2012
- Lu, L.¹ Ghoshal, A.² Renals, S.³

30
- 79958067294
- Regularized subspace Gaussian mixture models for speech recognition
- L. Lu, A. Ghoshal, and S. Renals, "Regularized subspace Gaussian mixture models for speech recognition," IEEE Signal Process. Lett., vol. 18, no. 7, pp. 419-422, 2011.
- (2011) IEEE Signal Process. Lett. , vol.18 , Issue.7 , pp. 419-422
- Lu, L.¹ Ghoshal, A.² Renals, S.³

31
- 0003684449
- New York, NY, USA: Springer
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York, NY, USA: Springer, 2005.
- (2005) The Elements of Statistical Learning: Data Mining, Inference and Prediction
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³

32
- 78049398611
- Sparse coding for speech recognition
- G. Sivaram, S. K. Nemala, M. Elhilali, T. D. Tran, and H. Hermansky, "Sparse coding for speech recognition," in Proc. ICASSP, 2010, pp. 4346-4349.
- Proc. ICASSP, 2010 , pp. 4346-4349
- Sivaram, G.¹ Nemala, S.K.² Elhilali, M.³ Tran, T.D.⁴ Hermansky, H.⁵

33
- 78049392891
- Bayesian compressive sensing for phonetic classification
- T. N. Sainath, A. Carmi, D. Kanevsky, and B. Ramabhadran, "Bayesian compressive sensing for phonetic classification," in Proc. ICASSP, 2010, pp. 4370-4373.
- Proc. ICASSP, 2010 , pp. 4370-4373
- Sainath, T.N.¹ Carmi, A.² Kanevsky, D.³ Ramabhadran, B.⁴

34
- 33645712892
- Compressed sensing
- Apr.
- D. L. Donoho, "Compressed sensing," IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289-1306, Apr. 2006.
- (2006) IEEE Trans. Inf. Theory , vol.52 , Issue.4 , pp. 1289-1306
- Donoho, D.L.¹

35
- 39449126969
- Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems
- Dec.
- M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright, "Gradient projection for sparse reconstruction: Application to compressed sensing and other inverse problems," IEEE J. Sel. Topics Signal Process., vol. 1, no. 4, pp. 586-597, Dec. 2007.
- (2007) IEEE J. Sel. Topics Signal Process. , vol.1 , Issue.4 , pp. 586-597
- Figueiredo, M.A.T.¹ Nowak, R.D.² Wright, S.J.³

36
- 0004192423
- Matrix Variate Distributions
- ser. Boca Raton, FL, USA: Chapman & Hall/CRC
- A. K. Gupta and D. K. Nagar, Matrix Variate Distributions, ser.Monographs and Surveys in Pure and Applied Mathematics. Boca Raton, FL, USA: Chapman & Hall/CRC, 1999, vol. 104.
- (1999) Monographs and Surveys in Pure and Applied Mathematics , vol.104
- Gupta, A.K.¹ Nagar, D.K.²

37
- 0035341086
- Joint Maximum a Posteriori adaptation of transformation and HMM parameters
- DOI 10.1109/89.917687, PII S1063667601027419
- O. Siohan, C. Chesta, and C. H. Lee, "Joint maximum a posteriori adaptation of transformation and HMM parameters," IEEE Trans. Speech Audio Process., vol. 9, no. 4, pp. 417-428, May 2001. (Pubitemid 32372183)
- (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.4 , pp. 417-428
- Siohan, O.¹ Chesta, C.² Lee, C.-H.³

38
- 0033236298
- The MLE algorithm for the matrix normal distribution
- P. Dutilleul, "The MLE algorithm for the matrix normal distribution," J. Statist. Comput. Simulat., vol. 64, no. 2, pp. 105-123, 1999.
- (1999) J. Statist. Comput. Simulat. , vol.64 , Issue.2 , pp. 105-123
- Dutilleul, P.¹

39
- 78049393226
- Microsoft Research, MSR-TR-2009-111, Tech. Rep.
- D. Povey, "A tutorial-style introduction to subspace Gaussian mixture models for speech recognition," Microsoft Research, MSR-TR-2009-111, 2009, Tech. Rep.
- (2009) A Tutorial-style Introduction to Subspace Gaussian Mixture Models for Speech Recognition
- Povey, D.¹

40
- 84858953642
- The Kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovský, G. Semmer, and K. Veselý, "The Kaldi speech recognition toolkit," in Proc. ASRU, 2011.
- Proc. ASRU, 2011
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovský, J.¹¹ Semmer, G.¹² Veselý, K.¹³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.