메뉴 건너뛰기




Volumn 29, Issue 6, 2012, Pages 18-33

Large-vocabulary continuous speech recognition systems: A look at some recent advances

Author keywords

[No Author keywords available]

Indexed keywords

AUTOMATIC TELEPHONE SYSTEMS; AUTOMATION; CONTINUOUS SPEECH RECOGNITION; DEEP NEURAL NETWORKS; SPEECH; SPEECH PROCESSING; SPEECH TRANSMISSION; TRANSCRIPTION; VOCABULARY CONTROL;

EID: 85032751472     PISSN: 10535888     EISSN: None     Source Type: Journal    
DOI: 10.1109/MSP.2012.2197156     Document Type: Article
Times cited : (95)

References (118)
  • 14
    • 0030244826 scopus 로고    scopus 로고
    • A review of large-vocabulary continuous-speech recognition
    • S. Young, "A review of large-vocabulary continuous-speech recognition," IEEE Signal Processing Mag., vol. 13, no. 5, pp. 45-57, 1996.
    • (1996) IEEE Signal Processing Mag. , vol.13 , Issue.5 , pp. 45-57
    • Young, S.1
  • 15
    • 77956781007 scopus 로고    scopus 로고
    • Advances in large vocabulary continuous speech recognition
    • G. Zweig and M. Picheny, "Advances in large vocabulary continuous speech recognition," Adv. Comput., vol. 60, pp. 249-291, 2004.
    • (2004) Adv. Comput. , vol.60 , pp. 249-291
    • Zweig, G.1    Picheny, M.2
  • 18
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • DOI 10.1121/1.399423
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738-1752, 1990. (Pubitemid 20256470)
    • (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 19
    • 0022667694 scopus 로고
    • Speaker-independent isolated word recognition using dynamic features of speech spectrum
    • S. Furui, "Speaker independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, no. 1, pp. 52-59, 1986. (Pubitemid 16575387)
    • (1986) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-34 , Issue.1 , pp. 52-59
    • Furui Sadaoki1
  • 21
    • 0036475982 scopus 로고    scopus 로고
    • Maximum likelihood multiple subspace projections for hidden Markov models
    • DOI 10.1109/89.985541, PII S1063667602015213
    • M. J. F. Gales, "Maximum likelihood multiple subspace projections for hidden Markov models," IEEE Trans. Speech Audio Processing, vol. 10, no. 2, pp. 37-47, 2002. (Pubitemid 34295263)
    • (2002) IEEE Transactions on Speech and Audio Processing , vol.10 , Issue.2 , pp. 37-47
    • Gales, M.J.F.1
  • 22
    • 0032289099 scopus 로고    scopus 로고
    • Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition
    • PII S0167639398000612
    • N. Kumar and A. G. Andreou, "Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition," Speech Commun., vol. 26, no. 4, pp. 283-297, 1998. (Pubitemid 128425471)
    • (1998) Speech Communication , vol.26 , Issue.4 , pp. 283-297
    • Kumar, N.1    Andreou, A.G.2
  • 23
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998. (Pubitemid 128383747)
    • (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.J.F.1
  • 26
    • 40249103761 scopus 로고    scopus 로고
    • Issues with uncertainty decoding for noise robust automatic speech recognition
    • H. Liao and M. J. F. Gales, "Issues with uncertainty decoding for noise robust automatic speech recognition," Speech Commun., vol. 50, no. 4, pp. 265-277, 2008.
    • (2008) Speech Commun. , vol.50 , Issue.4 , pp. 265-277
    • Liao, H.1    Gales, M.J.F.2
  • 27
    • 34047249084 scopus 로고    scopus 로고
    • Quantile based histogram equalization for noise robust large vocabulary speech recognition
    • DOI 10.1109/TSA.2005.857792
    • F. Hilger and H. Ney, "Quantile based histogram equalization for noise robust large vocabulary speech recognition," IEEE Trans. Audio Speech Lang. Processing, vol. 14, no. 3, pp. 845-854, 2006. (Pubitemid 46547647)
    • (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.3 , pp. 845-854
    • Hilger, F.1    Ney, H.2
  • 28
    • 0031647824 scopus 로고    scopus 로고
    • A frequency warping approach to speaker normalization
    • PII S1063667698000960
    • L. Lee and R. Rose, "A frequency warping approach to speaker normalization," IEEE Trans. Speech Audio Processing, vol. 6, no. 1, pp. 49-60, 1998. (Pubitemid 128720631)
    • (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.1 , pp. 49-60
    • Lee, L.1    Rose, R.2
  • 37
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech recognition
    • L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
    • (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.R.1
  • 39
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the em algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Stat. Soc. B, vol. 39, no. 1, pp. 1-38, 1977.
    • (1977) J. R. Stat. Soc. B , vol.39 , Issue.1 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 40
    • 0031139839 scopus 로고    scopus 로고
    • Minimum classification error rate methods for speech recognition
    • PII S1063667697035937
    • B.-H. Juang, W. Chou, and C.-H. Lee, "Minimum classification error methods for speech recognition," IEEE Trans. Speech Audio Processing, vol. 5, no. 3, pp. 257-265, 1997. (Pubitemid 127745998)
    • (1997) IEEE Transactions on Speech and Audio Processing , vol.5 , Issue.3 , pp. 257-265
    • Juang, B.-H.1    Chou, W.2    Lee, C.-H.3
  • 42
    • 0036460908 scopus 로고    scopus 로고
    • Lightly supervised and unsupervised acoustic model training
    • L. Lamel, J.-L. Gauvain, and G. Adda, "Lightly supervised and unsupervised acoustic model training," Comput. Speech Lang., vol. 16, no. 1, pp. 115-129, 2002.
    • (2002) Comput. Speech Lang. , vol.16 , Issue.1 , pp. 115-129
    • Lamel, L.1    Gauvain, J.-L.2    Adda, G.3
  • 45
    • 33745186926 scopus 로고    scopus 로고
    • Anatomy of an extremely fast LVCSR decoder
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • G. Saon, D. Povey, and G. Zweig, "Anatomy of an extremely fast LVCSR decoder," in Proc. Annu. Conf. Int. Speech Communication Association (INTERSPEECH), 2005, pp. 549-552. (Pubitemid 43908121)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 549-552
    • Saon, G.1    Povey, D.2    Zweig, G.3
  • 46
    • 33745203493 scopus 로고    scopus 로고
    • Improved discriminative training using phone lattices
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • J. Zheng and A. Stolcke, "Improved discriminative training using phone lattices," in Proc. Annu. Conf. Int. Speech Communication Association (INTERSPEECH), 2005, pp. 2125-2128. (Pubitemid 43908513)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 2125-2128
    • Zheng, J.1    Stolcke, A.2
  • 50
    • 34547522370 scopus 로고    scopus 로고
    • Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models
    • F. Sha and L. K. Saul, "Comparison of large margin training to other discriminative methods for phonetic recognition by hidden Markov models," in Proc. Int. Conf. Acoustic, Speech, and Signal Processing (ICASSP), 2007, pp. 313-316.
    • (2007) Proc. Int. Conf. Acoustic, Speech, and Signal Processing (ICASSP) , pp. 313-316
    • Sha, F.1    Saul, L.K.2
  • 52
    • 64149098818 scopus 로고    scopus 로고
    • Approximate test risk bound minimization through soft margin estimation
    • J. Li, M. Yuan, and C.-H. Lee, "Approximate test risk bound minimization through soft margin estimation," IEEE Trans. Audio Speech Lang. Processing, vol. 15, no. 8, pp. 2393-2404, 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Processing , vol.15 , Issue.8 , pp. 2393-2404
    • Li, J.1    Yuan, M.2    Lee, C.-H.3
  • 53
    • 34047115134 scopus 로고    scopus 로고
    • Large margin hidden Markov models for speech recognition
    • DOI 10.1109/TASL.2006.879805
    • H. Jiang, X. Li, and C. Liu, "Large margin hidden Markov models for speech recognition," IEEE Trans. Audio Speech Lang. Processing, vol. 14, no. 5, pp. 1584-1595, 2006. (Pubitemid 46552926)
    • (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.5 , pp. 1584-1595
    • Jiang, H.1    Li, X.2    Liu, C.3
  • 55
    • 85032750905 scopus 로고    scopus 로고
    • Discriminative learning in sequential pattern recognition-A unifying review for optimization-oriented speech recognition
    • X. He, L. Deng, and W. Chou, "Discriminative learning in sequential pattern recognition-A unifying review for optimization-oriented speech recognition," IEEE Signal Processing Mag., vol. 25, no. 5, pp. 14-36, 2008.
    • (2008) IEEE Signal Processing Mag. , vol.25 , Issue.5 , pp. 14-36
    • He, X.1    Deng, L.2    Chou, W.3
  • 56
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, no. 2, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang. , vol.9 , Issue.2 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 65
    • 40149091397 scopus 로고    scopus 로고
    • MPE-based discriminative linear transforms for speaker adaptation
    • DOI 10.1016/j.csl.2007.09.001, PII S0885230807000563
    • L. Wang and P. C. Woodland, "MPE-based discriminative linear transforms for speaker adaptation," Comput. Speech Lang., vol. 22, no. 3, pp. 256-272, 2008. (Pubitemid 351329452)
    • (2008) Computer Speech and Language , vol.22 , Issue.3 , pp. 256-272
    • Wang, L.1    Woodland, P.C.2
  • 66
    • 58349123022 scopus 로고    scopus 로고
    • A study of minimum classification error (MCE) linear regression for supervised adaptation of MCE-trained continuous-density hidden Markov models
    • J. Wu and Q. Huo, "A study of minimum classification error (MCE) linear regression for supervised adaptation of MCE-trained continuous-density hidden Markov models," IEEE Trans. Audio Speech Lang. Processing, vol. 15, no. 2, pp. 478-488, 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Processing , vol.15 , Issue.2 , pp. 478-488
    • Wu, J.1    Huo, Q.2
  • 67
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • PII S1063667696067120
    • M. J. F. Gales, "Robust continuous speech recognition using parallel model combination," IEEE Trans. Speech Audio Processing, vol. 4, no. 5, pp. 352-359, 1996. (Pubitemid 126753023)
    • (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 352-359
    • Gales, M.J.F.1    Young, S.J.2
  • 69
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio Speech Lang. Processing, vol. 20, no. 1, pp. 30-42, 2012.
    • (2012) IEEE Trans. Audio Speech Lang. Processing , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 71
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • DOI 10.1162/neco.2006.18.7.1527
    • G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006. (Pubitemid 44024729)
    • (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.-W.3
  • 74
    • 0033329799 scopus 로고    scopus 로고
    • Empirical study of smoothing techniques for language modeling
    • DOI 10.1006/csla.1999.0128
    • S. F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling," Comput. Speech Lang., vol. 13, no. 4, pp. 359-394, 1999. (Pubitemid 30518216)
    • (1999) Computer Speech and Language , vol.13 , Issue.4 , pp. 359-394
    • Chen, S.F.1    Goodman, J.2
  • 77
    • 77956280276 scopus 로고    scopus 로고
    • Hierarchical Bayesian language models for conversational speech recognition
    • S. Huang and S. Renals, "Hierarchical Bayesian language models for conversational speech recognition," IEEE Trans. Audio Speech Lang. Processing, vol. 18, no. 8, pp. 1941-1954, 2010.
    • (2010) IEEE Trans. Audio Speech Lang. Processing , vol.18 , Issue.8 , pp. 1941-1954
    • Huang, S.1    Renals, S.2
  • 78
    • 0000274403 scopus 로고    scopus 로고
    • Exploiting latent semantic information in statistical language modeling
    • J. Bellegarda, "Exploiting latent semantic information in statistical language modeling," Proc. IEEE, vol. 88, no. 8, pp. 1279-1296, 2000.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1279-1296
    • Bellegarda, J.1
  • 80
    • 0141607824 scopus 로고    scopus 로고
    • Latent Dirichlet allocation
    • D. M. Blei, A. Y. Ng, and M. I. Jordan, "Latent Dirichlet allocation," J. Mach. Learn. Res., vol. 3, no. 1, pp. 993-1022, 2003.
    • (2003) J. Mach. Learn. Res. , vol.3 , Issue.1 , pp. 993-1022
    • Blei, D.M.1    Ng, A.Y.2    Jordan, M.I.3
  • 81
    • 33745203547 scopus 로고    scopus 로고
    • Dynamic language model adaptation using variational bayes inference
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • Y. C. Tam and T. Schultz, "Dynamic language model adaptation using variational Bayes inference," in Proc. Annu. Conf. Int. Speech Communication Association (INTERSPEECH), 2005, pp. 5-8. (Pubitemid 43907987)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 5-8
    • Tam, Y.-C.1    Schultz, T.2
  • 82
    • 78149256857 scopus 로고    scopus 로고
    • Dirichlet class language models for speech recognition
    • J.-T. Chien and C.-H. Chueh, "Dirichlet class language models for speech recognition," IEEE Trans. Audio Speech Lang. Processing, vol. 19, no. 3, pp. 482-495, 2011.
    • (2011) IEEE Trans. Audio Speech Lang. Processing , vol.19 , Issue.3 , pp. 482-495
    • Chien, J.-T.1    Chueh, C.-H.2
  • 84
    • 0030181951 scopus 로고    scopus 로고
    • A maximum entropy approach to adaptive statistical language modeling
    • R. Rosenfeld, "A maximum entropy approach to adaptive statistical language modeling," Comput. Speech Lang., vol. 10, no. 3, pp. 187-228, 1996.
    • (1996) Comput. Speech Lang. , vol.10 , Issue.3 , pp. 187-228
    • Rosenfeld, R.1
  • 86
    • 73649117102 scopus 로고    scopus 로고
    • Joint acoustic and language modeling for speech recognition
    • J.-T. Chien and C.-H. Chueh, "Joint acoustic and language modeling for speech recognition," Speech Commun., vol. 52, no. 3, pp. 223-235, 2010.
    • (2010) Speech Commun. , vol.52 , Issue.3 , pp. 223-235
    • Chien, J.-T.1    Chueh, C.-H.2
  • 89
    • 33847610331 scopus 로고    scopus 로고
    • Continuous space language models
    • DOI 10.1016/j.csl.2006.09.003, PII S0885230806000325
    • H. Schwenk, "Continuous space language models," Comput. Speech Lang., vol. 21, no. 3, pp. 492-518, 2007. (Pubitemid 46367510)
    • (2007) Computer Speech and Language , vol.21 , Issue.3 , pp. 492-518
    • Schwenk, H.1
  • 91
    • 0034295822 scopus 로고    scopus 로고
    • Structured language modeling
    • C. Chelba and F. Jelinek, "Structured language modeling," Comput. Speech Lang., vol. 14, no. 4, pp. 283-332, 2000.
    • (2000) Comput. Speech Lang. , vol.14 , Issue.4 , pp. 283-332
    • Chelba, C.1    Jelinek, F.2
  • 94
    • 0036460907 scopus 로고    scopus 로고
    • Weighted finite state transducers in speech recognition
    • M. Mohri, F. Perreira, and M. Riley, "Weighted finite state transducers in speech recognition," Comput. Speech. Lang., vol. 16, no. 1, pp. 69-88, 2002.
    • (2002) Comput. Speech. Lang. , vol.16 , Issue.1 , pp. 69-88
    • Mohri, M.1    Perreira, F.2    Riley, M.3
  • 98
    • 0030638031 scopus 로고    scopus 로고
    • A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
    • J. Fiscus, "A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1997, pp. 347-354.
    • (1997) Proc. IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) , pp. 347-354
    • Fiscus, J.1
  • 99
    • 4544253834 scopus 로고    scopus 로고
    • Posterior probability decoding, confidence estimation and system combination
    • G. Evermann and P. Woodland, "Posterior probability decoding, confidence estimation and system combination," in Proc. Speech Transcription Workshop, 2000.
    • (2000) Proc. Speech Transcription Workshop
    • Evermann, G.1    Woodland, P.2
  • 100
    • 0034296009 scopus 로고    scopus 로고
    • Finding consensus in speech recognition: Word error minimization and other applications of confusion networks
    • L. Mangu, E. Brill, and A. Stolcke, "Finding consensus in speech recognition: Word error minimization and other applications of confusion networks," Comput. Speech Lang., vol. 14, no. 4, pp. 373-400, 2000.
    • (2000) Comput. Speech Lang. , vol.14 , Issue.4 , pp. 373-400
    • Mangu, L.1    Brill, E.2    Stolcke, A.3
  • 104
    • 80055092534 scopus 로고    scopus 로고
    • Boosting systems for large vocabulary continuous speech recognition
    • G. Saon and H. Soltau, "Boosting systems for large vocabulary continuous speech recognition," Spee ch Commun., vol. 54, no. 2, pp. 212-128, 2012.
    • (2012) Spee Ch Commun. , vol.54 , Issue.2 , pp. 212-128
    • Saon, G.1    Soltau, H.2
  • 107
    • 0742272654 scopus 로고    scopus 로고
    • Modeling inverse covariance matrices by basis expansion
    • P. A. Olsen and R. A. Gopinath, "Modeling inverse covariance matrices by basis expansion," IEEE Tr ans. Speech Audio Processing, vol. 12, no. 1, pp. 37-46, 2004.
    • (2004) IEEE Tr Ans. Speech Audio Processing , vol.12 , Issue.1 , pp. 37-46
    • Olsen, P.A.1    Gopinath, R.A.2
  • 110
    • 0001224048 scopus 로고    scopus 로고
    • Sparse bayesian learning and the relevance vector machine
    • DOI 10.1162/15324430152748236
    • M. E. Tipping, "Sparse Bayesian learning and the relevance vector machine," J. Mach. Learn. Res., vol. 1, no. 6, pp. 211-244, 2001. (Pubitemid 33687203)
    • (2001) Journal of Machine Learning Research , vol.1 , Issue.3 , pp. 211-244
    • Tipping, M.E.1
  • 114
    • 3042741069 scopus 로고    scopus 로고
    • Variational Bayesian estimation and clustering for speech recognition
    • S. Watanabe, Y. Minami, A. Nakamura, and N. Ueda, "Variational Bayesian estimation and clustering for speech recognition," IEEE Trans. Speech Audio Processing, vol. 12, no. 4, pp. 365-381, 2004.
    • (2004) IEEE Trans. Speech Audio Processing , vol.12 , Issue.4 , pp. 365-381
    • Watanabe, S.1    Minami, Y.2    Nakamura, A.3    Ueda, N.4
  • 116
    • 18744376902 scopus 로고    scopus 로고
    • Predictive hidden Markov model selection for speech recognition
    • DOI 10.1109/TSA.2005.845810
    • J.-T. Chien and S. Furui, "Predictive hidden markov model selection for speech recognition," IEEE Trans. Speech Audio Processing, vol. 13, no. 3, pp. 377-387, 2005. (Pubitemid 40666172)
    • (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 377-387
    • Chien, J.-T.1    Furui, S.2
  • 117
    • 76849117578 scopus 로고    scopus 로고
    • The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies
    • article 7
    • D. M. Blei, T. L. Griffiths, and M. I. Jordan, "The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies," J. ACM, vol. 57, no. 2, p. article 7, 2010.
    • (2010) J. ACM , vol.57 , Issue.2
    • Blei, D.M.1    Griffiths, T.L.2    Jordan, M.I.3
  • 118
    • 85032752250 scopus 로고    scopus 로고
    • Bayesian nonparametric methods for learning Ma rkov switching processes
    • E. Fox, E. Sudderth, M. I. Jordan, and A. Willsky, "Bayesian nonparametric methods for learning Ma rkov switching processes," IEEE Signal Processing Mag., vol. 27, no. 6, pp. 43-54, 2010.
    • (2010) IEEE Signal Processing Mag. , vol.27 , Issue.6 , pp. 43-54
    • Fox, E.1    Sudderth, E.2    Jordan, M.I.3    Willsky, A.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.