메뉴 건너뛰기




Volumn 18, Issue 6, 2010, Pages 1158-1169

A study on the generalization capability of acoustic models for robust speech recognition

Author keywords

Aurora task; discriminative training; large margin; robust speech recognition

Indexed keywords

ACOUSTIC MODEL; AURORA TASK; COMPETING MODELS; CONNECTED DIGITS; DISCRIMINATIVE TRAINING; FEATURE NORMALIZATION; GENERAL MODEL; GENERALIZATION CAPABILITY; LANGUAGE MODEL; LARGE MARGIN; LEARNING FRAMEWORKS; MACHINE LEARNING APPLICATIONS; MEAN AND VARIANCE NORMALIZATIONS; MODEL TRAINING; NOISY SPEECH RECOGNITION; ROBUST SPEECH RECOGNITION; ROBUSTNESS ISSUES; SPEECH RECOGNITION PERFORMANCE; SPEECH RECOGNITION ROBUSTNESS; STATISTICAL LEARNING THEORY; TESTING CASE; TESTING DATA; TRAINING DATA;

EID: 77955810460     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2009.2031236     Document Type: Article
Times cited : (21)

References (43)
  • 1
    • 0029288202 scopus 로고
    • Speech recognition in noisy environments: A survey
    • Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol.16, no.3, pp. 261-291, 1995.
    • (1995) Speech Commun. , vol.16 , Issue.3 , pp. 261-291
    • Gong, Y.1
  • 2
    • 0021645331 scopus 로고
    • Speech enhancement using a minimum mean square error short time spectral amplitude estimator
    • Dec.
    • Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error short time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-32, no.6, pp. 1109-1121, Dec. 1984.
    • (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
    • Ephraim, Y.1    Malah, D.2
  • 3
    • 0021892216 scopus 로고
    • Speech enhancement using a minimum mean square error log-spectral amplitude estimator
    • Apr.
    • Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error log-spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol.33, no.2, pp. 443-445, Apr. 1985.
    • (1985) IEEE Trans. Acoust., Speech, Signal Process. , vol.33 , Issue.2 , pp. 443-445
    • Ephraim, Y.1    Malah, D.2
  • 4
    • 4644336054 scopus 로고    scopus 로고
    • Reconstruction of missing features for robust speech recognition
    • B. Raj, M. L. Seltzer, and R. M. Stern, "Reconstruction of missing features for robust speech recognition," Speech Commun., vol.43, no.4, pp. 275-296, 2004.
    • (2004) Speech Commun. , vol.43 , Issue.4 , pp. 275-296
    • Raj, B.1    Seltzer, M.L.2    Stern, R.M.3
  • 5
    • 18744396687 scopus 로고    scopus 로고
    • Accurate compensation in the log-spectral domain for noisy speech recognition
    • May
    • M. Afify, "Accurate compensation in the log-spectral domain for noisy speech recognition," IEEE Trans. Speech Audio Process., vol.13, no.3, pp. 388-398, May 2005.
    • (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.3 , pp. 388-398
    • Afify, M.1
  • 6
    • 2142756950 scopus 로고    scopus 로고
    • Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise
    • Mar.
    • L. Deng, J. Droppo, and A. Acero, "Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise," IEEE Trans. Speech Audio Process., vol.12, no.2, pp. 133-143, Mar. 2004.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.2 , pp. 133-143
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 7
    • 2442551863 scopus 로고    scopus 로고
    • Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features
    • May
    • L. Deng, J. Droppo, and A. Acero, "Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features," IEEE Trans. Speech Audio Process., vol.12, no.3, pp. 218-223, May 2004.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.3 , pp. 218-223
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 8
    • 85009070292 scopus 로고    scopus 로고
    • Large-vocabulary speech recognition under adverse acoustic environment
    • Beijing, China Oct.
    • L. Deng, A. Acero, M. Plumpe, and X. D. Huang, "Large-vocabulary speech recognition under adverse acoustic environment," in Proc. ICSLP'00, Beijing, China, Oct. 2000, pp. 806-809.
    • (2000) Proc. ICSLP'00 , pp. 806-809
    • Deng, L.1    Acero, A.2    Plumpe, M.3    Huang, X.D.4
  • 9
    • 0019555090 scopus 로고
    • Cepstral analysis technique for automatic speaker verification
    • Apr.
    • S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-29, no.2, pp. 254-272, Apr. 1981.
    • (1981) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-29 , Issue.2 , pp. 254-272
    • Furui, S.1
  • 10
    • 0032141206 scopus 로고    scopus 로고
    • Cepstral domain segmental feature vector normalization for noise robust speech recognition
    • O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Commun., vol.25, pp. 133-147, 1998.
    • (1998) Speech Commun. , vol.25 , pp. 133-147
    • Viikki, O.1    Laurila, K.2
  • 11
    • 34047249084 scopus 로고    scopus 로고
    • Quantile based histogram equalization for noise robust large vocabulary speech recognition
    • May
    • F. Hilger and H. Ney, "Quantile based histogram equalization for noise robust large vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.14, no.3, pp. 845-854, May 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.3 , pp. 845-854
    • Hilger, F.1    Ney, H.2
  • 13
    • 34147106901 scopus 로고    scopus 로고
    • Probabilistic class histogram equalization for robust speech recognition
    • Apr.
    • Y. Suh, M. Ji, and H. Kim, "Probabilistic class histogram equalization for robust speech recognition," IEEE Signal Process. Lett., vol.14, no.4,pp. 287-290, Apr. 2007.
    • (2007) IEEE Signal Process. Lett. , vol.14 , Issue.4 , pp. 287-290
    • Suh, Y.1    Ji, M.2    Kim, H.3
  • 14
    • 2442509974 scopus 로고    scopus 로고
    • Cepstral domain segmental nonlinear feature transformations for robust speech recognition
    • May
    • J. C. Segura, C. Benítez, A. De La Torre, A. J. Rubio, and J. Ramírez, "Cepstral domain segmental nonlinear feature transformations for robust speech recognition," IEEE Signal Process. Lett., vol.11, no.5, pp. 517-520, May 2004.
    • (2004) IEEE Signal Process. Lett. , vol.11 , Issue.5 , pp. 517-520
    • Segura, J.C.1    Benítez, C.2    De La Torre, A.3    Rubio, A.J.4    Ramírez, J.5
  • 17
    • 34347376319 scopus 로고    scopus 로고
    • Temporal structure normalization of speech feature for robust speech recognition
    • Jul.
    • X. Xiao, E. S. Chng, and H. Li, "Temporal structure normalization of speech feature for robust speech recognition," IEEE Signal Process. Lett., vol.14, no.7, pp. 500-503, Jul. 2007.
    • (2007) IEEE Signal Process. Lett. , vol.14 , Issue.7 , pp. 500-503
    • Xiao, X.1    Chng, E.S.2    Li, H.3
  • 18
    • 70350016998 scopus 로고    scopus 로고
    • Normalization of the speech modulation spectra for robust speech recognition
    • Nov.
    • X. Xiao, E. S. Chng, and H. Li, "Normalization of the speech modulation spectra for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.16, no.8, pp. 1662-1674, Nov. 2008.
    • (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.8 , pp. 1662-1674
    • Xiao, X.1    Chng, E.S.2    Li, H.3
  • 19
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • Apr.
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol.9, pp. 171-185, Apr. 1995.
    • (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 20
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • Apr.
    • J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol.2, no.2, pp. 291-298, Apr. 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.L.1    Lee, C.H.2
  • 21
    • 0027622731 scopus 로고
    • Cepstral parameter compensation for HMM recognition
    • Jul.
    • M. J. F. Gales and S. J. Young, "Cepstral parameter compensation for HMM recognition," Speech Commun., vol.12, pp. 231-239, Jul. 1993.
    • (1993) Speech Commun. , vol.12 , pp. 231-239
    • Gales, M.J.F.1    Young, S.J.2
  • 22
    • 44849133030 scopus 로고    scopus 로고
    • An ensemble modeling approach to joint characterization of speaker and speaking environments
    • Antwerp, Belgium, Sep.
    • Y. Tsao and C.-H. Lee, "An ensemble modeling approach to joint characterization of speaker and speaking environments," in Proc. Eurospeech '07, Antwerp, Belgium, Sep. 2007, pp. 1050-1053.
    • (2007) Proc. Eurospeech '07 , pp. 1050-1053
    • Tsao, Y.1    Lee, C.-H.2
  • 23
    • 44849130443 scopus 로고    scopus 로고
    • Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition
    • Kyoto, Japan Dec.
    • Y. Tsao and C.-H. Lee, "Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition," in Proc. ASRU'07, Kyoto, Japan, Dec. 2007, pp. 77-80.
    • (2007) Proc. ASRU'07 , pp. 77-80
    • Tsao, Y.1    Lee, C.-H.2
  • 24
    • 27644486095 scopus 로고    scopus 로고
    • A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition
    • Oct.
    • Y. Gong, "A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition," IEEE Trans. Speech Audio Process., vol.13, no.5, pp. 975-983, Oct. 2005.
    • (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 975-983
    • Gong, Y.1
  • 25
    • 44849125798 scopus 로고    scopus 로고
    • High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series
    • Kyoto, Japan Dec.
    • J. Li, L. Deng, D. Yu, Y. Gong, and A. Acero, "High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series," in Proc. ASRU'07, Kyoto, Japan, Dec. 2007, pp. 65-70.
    • (2007) Proc. ASRU'07 , pp. 65-70
    • Li, J.1    Deng, L.2    Yu, D.3    Gong, Y.4    Acero, A.5
  • 27
    • 0026982122 scopus 로고
    • Discriminative learning for minimum error classification
    • Dec.
    • B.-H. Juang and S. Katagiri, "Discriminative learning for minimum error classification," IEEE Trans. Signal Process., vol.40, no.12, pp. 3043-3054, Dec. 1992.
    • (1992) IEEE Trans. Signal Process. , vol.40 , Issue.12 , pp. 3043-3054
    • Juang, B.-H.1    Katagiri, S.2
  • 28
    • 0031139839 scopus 로고    scopus 로고
    • Minimum classification error rate methods for speech recognition
    • May
    • B.-H. Juang, W. Chou, and C.-H. Lee, "Minimum classification error rate methods for speech recognition," IEEE Trans. Speech Audio Process., vol.5, no.3, pp. 257-265, May 1997.
    • (1997) IEEE Trans. Speech Audio Process. , vol.5 , Issue.3 , pp. 257-265
    • Juang, B.-H.1    Chou, W.2    Lee, C.-H.3
  • 29
    • 0022890536 scopus 로고
    • Maximum mutual information estimation of hidden Markov model parameters for speech recognition
    • Tokyo, Japan Apr.
    • L. R. Bahi, P. F. Brown, P. V. De Souza, and R. L. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," in Proc. ICASSP'86, Tokyo, Japan, Apr. 1986, pp. 49-52.
    • (1986) Proc. ICASSP'86 , pp. 49-52
    • Bahi, L.R.1    Brown, P.F.2    De Souza, P.V.3    Mercer, R.L.4
  • 30
    • 0031222490 scopus 로고    scopus 로고
    • MMIE training of large vocabulary recognition systems
    • V. Valtchev, J. J. Odell1, P. C. Woodland, and S. J. Young, "MMIE training of large vocabulary recognition systems," Speech Commun., vol.22, no.4, pp. 303-314, 1997.
    • (1997) Speech Commun. , vol.22 , Issue.4 , pp. 303-314
    • Valtchev, V.1    Odell, J.J.2    Woodland, P.C.3    Young, S.J.4
  • 32
    • 44849090158 scopus 로고    scopus 로고
    • An environment compensated minimum classification error training approach based on stochastic vector mapping
    • Nov.
    • J. Wu and Q. Huo, "An environment compensated minimum classification error training approach based on stochastic vector mapping," IEEE Trans. Audio, Speech, Lang. Process., vol.14, no.6, pp. 2147-2155, Nov. 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.6 , pp. 2147-2155
    • Wu, J.1    Huo, Q.2
  • 33
    • 0742272653 scopus 로고    scopus 로고
    • Discriminative auditory-based features for robust speech recognition
    • Jan.
    • B.-W. Mak, Y.-C. Tam, and P. Li, "Discriminative auditory-based features for robust speech recognition," IEEE Trans. Speech Audio Process., vol.12, no.1, pp. 27-36, Jan. 2004.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.1 , pp. 27-36
    • Mak, B.-W.1    Tam, Y.-C.2    Li, P.3
  • 34
    • 33745216251 scopus 로고    scopus 로고
    • Maximum mutual information SPLICE transform for seen and unseen conditions
    • Lisbon, Portugal, Sep.
    • J. Droppo and A. Acero, "Maximum mutual information SPLICE transform for seen and unseen conditions," in Proc. Interspeech'05, Lisbon, Portugal, Sep. 2005, pp. 989-992.
    • (2005) Proc. Interspeech'05 , pp. 989-992
    • Droppo, J.1    Acero, A.2
  • 35
    • 33947686745 scopus 로고    scopus 로고
    • Large margin gaussian mixture modeling for phonetic classification and recognition
    • May
    • F. Sha and L. Saul, "Large margin gaussian mixture modeling for phonetic classification and recognition," in Proc. ICASSP'06, May 2006, vol.1, pp. I-I.
    • (2006) Proc. ICASSP'06 , vol.1
    • Sha, F.1    Saul, L.2
  • 36
    • 84864038630 scopus 로고    scopus 로고
    • Large margin hidden Markov models for automatic speech recognition
    • B. Schölkopf, J. Platt, and T. Hofmann, Eds. Cambridge, MA: MIT Press
    • F. Sha and L. K. Saul, "Large margin hidden Markov models for automatic speech recognition," in Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt, and T. Hofmann, Eds. Cambridge, MA: MIT Press, 2007, vol.19.
    • (2007) Advances in Neural Information Processing Systems , vol.19
    • Sha, F.1    Saul, L.K.2
  • 37
    • 34047115134 scopus 로고    scopus 로고
    • Large margin hidden Markov models for speech recognition
    • Oct.
    • H. Jiang, X. Li, and C. Liu, "Large margin hidden Markov models for speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.14, no.5, pp. 1584-1595, Oct. 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.5 , pp. 1584-1595
    • Jiang, H.1    Li, X.2    Liu, C.3
  • 38
    • 64149098818 scopus 로고    scopus 로고
    • Approximate test risk bound minimization through soft margin estimation
    • Nov.
    • J. Li, M. Yuan, and C.-H. Lee, "Approximate test risk bound minimization through soft margin estimation,"IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.8, pp. 2393-2404, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2393-2404
    • Li, J.1    Yuan, M.2    Lee, C.-H.3
  • 39
    • 84867216710 scopus 로고    scopus 로고
    • On a generalization of margin-based discriminative training to robust speech recognition
    • Brisbane, Australia, Sep.
    • J. Li and C.-H. Lee, "On a generalization of margin-based discriminative training to robust speech recognition," in Proc. Interspeech'08, Brisbane, Australia, Sep. 2008.
    • (2008) Proc. Interspeech'08
    • Li, J.1    Lee, C.-H.2
  • 40
    • 84987702417 scopus 로고    scopus 로고
    • The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions
    • Beijing, China Oct.
    • D. Pearce and H.-G. Hirsch, "The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. ICSLP'00, Beijing, China, Oct. 2000, vol.4, pp. 29-32.
    • (2000) Proc. ICSLP'00 , vol.4 , pp. 29-32
    • Pearce, D.1    Hirsch, H.-G.2
  • 42
    • 65549153550 scopus 로고    scopus 로고
    • Ph.D. dissertation, Dept. Elec. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA
    • P. J. Moreno, "Speech Recognition in Noisy Environments," Ph.D. dissertation, Dept. Elec. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, 1996.
    • (1996) Speech Recognition in Noisy Environments
    • Moreno, P.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.