메뉴 건너뛰기




Volumn 17, Issue 5, 2009, Pages 1025-1037

An ensemble speaker and speaking environment modeling approach to robust speech recognition

Author keywords

Environment modeling; Noise robustness

Indexed keywords

AUTOMATIC SPEECH RECOGNITION SYSTEM; BASELINE SYSTEMS; CONNECTED DIGITS; ENVIRONMENT MODELING; ENVIRONMENTAL COMPENSATION; GAUSSIAN MIXTURES; MINIMUM CLASSIFICATION ERROR TRAINING; NOISE ROBUSTNESS; OFFLINE; PARTITIONING ALGORITHMS; PERFORMANCE ROBUSTNESS; ROBUST SPEECH RECOGNITION; STOCHASTIC MATCHING; STRUCTURED ENVIRONMENT; TESTING ENVIRONMENT; TWO PHASIS;

EID: 67651177785     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2009.2016231     Document Type: Article
Times cited : (38)

References (52)
  • 1
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech recognition
    • Feb
    • L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
    • (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.R.1
  • 2
    • 0025493667 scopus 로고
    • The segmental K-means algorithm for estimating parametersof hidden Markov models
    • Sep
    • B.-H. Juang and L. R. Rabiner, "The segmental K-means algorithm for estimating parametersof hidden Markov models," IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 9, pp. 1639-1641, Sep. 1990.
    • (1990) IEEE Trans. Acoust., Speech, Signal Process , vol.38 , Issue.9 , pp. 1639-1641
    • Juang, B.-H.1    Rabiner, L.R.2
  • 4
    • 0030149866 scopus 로고    scopus 로고
    • A maximum-likelihood approach to stochastic matching for robust speech recognition
    • May
    • A. Sankar and C.-H. Lee, "A maximum-likelihood approach to stochastic matching for robust speech recognition," IEEE Trans. Speech Audio Process., vol. 4, no. 3, pp. 190-202, May 1996.
    • (1996) IEEE Trans. Speech Audio Process , vol.4 , Issue.3 , pp. 190-202
    • Sankar, A.1    Lee, C.-H.2
  • 5
    • 0001583797 scopus 로고    scopus 로고
    • Nonlinear compensation for stochastic matching
    • Nov
    • A. C. Suredran, C.-H. Lee, and M. Rahim, "Nonlinear compensation for stochastic matching," IEEE Trans. Speech Audio Process., vol. 7, no. 6, pp. 643-655, Nov. 1999.
    • (1999) IEEE Trans. Speech Audio Process , vol.7 , Issue.6 , pp. 643-655
    • Suredran, A.C.1    Lee, C.-H.2    Rahim, M.3
  • 6
    • 0006923547 scopus 로고
    • Noise adaptation in a hidden Markov model speech recognition system
    • D. V. Compernolle, "Noise adaptation in a hidden Markov model speech recognition system," Comput. Speech Lang., vol. 3, pp. 151-167, 1989.
    • (1989) Comput. Speech Lang , vol.3 , pp. 151-167
    • Compernolle, D.V.1
  • 7
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • Apr
    • S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoustic., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
    • (1979) IEEE Trans. Acoustic., Speech, Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll, S.F.1
  • 8
    • 0042362207 scopus 로고    scopus 로고
    • Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments
    • Sep
    • H. Kim and R. C. Rose, "Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 435-446, Sep. 2003.
    • (2003) IEEE Trans. Speech Audio Process , vol.11 , Issue.5 , pp. 435-446
    • Kim, H.1    Rose, R.C.2
  • 9
    • 0016067897 scopus 로고
    • Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
    • Jun
    • B. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. Acoust. Soc. Amer., vol. 55, pp. 1304-1312, Jun. 1974.
    • (1974) J. Acoust. Soc. Amer , vol.55 , pp. 1304-1312
    • Atal, B.1
  • 10
    • 0019555090 scopus 로고
    • Cepstral analysis technique for automatic speaker verification
    • Apr
    • S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-29, no. 2, pp. 254-272, Apr. 1981.
    • (1981) IEEE Trans. Acoust., Speech, Signal Process , vol.ASSP-29 , Issue.2 , pp. 254-272
    • Furui, S.1
  • 11
    • 4544345512 scopus 로고    scopus 로고
    • Higher order cepstral moment normalization (HOCMN) for robust speech recognition
    • C.-W. Hsu and L.-S. Lee, "Higher order cepstral moment normalization (HOCMN) for robust speech recognition," in Proc. ICASSP'04, 2004, pp. 197-200.
    • (2004) Proc. ICASSP'04 , pp. 197-200
    • Hsu, C.-W.1    Lee, L.-S.2
  • 12
    • 0032641790 scopus 로고    scopus 로고
    • Cepstrum third-order normalization method for noisy speech recognition
    • Apr
    • Y. H. Suk, S. H. Choi, and H. S. Lee, "Cepstrum third-order normalization method for noisy speech recognition," Electron. Lett., vol. 35, no. 7, pp. 527-528, Apr. 1999.
    • (1999) Electron. Lett , vol.35 , Issue.7 , pp. 527-528
    • Suk, Y.H.1    Choi, S.H.2    Lee, H.S.3
  • 14
    • 0004319970 scopus 로고
    • Acoustical and environmental robustness in automatic speech recognition,
    • Ph.D. dissertation, Elect. Comput. Eng. Dept, Carnegie Mellon Univ, Pittsburgh, PA
    • A. Acero, "Acoustical and environmental robustness in automatic speech recognition," Ph.D. dissertation, Elect. Comput. Eng. Dept., Carnegie Mellon Univ., Pittsburgh, PA, 1990.
    • (1990)
    • Acero, A.1
  • 15
    • 0347968277 scopus 로고    scopus 로고
    • Recursive estimation of nonsta-tionary noise using iterative stochastic approximation for robust speech recognition
    • Nov
    • L. Deng, J. Droppo, and A. Acero, "Recursive estimation of nonsta-tionary noise using iterative stochastic approximation for robust speech recognition," IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 568-580, Nov. 2003.
    • (2003) IEEE Trans. Speech Audio Process , vol.11 , Issue.6 , pp. 568-580
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 16
    • 67651169560 scopus 로고    scopus 로고
    • M. J. F. Gales, Maximum likelihood linear transformations for HMM-based speech recognition, Cambridge Univ., Cambridge, U.K., 1997, Tech. Rep. TR 291.
    • M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Cambridge Univ., Cambridge, U.K., 1997, Tech. Rep. TR 291.
  • 17
    • 0036298126 scopus 로고    scopus 로고
    • Structuring linear transforms for adaptation using training time information
    • K. Visweswariah, V. Goel, and R. Gopinath, "Structuring linear transforms for adaptation using training time information," in Proc. ICASSP'02, 2002, pp. 585-588.
    • (2002) Proc. ICASSP'02 , pp. 585-588
    • Visweswariah, K.1    Goel, V.2    Gopinath, R.3
  • 18
    • 0000159105 scopus 로고    scopus 로고
    • On adaptive decision rules and decision parameter adaptation for automatic speech recognition
    • Aug
    • C.-H. Lee and Q. Huo, "On adaptive decision rules and decision parameter adaptation for automatic speech recognition," Proc. IEEE, vol. 88, no. 8, pp. 1241-1269, Aug. 2000.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1241-1269
    • Lee, C.-H.1    Huo, Q.2
  • 19
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • Apr
    • J.-L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 291-99, Apr. 1994.
    • (1994) IEEE Trans. Speech Audio Process , vol.2 , Issue.2 , pp. 291-299
    • Gauvain, J.-L.1    Lee, C.-H.2
  • 20
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. Leggetter and P. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang , vol.9 , pp. 171-185
    • Leggetter, C.1    Woodland, P.2
  • 22
    • 27644486095 scopus 로고    scopus 로고
    • A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition
    • Sep
    • Y. Gong, "A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition," IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 975-983, Sep. 2005.
    • (2005) IEEE Trans. Speech Audio Process , vol.13 , Issue.5 , pp. 975-983
    • Gong, Y.1
  • 23
    • 85009113852 scopus 로고    scopus 로고
    • HMM adaptation using vector Taylor series for noisy speech recognition
    • A. Acero, L. Deng, T. Kristjansson, and J. Zhang, "HMM adaptation using vector Taylor series for noisy speech recognition," in Proc. ICSLP'02, 2000, pp. 869-872.
    • (2000) Proc. ICSLP'02 , pp. 869-872
    • Acero, A.1    Deng, L.2    Kristjansson, T.3    Zhang, J.4
  • 24
    • 0035279111 scopus 로고    scopus 로고
    • A structural Bayes approach to speaker adaptation
    • Mar
    • K. Shinoda and C.-H. Lee, "A structural Bayes approach to speaker adaptation," IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 276-287, Mar. 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.3 , pp. 276-287
    • Shinoda, K.1    Lee, C.-H.2
  • 25
    • 0035341086 scopus 로고    scopus 로고
    • Joint maximum a posteriori adaptation of transformation and HMM parameters
    • May
    • O. Siohan, C. Chesta, and C.-H. Lee, "Joint maximum a posteriori adaptation of transformation and HMM parameters," IEEE Trans. Speech Audio Process., vol. 9, no. 4, pp. 417-428, May 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.4 , pp. 417-428
    • Siohan, O.1    Chesta, C.2    Lee, C.-H.3
  • 26
    • 0023263708 scopus 로고
    • Multi-style training for robust isolated-word speech recognition
    • Dallas, TX, Apr
    • R. P. Lippmann, E. A. Martin, and D. B. Paul, "Multi-style training for robust isolated-word speech recognition," in Proc. ICASSP'87, Dallas, TX, Apr. 1987, pp. 705-708.
    • (1987) Proc. ICASSP'87 , pp. 705-708
    • Lippmann, R.P.1    Martin, E.A.2    Paul, D.B.3
  • 27
    • 44849090160 scopus 로고    scopus 로고
    • A vector space approach to environment modeling for robust speech recognition
    • Sep
    • Y. Tsao and C.-H. Lee, "A vector space approach to environment modeling for robust speech recognition," in Proc. ICSLP'06, Sep. 2006, pp. 785-788.
    • (2006) Proc. ICSLP'06 , pp. 785-788
    • Tsao, Y.1    Lee, C.-H.2
  • 28
    • 85009097035 scopus 로고    scopus 로고
    • Fast speaker adaptation using eigenspace-based maximum likelihoods linear regression
    • K.-T. Chen, W.-W. Liau, H.-M. Wang, and L.-S. Lee, "Fast speaker adaptation using eigenspace-based maximum likelihoods linear regression," in Proc. ICSLP'00, 2000, pp. 742-745.
    • (2000) Proc. ICSLP'00 , pp. 742-745
    • Chen, K.-T.1    Liau, W.-W.2    Wang, H.-M.3    Lee, L.-S.4
  • 29
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the EM algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. Roy. Statist. Soc. B, vol. 39, pp. 1-38, 1977.
    • (1977) J. Roy. Statist. Soc. B , vol.39 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 30
    • 0031139839 scopus 로고    scopus 로고
    • Minimum classification error rate methods for speech recognition
    • May
    • B.-H. Juang, W. Chou, and C.-H. Lee, "Minimum classification error rate methods for speech recognition," IEEE Trans. Speech Audio Process., vol. 5, no. 3, pp. 257-265, May 1997.
    • (1997) IEEE Trans. Speech Audio Process , vol.5 , Issue.3 , pp. 257-265
    • Juang, B.-H.1    Chou, W.2    Lee, C.-H.3
  • 31
    • 44849130443 scopus 로고    scopus 로고
    • Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition
    • Dec
    • Y. Tsao and C.-H. Lee, "Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition," in Proc. ASRU, Dec. 2007, pp. 77-80.
    • (2007) Proc. ASRU , pp. 77-80
    • Tsao, Y.1    Lee, C.-H.2
  • 33
    • 84968470175 scopus 로고
    • Algorithms for piecewise polynomials and splines with free knots
    • G. Meinardus, G. Nurnberger, M. Sommer, and H. Strauss, "Algorithms for piecewise polynomials and splines with free knots," Math. Comput., vol. 53, pp. 235-247, 1989.
    • (1989) Math. Comput , vol.53 , pp. 235-247
    • Meinardus, G.1    Nurnberger, G.2    Sommer, M.3    Strauss, H.4
  • 34
    • 18744411268 scopus 로고    scopus 로고
    • Segmental eigenvoice with delicate eigenspace for improved speaker adaptation
    • May
    • Y. Tsao, S.-M. Lee, and L.-S. Lee, "Segmental eigenvoice with delicate eigenspace for improved speaker adaptation," IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 399-411, May 2005.
    • (2005) IEEE Trans. Speech Audio Process , vol.13 , Issue.3 , pp. 399-411
    • Tsao, Y.1    Lee, S.-M.2    Lee, L.-S.3
  • 36
    • 0031222490 scopus 로고    scopus 로고
    • MMIE training of large vocabulary recognition systems
    • V. Valtchev, J. Odell, P. C. Woodland, and S. Young, "MMIE training of large vocabulary recognition systems," Speech Commun., vol. 22, no. 4, pp. 303-314, 1997.
    • (1997) Speech Commun , vol.22 , Issue.4 , pp. 303-314
    • Valtchev, V.1    Odell, J.2    Woodland, P.C.3    Young, S.4
  • 37
    • 0036296863 scopus 로고    scopus 로고
    • Minimum phone error and i-smoothing for improved discriminative training
    • pp. I-105-I-108
    • D. Povey and P. C. Woodland, "Minimum phone error and i-smoothing for improved discriminative training," in Proc. ICASSP'02, 2002, pp. I-105-I-108.
    • Proc. ICASSP'02 , pp. 2002
    • Povey, D.1    Woodland, P.C.2
  • 38
    • 67651157825 scopus 로고    scopus 로고
    • Soft margin estimation for automatic speech recognition,
    • Ph.D. dissertation, School Elect. Comput. Eng, Georgia Inst. Technol, Atlanta, GA
    • J. Li, "Soft margin estimation for automatic speech recognition," Ph.D. dissertation, School Elect. Comput. Eng., Georgia Inst. Technol., Atlanta, GA, 2008.
    • (2008)
    • Li, J.1
  • 39
    • 0032203256 scopus 로고    scopus 로고
    • Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method
    • Nov
    • S. Katagiri, B.-H. Juang, and C.-H. Lee, "Pattern recognition using a family of design algorithms based upon the generalized probabilistic descent method," Proc. IEEE, vol. 86, no. 11, pp. 2345-2373, Nov. 1998.
    • (1998) Proc. IEEE , vol.86 , Issue.11 , pp. 2345-2373
    • Katagiri, S.1    Juang, B.-H.2    Lee, C.-H.3
  • 41
    • 0009625231 scopus 로고    scopus 로고
    • A comparison of novel techniques for rapid speaker adaptation
    • May
    • T. J. Hazen, "A comparison of novel techniques for rapid speaker adaptation," Speech Commun., vol. 31, pp. 15-33, May 2000.
    • (2000) Speech Commun , vol.31 , pp. 15-33
    • Hazen, T.J.1
  • 43
    • 44849133030 scopus 로고    scopus 로고
    • An ensemble modeling approach to joint characterization of speaker and speaking environments
    • Aug
    • Y. Tsao and C.-H. Lee, "An ensemble modeling approach to joint characterization of speaker and speaking environments," in Proc. Interspeech 2007, Aug. 2007, pp. 1050-1053.
    • (2007) Proc. Interspeech 2007 , pp. 1050-1053
    • Tsao, Y.1    Lee, C.-H.2
  • 44
    • 33947681802 scopus 로고    scopus 로고
    • Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers
    • May
    • B. Mak and T.-C. Lai, "Improving reference speaker weighting adaptation by the use of maximum-likelihood reference speakers," in Proc. ICASSP'06, May 2006, vol. 1, pp. 229-232.
    • (2006) Proc. ICASSP'06 , vol.1 , pp. 229-232
    • Mak, B.1    Lai, T.-C.2
  • 45
    • 0038669544 scopus 로고    scopus 로고
    • The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
    • Paris, France
    • H.-G. Hirsch and D. Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. ISCA ITRW ASR'2000, Paris, France, 2000, pp. 181-188.
    • (2000) Proc. ISCA ITRW ASR'2000 , pp. 181-188
    • Hirsch, H.-G.1    Pearce, D.2
  • 46
    • 84867201606 scopus 로고    scopus 로고
    • Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process
    • Y. Tsao and C.-H. Lee, "Improving the ensemble speaker and speaking environment modeling approach by enhancing the precision of the online estimation process," in Proc. Interspeech'08, 2008, pp. 1265-1268.
    • (2008) Proc. Interspeech'08 , pp. 1265-1268
    • Tsao, Y.1    Lee, C.-H.2
  • 47
    • 0034227757 scopus 로고    scopus 로고
    • M. J. F. Gales, Cluster adaptive training of hidden Markov models, IEEE Trans. Speech Audio Process., 8,no. , pp. 417-428, Jul. 2000.
    • M. J. F. Gales, "Cluster adaptive training of hidden Markov models," IEEE Trans. Speech Audio Process., vol. 8,no. , pp. 417-428, Jul. 2000.
  • 49
    • 33846860860 scopus 로고    scopus 로고
    • Statistics: The Art and Science of Learning from Data
    • Upper Saddle River, NJ: Prentice-Hall
    • A. Agresti and C. A. Franklin, Statistics: The Art and Science of Learning from Data (MyStatLab Series). Upper Saddle River, NJ: Prentice-Hall, 2008.
    • (2008) MyStatLab Series
    • Agresti, A.1    Franklin, C.A.2
  • 50
    • 0141813799 scopus 로고    scopus 로고
    • Speaker adaptation by hierarchical eigen-voice
    • Apr
    • Y. Onishi and K.-I. Iso, "Speaker adaptation by hierarchical eigen-voice," in Proc. ICASSP '03, Apr. 2003, vol. 1, pp. 576-579.
    • (2003) Proc. ICASSP '03 , vol.1 , pp. 576-579
    • Onishi, Y.1    Iso, K.-I.2
  • 51
    • 85009181040 scopus 로고    scopus 로고
    • Several HKU approaches for robust speech recognition and their evaluation on Aurora connected digit recognition tasks
    • J. Wu and Q. Huo, "Several HKU approaches for robust speech recognition and their evaluation on Aurora connected digit recognition tasks," in Proc. Eurospeech'03, 2003, pp. 21-24.
    • (2003) Proc. Eurospeech'03 , pp. 21-24
    • Wu, J.1    Huo, Q.2
  • 52
    • 33745199175 scopus 로고    scopus 로고
    • A study on separation between acoustic models and its applications
    • Y. Tsao, J. Li, and C.-H. Lee, "A study on separation between acoustic models and its applications," in Proc. Interspeech'05, 2005, pp. 1109-1112.
    • (2005) Proc. Interspeech'05 , pp. 1109-1112
    • Tsao, Y.1    Li, J.2    Lee, C.-H.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.