SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 18, Issue 6, 2010, Pages 1158-1169

A study on the generalization capability of acoustic models for robust speech recognition

(5) Xiao, Xiong a Li, Jinyu b Chng, Eng Siong a Li, Haizhou a,c Lee, Chin Hui d

a NANYANG TECHNOLOGICAL UNIVERSITY (Singapore)

b MICROSOFT (United States)

c INSTITUTE FOR INFOCOMM RESEARCH (Singapore)

d GEORGIA INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Aurora task; discriminative training; large margin; robust speech recognition

Indexed keywords

ACOUSTIC MODEL; AURORA TASK; COMPETING MODELS; CONNECTED DIGITS; DISCRIMINATIVE TRAINING; FEATURE NORMALIZATION; GENERAL MODEL; GENERALIZATION CAPABILITY; LANGUAGE MODEL; LARGE MARGIN; LEARNING FRAMEWORKS; MACHINE LEARNING APPLICATIONS; MEAN AND VARIANCE NORMALIZATIONS; MODEL TRAINING; NOISY SPEECH RECOGNITION; ROBUST SPEECH RECOGNITION; ROBUSTNESS ISSUES; SPEECH RECOGNITION PERFORMANCE; SPEECH RECOGNITION ROBUSTNESS; STATISTICAL LEARNING THEORY; TESTING CASE; TESTING DATA; TRAINING DATA;

ABILITY TESTING; COMPUTATIONAL LINGUISTICS; FACE RECOGNITION;

SPEECH RECOGNITION;

EID: 77955810460 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2009.2031236 Document Type: Article

Times cited : (21)

References (43)

1
- 0029288202
- Speech recognition in noisy environments: A survey
- Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol.16, no.3, pp. 261-291, 1995.
- (1995) Speech Commun. , vol.16 , Issue.3 , pp. 261-291
- Gong, Y.¹

2
- 0021645331
- Speech enhancement using a minimum mean square error short time spectral amplitude estimator
- Dec.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error short time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-32, no.6, pp. 1109-1121, Dec. 1984.
- (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
- Ephraim, Y.¹ Malah, D.²

3
- 0021892216
- Speech enhancement using a minimum mean square error log-spectral amplitude estimator
- Apr.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error log-spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol.33, no.2, pp. 443-445, Apr. 1985.
- (1985) IEEE Trans. Acoust., Speech, Signal Process. , vol.33 , Issue.2 , pp. 443-445
- Ephraim, Y.¹ Malah, D.²

4
- 4644336054
- Reconstruction of missing features for robust speech recognition
- B. Raj, M. L. Seltzer, and R. M. Stern, "Reconstruction of missing features for robust speech recognition," Speech Commun., vol.43, no.4, pp. 275-296, 2004.
- (2004) Speech Commun. , vol.43 , Issue.4 , pp. 275-296
- Raj, B.¹ Seltzer, M.L.² Stern, R.M.³

5
- 18744396687
- Accurate compensation in the log-spectral domain for noisy speech recognition
- May
- M. Afify, "Accurate compensation in the log-spectral domain for noisy speech recognition," IEEE Trans. Speech Audio Process., vol.13, no.3, pp. 388-398, May 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.3 , pp. 388-398
- Afify, M.¹

6
- 2142756950
- Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise
- Mar.
- L. Deng, J. Droppo, and A. Acero, "Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise," IEEE Trans. Speech Audio Process., vol.12, no.2, pp. 133-143, Mar. 2004.
- (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.2 , pp. 133-143
- Deng, L.¹ Droppo, J.² Acero, A.³

7
- 2442551863
- Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features
- May
- L. Deng, J. Droppo, and A. Acero, "Estimating cepstrum of speech under the presence of noise using a joint prior of static and dynamic features," IEEE Trans. Speech Audio Process., vol.12, no.3, pp. 218-223, May 2004.
- (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.3 , pp. 218-223
- Deng, L.¹ Droppo, J.² Acero, A.³

8
- 85009070292
- Large-vocabulary speech recognition under adverse acoustic environment
- Beijing, China Oct.
- L. Deng, A. Acero, M. Plumpe, and X. D. Huang, "Large-vocabulary speech recognition under adverse acoustic environment," in Proc. ICSLP'00, Beijing, China, Oct. 2000, pp. 806-809.
- (2000) Proc. ICSLP'00 , pp. 806-809
- Deng, L.¹ Acero, A.² Plumpe, M.³ Huang, X.D.⁴

9
- 0019555090
- Cepstral analysis technique for automatic speaker verification
- Apr.
- S. Furui, "Cepstral analysis technique for automatic speaker verification," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-29, no.2, pp. 254-272, Apr. 1981.
- (1981) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-29 , Issue.2 , pp. 254-272
- Furui, S.¹

10
- 0032141206
- Cepstral domain segmental feature vector normalization for noise robust speech recognition
- O. Viikki and K. Laurila, "Cepstral domain segmental feature vector normalization for noise robust speech recognition," Speech Commun., vol.25, pp. 133-147, 1998.
- (1998) Speech Commun. , vol.25 , pp. 133-147
- Viikki, O.¹ Laurila, K.²

11
- 34047249084
- Quantile based histogram equalization for noise robust large vocabulary speech recognition
- May
- F. Hilger and H. Ney, "Quantile based histogram equalization for noise robust large vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.14, no.3, pp. 845-854, May 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.3 , pp. 845-854
- Hilger, F.¹ Ney, H.²

12
- 18744371585
- Histogram equalization of speech representation for robust speech recognition
- May
- A. De La Torre, A. M. Peinado, J. C. Segura, J. L. Perez-Cordoba, M. C. Benitez, and A. J. Rubio, "Histogram equalization of speech representation for robust speech recognition," IEEE Trans. Speech Audio Process., vol.13, no.3, pp. 355-366, May 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.3 , pp. 355-366
- De La Torre, A.¹ Peinado, A.M.² Segura, J.C.³ Perez-Cordoba, J.L.⁴ Benitez, M.C.⁵ Rubio, A.J.⁶

13
- 34147106901
- Probabilistic class histogram equalization for robust speech recognition
- Apr.
- Y. Suh, M. Ji, and H. Kim, "Probabilistic class histogram equalization for robust speech recognition," IEEE Signal Process. Lett., vol.14, no.4,pp. 287-290, Apr. 2007.
- (2007) IEEE Signal Process. Lett. , vol.14 , Issue.4 , pp. 287-290
- Suh, Y.¹ Ji, M.² Kim, H.³

14
- 2442509974
- Cepstral domain segmental nonlinear feature transformations for robust speech recognition
- May
- J. C. Segura, C. Benítez, A. De La Torre, A. J. Rubio, and J. Ramírez, "Cepstral domain segmental nonlinear feature transformations for robust speech recognition," IEEE Signal Process. Lett., vol.11, no.5, pp. 517-520, May 2004.
- (2004) IEEE Signal Process. Lett. , vol.11 , Issue.5 , pp. 517-520
- Segura, J.C.¹ Benítez, C.² De La Torre, A.³ Rubio, A.J.⁴ Ramírez, J.⁵

15
- 0028517164
- RASTA processing of speech
- Oct.
- H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE Trans. Speech Audio Process., vol.2, no.4, pp. 578-589, Oct. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

16
- 42549139762
- MVA processing of speech features
- Jan.
- C.-P. Chen and J. A. Bilmes, "MVA processing of speech features," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.1, pp. 257-270, Jan. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 257-270
- Chen, C.-P.¹ Bilmes, J.A.²

17
- 34347376319
- Temporal structure normalization of speech feature for robust speech recognition
- Jul.
- X. Xiao, E. S. Chng, and H. Li, "Temporal structure normalization of speech feature for robust speech recognition," IEEE Signal Process. Lett., vol.14, no.7, pp. 500-503, Jul. 2007.
- (2007) IEEE Signal Process. Lett. , vol.14 , Issue.7 , pp. 500-503
- Xiao, X.¹ Chng, E.S.² Li, H.³

18
- 70350016998
- Normalization of the speech modulation spectra for robust speech recognition
- Nov.
- X. Xiao, E. S. Chng, and H. Li, "Normalization of the speech modulation spectra for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.16, no.8, pp. 1662-1674, Nov. 2008.
- (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.8 , pp. 1662-1674
- Xiao, X.¹ Chng, E.S.² Li, H.³

19
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- Apr.
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol.9, pp. 171-185, Apr. 1995.
- (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

20
- 0028419019
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
- Apr.
- J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol.2, no.2, pp. 291-298, Apr. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.2 , pp. 291-298
- Gauvain, J.L.¹ Lee, C.H.²

21
- 0027622731
- Cepstral parameter compensation for HMM recognition
- Jul.
- M. J. F. Gales and S. J. Young, "Cepstral parameter compensation for HMM recognition," Speech Commun., vol.12, pp. 231-239, Jul. 1993.
- (1993) Speech Commun. , vol.12 , pp. 231-239
- Gales, M.J.F.¹ Young, S.J.²

22
- 44849133030
- An ensemble modeling approach to joint characterization of speaker and speaking environments
- Antwerp, Belgium, Sep.
- Y. Tsao and C.-H. Lee, "An ensemble modeling approach to joint characterization of speaker and speaking environments," in Proc. Eurospeech '07, Antwerp, Belgium, Sep. 2007, pp. 1050-1053.
- (2007) Proc. Eurospeech '07 , pp. 1050-1053
- Tsao, Y.¹ Lee, C.-H.²

23
- 44849130443
- Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition
- Kyoto, Japan Dec.
- Y. Tsao and C.-H. Lee, "Two extensions to ensemble speaker and speaking environment modeling for robust automatic speech recognition," in Proc. ASRU'07, Kyoto, Japan, Dec. 2007, pp. 77-80.
- (2007) Proc. ASRU'07 , pp. 77-80
- Tsao, Y.¹ Lee, C.-H.²

24
- 27644486095
- A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition
- Oct.
- Y. Gong, "A method of joint compensation of additive and convolutive distortions for speaker-independent speech recognition," IEEE Trans. Speech Audio Process., vol.13, no.5, pp. 975-983, Oct. 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 975-983
- Gong, Y.¹

25
- 44849125798
- High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series
- Kyoto, Japan Dec.
- J. Li, L. Deng, D. Yu, Y. Gong, and A. Acero, "High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series," in Proc. ASRU'07, Kyoto, Japan, Dec. 2007, pp. 65-70.
- (2007) Proc. ASRU'07 , pp. 65-70
- Li, J.¹ Deng, L.² Yu, D.³ Gong, Y.⁴ Acero, A.⁵

26
- 0003450542
- New York: Springer Verlag
- V. N. Vapnik, The Nature of Statistical Learning Theory. New York: Springer Verlag, 1995.
- (1995) The Nature of Statistical Learning Theory
- Vapnik, V.N.¹

27
- 0026982122
- Discriminative learning for minimum error classification
- Dec.
- B.-H. Juang and S. Katagiri, "Discriminative learning for minimum error classification," IEEE Trans. Signal Process., vol.40, no.12, pp. 3043-3054, Dec. 1992.
- (1992) IEEE Trans. Signal Process. , vol.40 , Issue.12 , pp. 3043-3054
- Juang, B.-H.¹ Katagiri, S.²

28
- 0031139839
- Minimum classification error rate methods for speech recognition
- May
- B.-H. Juang, W. Chou, and C.-H. Lee, "Minimum classification error rate methods for speech recognition," IEEE Trans. Speech Audio Process., vol.5, no.3, pp. 257-265, May 1997.
- (1997) IEEE Trans. Speech Audio Process. , vol.5 , Issue.3 , pp. 257-265
- Juang, B.-H.¹ Chou, W.² Lee, C.-H.³

29
- 0022890536
- Maximum mutual information estimation of hidden Markov model parameters for speech recognition
- Tokyo, Japan Apr.
- L. R. Bahi, P. F. Brown, P. V. De Souza, and R. L. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," in Proc. ICASSP'86, Tokyo, Japan, Apr. 1986, pp. 49-52.
- (1986) Proc. ICASSP'86 , pp. 49-52
- Bahi, L.R.¹ Brown, P.F.² De Souza, P.V.³ Mercer, R.L.⁴

30
- 0031222490
- MMIE training of large vocabulary recognition systems
- V. Valtchev, J. J. Odell1, P. C. Woodland, and S. J. Young, "MMIE training of large vocabulary recognition systems," Speech Commun., vol.22, no.4, pp. 303-314, 1997.
- (1997) Speech Commun. , vol.22 , Issue.4 , pp. 303-314
- Valtchev, V.¹ Odell, J.J.² Woodland, P.C.³ Young, S.J.⁴

31
- 4544265717
- Ph.D. dissertation, Univ. of Cambridge, Cambridge, U.K.
- D. Povey, "Discriminative training for large vocabulary speech recognition," Ph.D. dissertation, Univ. of Cambridge, Cambridge, U.K., 2003.
- (2003) Discriminative Training for Large Vocabulary Speech Recognition
- Povey, D.¹

32
- 44849090158
- An environment compensated minimum classification error training approach based on stochastic vector mapping
- Nov.
- J. Wu and Q. Huo, "An environment compensated minimum classification error training approach based on stochastic vector mapping," IEEE Trans. Audio, Speech, Lang. Process., vol.14, no.6, pp. 2147-2155, Nov. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.6 , pp. 2147-2155
- Wu, J.¹ Huo, Q.²

33
- 0742272653
- Discriminative auditory-based features for robust speech recognition
- Jan.
- B.-W. Mak, Y.-C. Tam, and P. Li, "Discriminative auditory-based features for robust speech recognition," IEEE Trans. Speech Audio Process., vol.12, no.1, pp. 27-36, Jan. 2004.
- (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.1 , pp. 27-36
- Mak, B.-W.¹ Tam, Y.-C.² Li, P.³

34
- 33745216251
- Maximum mutual information SPLICE transform for seen and unseen conditions
- Lisbon, Portugal, Sep.
- J. Droppo and A. Acero, "Maximum mutual information SPLICE transform for seen and unseen conditions," in Proc. Interspeech'05, Lisbon, Portugal, Sep. 2005, pp. 989-992.
- (2005) Proc. Interspeech'05 , pp. 989-992
- Droppo, J.¹ Acero, A.²

35
- 33947686745
- Large margin gaussian mixture modeling for phonetic classification and recognition
- May
- F. Sha and L. Saul, "Large margin gaussian mixture modeling for phonetic classification and recognition," in Proc. ICASSP'06, May 2006, vol.1, pp. I-I.
- (2006) Proc. ICASSP'06 , vol.1
- Sha, F.¹ Saul, L.²

36
- 84864038630
- Large margin hidden Markov models for automatic speech recognition
- B. Schölkopf, J. Platt, and T. Hofmann, Eds. Cambridge, MA: MIT Press
- F. Sha and L. K. Saul, "Large margin hidden Markov models for automatic speech recognition," in Advances in Neural Information Processing Systems, B. Schölkopf, J. Platt, and T. Hofmann, Eds. Cambridge, MA: MIT Press, 2007, vol.19.
- (2007) Advances in Neural Information Processing Systems , vol.19
- Sha, F.¹ Saul, L.K.²

37
- 34047115134
- Large margin hidden Markov models for speech recognition
- Oct.
- H. Jiang, X. Li, and C. Liu, "Large margin hidden Markov models for speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol.14, no.5, pp. 1584-1595, Oct. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.5 , pp. 1584-1595
- Jiang, H.¹ Li, X.² Liu, C.³

38
- 64149098818
- Approximate test risk bound minimization through soft margin estimation
- Nov.
- J. Li, M. Yuan, and C.-H. Lee, "Approximate test risk bound minimization through soft margin estimation,"IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.8, pp. 2393-2404, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2393-2404
- Li, J.¹ Yuan, M.² Lee, C.-H.³

39
- 84867216710
- On a generalization of margin-based discriminative training to robust speech recognition
- Brisbane, Australia, Sep.
- J. Li and C.-H. Lee, "On a generalization of margin-based discriminative training to robust speech recognition," in Proc. Interspeech'08, Brisbane, Australia, Sep. 2008.
- (2008) Proc. Interspeech'08
- Li, J.¹ Lee, C.-H.²

40
- 84987702417
- The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Beijing, China Oct.
- D. Pearce and H.-G. Hirsch, "The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. ICSLP'00, Beijing, China, Oct. 2000, vol.4, pp. 29-32.
- (2000) Proc. ICSLP'00 , vol.4 , pp. 29-32
- Pearce, D.¹ Hirsch, H.-G.²

41
- 0004319970
- Ph.D. dissertation, Dept. Elec. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA
- A. Acero, "Acoustical and environmental robustness in automatic speech recognition," Ph.D. dissertation, Dept. Elec. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, 1990.
- (1990) Acoustical and Environmental Robustness in Automatic Speech Recognition
- Acero, A.¹

42
- 65549153550
- Ph.D. dissertation, Dept. Elec. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA
- P. J. Moreno, "Speech Recognition in Noisy Environments," Ph.D. dissertation, Dept. Elec. Comput. Eng., Carnegie Mellon Univ., Pittsburgh, PA, 1996.
- (1996) Speech Recognition in Noisy Environments
- Moreno, P.J.¹

43
- 0442270736
- Aurora document no. AU/255/00
- Baseline Results for Subsect of SpeechDat-Car Finnish Database for ETSI STQ WI008 Advance Front End Evaluation Nokia, 2000, Aurora document no. AU/255/00.
- (2000) Baseline Results for Subsect of SpeechDat-Car Finnish Database for ETSI STQ WI008 Advance Front End Evaluation Nokia

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.