SCOPUS 정보 검색 플랫폼

Volumn 10, Issue 5, 2008, Pages 767-779

Robust audio-visual speech recognition based on late integration

(2) Lee, Jong Seok a,b Park, Cheol Hoon a,b

a IEEE

b Korea Advanced Institute of Science and Technology (KAIST) (South Korea)

Author keywords

Audio visual speech recognition; Hidden Markov model; Interframe correlation; Late integration; Neural network; Robustness; Stochastic optimization

Indexed keywords

ACOUSTICS; ARTIFICIAL INTELLIGENCE; AUDIO ACOUSTICS; HIDDEN MARKOV MODELS; MARKOV PROCESSES; NEURAL NETWORKS; SPEECH; SPEECH ANALYSIS; STOCHASTIC MODELS;

AUDIO VISUAL SPEECH RECOGNITION (AVSR); CONVENTIONAL SYSTEMS; DYNAMIC CHARACTERISTICS; INTEGRATION SCHEMES; ISOLATED WORD RECOGNITION; MARKOV MODELLING; NOISE CONDITIONS; NOISY ENVIRONMENTS; PRIORI KNOWLEDGE; ROBUST RECOGNITION; SPEECH RECOGNIZERS; STOCHASTIC OPTIMIZATIONS; VISUAL SIGNALS;

SPEECH RECOGNITION;

EID: 47649103796 PISSN: 15209210 EISSN: None Source Type: Journal
DOI: 10.1109/TMM.2008.922789 Document Type: Article

Times cited : (52)

References (41)

1
- 0021541159
- Automatic lipreading to enhance speech recognition
- Atlanta, GA, Nov
- E. D. Petajan, "Automatic lipreading to enhance speech recognition," in Proc. Global Telecommunications Conf., Atlanta, GA, Nov. 1984, pp. 265-272.
- (1984) Proc. Global Telecommunications Conf , pp. 265-272
- Petajan, E.D.¹

2
- 0036502797
- A review of speechbased bimodal recognition
- Mar
- C. C. Chibelushi, F. Deravi, and J. S. D. Mason, "A review of speechbased bimodal recognition," IEEE Trans. Multimedia, vol. 4, no. 1, pp. 23-37, Mar. 2002.
- (2002) IEEE Trans. Multimedia , vol.4 , Issue.1 , pp. 23-37
- Chibelushi, C.C.¹ Deravi, F.² Mason, J.S.D.³

3
- 0030830419
- Sensor fusion potential exploitation: Innovative archi-tectures and illustrative applications
- Jan
- B. V. Dasarathy, "Sensor fusion potential exploitation: Innovative archi-tectures and illustrative applications," Proc. IEEE, vol. 85, pp. 24-38, Jan. 1997.
- (1997) Proc. IEEE , vol.85 , pp. 24-38
- Dasarathy, B.V.¹

4
- 0003966402
- Hillsdale, NJ: Lawrence Erlbaum
- D. W. Massaro, Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Hillsdale, NJ: Lawrence Erlbaum, 1987.
- (1987) Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry
- Massaro, D.W.¹

5
- 34548139784
- Training hidden Markov models by hybrid simulated annealing for visual speech recognition
- Taipei, Taiwan, R.O.C, Oct
- J.-S. Lee and C. H. Park, "Training hidden Markov models by hybrid simulated annealing for visual speech recognition," in Proc. IEEE Int. Conf. Systems, Man, Cybernetics, Taipei, Taiwan, R.O.C., Oct. 2006, pp. 198-202.
- (2006) Proc. IEEE Int. Conf. Systems, Man, Cybernetics , pp. 198-202
- Lee, J.-S.¹ Park, C.H.²

6
- 0004056285
- Upper Saddle River, NJ: Prentice Hall
- X. Huang, A. Acero, and H.-W. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development., Upper Saddle River, NJ: Prentice Hall, 2001.
- (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
- Huang, X.¹ Acero, A.² Hon, H.-W.³

7
- 0027957839
- Effect of temporal envelope smearing on speech reception
- Feb
- R. Drullman, J. M. Festen, and R. Plomp, "Effect of temporal envelope smearing on speech reception," J. Acoust. Soc. Amer., vol. 95, no. 2, pp. 1053-1064, Feb. 1994.
- (1994) J. Acoust. Soc. Amer , vol.95 , Issue.2 , pp. 1053-1064
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

8
- 84892184580
- Speech intelligibility in the presence of cross-channel spectral asynchrony
- Seattle, WA
- T. Arai and S. Greenberg, "Speech intelligibility in the presence of cross-channel spectral asynchrony," in Proc. ICASSP, Seattle, WA, 1998, vol. 2, pp. 933-936.
- (1998) Proc. ICASSP , vol.2 , pp. 933-936
- Arai, T.¹ Greenberg, S.²

9
- 0022667694
- Speaker-independent isolated word recognition using dynamic features of speech spectrum
- Feb
- S. Furai, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Process., vol. 34, no. 1, pp. 52-59, Feb. 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Process , vol.34 , Issue.1 , pp. 52-59
- Furai, S.¹

10
- 0028517164
- RASTA processing of speech
- H. Hermansky and N. Morgan. "RASTA processing of speech," IEEE Trans. Speech Audio Processing, vol. 2. no. 4, pp. 578-589, 1994.
- (1994) IEEE Trans. Speech Audio Processing , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

11
- 33846242179
- Focused state transition information in ASR
- San Juan, PR, Nov
- C. Bartels and J. Bilmes. "Focused state transition information in ASR," in Proc. Workshop on Automatic Speech Recognition and Understanding, San Juan, PR, Nov. 2005, pp. 191-196.
- (2005) Proc. Workshop on Automatic Speech Recognition and Understanding , pp. 191-196
- Bartels, C.¹ Bilmes, J.²

12
- 45949121309
- Fast simulated annealing
- June
- H. H. Szu and R. L. Hartley, "Fast simulated annealing," Phys. Lett. A, vol. 122, no. 3-4, pp. 157-162, June 1987.
- (1987) Phys. Lett. A , vol.122 , Issue.3-4 , pp. 157-162
- Szu, H.H.¹ Hartley, R.L.²

13
- 0022227186
- Training of HMM recognizers by simulated annealing
- Tampa, FL, Mar
- D. Paul, "Training of HMM recognizers by simulated annealing," in Proc. ICASSP, Tampa, FL, Mar. 1985, pp. 13-16.
- (1985) Proc. ICASSP , pp. 13-16
- Paul, D.¹

14
- 0029174347
- Multiple alignment using hidden Markov models
- Menlo Park, CA
- S. R. Eddy, "Multiple alignment using hidden Markov models," in Proc. Int. Conf. Intelligent Systems for Molecular Biology, Menlo Park, CA, 1995, pp. 114-120.
- (1995) Proc. Int. Conf. Intelligent Systems for Molecular Biology , pp. 114-120
- Eddy, S.R.¹

15
- 10444288769
- n-dimensional Cauchy neighbor generation for the fast simulated annealing
- Nov
- D. Nam, J.-S. Lee, and C. H. Park, "n-dimensional Cauchy neighbor generation for the fast simulated annealing," IEICE Trans. Inf. Syst., vol. E87-D, no. 11, pp. 2499-2502, Nov. 2004.
- (2004) IEICE Trans. Inf. Syst , vol.E87-D , Issue.11 , pp. 2499-2502
- Nam, D.¹ Lee, J.-S.² Park, C.H.³

16
- 5744249209
- Equation of state calculations by fast computing machines
- N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller, "Equation of state calculations by fast computing machines," J. Chem. Phys., vol. 21, no. 6, pp. 1087-1092, 1953.
- (1953) J. Chem. Phys , vol.21 , Issue.6 , pp. 1087-1092
- Metropolis, N.¹ Rosenbluth, A.W.² Rosenbluth, M.N.³ Teller, A.H.⁴ Teller, E.⁵

17
- 47649094767
- Audio-Visual Speech Recognition: Stochastic Optimization of Hidden Markov Models, Modeling of Interframe Correlations and Integration With Neural Networks,
- Ph.D. dissertation, Dept. Elect. Eng. Comput. Science, KAIST, Daejeon, Korea
- J.-S. Lee, "Audio-Visual Speech Recognition: Stochastic Optimization of Hidden Markov Models, Modeling of Interframe Correlations and Integration With Neural Networks," Ph.D. dissertation, Dept. Elect. Eng. Comput. Science, KAIST, Daejeon, Korea, 2006.
- (2006)
- Lee, J.-S.¹

18
- 0041568115
- Schur complements and statistics
- Mar
- D. V. Ouellette, "Schur complements and statistics," Linear Algebra Appl., vol. 36, pp. 187-295, Mar. 1981.
- (1981) Linear Algebra Appl , vol.36 , pp. 187-295
- Ouellette, D.V.¹

19
- 0003663467
- 3rd ed. New York: McGraw-Hill
- A. Papoulis, Probability, Random Variables, and Stochastic Processes, 3rd ed. New York: McGraw-Hill, 1991.
- (1991) Probability, Random Variables, and Stochastic Processes
- Papoulis, A.¹

20
- 0026368826
- Regression features for recognition of speech in quiet and in noise
- Toronto, ON, Canada, Apr
- T. H. Applebaum and B. A. Hanson, "Regression features for recognition of speech in quiet and in noise," in Proc. ICASSP, Toronto, ON, Canada, Apr. 1991, vol. 2, pp. 985-988.
- (1991) Proc. ICASSP , vol.2 , pp. 985-988
- Applebaum, T.H.¹ Hanson, B.A.²

21
- 0003408774
- Natick, MA: The Mathworks, Inc, The Mathworks
- Optimization Toolbox User's Guide. Natick, MA: The Mathworks, Inc., 2005, The Mathworks.
- (2005) Optimization Toolbox User's Guide

22
- 0003806707
- Upper Saddle River, NJ: Prentice-Hall
- A. D. Belegundu and T. R. Chandrupatla, Optimization Concepts and Applications in Engineering. Upper Saddle River, NJ: Prentice-Hall, 1999.
- (1999) Optimization Concepts and Applications in Engineering
- Belegundu, A.D.¹ Chandrupatla, T.R.²

23
- 34247172408
- Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments
- L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, "Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments," Cerebral Cortex vol. 17, no. 5, pp. 1147-1153, 2007.
- (2007) Cerebral Cortex , vol.17 , Issue.5 , pp. 1147-1153
- Ross, L.A.¹ Saint-Amour, D.² Leavitt, V.M.³ Javitt, D.C.⁴ Foxe, J.J.⁵

24
- 0035347346
- Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact
- P. Arnold and F. Hill, "Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact," Brit. J. Psychol., vol. 92, pp. 339-355, 2001.
- (2001) Brit. J. Psychol , vol.92 , pp. 339-355
- Arnold, P.¹ Hill, F.²

25
- 34047262788
- The intrinsic bimodality of speech communication and the synthesis of talking faces
- C. Benoît, M. M. Taylor, F. Nel, and D. Bouwhuis, Eds, Amsterdam, The Netherlands: John Benjamins
- C. Benoît, , M. M. Taylor, F. Nel, and D. Bouwhuis, Eds., "The intrinsic bimodality of speech communication and the synthesis of talking faces," in The Structure of Multimodal Dialogue II. Amsterdam, The Netherlands: John Benjamins, 2000, pp. 485-502.
- (2000) The Structure of Multimodal Dialogue II , pp. 485-502

26
- 33745102745
- Auditory-visual speech perception and synchrony detection for speech and nonspeech signals
- June
- B. Conrey and D. B. Pisoni, "Auditory-visual speech perception and synchrony detection for speech and nonspeech signals," J. Acoust. Soc. Amer., vol. 119, no. 6, pp. 4065-4073, June 2006.
- (2006) J. Acoust. Soc. Amer , vol.119 , Issue.6 , pp. 4065-4073
- Conrey, B.¹ Pisoni, D.B.²

27
- 0036874527
- Noise adaptive stream weighting in audio-visual speech recognition
- M. Heckmann, F. Berthommier, and K. Kroschel, "Noise adaptive stream weighting in audio-visual speech recognition," EURASIP J. Appl. Signal Process., vol. 11, pp. 1260-1273, 2002.
- (2002) EURASIP J. Appl. Signal Process , vol.11 , pp. 1260-1273
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

28
- 34547497793
- Dynamic stream weight modeling for audio-visual speech recognition
- Honolulu, HI, Apr
- E. Marcheret, V. Libal, and G. Potamianos, "Dynamic stream weight modeling for audio-visual speech recognition," in Proc. ICASSP, Honolulu, HI, Apr. 2007, vol. 4, pp. 945-948.
- (2007) Proc. ICASSP , vol.4 , pp. 945-948
- Marcheret, E.¹ Libal, V.² Potamianos, G.³

29
- 0032180188
- Adaptive fusion of acoustic and visual sources for automatic speech recognition
- Oct
- A. Rogozan and P. Deléglise, "Adaptive fusion of acoustic and visual sources for automatic speech recognition," Speech Commun., vol. 26, no. 1-2, pp. 149-161, Oct. 1998.
- (1998) Speech Commun , vol.26 , Issue.1-2 , pp. 149-161
- Rogozan, A.¹ Deléglise, P.²

30
- 28444493889
- Sensor fusion weighting measures in audio-visual speech recognition
- Dunedin, New Zealand
- T. W. Lewis and D. M. W. Powers, "Sensor fusion weighting measures in audio-visual speech recognition," in Proc. 27th Conf. Australasian Computer Science, Dunedin, New Zealand, 2004, pp. 305-314.
- (2004) Proc. 27th Conf. Australasian Computer Science , pp. 305-314
- Lewis, T.W.¹ Powers, D.M.W.²

31
- 34047263009
- Visual model structures and synchrony constraints for audio-visual speech recognition
- May
- T. J. Hazen, "Visual model structures and synchrony constraints for audio-visual speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 3, pp. 1082-1089, May 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process , vol.14 , Issue.3 , pp. 1082-1089
- Hazen, T.J.¹

32
- 0042954451
- Late integration in audiovisual continuous speech recognition
- Keystone, CO, Dec
- A. Verma, T. Faruquie, C. Neti, and S. Basu, "Late integration in audiovisual continuous speech recognition," in Proc.Workshop on Automatic Speech Recognition and Understanding, Keystone, CO, Dec. 1999, pp. 71-74.
- (1999) Proc.Workshop on Automatic Speech Recognition and Understanding , pp. 71-74
- Verma, A.¹ Faruquie, T.² Neti, C.³ Basu, S.⁴

33
- 1842854571
- Continuous audiovisual digit recognition using N-best decision fusion
- June
- G. F. Meyer, J. B. Mulligan, and S. M. Wuerger, "Continuous audiovisual digit recognition using N-best decision fusion," Inform. Fusion, vol. 5, no. 2, pp. 91-101, June 2004.
- (2004) Inform. Fusion , vol.5 , Issue.2 , pp. 91-101
- Meyer, G.F.¹ Mulligan, J.B.² Wuerger, S.M.³

34
- 33646814706
- A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization
- Philadelphia, PA, Mar
- S. Tamura, K. Iwano, and S. Furui, "A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization," in Proc. ICASSP, Philadelphia, PA, Mar. 2005, vol. 1, pp. 469-472.
- (2005) Proc. ICASSP , vol.1 , pp. 469-472
- Tamura, S.¹ Iwano, K.² Furui, S.³

35
- 0001432664
- On the integration of auditory and visual parameters in an HMM-based ASR
- A. Adjoudani and C. Benoǐt, D. G. Stork and M. E. Hennecke, Eds, Speechreading by Humans and Machines: Models, Systems and Applications, Berlin, Germany: Springer
- A. Adjoudani and C. Benoǐt, , D. G. Stork and M. E. Hennecke, Eds., "On the integration of auditory and visual parameters in an HMM-based ASR," in Speechreading by Humans and Machines: Models, Systems and Applications, ser. NATO ASI Series. Berlin, Germany: Springer, 1996, pp. 461-472.
- (1996) ser. NATO ASI Series , pp. 461-472

36
- 0003487601
- New York: Oxford Univ. Press
- C. M. Bishop, Neural Networks for Pattern Recognition. New York: Oxford Univ. Press, 1995.
- (1995) Neural Networks for Pattern Recognition
- Bishop, C.M.¹

37
- 0003413187
- Upper Saddle River, NJ: Prentice-Hall
- S. Haykin, Neural Networks: A Comprehensive Foundation. Upper Saddle River, NJ: Prentice-Hall, 1999.
- (1999) Neural Networks: A Comprehensive Foundation
- Haykin, S.¹

38
- 0027623210
- Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, no. 3, pp. 247-251, 1993.
- (1993) Speech Commun , vol.12 , Issue.3 , pp. 247-251
- Varga, A.¹ Steeneken, H.J.M.²

39
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Sep
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

40
- 84880887921
- Multimodal integration - A biological view
- Seattle, WA
- M. H. Coen, "Multimodal integration - A biological view," in Proc. Int. Joint Conf. Artificial Intelligence, Seattle, WA, 2001, pp. 1417-1424.
- (2001) Proc. Int. Joint Conf. Artificial Intelligence , pp. 1417-1424
- Coen, M.H.¹

41
- 2342451199
- Multimedia content processing through cross-modal association
- Berkeley, CA, Nov
- D. Li, N. Dimitrova, M. Li, and I. K. Sethi, "Multimedia content processing through cross-modal association," in Proc. ACM Int. Conf. Multimedia, Berkeley, CA, Nov. 2003, pp. 604-611.
- (2003) Proc. ACM Int. Conf. Multimedia , pp. 604-611
- Li, D.¹ Dimitrova, N.² Li, M.³ Sethi, I.K.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.