메뉴 건너뛰기




Volumn 10, Issue 5, 2008, Pages 767-779

Robust audio-visual speech recognition based on late integration

Author keywords

Audio visual speech recognition; Hidden Markov model; Interframe correlation; Late integration; Neural network; Robustness; Stochastic optimization

Indexed keywords

ACOUSTICS; ARTIFICIAL INTELLIGENCE; AUDIO ACOUSTICS; HIDDEN MARKOV MODELS; MARKOV PROCESSES; NEURAL NETWORKS; SPEECH; SPEECH ANALYSIS; STOCHASTIC MODELS;

EID: 47649103796     PISSN: 15209210     EISSN: None     Source Type: Journal    
DOI: 10.1109/TMM.2008.922789     Document Type: Article
Times cited : (52)

References (41)
  • 1
    • 0021541159 scopus 로고
    • Automatic lipreading to enhance speech recognition
    • Atlanta, GA, Nov
    • E. D. Petajan, "Automatic lipreading to enhance speech recognition," in Proc. Global Telecommunications Conf., Atlanta, GA, Nov. 1984, pp. 265-272.
    • (1984) Proc. Global Telecommunications Conf , pp. 265-272
    • Petajan, E.D.1
  • 2
    • 0036502797 scopus 로고    scopus 로고
    • A review of speechbased bimodal recognition
    • Mar
    • C. C. Chibelushi, F. Deravi, and J. S. D. Mason, "A review of speechbased bimodal recognition," IEEE Trans. Multimedia, vol. 4, no. 1, pp. 23-37, Mar. 2002.
    • (2002) IEEE Trans. Multimedia , vol.4 , Issue.1 , pp. 23-37
    • Chibelushi, C.C.1    Deravi, F.2    Mason, J.S.D.3
  • 3
    • 0030830419 scopus 로고    scopus 로고
    • Sensor fusion potential exploitation: Innovative archi-tectures and illustrative applications
    • Jan
    • B. V. Dasarathy, "Sensor fusion potential exploitation: Innovative archi-tectures and illustrative applications," Proc. IEEE, vol. 85, pp. 24-38, Jan. 1997.
    • (1997) Proc. IEEE , vol.85 , pp. 24-38
    • Dasarathy, B.V.1
  • 5
    • 34548139784 scopus 로고    scopus 로고
    • Training hidden Markov models by hybrid simulated annealing for visual speech recognition
    • Taipei, Taiwan, R.O.C, Oct
    • J.-S. Lee and C. H. Park, "Training hidden Markov models by hybrid simulated annealing for visual speech recognition," in Proc. IEEE Int. Conf. Systems, Man, Cybernetics, Taipei, Taiwan, R.O.C., Oct. 2006, pp. 198-202.
    • (2006) Proc. IEEE Int. Conf. Systems, Man, Cybernetics , pp. 198-202
    • Lee, J.-S.1    Park, C.H.2
  • 7
    • 0027957839 scopus 로고
    • Effect of temporal envelope smearing on speech reception
    • Feb
    • R. Drullman, J. M. Festen, and R. Plomp, "Effect of temporal envelope smearing on speech reception," J. Acoust. Soc. Amer., vol. 95, no. 2, pp. 1053-1064, Feb. 1994.
    • (1994) J. Acoust. Soc. Amer , vol.95 , Issue.2 , pp. 1053-1064
    • Drullman, R.1    Festen, J.M.2    Plomp, R.3
  • 8
    • 84892184580 scopus 로고    scopus 로고
    • Speech intelligibility in the presence of cross-channel spectral asynchrony
    • Seattle, WA
    • T. Arai and S. Greenberg, "Speech intelligibility in the presence of cross-channel spectral asynchrony," in Proc. ICASSP, Seattle, WA, 1998, vol. 2, pp. 933-936.
    • (1998) Proc. ICASSP , vol.2 , pp. 933-936
    • Arai, T.1    Greenberg, S.2
  • 9
    • 0022667694 scopus 로고
    • Speaker-independent isolated word recognition using dynamic features of speech spectrum
    • Feb
    • S. Furai, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Process., vol. 34, no. 1, pp. 52-59, Feb. 1986.
    • (1986) IEEE Trans. Acoust., Speech, Signal Process , vol.34 , Issue.1 , pp. 52-59
    • Furai, S.1
  • 12
    • 45949121309 scopus 로고
    • Fast simulated annealing
    • June
    • H. H. Szu and R. L. Hartley, "Fast simulated annealing," Phys. Lett. A, vol. 122, no. 3-4, pp. 157-162, June 1987.
    • (1987) Phys. Lett. A , vol.122 , Issue.3-4 , pp. 157-162
    • Szu, H.H.1    Hartley, R.L.2
  • 13
    • 0022227186 scopus 로고
    • Training of HMM recognizers by simulated annealing
    • Tampa, FL, Mar
    • D. Paul, "Training of HMM recognizers by simulated annealing," in Proc. ICASSP, Tampa, FL, Mar. 1985, pp. 13-16.
    • (1985) Proc. ICASSP , pp. 13-16
    • Paul, D.1
  • 15
    • 10444288769 scopus 로고    scopus 로고
    • n-dimensional Cauchy neighbor generation for the fast simulated annealing
    • Nov
    • D. Nam, J.-S. Lee, and C. H. Park, "n-dimensional Cauchy neighbor generation for the fast simulated annealing," IEICE Trans. Inf. Syst., vol. E87-D, no. 11, pp. 2499-2502, Nov. 2004.
    • (2004) IEICE Trans. Inf. Syst , vol.E87-D , Issue.11 , pp. 2499-2502
    • Nam, D.1    Lee, J.-S.2    Park, C.H.3
  • 17
    • 47649094767 scopus 로고    scopus 로고
    • Audio-Visual Speech Recognition: Stochastic Optimization of Hidden Markov Models, Modeling of Interframe Correlations and Integration With Neural Networks,
    • Ph.D. dissertation, Dept. Elect. Eng. Comput. Science, KAIST, Daejeon, Korea
    • J.-S. Lee, "Audio-Visual Speech Recognition: Stochastic Optimization of Hidden Markov Models, Modeling of Interframe Correlations and Integration With Neural Networks," Ph.D. dissertation, Dept. Elect. Eng. Comput. Science, KAIST, Daejeon, Korea, 2006.
    • (2006)
    • Lee, J.-S.1
  • 18
    • 0041568115 scopus 로고
    • Schur complements and statistics
    • Mar
    • D. V. Ouellette, "Schur complements and statistics," Linear Algebra Appl., vol. 36, pp. 187-295, Mar. 1981.
    • (1981) Linear Algebra Appl , vol.36 , pp. 187-295
    • Ouellette, D.V.1
  • 20
    • 0026368826 scopus 로고
    • Regression features for recognition of speech in quiet and in noise
    • Toronto, ON, Canada, Apr
    • T. H. Applebaum and B. A. Hanson, "Regression features for recognition of speech in quiet and in noise," in Proc. ICASSP, Toronto, ON, Canada, Apr. 1991, vol. 2, pp. 985-988.
    • (1991) Proc. ICASSP , vol.2 , pp. 985-988
    • Applebaum, T.H.1    Hanson, B.A.2
  • 21
    • 0003408774 scopus 로고    scopus 로고
    • Natick, MA: The Mathworks, Inc, The Mathworks
    • Optimization Toolbox User's Guide. Natick, MA: The Mathworks, Inc., 2005, The Mathworks.
    • (2005) Optimization Toolbox User's Guide
  • 23
    • 34247172408 scopus 로고    scopus 로고
    • Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments
    • L. A. Ross, D. Saint-Amour, V. M. Leavitt, D. C. Javitt, and J. J. Foxe, "Do you see what I am saying? Exploring visual enhancement of speech comprehension in noisy environments," Cerebral Cortex vol. 17, no. 5, pp. 1147-1153, 2007.
    • (2007) Cerebral Cortex , vol.17 , Issue.5 , pp. 1147-1153
    • Ross, L.A.1    Saint-Amour, D.2    Leavitt, V.M.3    Javitt, D.C.4    Foxe, J.J.5
  • 24
    • 0035347346 scopus 로고    scopus 로고
    • Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact
    • P. Arnold and F. Hill, "Bisensory augmentation: A speechreading advantage when speech is clearly audible and intact," Brit. J. Psychol., vol. 92, pp. 339-355, 2001.
    • (2001) Brit. J. Psychol , vol.92 , pp. 339-355
    • Arnold, P.1    Hill, F.2
  • 25
    • 34047262788 scopus 로고    scopus 로고
    • The intrinsic bimodality of speech communication and the synthesis of talking faces
    • C. Benoît, M. M. Taylor, F. Nel, and D. Bouwhuis, Eds, Amsterdam, The Netherlands: John Benjamins
    • C. Benoît, , M. M. Taylor, F. Nel, and D. Bouwhuis, Eds., "The intrinsic bimodality of speech communication and the synthesis of talking faces," in The Structure of Multimodal Dialogue II. Amsterdam, The Netherlands: John Benjamins, 2000, pp. 485-502.
    • (2000) The Structure of Multimodal Dialogue II , pp. 485-502
  • 26
    • 33745102745 scopus 로고    scopus 로고
    • Auditory-visual speech perception and synchrony detection for speech and nonspeech signals
    • June
    • B. Conrey and D. B. Pisoni, "Auditory-visual speech perception and synchrony detection for speech and nonspeech signals," J. Acoust. Soc. Amer., vol. 119, no. 6, pp. 4065-4073, June 2006.
    • (2006) J. Acoust. Soc. Amer , vol.119 , Issue.6 , pp. 4065-4073
    • Conrey, B.1    Pisoni, D.B.2
  • 27
    • 0036874527 scopus 로고    scopus 로고
    • Noise adaptive stream weighting in audio-visual speech recognition
    • M. Heckmann, F. Berthommier, and K. Kroschel, "Noise adaptive stream weighting in audio-visual speech recognition," EURASIP J. Appl. Signal Process., vol. 11, pp. 1260-1273, 2002.
    • (2002) EURASIP J. Appl. Signal Process , vol.11 , pp. 1260-1273
    • Heckmann, M.1    Berthommier, F.2    Kroschel, K.3
  • 28
    • 34547497793 scopus 로고    scopus 로고
    • Dynamic stream weight modeling for audio-visual speech recognition
    • Honolulu, HI, Apr
    • E. Marcheret, V. Libal, and G. Potamianos, "Dynamic stream weight modeling for audio-visual speech recognition," in Proc. ICASSP, Honolulu, HI, Apr. 2007, vol. 4, pp. 945-948.
    • (2007) Proc. ICASSP , vol.4 , pp. 945-948
    • Marcheret, E.1    Libal, V.2    Potamianos, G.3
  • 29
    • 0032180188 scopus 로고    scopus 로고
    • Adaptive fusion of acoustic and visual sources for automatic speech recognition
    • Oct
    • A. Rogozan and P. Deléglise, "Adaptive fusion of acoustic and visual sources for automatic speech recognition," Speech Commun., vol. 26, no. 1-2, pp. 149-161, Oct. 1998.
    • (1998) Speech Commun , vol.26 , Issue.1-2 , pp. 149-161
    • Rogozan, A.1    Deléglise, P.2
  • 30
    • 28444493889 scopus 로고    scopus 로고
    • Sensor fusion weighting measures in audio-visual speech recognition
    • Dunedin, New Zealand
    • T. W. Lewis and D. M. W. Powers, "Sensor fusion weighting measures in audio-visual speech recognition," in Proc. 27th Conf. Australasian Computer Science, Dunedin, New Zealand, 2004, pp. 305-314.
    • (2004) Proc. 27th Conf. Australasian Computer Science , pp. 305-314
    • Lewis, T.W.1    Powers, D.M.W.2
  • 31
    • 34047263009 scopus 로고    scopus 로고
    • Visual model structures and synchrony constraints for audio-visual speech recognition
    • May
    • T. J. Hazen, "Visual model structures and synchrony constraints for audio-visual speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 3, pp. 1082-1089, May 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process , vol.14 , Issue.3 , pp. 1082-1089
    • Hazen, T.J.1
  • 33
    • 1842854571 scopus 로고    scopus 로고
    • Continuous audiovisual digit recognition using N-best decision fusion
    • June
    • G. F. Meyer, J. B. Mulligan, and S. M. Wuerger, "Continuous audiovisual digit recognition using N-best decision fusion," Inform. Fusion, vol. 5, no. 2, pp. 91-101, June 2004.
    • (2004) Inform. Fusion , vol.5 , Issue.2 , pp. 91-101
    • Meyer, G.F.1    Mulligan, J.B.2    Wuerger, S.M.3
  • 34
    • 33646814706 scopus 로고    scopus 로고
    • A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization
    • Philadelphia, PA, Mar
    • S. Tamura, K. Iwano, and S. Furui, "A stream-weight optimization method for multi-stream HMMs based on likelihood value normalization," in Proc. ICASSP, Philadelphia, PA, Mar. 2005, vol. 1, pp. 469-472.
    • (2005) Proc. ICASSP , vol.1 , pp. 469-472
    • Tamura, S.1    Iwano, K.2    Furui, S.3
  • 35
    • 0001432664 scopus 로고    scopus 로고
    • On the integration of auditory and visual parameters in an HMM-based ASR
    • A. Adjoudani and C. Benoǐt, D. G. Stork and M. E. Hennecke, Eds, Speechreading by Humans and Machines: Models, Systems and Applications, Berlin, Germany: Springer
    • A. Adjoudani and C. Benoǐt, , D. G. Stork and M. E. Hennecke, Eds., "On the integration of auditory and visual parameters in an HMM-based ASR," in Speechreading by Humans and Machines: Models, Systems and Applications, ser. NATO ASI Series. Berlin, Germany: Springer, 1996, pp. 461-472.
    • (1996) ser. NATO ASI Series , pp. 461-472
  • 38
    • 0027623210 scopus 로고
    • Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
    • A. Varga and H. J. M. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, no. 3, pp. 247-251, 1993.
    • (1993) Speech Commun , vol.12 , Issue.3 , pp. 247-251
    • Varga, A.1    Steeneken, H.J.M.2
  • 39
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modeling for continuous speech recognition
    • Sep
    • S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 40
    • 84880887921 scopus 로고    scopus 로고
    • Multimodal integration - A biological view
    • Seattle, WA
    • M. H. Coen, "Multimodal integration - A biological view," in Proc. Int. Joint Conf. Artificial Intelligence, Seattle, WA, 2001, pp. 1417-1424.
    • (2001) Proc. Int. Joint Conf. Artificial Intelligence , pp. 1417-1424
    • Coen, M.H.1
  • 41
    • 2342451199 scopus 로고    scopus 로고
    • Multimedia content processing through cross-modal association
    • Berkeley, CA, Nov
    • D. Li, N. Dimitrova, M. Li, and I. K. Sethi, "Multimedia content processing through cross-modal association," in Proc. ACM Int. Conf. Multimedia, Berkeley, CA, Nov. 2003, pp. 604-611.
    • (2003) Proc. ACM Int. Conf. Multimedia , pp. 604-611
    • Li, D.1    Dimitrova, N.2    Li, M.3    Sethi, I.K.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.