메뉴 건너뛰기




Volumn E86-D, Issue 3, 2003, Pages 454-463

Audio-visual speech recognition based on optimized product HMMs and GMM based-MCE-GPD stream weight estimation

Author keywords

Audio visual speech recognition; Bi modal; Generalized probabilistic descent (GPD); Minimum classification error (MCE); Stream weight

Indexed keywords

ALGORITHMS; AUDITION; COMPUTATIONAL METHODS; DATABASE SYSTEMS; ERROR ANALYSIS; IMAGE PROCESSING; MARKOV PROCESSES; OPTIMIZATION; PROBABILITY; VISION;

EID: 0038381727     PISSN: 09168532     EISSN: None     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (2)

References (29)
  • 1
    • 0032134085 scopus 로고    scopus 로고
    • Eye movement of perceivers during audio-visual speech intelligibility in noise
    • E. Vatikiotis-Bateson, I.M. Eigsti, S. Yano, and K. Munhall, "Eye movement of perceivers during audio-visual speech intelligibility in noise," Perception and Psychophysics, vol.60, no.6, pp.926-940, 1998.
    • (1998) Perception and Psychophysics , vol.60 , Issue.6 , pp. 926-940
    • Vatikiotis-Bateson, E.1    Eigsti, I.M.2    Yano, S.3    Munhall, K.4
  • 2
    • 0001048664 scopus 로고
    • Visual contribution to speech inteligibility in noise
    • March
    • W.H. Sumby and I. Pollack, "Visual contribution to speech inteligibility in noise," J. Acoust. Soc. Am., vol.26, pp.212-215, March 1954.
    • (1954) J. Acoust. Soc. Am. , vol.26 , pp. 212-215
    • Sumby, W.H.1    Pollack, I.2
  • 3
    • 0001055701 scopus 로고    scopus 로고
    • Which components of the face do humans and machines best speechread?
    • Speechreading by Humans and Machines: Models, Systems and Applications, Springer-Verlag
    • C. Benoit, T. Guiard Marigny, B. LeGoffand, and A. Adjoudani, "Which components of the face do humans and machines best speechread?," in Speechreading by Humans and Machines: Models, Systems and Applications, NATO ASI Series, pp.315-328, Springer-Verlag, 1996.
    • (1996) NATO ASI Series , pp. 315-328
    • Benoit, C.1    Guiard Marigny, T.2    Legoffand, B.3    Adjoudani, A.4
  • 6
    • 33646906672 scopus 로고    scopus 로고
    • Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual Synchronous database
    • S. Nakamura, R. Nagai, and K. Shikano, "Improved bimodal speech recognition using tied-mixture HMMs and 5000 word Audio-Visual Synchronous database," Proc. EUROSPEECH'97, pp.1623-1626, 1997.
    • (1997) Proc. EUROSPEECH'97 , pp. 1623-1626
    • Nakamura, S.1    Nagai, R.2    Shikano, K.3
  • 8
    • 84949458153 scopus 로고    scopus 로고
    • Using the multi-stream approach for continuous audio-visual speech recognition: Experiments on the M2VTS DATABASE
    • Oct.
    • S. Dupont and J. Luettin, "Using the multi-stream approach for continuous audio-visual speech recognition: experiments on the M2VTS DATABASE," Proc. International Conference on Spoken Language Processing (ICSLP'98), vol.4, pp.1283-1286, Oct. 1998.
    • (1998) Proc. International Conference on Spoken Language Processing (ICSLP'98) , vol.4 , pp. 1283-1286
    • Dupont, S.1    Luettin, J.2
  • 9
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modelling for continuous speech recognition
    • Sept.
    • S. Dupont and J. Luettin, "Audio-visual speech modelling for continuous speech recognition," IEEE Trans. Multimed., vol.2, no.3, pp.141-151, Sept. 2000.
    • (2000) IEEE Trans. Multimed. , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 17
    • 0034825241 scopus 로고    scopus 로고
    • Multistream adaptive evidence combination for noise robust ASR
    • A. Morris, A. Hagen, H. Glotin, and H. Bourlard, "Multistream adaptive evidence combination for noise robust ASR," Speech Commun, vol.34, pp.25-40, 2001.
    • (2001) Speech Commun , vol.34 , pp. 25-40
    • Morris, A.1    Hagen, A.2    Glotin, H.3    Bourlard, H.4
  • 20
    • 0037662332 scopus 로고    scopus 로고
    • Overview on recent activities in multi-modal corpora
    • Oct.
    • S. Nakamura, "Overview on recent activities in multi-modal corpora," COCOSDA Workshop, Oct. 2000.
    • (2000) COCOSDA Workshop
    • Nakamura, S.1
  • 23
    • 0035251712 scopus 로고    scopus 로고
    • Speech-to-lip movement synthesis by maximizing audio-visual joint probability based on the EM algorithm
    • S. Nakamura and E. Yamamoto, "Speech-to-lip movement synthesis by maximizing audio-visual joint probability based on the EM algorithm," J. VLSI Signal Processing, vol.27, no.1/2, pp.119-126, 2001.
    • (2001) J. VLSI Signal Processing , vol.27 , Issue.1-2 , pp. 119-126
    • Nakamura, S.1    Yamamoto, E.2
  • 28
    • 0006132736 scopus 로고
    • A minimum error rate pattern recognition approach to speech recognition
    • Col. VIII
    • W. Chou, B.-H. Juang, C.-H. Lee, and F.K. Soong, "A minimum error rate pattern recognition approach to speech recognition," J. Pattern Recog. Art. Intell., Col. VIII, pp.5-31, 1994.
    • (1994) J. Pattern Recog. Art. Intell. , pp. 5-31
    • Chou, W.1    Juang, B.-H.2    Lee, C.-H.3    Soong, F.K.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.