SCOPUS 정보 검색 플랫폼

IEICE Transactions on Information and Systems

Volumn E86-D, Issue 3, 2003, Pages 454-463

Audio-visual speech recognition based on optimized product HMMs and GMM based-MCE-GPD stream weight estimation

(2) Kumatani, Kenichi a Nakamura, Satoshi b

b ADVANCED TELECOMMUNICATIONS RESEARCH INSTITUTE INTERNATIONAL (Japan)

Author keywords

Audio visual speech recognition; Bi modal; Generalized probabilistic descent (GPD); Minimum classification error (MCE); Stream weight

Indexed keywords

ALGORITHMS; AUDITION; COMPUTATIONAL METHODS; DATABASE SYSTEMS; ERROR ANALYSIS; IMAGE PROCESSING; MARKOV PROCESSES; OPTIMIZATION; PROBABILITY; VISION;

GENERALIZED PROBABILISTIC DESCENT (GPD);

SPEECH RECOGNITION;

EID: 0038381727 PISSN: 09168532 EISSN: None Source Type: Journal
DOI: None Document Type: Article

Times cited : (2)

References (29)

1
- 0032134085
- Eye movement of perceivers during audio-visual speech intelligibility in noise
- E. Vatikiotis-Bateson, I.M. Eigsti, S. Yano, and K. Munhall, "Eye movement of perceivers during audio-visual speech intelligibility in noise," Perception and Psychophysics, vol.60, no.6, pp.926-940, 1998.
- (1998) Perception and Psychophysics , vol.60 , Issue.6 , pp. 926-940
- Vatikiotis-Bateson, E.¹ Eigsti, I.M.² Yano, S.³ Munhall, K.⁴

2
- 0001048664
- Visual contribution to speech inteligibility in noise
- March
- W.H. Sumby and I. Pollack, "Visual contribution to speech inteligibility in noise," J. Acoust. Soc. Am., vol.26, pp.212-215, March 1954.
- (1954) J. Acoust. Soc. Am. , vol.26 , pp. 212-215
- Sumby, W.H.¹ Pollack, I.²

3
- 0001055701
- Which components of the face do humans and machines best speechread?
- Speechreading by Humans and Machines: Models, Systems and Applications, Springer-Verlag
- C. Benoit, T. Guiard Marigny, B. LeGoffand, and A. Adjoudani, "Which components of the face do humans and machines best speechread?," in Speechreading by Humans and Machines: Models, Systems and Applications, NATO ASI Series, pp.315-328, Springer-Verlag, 1996.
- (1996) NATO ASI Series , pp. 315-328
- Benoit, C.¹ Guiard Marigny, T.² Legoffand, B.³ Adjoudani, A.⁴

4
- 0027228958
- Improving connected letter recognition by lipreading
- April
- C. Bregler, H. Hild, S. Manke, and A. Waibel, "Improving connected letter recognition by lipreading," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'93), vol.1, pp.557-560, April 1993.
- (1993) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'93) , vol.1 , pp. 557-560
- Bregler, C.¹ Hild, H.² Manke, S.³ Waibel, A.⁴

5
- 0002517880
- Audio-visual large vocabulary continuous speech recognition in the broadcast domain
- Dec.
- S. Basu, C. Neti, N. Rajput, A. Senior, L. Subramaniam, and A. Verma, "Audio-visual large vocabulary continuous speech recognition in the broadcast domain," Workshop on Multimedia Signal Processing, pp.475-481, Dec. 1998.
- (1998) Workshop on Multimedia Signal Processing , pp. 475-481
- Basu, S.¹ Neti, C.² Rajput, N.³ Senior, A.⁴ Subramaniam, L.⁵ Verma, A.⁶

6
- 33646906672
- Improved bimodal speech recognition using tied-mixture HMMs and 5000 word audio-visual Synchronous database
- S. Nakamura, R. Nagai, and K. Shikano, "Improved bimodal speech recognition using tied-mixture HMMs and 5000 word Audio-Visual Synchronous database," Proc. EUROSPEECH'97, pp.1623-1626, 1997.
- (1997) Proc. EUROSPEECH'97 , pp. 1623-1626
- Nakamura, S.¹ Nagai, R.² Shikano, K.³

7
- 0029747053
- Integrating audio and visual information to provide highly robust speech recognition
- May
- M.J. Tomlinson, M.J. Russell, and N.M. Brooke, "Integrating audio and visual information to provide highly robust speech recognition," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'96), vol.2, pp.821-824, May 1996.
- (1996) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'96) , vol.2 , pp. 821-824
- Tomlinson, M.J.¹ Russell, M.J.² Brooke, N.M.³

8
- 84949458153
- Using the multi-stream approach for continuous audio-visual speech recognition: Experiments on the M2VTS DATABASE
- Oct.
- S. Dupont and J. Luettin, "Using the multi-stream approach for continuous audio-visual speech recognition: experiments on the M2VTS DATABASE," Proc. International Conference on Spoken Language Processing (ICSLP'98), vol.4, pp.1283-1286, Oct. 1998.
- (1998) Proc. International Conference on Spoken Language Processing (ICSLP'98) , vol.4 , pp. 1283-1286
- Dupont, S.¹ Luettin, J.²

9
- 0034270644
- Audio-visual speech modelling for continuous speech recognition
- Sept.
- S. Dupont and J. Luettin, "Audio-visual speech modelling for continuous speech recognition," IEEE Trans. Multimed., vol.2, no.3, pp.141-151, Sept. 2000.
- (2000) IEEE Trans. Multimed. , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

10
- 0002100804
- Adaptive determination of audio and visual weights for automatic speech recognition
- Sept.
- A. Rogozan, P. Deleglise, and M. Alissali, "Adaptive determination of audio and visual weights for Automatic speech recognition," Proc. Europ. Tut. Work. Audio-Visual Speech Process (AVSP), pp.61-64, Sept. 1997.
- (1997) Proc. Europ. Tut. Work. Audio-visual Speech Process (AVSP) , pp. 61-64
- Rogozan, A.¹ Deleglise, P.² Alissali, M.³

11
- 85009153179
- Stream confidence estimation for audio-visual speech recognition
- Oct.
- G. Potamianos and C. Neti, "Stream confidence estimation for audio-visual speech recognition," Proc. International Conference on Spoken Language Processing (ICSLP'00), vol.3, pp.746-749, Oct. 2000.
- (2000) Proc. International Conference on Spoken Language Processing (ICSLP'00) , vol.3 , pp. 746-749
- Potamianos, G.¹ Neti, C.²

12
- 85009154155
- Stream weight optimization of speech and lip image sequence for audio-visual speech recognition
- Oct.
- S. Nakamura, H. Ito and K. Shikano, "Stream weight optimization of speech and lip image sequence for Audio-Visual speech recognition," Proc. International Conference on Spoken Language Processing (ICSLP'00), vol.3, pp.20-23, Oct. 2000.
- (2000) Proc. International Conference on Spoken Language Processing (ICSLP'00) , vol.3 , pp. 20-23
- Nakamura, S.¹ Ito, H.² Shikano, K.³

13
- 0031624666
- Discriminative training of HMM stream exponents for audio-visual speech recognition
- May
- G. Potamianos and H.P. Graf, "Discriminative training of HMM stream exponents for Audio-Visual speech recognition," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'98), vol.6, pp.3733-3736, May 1998.
- (1998) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'98) , vol.6 , pp. 3733-3736
- Potamianos, G.¹ Graf, H.P.²

14
- 85009091822
- Audio-visual speech recognition using MCE-based HMMs and model-dependent stream weights
- Oct.
- C. Miyajima, K. Tokuda, and Tadashi Kitamura, "Audio-Visual speech recognition using MCE-based HMMs and model-dependent stream weights," Proc. International Conference on Spoken Language Processing (ICSLP'00), vol.2, pp.1023-1026, Oct. 2000.
- (2000) Proc. International Conference on Spoken Language Processing (ICSLP'00) , vol.2 , pp. 1023-1026
- Miyajima, C.¹ Tokuda, K.² Kitamura, T.³

15
- 84885664026
- An adaptive integration based on product HMM for audio-visual speech recognition
- Aug.
- K. Kumatani, S. Nakamura, and K. Shikano, "An adaptive integration based on product HMM for audio-visual speech recognition," Proc. IEEE International Conference Multimedia and Expo (ICME'01), vol.1, Aug. 2001.
- (2001) Proc. IEEE International Conference Multimedia and Expo (ICME'01) , vol.1
- Kumatani, K.¹ Nakamura, S.² Shikano, K.³

16
- 0030355935
- A new ASR approach based on independent processing and recombination of partial frequency bands
- H. Bourlard and S. Doupont, "A new ASR approach based on independent processing and recombination of partial frequency bands," Proc. International Conference on Spoken Language Processing (ICSLP'96), pp.426-429, 1996.
- (1996) Proc. International Conference on Spoken Language Processing (ICSLP'96) , pp. 426-429
- Bourlard, H.¹ Doupont, S.²

17
- 0034825241
- Multistream adaptive evidence combination for noise robust ASR
- A. Morris, A. Hagen, H. Glotin, and H. Bourlard, "Multistream adaptive evidence combination for noise robust ASR," Speech Commun, vol.34, pp.25-40, 2001.
- (2001) Speech Commun , vol.34 , pp. 25-40
- Morris, A.¹ Hagen, A.² Glotin, H.³ Bourlard, H.⁴

18
- 0030676381
- Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition
- April
- J. Hernando, "Maximum likelihood weighting of dynamic speech features for CDHMM speech recognition," Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97), vol.2, pp.1267-1270, April 1997.
- (1997) Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'97) , vol.2 , pp. 1267-1270
- Hernando, J.¹

19
- 0029765665
- Visual speech recognition using active shape models and hidden Markov models
- May
- J. Luettin, N.A. Thacker, and S.W. Beet, "Visual speech recognition using active shape models and hidden Markov models," Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-96), vol.2, pp.817-820, May 1996.
- (1996) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP-96) , vol.2 , pp. 817-820
- Luettin, J.¹ Thacker, N.A.² Beet, S.W.³

20
- 0037662332
- Overview on recent activities in multi-modal corpora
- Oct.
- S. Nakamura, "Overview on recent activities in multi-modal corpora," COCOSDA Workshop, Oct. 2000.
- (2000) COCOSDA Workshop
- Nakamura, S.¹

21
- 0033708747
- Asynchronous-transition HMM
- May
- S. Matsuda, M. Nakai, H. Shimodair, and S. Sagayama, "Asynchronous-transition HMM," Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'00), pp.1005-1008, May 2000.
- (2000) Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'00) , pp. 1005-1008
- Matsuda, S.¹ Nakai, M.² Shimodair, H.³ Sagayama, S.⁴

22
- 0040413488
- Quantitative association of orofacial and vocal-tract shapes
- Sept.
- H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, "Quantitative association of orofacial and vocal-tract shapes," Proc. Europ. Tut. Work. Audio-Visual Speech Process (AVSP), pp.41-44, Sept. 1997.
- (1997) Proc. Europ. Tut. Work. Audio-visual Speech Process (AVSP) , pp. 41-44
- Yehia, H.¹ Rubin, P.² Vatikiotis-Bateson, E.³

23
- 0035251712
- Speech-to-lip movement synthesis by maximizing audio-visual joint probability based on the EM algorithm
- S. Nakamura and E. Yamamoto, "Speech-to-lip movement synthesis by maximizing audio-visual joint probability based on the EM algorithm," J. VLSI Signal Processing, vol.27, no.1/2, pp.119-126, 2001.
- (2001) J. VLSI Signal Processing , vol.27 , Issue.1-2 , pp. 119-126
- Nakamura, S.¹ Yamamoto, E.²

24
- 0034501586
- Speech-to-face movement synthesis based on HMMs
- K. Kakihara, S. Nakamura, and K. Shikano, "Speech-to-face movement synthesis based on HMMs," Proc. IEEE International Conference Multimedia and Expo (ICME'00), no.MP7.07, 2000.
- (2000) Proc. IEEE International Conference Multimedia and Expo (ICME'00) , Issue.MP7.07
- Kakihara, K.¹ Nakamura, S.² Shikano, K.³

25
- 0038676522
- Model based lip synchronization with an automatic translation system
- Aug.
- S. Ogata, K. Murai, S. Nakamura, and S. Morisima, "Model based lip synchronization with an automatic translation system," Proc. IEEE International Conference Multimedia and Expo (ICME'01), Aug. 2001.
- (2001) Proc. IEEE International Conference Multimedia and Expo (ICME'01)
- Ogata, S.¹ Murai, K.² Nakamura, S.³ Morisima, S.⁴

26
- 0003462715
- Hidden Markov models for speech recognition
- Edinburgh University Press, Edinburgh
- X.D. Huang, Y. Ariki, and N.A. Jack, Hidden Markov Models for Speech Recognition, Edinburgh Information Technology Series, Edinburgh University Press, Edinburgh, 1990.
- (1990) Edinburgh Information Technology Series
- Huang, X.D.¹ Ariki, Y.² Jack, N.A.³

27
- 85009263395
- Segmental GPD training of HMM based speech recognizer
- May
- W. Chou, B.-H. Juang, and C.-H. Lee, "Segmental GPD training of HMM based speech recognizer," Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'92), vol.1, pp.473-476, May 2000.
- (2000) Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'92) , vol.1 , pp. 473-476
- Chou, W.¹ Juang, B.-H.² Lee, C.-H.³

28
- 0006132736
- A minimum error rate pattern recognition approach to speech recognition
- Col. VIII
- W. Chou, B.-H. Juang, C.-H. Lee, and F.K. Soong, "A minimum error rate pattern recognition approach to speech recognition," J. Pattern Recog. Art. Intell., Col. VIII, pp.5-31, 1994.
- (1994) J. Pattern Recog. Art. Intell. , pp. 5-31
- Chou, W.¹ Juang, B.-H.² Lee, C.-H.³ Soong, F.K.⁴

29
- 0003483593
- Microsoft Corporation
- S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, HTK-Hidden Markov Model Toolkit, Version 3.0, Microsoft Corporation, 2000.
- (2000) HTK-hidden Markov Model Toolkit, Version 3.0
- Young, S.¹ Kershaw, D.² Odell, J.³ Ollason, D.⁴ Valtchev, V.⁵ Woodland, P.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.