SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 14, Issue 3, 2006, Pages 1082-1089

Visual model structures and synchrony constraints for audio-visual speech recognition

(1) Hazen, Timothy J a,b

a IEEE (United States)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Audio visual speech recognition; Lip reading; Multimodal speech processing

Indexed keywords

AUDIO VISUAL SPEECH RECOGNITION (AVSR); LIP READING; MULTIMODAL SPEECH PROCESSING; WORD ERROR RATE;

AUDIO ACOUSTICS; BIT ERROR RATE; HIDDEN MARKOV MODELS; SIGNAL TO NOISE RATIO; SPEECH PROCESSING; VISUAL COMMUNICATION;

SPEECH RECOGNITION;

EID: 34047263009 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TSA.2005.857572 Document Type: Article

Times cited : (61)

References (36)

1
- 34047262788
- The intrinsic bimodality of speech communication and the synthesis of talking faces
- M. M. Taylor, F. Nel, and D. Bouwhuis, Eds. Amsterdam, The Netherlands: John Benjamins
- C. Benoit, "The intrinsic bimodality of speech communication and the synthesis of talking faces," in The Structure of Multimodal Dialogue II, M. M. Taylor, F. Nel, and D. Bouwhuis, Eds. Amsterdam, The Netherlands: John Benjamins, 2000, pp. 485-502.
- (2000) The Structure of Multimodal Dialogue II , pp. 485-502
- Benoit, C.¹

2
- 0036502797
- A review of speech-based bimodal recognition
- Mar
- C. Chibelushi, F. Deravi, and J. Mason, "A review of speech-based bimodal recognition," IEEE Trans. Multimedia, vol. 4, no. 1, pp. 23-37, Mar. 2002.
- (2002) IEEE Trans. Multimedia , vol.4 , Issue.1 , pp. 23-37
- Chibelushi, C.¹ Deravi, F.² Mason, J.³

3
- 85009074661
- Large- vocabulary audio-visual speech recognition by machines and humans
- Aalborg, Denmark, Sep
- G. Potamianos, C. Neti, G. lyengar, and E. Helmuth, "Large- vocabulary audio-visual speech recognition by machines and humans," in Proc. Eurospeech, Aalborg, Denmark, Sep. 2001, pp. 1293-1296.
- (2001) Proc. Eurospeech , pp. 1293-1296
- Potamianos, G.¹ Neti, C.² lyengar, G.³ Helmuth, E.⁴

4
- 4544290191
- Recent advances in the automatic recognition of audiovisual speech
- Sep
- G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. Senior, "Recent advances in the automatic recognition of audiovisual speech," Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, Sep. 2003.
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ Senior, A.⁵

5
- 84955023977
- Visual contribution to speech intelligibility in noise
- Mar
- W. Sumby and I. Pollack, "Visual contribution to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, no. 2, pp. 212-215, Mar. 1954.
- (1954) J. Acoust. Soc. Amer , vol.26 , Issue.2 , pp. 212-215
- Sumby, W.¹ Pollack, I.²

6
- 85009230873
- Audio-visual speech recognition in challenging environments
- Geneva, Switzerland, Sep
- G. Potamianos and C. Neti, "Audio-visual speech recognition in challenging environments," in Proc. Eurospeech, Geneva, Switzerland, Sep. 2003, pp. 1293-1296.
- (2003) Proc. Eurospeech , pp. 1293-1296
- Potamianos, G.¹ Neti, C.²

7
- 0038359548
- A probabilistic framework for segment-based speech recognition
- Apr./Jul
- J. Glass, "A probabilistic framework for segment-based speech recognition," Comput. Speech Lang., vol. 17, no. 2-3, pp. 137-152, Apr./Jul. 2003.
- (2003) Comput. Speech Lang , vol.17 , Issue.2-3 , pp. 137-152
- Glass, J.¹

8
- 14944353581
- A segment-based audio-visual speech recognizer: Data collection, development, and initial experiments
- PA, Oct
- T. Hazen, K. Saenko, C. La, and J. Glass, "A segment-based audio-visual speech recognizer: Data collection, development, and initial experiments," in Proc. Int. Conf. Multimodal Interfaces, PA, Oct. 2004.
- (2004) Proc. Int. Conf. Multimodal Interfaces
- Hazen, T.¹ Saenko, K.² La, C.³ Glass, J.⁴

9
- 84925639646
- Real-time lip tracking and bimodal continuous speech recognition
- Redondo Beach, CA
- M. Chan, Y. Zhang, and T. Huang, "Real-time lip tracking and bimodal continuous speech recognition," in Proc. Workshop on Multimedia Signal Processing, Redondo Beach, CA, 1998, pp. 65-70.
- (1998) Proc. Workshop on Multimedia Signal Processing , pp. 65-70
- Chan, M.¹ Zhang, Y.² Huang, T.³

10
- 0030355932
- Audio-visual speech recognition using multiscale nonlinear image decomposition
- Philadelphia, PA
- I. Matthews, J. Bangham, and S. Cox, "Audio-visual speech recognition using multiscale nonlinear image decomposition," in Proc. Int. Conf. Spoken Language Processing, Philadelphia, PA, 1996, pp. 38-41.
- (1996) Proc. Int. Conf. Spoken Language Processing , pp. 38-41
- Matthews, I.¹ Bangham, J.² Cox, S.³

11
- 0006184263
- The M2VTS multimodal face database
- S. Pigeon and L. Vandendorpe, "The M2VTS multimodal face database," in Proc. Audio- Video-based Biometric Person Authentication Workshop, 1997.
- (1997) Proc. Audio- Video-based Biometric Person Authentication Workshop
- Pigeon, S.¹ Vandendorpe, L.²

12
- 85009135946
- Bimodal speech recognition using coupled hidden Markov models
- Beijing, China, Oct
- S. Chu and T. Huang, "Bimodal speech recognition using coupled hidden Markov models," in Proc. Int. Conf. Spoken Language Processing, vol. II, Beijing, China, Oct. 2000.
- (2000) Proc. Int. Conf. Spoken Language Processing , vol.2
- Chu, S.¹ Huang, T.²

13
- 0036299249
- CUAVE: A new audio-visual database for multi-modal human-computer interface research
- Orlando, FL, May
- E. Patterson, S. Gurbuz, Z. Tufekci, and J. Gowdy, "CUAVE: A new audio-visual database for multi-modal human-computer interface research," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, Orlando, FL, May 2002, pp. 2017-2020.
- (2002) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing , vol.2 , pp. 2017-2020
- Patterson, E.¹ Gurbuz, S.² Tufekci, Z.³ Gowdy, J.⁴

14
- 0001935972
- XM2VTSDB: The extended M2VTS database
- Washington, DC, Mar, 16 IDIAP-RR 99-02
- K. Messer, J. Matas, J. Kittler, and K. Jonsson, "XM2VTSDB: The extended M2VTS database," in Audio- and Video-based Biometric Person Authentication, AVBPA'99, Washington, DC, Mar. 1999, pp. 72-77. 16 IDIAP-RR 99-02.
- (1999) Audio- and Video-based Biometric Person Authentication, AVBPA'99 , pp. 72-77
- Messer, K.¹ Matas, J.² Kittler, J.³ Jonsson, K.⁴

15
- 0036295989
- Audio-visual speech modeling using coupled hidden Markov models
- Orlando, FL, May
- S. Chu and T. Huang, "Audio-visual speech modeling using coupled hidden Markov models," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, Orlando, FL, May 2002, pp. 2009-2012.
- (2002) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing , vol.2 , pp. 2009-2012
- Chu, S.¹ Huang, T.²

16
- 0041355006
- The VidTIMIT database
- Martigny, Switzerland: IDIAP, Aug
- C. Sanderson, "The VidTIMIT database," in IDIAP Communication 02-06. Martigny, Switzerland: IDIAP, Aug. 2002.
- (2002) IDIAP Communication 02-06
- Sanderson, C.¹

17
- 0025477640
- Speech database development: TIMIT and beyond
- V. Zue, S. Seneff, and J. Glass, "Speech database development: TIMIT and beyond," Speech Commun., vol. 9, no. 4, pp. 351-356, 1990.
- (1990) Speech Commun , vol.9 , Issue.4 , pp. 351-356
- Zue, V.¹ Seneff, S.² Glass, J.³

18
- 0027623210
- Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- Jul
- A. Varga and H. Steeneken, "Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, no. 3, pp. 247-251, Jul. 1993.
- (1993) Speech Commun , vol.12 , Issue.3 , pp. 247-251
- Varga, A.¹ Steeneken, H.²

19
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Sep
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

20
- 0036297183
- A coupled HMM for audio-visual speech recognition
- Orlando, FL, May
- A. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao, and K. Murphy, "A coupled HMM for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, Orlando, FL, May 2002, pp. 2013-2016.
- (2002) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing , vol.2 , pp. 2013-2016
- Nefian, A.¹ Liang, L.² Pi, X.³ Xiaoxiang, L.⁴ Mao, C.⁵ Murphy, K.⁶

21
- 0012668146
- Asynchrony modeling for audio-visual speech recognition
- San Diego, CA, Mar
- G. Gravier, G. Potamianos, and C. Neti, "Asynchrony modeling for audio-visual speech recognition," in Proc. Human Language Technology Conf., San Diego, CA, Mar. 2002, pp. 1-6.
- (2002) Proc. Human Language Technology Conf , pp. 1-6
- Gravier, G.¹ Potamianos, G.² Neti, C.³

22
- 0034238554
- Toward unrestricted lip reading
- Aug
- U. Meier, R. Stiefelhagen, J. Yang, and A. Waibel, "Toward unrestricted lip reading," Int. J. Pattern Recognit. Artif. Intell., no. 14, pp. 571-585, Aug. 2000.
- (2000) Int. J. Pattern Recognit. Artif. Intell , Issue.14 , pp. 571-585
- Meier, U.¹ Stiefelhagen, R.² Yang, J.³ Waibel, A.⁴

23
- 0034848499
- Optimal weighting of posteriors for audio-visual speech recognition
- Salt Lake City, UT, May
- M. Heckmann, F. Berthommier, and K. Kroschel, "Optimal weighting of posteriors for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, Salt Lake City, UT, May 2001, pp. 161-164.
- (2001) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing , vol.1 , pp. 161-164
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

24
- 0034842451
- Weighting schemes for audio-visual fusion in speech recognition
- Salt Lake City, UT, May
- H. Glotin, D. Vergyri, C. Neti, G. Polamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, Salt Lake City, UT, May 2001, pp. 165-168.
- (2001) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing , vol.1 , pp. 165-168
- Glotin, H.¹ Vergyri, D.² Neti, C.³ Polamianos, G.⁴ Luettin, J.⁵

25
- 4544224863
- A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs
- Montreal, QC, Canada, May
- S. Tamura, K. Iwano, and S. Furui, "A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, Montreal, QC, Canada, May 2004, pp. 857-860.
- (2004) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.1 , pp. 857-860
- Tamura, S.¹ Iwano, K.² Furui, S.³

26
- 0014366349
- Confusions among visually perceived consonants
- C. Fisher, "Confusions among visually perceived consonants," J. Speech Hearing Res., vol. 11, pp. 796-804, 1968.
- (1968) J. Speech Hearing Res , vol.11 , pp. 796-804
- Fisher, C.¹

27
- 34047264545
- K. Berger, Speechreading: Principles and Methods. Baltimore, MD: National Educational, 1972.
- K. Berger, Speechreading: Principles and Methods. Baltimore, MD: National Educational, 1972.

28
- 0032074310
- Audio-visual integration in multimodal communication
- May
- T. Chen and R. Rao, "Audio-visual integration in multimodal communication," Proc. IEEE, vol. 86, no. 5, pp. 837-852, May 1998.
- (1998) Proc. IEEE , vol.86 , Issue.5 , pp. 837-852
- Chen, T.¹ Rao, R.²

29
- 85013580214
- Sensory integration in audiovisual automatic speech recognition
- Pacific Grove, CA, Oct./Nov
- P. Silsbee, "Sensory integration in audiovisual automatic speech recognition," in Proce. 28th Annual Asilomar Conf. Signals, Systems, and Computers, vol. I, Pacific Grove, CA, Oct./Nov. 1994, pp. 561-565.
- (1994) Proce. 28th Annual Asilomar Conf. Signals, Systems, and Computers , vol.1 , pp. 561-565
- Silsbee, P.¹

30
- 0002358797
- Discriminative learning of visual data for audiovisual speech recognition
- Mar
- A. Rogozan, "Discriminative learning of visual data for audiovisual speech recognition," Int. J. Artif. Intell. Tools, vol. 8, no. 1, pp. 43-52, Mar. 1999.
- (1999) Int. J. Artif. Intell. Tools , vol.8 , Issue.1 , pp. 43-52
- Rogozan, A.¹

31
- 0004052871
- Center Lang. Speech Process, The Johns Hopkins Univ, Baltimore, MD, Tech. Rep
- C. Neti et al., "Audio-visual speech recognition," Center Lang. Speech Process., The Johns Hopkins Univ., Baltimore, MD, Tech. Rep., 2000.
- (2000) Audio-visual speech recognition
- Neti, C.¹

32
- 38249029471
- Automatic optically-based recognition of speech
- Oct
- K. Finn and A. Montgomery, "Automatic optically-based recognition of speech," Pattern Recognit. Lett., vol. 8, no. 3, pp. 159-164, Oct. 1988.
- (1988) Pattern Recognit. Lett , vol.8 , Issue.3 , pp. 159-164
- Finn, K.¹ Montgomery, A.²

33
- 0027228958
- Improving connected letter recognition by lipreading
- Minneapolis, MN, Apr
- C. Bregler, H. Hild, S. Manke, and A. Waibel, "Improving connected letter recognition by lipreading," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 1, Minneapolis, MN, Apr. 1993, pp. 557-560.
- (1993) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing , vol.1 , pp. 557-560
- Bregler, C.¹ Hild, H.² Manke, S.³ Waibel, A.⁴

34
- 0033329811
- VHD: A system for directing real-time virtual actors
- Nov
- G. Sannier, S. Balcisoy, N. Magnenat-Thalmann, and D. Thalmann, "VHD: A system for directing real-time virtual actors," Vis. Comput., vol. 15, no. 7/8, pp. 320-329, Nov. 1999.
- (1999) Vis. Comput , vol.15 , Issue.7-8 , pp. 320-329
- Sannier, G.¹ Balcisoy, S.² Magnenat-Thalmann, N.³ Thalmann, D.⁴

35
- 0002144369
- Tree-based state tying for high accuracy acoustic modeling
- Princeton, NJ, Mar
- S. Young, J. Odell, and P. Woodland, "Tree-based state tying for high accuracy acoustic modeling," in Proc. ARPA Human Language Technology Workshop, Princeton, NJ, Mar. 1994, pp. 307-312.
- (1994) Proc. ARPA Human Language Technology Workshop , pp. 307-312
- Young, S.¹ Odell, J.² Woodland, P.³

36
- 84898971246
- An asynchronous hidden Markov model for audio-visual speech recognition
- S. Decker, S. Thrun, and K. Obermayer, Eds. Cambridge, MA: MIT Press
- S. Bengio, "An asynchronous hidden Markov model for audio-visual speech recognition," in Advances in Neural Information Processing Systems, NIPS 75, S. Decker, S. Thrun, and K. Obermayer, Eds. Cambridge, MA: MIT Press, 2003, pp. 1237-1244.
- (2003) Advances in Neural Information Processing Systems, NIPS 75 , pp. 1237-1244
- Bengio, S.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.