SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE

Volumn 91, Issue 9, 2003, Pages 1306-1325

Recent advances in the automatic recognition of audiovisual speech

(5) Potamianos, Gerasimos a Neti, Chalapathy a Gravier, Guillaume b Garg, Ashutosh c Senior, Andrew W a

a IBM T J WATSON RESEARCH CENTER (United States)

b INRIA (France)

c IBM ALMADEN RESEARCH CENTER (United States)

Author keywords

Adaptation; Audiovisual fusion; Audiovisual speech recognition (ASR); Face tracking; Hidden Markov models (HMM); Multimedia databases; Multistream hidden Markov models; Product hidden Markov models; Speechreading; Stream reliability; Visual feature extraction

Indexed keywords

ALGORITHMS; COMPUTER SIMULATION; DATABASE SYSTEMS; FACE RECOGNITION; FEATURE EXTRACTION; HUMAN COMPUTER INTERACTION; IMAGE PROCESSING; MARKOV PROCESSES; RELIABILITY; SPEECH ANALYSIS; VISUAL COMMUNICATION;

ADAPTATION; AUDIOVISUAL FUSION; AUDIOVISUAL SPEECH RECOGNITION (ASR); FACE TRACKING; HIDDEN MARKOV MODELS (HMM); MULTIMEDIA DATABASES; MULTISTREAM HIDDEN MARKOV MODELS; PRODUCT HIDDEN MARKOV MODELS; SPEECHREADING; STREAM RELIABILITY;

SPEECH RECOGNITION;

EID: 4544290191 PISSN: 00189219 EISSN: None Source Type: Journal
DOI: 10.1109/JPROC.2003.817150 Document Type: Conference Paper

Times cited : (700)

References (134)

1
- 0031187171
- Speech recognition by machines and humans
- R. P. Lippmann, "Speech recognition by machines and humans," Speech Commun., vol. 22, pp. 1-15, 1997.
- (1997) Speech Commun. , vol.22 , pp. 1-15
- Lippmann, R.P.¹

2
- 0026189808
- Speech recognition in adverse environments
- B. H. Juang, "Speech recognition in adverse environments," Comput. Speech Lang., vol. 5, pp. 275-294, 1991.
- (1991) Comput. Speech Lang. , vol.5 , pp. 275-294
- Juang, B.H.¹

3
- 0002788784
- Signal processing for robust speech recognition
- C.-H. Lee, F. K. Soong, and Y. Ohshima, Eds. Norwell, MA: Kluwer, ch. 15
- R. Stern, A. Acero, F.-H. Liu, and Y. Ohshima, "Signal processing for robust speech recognition," in Automatic Speech and Speaker Recognition. Advanced Topics, C.-H. Lee, F. K. Soong, and Y. Ohshima, Eds. Norwell, MA: Kluwer, 1997, ch. 15, pp. 357-384.
- (1997) Automatic Speech and Speaker Recognition. Advanced Topics , pp. 357-384
- Stern, R.¹ Acero, A.² Liu, F.-H.³ Ohshima, Y.⁴

4
- 0003544881
- Berlin, Germany: Springer-Verlag
- D. G. Stork and M. E. Hennecke, Eds., Speechreading by Humans and Machines. Berlin, Germany: Springer-Verlag, 1996.
- (1996) Speechreading by Humans and Machines
- Stork, D.G.¹ Hennecke, M.E.²

5
- 0003835127
- Hove, U.K.: Psychology
- R. Campbell, B. Dodd, and D. Burnham, Eds., Hearing by Eye II. Hove, U.K.: Psychology, 1998.
- (1998) Hearing by Eye II
- Campbell, R.¹ Dodd, B.² Burnham, D.³

6
- 0001048664
- Visual contribution to speech intelligibility in noise
- W. H. Sumby and I. Pollack, "Visual contribution to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, pp. 212-215, 1954.
- (1954) J. Acoust. Soc. Amer. , vol.26 , pp. 212-215
- Sumby, W.H.¹ Pollack, I.²

7
- 0017199877
- Hearing lips and seeing voices
- H. MacGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976.
- (1976) Nature , vol.264 , pp. 746-748
- MacGurk, H.¹ MacDonald, J.²

8
- 85058246934
- Mouth movement and signed communication
- R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, ch. 13
- M. Marschark, D. LePoutre, and L. Bernent, "Mouth movement and signed communication," in Hearing by Eye II, R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, 1998, ch. 13, pp. 245-266.
- (1998) Hearing by Eye II , pp. 245-266
- Marschark, M.¹ Lepoutre, D.² Bernent, L.³

9
- 85069146767
- What makes a good speechreader? First you have to find one
- R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, ch. 11
- L. E. Bernstein, M. E. Demorest, and P. E. Tucker, "What makes a good speechreader? First you have to find one," in Hearing by Eye II, R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, 1998, ch. 11, pp. 211-227.
- (1998) Hearing by Eye II , pp. 211-227
- Bernstein, L.E.¹ Demorest, M.E.² Tucker, P.E.³

10
- 0002028032
- Some preliminaries to a comprehensive account of audio visual speech perception
- R. Campbell and B. Dodd, Eds. London, U.K.: Lawrence Erlbaum
- A. Q. Summerfield, "Some preliminaries to a comprehensive account of audio visual speech perception," in Hearing by Eye: The Psychology of Lip-Reading, R. Campbell and B. Dodd, Eds. London, U.K.: Lawrence Erlbaum, 1987, pp. 3-51.
- (1987) Hearing by Eye: The Psychology of Lip-reading , pp. 3-51
- Summerfield, A.Q.¹

11
- 0032072433
- Speech recognition and sensory integration
- D. W. Massaro and D. G. Stork, "Speech recognition and sensory integration," Amer. Sci., vol. 86, pp. 236-244, 1998.
- (1998) Amer. Sci. , vol.86 , pp. 236-244
- Massaro, D.W.¹ Stork, D.G.²

12
- 0032178592
- Quantitative association of vocal-tract and facial behavior
- H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, "Quantitative association of vocal-tract and facial behavior," Speech Commun., vol. 26, pp. 23-43, 1998.
- (1998) Speech Commun. , vol.26 , pp. 23-43
- Yehia, H.¹ Rubin, P.² Vatikiotis-Bateson, E.³

13
- 0012725678
- Estimation of speech acoustics from visual speech features: A comparison of linear and nonlinear models
- J. P. Barker and F. Berthommier, "Estimation of speech acoustics from visual speech features: A comparison of linear and nonlinear models," in Proc. Conf. Audio-Visual Speech Processing, 1999, pp. 112-117.
- (1999) Proc. Conf. Audio-visual Speech Processing , pp. 112-117
- Barker, J.P.¹ Berthommier, F.²

14
- 0036874551
- On the relationship between face movements, tongue movements, and speech acoustics
- Nov.
- J. Jiang, A. Alwan, P. A. Keating, B. Chaney, E. T. Auer Jr., and L. E. Bernstein, "On the relationship between face movements, tongue movements, and speech acoustics," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1174-1188, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1174-1188
- Jiang, J.¹ Alwan, A.² Keating, P.A.³ Chaney, B.⁴ Auer Jr., E.T.⁵ Bernstein, L.E.⁶

15
- 0002955163
- Lips, teeth, and the benefits of lipreading
- H. D. Ellis and A. W. Young, Eds. Amsterdam, The Netherlands: Elsevier
- Q. Summerfield, A. MacLeod, M. McGrath, and M. Brooke, "Lips, teeth, and the benefits of lipreading," in Handbook of Research on Face Processing, H. D. Ellis and A. W. Young, Eds. Amsterdam, The Netherlands: Elsevier, 1989, pp. 223-233.
- (1989) Handbook of Research on Face Processing , pp. 223-233
- Summerfield, Q.¹ MacLeod, A.² McGrath, M.³ Brooke, M.⁴

16
- 0002700689
- Psychology of human speechreading
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
- P. M. T. Smeele, "Psychology of human speechreading," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 3-15.
- (1996) Speechreading by Humans and Machines , pp. 3-15
- Smeele, P.M.T.¹

17
- 0003544881
- Visionary speech: Looking ahead to practical speechreading systems
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
- M. E. Hennecke, D. G. Stork, and K. V. Prasad, "Visionary speech: Looking ahead to practical speechreading systems," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 331-349.
- (1996) Speechreading by Humans and Machines , pp. 331-349
- Hennecke, M.E.¹ Stork, D.G.² Prasad, K.V.³

18
- 0021541159
- Automatic lipreading to enhance speech recognition
- E. D. Petajan, "Automatic lipreading to enhance speech recognition," in Proc. Global Telecommunications Conf., 1984, pp. 265-272.
- (1984) Proc. Global Telecommunications Conf. , pp. 265-272
- Petajan, E.D.¹

19
- 0004244302
- Englewood Cliffs, NJ: Prentice-Hall
- L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.-H.²

20
- 0036502797
- A review of speech-based bimodal recognition
- Mar.
- C. C. Chibelushi, F. Deravi, and J. S. D. Mason, "A review of speech-based bimodal recognition," IEEE Trans. Multimedia, vol. 4, pp. 23-37, Mar. 2002.
- (2002) IEEE Trans. Multimedia , vol.4 , pp. 23-37
- Chibelushi, C.C.¹ Deravi, F.² Mason, J.S.D.³

21
- 0001432664
- On the integration of auditory and visual parameters in an HMM-based ASR
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
- A. Adjoudani and C. Benoît, "On the integration of auditory and visual parameters in an HMM-based ASR," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 461-471.
- (1996) Speechreading by Humans and Machines , pp. 461-471
- Adjoudani, A.¹ Benoît, C.²

22
- 0030376248
- Robust audiovisual integration using semicontinuous hidden Markov models
- Q. Su and P. L. Silsbee, "Robust audiovisual integration using semicontinuous hidden Markov models," in Proc. Int. Conf. Spoken Language Processing, 1996, pp. 42-45.
- (1996) Proc. Int. Conf. Spoken Language Processing , pp. 42-45
- Su, Q.¹ Silsbee, P.L.²

23
- 0000789852
- Channel separability in the audio visual integration of speech: A Bayesian approach
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
- J. R. Movellan and G. Chadderdon, "Channel separability in the audio visual integration of speech: A Bayesian approach," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 473-487.
- (1996) Speechreading by Humans and Machines , pp. 473-487
- Movellan, J.R.¹ Chadderdon, G.²

24
- 84925595128
- Combining noise compensation with visual information in speech recognition
- S. Cox, I. Matthews, and A. Bangham, "Combining noise compensation with visual information in speech recognition," in Proc. Eur. Workshop Audio-Visual Speech Processing, 1997, pp. 53-56.
- (1997) Proc. Eur. Workshop Audio-visual Speech Processing , pp. 53-56
- Cox, S.¹ Matthews, I.² Bangham, A.³

25
- 84925639646
- Real-time lip tracking and bimodal continuous speech recognition
- M. T. Chan, Y. Zhang, and T. S. Huang, "Real-time lip tracking and bimodal continuous speech recognition," in Proc. Workshop Multimedia Signal Processing, 1998, pp. 65-70.
- (1998) Proc. Workshop Multimedia Signal Processing , pp. 65-70
- Chan, M.T.¹ Zhang, Y.² Huang, T.S.³

26
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Sept.
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, pp. 141-151, Sept. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

27
- 0034841727
- Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition
- S. Gurbuz, Z. Tufekci, E. Patterson, and J. N. Gowdy, "Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2001, pp. 177-180.
- (2001) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 177-180
- Gurbuz, S.¹ Tufekci, Z.² Patterson, E.³ Gowdy, J.N.⁴

28
- 0035791629
- Consideration of Lombard effect for speechreading
- F. J. Huang and T. Chen, "Consideration of Lombard effect for speechreading," in Proc. Workshop Multimedia Signal Processing, 2001, pp. 613-618.
- (2001) Proc. Workshop Multimedia Signal Processing , pp. 613-618
- Huang, F.J.¹ Chen, T.²

29
- 0036874999
- Dynamic Bayesian networks for audio-visual speech recognition
- Nov.
- A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, "Dynamic Bayesian networks for audio-visual speech recognition," EURASIP J. Appl. Signal Process., vol. 2002, pp. 1274-1288, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Process. , vol.2002 , pp. 1274-1288
- Nefian, A.V.¹ Liang, L.² Pi, X.³ Liu, X.⁴ Murphy, K.⁵

30
- 0031624666
- Discriminative training of HMM stream exponents for audio-visual speech recognition
- G. Potamianos and H. P. Graf, "Discriminative training of HMM stream exponents for audio-visual speech recognition," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 3733-3736.
- (1998) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 3733-3736
- Potamianos, G.¹ Graf, H.P.²

31
- 0034502214
- Speaker independent audiovisual speech recognition
- Y. Zhang, S. Levinson, and T. Huang, "Speaker independent audiovisual speech recognition," in Proc. Int. Conf. Multimedia Expo, 2000, pp. 1073-1076.
- (2000) Proc. Int. Conf. Multimedia Expo , pp. 1073-1076
- Zhang, Y.¹ Levinson, S.² Huang, T.³

32
- 0003544881
- Rationale for phoneme-viseme mapping and feature selection in visual speech recognition
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
- A. J. Goldschen, O. N. Garcia, and E. D. Petajan, "Rationale for phoneme-viseme mapping and feature selection in visual speech recognition," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 505-515.
- (1996) Speechreading by Humans and Machines , pp. 505-515
- Goldschen, A.J.¹ Garcia, O.N.² Petajan, E.D.³

33
- 0002100804
- Adaptive determination of audio and visual weights for automatic speech recognition
- A. Rogozan, P. Deléglise, and M. Alissali, "Adaptive determination of audio and visual weights for automatic speech recognition," in Proc. Eur. Workshop Audio-Visual Speech Processing, 1997, pp. 61-64.
- (1997) Proc. Eur. Workshop Audio-visual Speech Processing , pp. 61-64
- Rogozan, A.¹ Deléglise, P.² Alissali, M.³

34
- 85013597845
- 'Eigenlips' for robust speech recognition
- C. Bregler and Y. Konig, "'Eigenlips' for robust speech recognition," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1994, pp. 669-672.
- (1994) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 669-672
- Bregler, C.¹ Konig, Y.²

35
- 14944340400
- Neural architectures for sensorfusion in speech recognition
- G. Krone, B. Talle, A. Wichen, and G. Palm, "Neural architectures for sensorfusion in speech recognition," in Proc. Eur. Workshop Audio-Visual Speech Processing, 1997, pp. 57-60.
- (1997) Proc. Eur. Workshop Audio-visual Speech Processing , pp. 57-60
- Krone, G.¹ Talle, B.² Wichen, A.³ Palm, G.⁴

36
- 85009154155
- Stream weight optimization of speech and lip image sequence for audio-visual speech recognition
- S. Nakamura, H. Ito, and K. Shikano, "Stream weight optimization of speech and lip image sequence for audio-visual speech recognition," in Proc. Int. Conf. Spoken Language Processing, vol. 3, 2000, pp. 20-23.
- (2000) Proc. Int. Conf. Spoken Language Processing , vol.3 , pp. 20-23
- Nakamura, S.¹ Ito, H.² Shikano, K.³

37
- 0004052871
- Audio-visual speech recognition
- Johns Hopkins, Univ., Baltimore, MD
- C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A. Mashari, and J. Zhou, "Audio-visual speech recognition," Center Lang. Speech Process., Johns Hopkins, Univ., Baltimore, MD, 2000.
- (2000) Center Lang. Speech Process.
- Neti, C.¹ Potamianos, G.² Luettin, J.³ Matthews, I.⁴ Glotin, H.⁵ Vergyri, D.⁶ Sison, J.⁷ Mashari, A.⁸ Zhou, J.⁹

38
- 0141589490
- Automatic speechreading of impaired speech
- G. Potamianos and C. Neti, "Automatic speechreading of impaired speech," in Proc. Conf. Audio-Visual Speech Processing, 2001, pp. 177-182.
- (2001) Proc. Conf. Audio-visual Speech Processing , pp. 177-182
- Potamianos, G.¹ Neti, C.²

39
- 85135321224
- See me, hear me: Integrating automatic speech recognition and lip-reading
- P. Duchnowski, U. Meier, and A. Waibel, "See me, hear me: Integrating automatic speech recognition and lip-reading," in Proc. Int. Conf. Spoken Language Processing, 1994, pp. 547-550.
- (1994) Proc. Int. Conf. Spoken Language Processing , pp. 547-550
- Duchnowski, P.¹ Meier, U.² Waibel, A.³

40
- 0032314380
- An image transform approach for HMM based automatic lipreading
- G. Potamianos, H. P. Graf, and E. Cosatto, "An image transform approach for HMM based automatic lipreading," in Proc. Int. Conf. image Processing, vol. 1, 1998, pp. 173-177.
- (1998) Proc. Int. Conf. Image Processing , vol.1 , pp. 173-177
- Potamianos, G.¹ Graf, H.P.² Cosatto, E.³

41
- 0031211240
- Lipreading from color video
- Aug.
- G. Chiou and J.-N. Hwang, "Lipreading from color video," IEEE Trans. Image Processing, vol. 6, pp. 1192-1195, Aug. 1997.
- (1997) IEEE Trans. Image Processing , vol.6 , pp. 1192-1195
- Chiou, G.¹ Hwang, J.-N.²

42
- 0345166088
- Lipreading using eigensequences
- N. Li, S. Dettmer, and M. Shah, "Lipreading using eigensequences," in Proc. Int. Workshop Automatic Face Gesture Recognition, 1995, pp. 30-34.
- (1995) Proc. Int. Workshop Automatic Face Gesture Recognition , pp. 30-34
- Li, N.¹ Dettmer, S.² Shah, M.³

43
- 0035791204
- Feature analysis for automatic speechreading
- P. Scanlon and R. Reilly, "Feature analysis for automatic speechreading," in Proc. Workshop Multimedia Signal Processing, 2001, pp. 625-630.
- (2001) Proc. Workshop Multimedia Signal Processing , pp. 625-630
- Scanlon, P.¹ Reilly, R.²

44
- 0035386489
- A cascade visual front end for speaker independent automatic speechreading
- July/Oct.
- G. Potamianos, C. Neti, G. Iyengar, A. W. Senior, and A. Verma, "A cascade visual front end for speaker independent automatic speechreading," Int. J. Speech Technol., vol. 4, pp. 193-208, July/Oct. 2001.
- (2001) Int. J. Speech Technol. , vol.4 , pp. 193-208
- Potamianos, G.¹ Neti, C.² Iyengar, G.³ Senior, A.W.⁴ Verma, A.⁵

45
- 84925619981
- Word dependent acoustic-labial weights in HMM-based speech recognition
- P. Jourlin, "Word dependent acoustic-labial weights in HMM-based speech recognition," in Proc. Eur. Workshop Audio-Visual Speech Processing, 1997, pp. 69-72.
- (1997) Proc. Eur. Workshop Audio-visual Speech Processing , pp. 69-72
- Jourlin, P.¹

46
- 0003770986
- Comparing models for audiovisual fusion in a noisy-vowel recognition task
- Nov.
- P. Teissier, J. Robert-Ribes, and J. L. Schwartz, "Comparing models for audiovisual fusion in a noisy-vowel recognition task," IEEE Trans. Speech Audio Processing, vol. 7, pp. 629-642, Nov. 1999.
- (1999) IEEE Trans. Speech Audio Processing , vol.7 , pp. 629-642
- Teissier, P.¹ Robert-Ribes, J.² Schwartz, J.L.³

47
- 0036874527
- Noise adaptive stream weighting in audio-visual speech recognition
- Nov.
- M. Heckmann, F. Berthommier, and K. Kroschel, "Noise adaptive stream weighting in audio-visual speech recognition," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1260-1273, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1260-1273
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

48
- 85009110558
- Lip representation by image ellipse
- L. Czap, "Lip representation by image ellipse," in Proc. Int. Conf. Spoken Language Processing, vol. 4, 2000, pp. 93-96.
- (2000) Proc. Int. Conf. Spoken Language Processing , vol.4 , pp. 93-96
- Czap, L.¹

49
- 0031069562
- Speechreading using probabilistic models
- J. Luettin and N. A. Thacker, "Speechreading using probabilistic models," Comput. Vision Image Understanding, vol. 65, pp. 163-178, 1997.
- (1997) Comput. Vision Image Understanding , vol.65 , pp. 163-178
- Luettin, J.¹ Thacker, N.A.²

50
- 0030374887
- A multiple deformable template approach for visual speech recognition
- D. Chandramohan and P. L. Silsbee, "A multiple deformable template approach for visual speech recognition," in Proc. Int. Conf. Spoken Language Processing, 1996, pp. 50-53.
- (1996) Proc. Int. Conf. Spoken Language Processing , pp. 50-53
- Chandramohan, D.¹ Silsbee, P.L.²

51
- 0038706765
- Automatic speechreading using dynamic contours
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
- B. Dalton, R. Kaucic, and A. Blake, "Automatic speechreading using dynamic contours," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 373-382.
- (1996) Speechreading by Humans and Machines , pp. 373-382
- Dalton, B.¹ Kaucic, R.² Blake, A.³

52
- 0036874915
- Audio-visual speech recognition using MPEG-4 compliant visual features
- Nov.
- P. S. Aleksic, J. J. Williams, Z. Wu, and A. K. Katsaggelos, "Audio-visual speech recognition using MPEG-4 compliant visual features," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1213-1227, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1213-1227
- Aleksic, P.S.¹ Williams, J.J.² Wu, Z.³ Katsaggelos, A.K.⁴

53
- 0035791288
- HMM based audio-visual speech recognition integrating geometric- and appearance-based visual features
- M. T. Chan, "HMM based audio-visual speech recognition integrating geometric- and appearance-based visual features," in Proc. Workshop Multimedia Signal Processing, 2001, pp. 9-14.
- (2001) Proc. Workshop Multimedia Signal Processing , pp. 9-14
- Chan, M.T.¹

54
- 84957810778
- Active appearance models
- T. F. Cootes, G. J. Edwards, and C. J. Taylor, "Active appearance models," in Proc. Eur. Conf. Computer Vision, 1998, pp. 484-498.
- (1998) Proc. Eur. Conf. Computer Vision , pp. 484-498
- Cootes, T.F.¹ Edwards, G.J.² Taylor, C.J.³

55
- 84925640716
- A multimedia platform for audio-visual speech processing
- A. Adjoudani, T. Guiard-Marigny, B. L. Goff, L. Reveret, and C. Benoît, "A multimedia platform for audio-visual speech processing," in Proc. Eur. Conf. Speech Communication Technology, 1997, pp. 1671-1674.
- (1997) Proc. Eur. Conf. Speech Communication Technology , pp. 1671-1674
- Adjoudani, A.¹ Guiard-Marigny, T.² Goff, B.L.³ Reveret, L.⁴ Benoît, C.⁵

56
- 34250090755
- Snakes: Active contour models
- M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active contour models," Int. J. Comput. Vision, vol. 4, pp. 321-331, 1988.
- (1988) Int. J. Comput. Vision , vol.4 , pp. 321-331
- Kass, M.¹ Witkin, A.² Terzopoulos, D.³

57
- 0026903014
- Feature extraction from faces using deformable templates
- A. L. Yuille, P. W. Hallinan, and D. S. Cohen, "Feature extraction from faces using deformable templates," Int. J. Comput. Vision, vol. 8, pp. 99-111, 1992.
- (1992) Int. J. Comput. Vision , vol.8 , pp. 99-111
- Yuille, A.L.¹ Hallinan, P.W.² Cohen, D.S.³

58
- 0029182228
- Active shape models - Their training and application
- Jan.
- T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, "Active shape models - their training and application," Comput. Vision image Understanding, vol. 61, pp. 38-59, Jan. 1995.
- (1995) Comput. Vision Image Understanding , vol.61 , pp. 38-59
- Cootes, T.F.¹ Taylor, C.J.² Cooper, D.H.³ Graham, J.⁴

59
- 0031361424
- Robust recognition of faces and facial features with a multi-modal system
- H. P. Graf, E. Cosatto, and G. Potamianos, "Robust recognition of faces and facial features with a multi-modal system," in Proc. Int. Conf. Systems, Man, Cybernetics, 1997, pp. 2034-2039.
- (1997) Proc. Int. Conf. Systems, Man, Cybernetics , pp. 2034-2039
- Graf, H.P.¹ Cosatto, E.² Potamianos, G.³

60
- 0036875048
- Automatic speechreading with applications to human-computer interfaces
- Nov.
- X. Zhang, C. C. Broun, R. M. Mersereau, and M. Clements, "Automatic speechreading with applications to human-computer interfaces," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1228-1247, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1228-1247
- Zhang, X.¹ Broun, C.C.² Mersereau, R.M.³ Clements, M.⁴

61
- 0036875015
- Automatically building and evaluating statistical models for lipreading
- Nov.
- P. Daubias and P. Deléglise, "Automatically building and evaluating statistical models for lipreading," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1202-1212, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1202-1212
- Daubias, P.¹ Deléglise, P.²

62
- 0031672526
- Neural network-based face detection
- Jan.
- H. A. Rowley, S. Batuja, and T. Kanade, "Neural network-based face detection," IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 23-38, Jan. 1998.
- (1998) IEEE Trans. Pattern Anal. Machine Intell. , vol.20 , pp. 23-38
- Rowley, H.A.¹ Batuja, S.² Kanade, T.³

63
- 0031648023
- Example-based learning for view-based human face detection
- Jan.
- K.-K. Sung and T. Poggio, "Example-based learning for view-based human face detection," IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 39-51, Jan. 1998.
- (1998) IEEE Trans. Pattern Anal. Machine Intell. , vol.20 , pp. 39-51
- Sung, K.-K.¹ Poggio, T.²

64
- 0002656434
- Face and feature finding for a face recognition system
- A. W. Senior, "Face and feature finding for a face recognition system," in Proc. Int. Conf. Audio Video-based Biometric Person Authentication, 1999, pp. 154-159.
- (1999) Proc. Int. Conf. Audio Video-based Biometric Person Authentication , pp. 154-159
- Senior, A.W.¹

65
- 0003436776
- New York: Wiley
- C. R. Rao, Linear Statistical Inference and its Applications. New York: Wiley, 1965.
- (1965) Linear Statistical Inference and Its Applications
- Rao, C.R.¹

66
- 0003515753
- London, U.K.: Chapman & Hall
- C. Chatfield and A. J. Collins, Introduction to Multivariate Analysis. London, U.K.: Chapman & Hall, 1991.
- (1991) Introduction to Multivariate Analysis
- Chatfield, C.¹ Collins, A.J.²

67
- 0035167625
- Improved ROI and within frame discriminant features for lipreading
- G. Potamianos and C. Neti, "Improved ROI and within frame discriminant features for lipreading," in Proc. Int. Conf. Image Processing, vol. 3, 2001, pp. 250-253.
- (2001) Proc. Int. Conf. Image Processing , vol.3 , pp. 250-253
- Potamianos, G.¹ Neti, C.²

68
- 0003626435
- Reading, MA: Addison-Wesley
- R. C. Gonzalez and P. Wintz, Digital Image Processing. Reading, MA: Addison-Wesley, 1977.
- (1977) Digital Image Processing
- Gonzalez, R.C.¹ Wintz, P.²

69
- 0004232640
- Philadelphia, PA: SIAM
- I. Daubechies, Wavelets. Philadelphia, PA: SIAM, 1992.
- (1992) Wavelets
- Daubechies, I.¹

70
- 0029747053
- Integrating audio and visual information to provide highly robust speech recognition
- M. J. Tomlinson, M. J. Russell, and N. M. Brooke, "Integrating audio and visual information to provide highly robust speech recognition," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1996, pp.821-824.
- (1996) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 821-824
- Tomlinson, M.J.¹ Russell, M.J.² Brooke, N.M.³

71
- 0000874921
- Dynamic features for visual speech-reading: A systematic comparison
- M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge. MA: MIT Press
- M. S. Gray, J. R. Movellan, and T. J. Sejnowski, "Dynamic features for visual speech-reading: A systematic comparison," in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge. MA: MIT Press, 1997, vol. 9, pp. 751-757.
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 751-757
- Gray, M.S.¹ Movellan, J.R.² Sejnowski, T.J.³

72
- 0003822743
- Cambridge, U.K.: Entropic
- S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book. Cambridge, U.K.: Entropic, 1999.
- (1999) The HTK Book
- Young, S.¹ Kershaw, D.² Odell, J.³ Ollason, D.⁴ Valtchev, V.⁵ Woodland, P.⁶

73
- 0001769235
- Time-varying information for visual speech perception
- R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, ch. 3
- L. D. Rosenblum and H. M. Saldana, "Time-varying information for visual speech perception," in Hearing by Eye II, R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, 1998, ch. 3. pp. 61-81.
- (1998) Hearing by Eye II , pp. 61-81
- Rosenblum, L.D.¹ Saldana, H.M.²

74
- 84892187452
- Maximum likelihood modeling with Gaussian distributions for classification
- R. A. Gopinath, "Maximum likelihood modeling with Gaussian distributions for classification," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 661-664.
- (1998) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 661-664
- Gopinath, R.A.¹

75
- 0000633724
- Transcription of broadcast news - Some recent improvements to IBM's LVCSR system
- L. Polymenakos, P. Olsen, D. Kanevsky, R. A. Gopinath, P. Gopalakrishnan, and S. Chen, "Transcription of broadcast news - some recent improvements to IBM's LVCSR system," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 901-904.
- (1998) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 901-904
- Polymenakos, L.¹ Olsen, P.² Kanevsky, D.³ Gopinath, R.A.⁴ Gopalakrishnan, P.⁵ Chen, S.⁶

76
- 0029725863
- Adaptive bimodal sensor fusion for automatic speechreading
- U. Meier, W. Hurst, and P. Duchnowski, "Adaptive bimodal sensor fusion for automatic speechreading," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1996, pp. 833-836.
- (1996) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 833-836
- Meier, U.¹ Hurst, W.² Duchnowski, P.³

77
- 0003216515
- HMM-based visual speech recognition using intensity and location normalization
- O. Vanegas, A. Tanaka, K. Tokuda, and T. Kitamura, "HMM-based visual speech recognition using intensity and location normalization," in Proc. Int. Conf. Spoken Language Processing, 1998, pp. 289-292.
- (1998) Proc. Int. Conf. Spoken Language Processing , pp. 289-292
- Vanegas, O.¹ Tanaka, A.² Tokuda, K.³ Kitamura, T.⁴

78
- 85032752352
- Audiovisual speech processing. Lip reading and lip synchronization
- Jan.
- T. Chen, "Audiovisual speech processing. Lip reading and lip synchronization," IEEE Signal Processing Mag., vol. 18, pp. 9-21, Jan. 2001.
- (2001) IEEE Signal Processing Mag. , vol.18 , pp. 9-21
- Chen, T.¹

79
- 0036875002
- A support vector machine-based dynamic network for visual speech recognition applications
- Nov.
- M. Gordan, C. Kotropoulos, and I. Pitas, "A support vector machine-based dynamic network for visual speech recognition applications," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1248-1259, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1248-1259
- Gordan, M.¹ Kotropoulos, C.² Pitas, I.³

80
- 0036295989
- Audio-visual speech modeling using coupled hidden Markov models
- |80] S. M. Chu and T. S. Huang, "Audio-visual speech modeling using coupled hidden Markov models," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2002, pp. 2009-2012.
- (2002) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 2009-2012
- Chu, S.M.¹ Huang, T.S.²

81
- 0036874756
- Moving talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus
- Nov.
- E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, "Moving talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1189-1201, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1189-1201
- Patterson, E.K.¹ Gurbuz, S.² Tufekci, Z.³ Gowdy, J.N.⁴

82
- 0002358797
- Discriminative learning of visual data for audiovisual speech recognition
- A. Rogozan, "Discriminative learning of visual data for audiovisual speech recognition," Int. J. Artif. Intell. Tools, vol. 8, pp. 43-52, 1999.
- (1999) Int. J. Artif. Intell. Tools , vol.8 , pp. 43-52
- Rogozan, A.¹

83
- 0030247984
- Computer lipreading for improved accuracy in automatic speech recognition
- Sept.
- P. L. Silsbee and A. C. Bovik, "Computer lipreading for improved accuracy in automatic speech recognition," IEEE Trans. Speech Audio Processing, vol. 4, pp. 337-351, Sept. 1996.
- (1996) IEEE Trans. Speech Audio Processing , vol.4 , pp. 337-351
- Silsbee, P.L.¹ Bovik, A.C.²

84
- 0002629270
- Maximum likelihood from incomplete data via the EM algorithm
- A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. Royal Statist. Soc., vol. 39, pp. 1-38, 1977.
- (1977) J. Royal Statist. Soc. , vol.39 , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

85
- 0022890536
- Maximum mutual information estimation of hidden Markov model parameters for speech recognition
- L. R. Bahl, P. F. Brown, P. V. DeSouza, and R. L. Mercer, "Maximum mutual information estimation of hidden Markov model parameters for speech recognition," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1986, pp. 49-52.
- (1986) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 49-52
- Bahl, L.R.¹ Brown, P.F.² DeSouza, P.V.³ Mercer, R.L.⁴

86
- 0006132736
- A minimum error rate pattern recognition approach to speech recognition
- Jan.
- W. Chou, B.-H. Juang, C.-H. Lee, and F. Soong, "A minimum error rate pattern recognition approach to speech recognition," Int. J. Pattern Recognit. Artif. Intell., vol. 8, pp. 5-31, Jan. 1994.
- (1994) Int. J. Pattern Recognit. Artif. Intell. , vol.8 , pp. 5-31
- Chou, W.¹ Juang, B.-H.² Lee, C.-H.³ Soong, F.⁴

87
- 0009626553
- Noisy speech enhancement with filters estimated from the speaker's lips
- L. Girin, G. Feng, and J.-L. Schwartz, "Noisy speech enhancement with filters estimated from the speaker's lips," in Proc. Eur. Conf. Speech Communication Technology, 1995, pp. 1559-1562.
- (1995) Proc. Eur. Conf. Speech Communication Technology , pp. 1559-1562
- Girin, L.¹ Feng, G.² Schwartz, J.-L.³

88
- 0034974093
- Audio-visual enhancement of speech in noise
- L. Girin, J.-L. Schwartz, and G. Feng, "Audio-visual enhancement of speech in noise," J. Acoust. Soc. Amer., vol. 109, pp. 3007-3020, 2001.
- (2001) J. Acoust. Soc. Amer. , vol.109 , pp. 3007-3020
- Girin, L.¹ Schwartz, J.-L.² Feng, G.³

89
- 0036295990
- Noisy audio feature enhancement using audio-visual speech data
- R. Goecke, G. Potamianos, and C. Neti, "Noisy audio feature enhancement using audio-visual speech data," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2002, pp. 2025-2028.
- (2002) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 2025-2028
- Goecke, R.¹ Potamianos, G.² Neti, C.³

90
- 85009232030
- Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
- S. Deligne, G. Potamianos, and C. Neti, "Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)," in Proc. Int. Conf. Spoken Language Processing, 2002, pp. 1449-1452.
- (2002) Proc. Int. Conf. Spoken Language Processing , pp. 1449-1452
- Deligne, S.¹ Potamianos, G.² Neti, C.³

91
- 0026860706
- Methods of combining multiple classifiers and their applications in handwritten recognition
- May/June
- L. Xu, A. Krzyzak, and C. Y. Suen, "Methods of combining multiple classifiers and their applications in handwritten recognition," IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 418-435, May/June 1992.
- (1992) IEEE Trans. Syst., Man, Cybern. , vol.22 , pp. 418-435
- Xu, L.¹ Krzyzak, A.² Suen, C.Y.³

92
- 0032021555
- On combining classifiers
- Mar.
- J. Kittler, M. Halef, R. P. W. Duin, and J. Matas, "On combining classifiers," IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 226-239, Mar. 1998.
- (1998) IEEE Trans. Pattern Anal. Machine Intell. , vol.20 , pp. 226-239
- Kittler, J.¹ Halef, M.² Duin, R.P.W.³ Matas, J.⁴

93
- 0033640646
- Statistical pattern recognition: A review
- Jan.
- A. K. Jain, R. P. W. Duin, and J. Mao, "Statistical pattern recognition: A review," IEEE Trans. Pattern Anal. Machine Intell., vol. 22, pp. 4-37, Jan. 2000.
- (2000) IEEE Trans. Pattern Anal. Machine Intell. , vol.22 , pp. 4-37
- Jain, A.K.¹ Duin, R.P.W.² Mao, J.³

94
- 4544258314
- Introduction to biometrics
- A. Jain, R. Bolle, and S. Pankanti, Eds. Norwell, MA: Kluwer, ch. 1
- A. Jain, R. Bolle, and S. Pankanti, "Introduction to biometrics," in Biometrics. Personal Identification in Networked Society, A. Jain, R. Bolle, and S. Pankanti, Eds. Norwell, MA: Kluwer, 1999, ch. 1, pp. 1-41.
- (1999) Biometrics. Personal Identification in Networked Society , pp. 1-41
- Jain, A.¹ Bolle, R.² Pankanti, S.³

95
- 0030355935
- A new ASR approach based on independent processing and recombination of partial frequency bands
- H. Bourlard and S. Dupont, "A new ASR approach based on independent processing and recombination of partial frequency bands," in Proc. Int. Conf. Spoken Language Processing, 1996, pp. 426-429.
- (1996) Proc. Int. Conf. Spoken Language Processing , pp. 426-429
- Bourlard, H.¹ Dupont, S.²

96
- 84960942890
- Test of several external posterior weighting functions for multiband full combination ASR
- H. Glotin and F. Berthommier, "Test of several external posterior weighting functions for multiband full combination ASR," in Proc. Int. Conf. Spoken Language Processing, vol. 1, 2000, pp. 333-336.
- (2000) Proc. Int. Conf. Spoken Language Processing , vol.1 , pp. 333-336
- Glotin, H.¹ Berthommier, F.²

97
- 85009091822
- Audio-visual speech recognition using MCE-based HMM's and model-dependent stream weights
- C. Miyajima, K. Tokuda, and T. Kitamura, "Audio-visual speech recognition using MCE-based HMM's and model-dependent stream weights," in Proc. Int. Conf. Spoken Language Processing, vol. 2, 2000, pp. 1023-1026.
- (2000) Proc. Int. Conf. Spoken Language Processing , vol.2 , pp. 1023-1026
- Miyajima, C.¹ Tokuda, K.² Kitamura, T.³

98
- 0031625499
- Discriminative model combination
- P. Beyerlein, "Discriminative model combination," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 481-484.
- (1998) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 481-484
- Beyerlein, P.¹

99
- 0034842451
- Weighting schemes for audio-visual fusion in speech recognition
- H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2001, pp. 173-176.
- (2001) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 173-176
- Glotin, H.¹ Vergyri, D.² Neti, C.³ Potamianos, G.⁴ Luettin, J.⁵

100
- 0025681008
- Hidden Markov model decomposition of speech and noise
- P. Varga and R. K. Moore, "Hidden Markov model decomposition of speech and noise," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1990, pp. 845-848.
- (1990) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 845-848
- Varga, P.¹ Moore, R.K.²

101
- 0030685285
- Coupled hidden Markov models for complex action recognition
- M. Brand, N. Oliver, and A. Pentland, "Coupled hidden Markov models for complex action recognition," in Proc. Conf. Computer Vision Pattern Recognition, 1997, pp. 994-999.
- (1997) Proc. Conf. Computer Vision Pattern Recognition , pp. 994-999
- Brand, M.¹ Oliver, N.² Pentland, A.³

102
- 85133343575
- Speech intelligibility derived from asynchronous processing of auditory-visual information
- K. W. Grant and S. Greenberg, "Speech intelligibility derived from asynchronous processing of auditory-visual information," in Proc. Conf. Audio-Visual Speech Processing, 2001, pp. 132-137.
- (2001) Proc. Conf. Audio-visual Speech Processing , pp. 132-137
- Grant, K.W.¹ Greenberg, S.²

103
- 85009257778
- Audio-visual continuous speech recognition using a coupled hidden Markov model
- X. Liu, Y. Zhao, X. Pi, L. Liang, and A. V. Nefian, "Audio-visual continuous speech recognition using a coupled hidden Markov model," in Proc. Int. Conf. Spoken Language Processing, 2002, pp. 213-216.
- (2002) Proc. Int. Conf. Spoken Language Processing , pp. 213-216
- Liu, X.¹ Zhao, Y.² Pi, X.³ Liang, L.⁴ Nefian, A.V.⁵

104
- 0012668146
- Asynchrony modeling for audio-visual speech recognition
- G. Gravier, G. Potamianos, and C. Neti, "Asynchrony modeling for audio-visual speech recognition," in Proc. Human Language Technology Conf., 2002, pp. 1-6.
- (2002) Proc. Human Language Technology Conf. , pp. 1-6
- Gravier, G.¹ Potamianos, G.² Neti, C.³

105
- 82055176921
- Fusion of audio-visual information for integrated speech processing
- J. Bigun and F. Smeraldi, Eds. Berlin, Germany: Springer-Verlag
- S. Nakamura, "Fusion of audio-visual information for integrated speech processing," in Audio- and Video-Based Biometric Person Authentication, J. Bigun and F. Smeraldi, Eds. Berlin, Germany: Springer-Verlag, 2001, pp. 127-143.
- (2001) Audio- and Video-based Biometric Person Authentication , pp. 127-143
- Nakamura, S.¹

106
- 17344376380
- Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR
- G. Gravier, S. Axelrod, G. Potamianos, and C. Neti, "Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2002, pp. 853-856.
- (2002) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 853-856
- Gravier, G.¹ Axelrod, S.² Potamianos, G.³ Neti, C.⁴

107
- 0000238336
- A simplex method for function minimization
- J. A. Nelder and R. Mead, "A simplex method for function minimization," Comput. J., vol. 7, pp. 308-313, 1965.
- (1965) Comput. J. , vol.7 , pp. 308-313
- Nelder, J.A.¹ Mead, R.²

108
- 0001437767
- A new SNR-feature mapping for robust multistream speech recognition
- F. Berthommier and H. Glotin, "A new SNR-feature mapping for robust multistream speech recognition," in Proc. Int. Congress Phonetic Sciences, 1999, pp. 711-715.
- (1999) Proc. Int. Congress Phonetic Sciences , pp. 711-715
- Berthommier, F.¹ Glotin, H.²

109
- 85009153179
- Stream confidence estimation for audio-visual speech recognition
- G. Polamianos and C. Neti, "Stream confidence estimation for audio-visual speech recognition," in Proc. Int. Conf. Spoken Language Processing, vol. 3, 2000, pp. 746-749.
- (2000) Proc. Int. Conf. Spoken Language Processing , vol.3 , pp. 746-749
- Polamianos, G.¹ Neti, C.²

110
- 0028419019
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
- Apr.
- J.-L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Processing, vol. 2, pp. 291-298, Apr. 1994.
- (1994) IEEE Trans. Speech Audio Processing , vol.2 , pp. 291-298
- Gauvain, J.-L.¹ Lee, C.-H.²

111
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
- (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

112
- 85135155427
- A comparative study of speaker adaptation techniques
- L. Neumeyer, A. Sankar, and V. Digalakis, "A comparative study of speaker adaptation techniques," in Proc. Eur. Conf. Speech Communication Technology, 1995, pp. 1127-1130.
- (1995) Proc. Eur. Conf. Speech Communication Technology , pp. 1127-1130
- Neumeyer, L.¹ Sankar, A.² Digalakis, V.³

113
- 0030677475
- Speaker adaptive training: A maximum likelihood approach to speaker normalization
- T. Anastasakos, J. McDonough, and J. Makhoul, "Speaker adaptive training: A maximum likelihood approach to speaker normalization," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1997, pp. 1043-1046.
- (1997) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 1043-1046
- Anastasakos, T.¹ McDonough, J.² Makhoul, J.³

114
- 0008808956
- Cambridge University, Cambridge, U.K.
- M. J. F. Gales, "Maximum likelihood multiple projection schemes for hidden Markov models," Cambridge University, Cambridge, U.K., 1999.
- (1999) Maximum Likelihood Multiple Projection Schemes for Hidden Markov Models
- Gales, M.J.F.¹

115
- 0010127090
- Speaker adaptation for audio-visual speech recognition
- G. Potamianos and A. Potamianos, "Speaker adaptation for audio-visual speech recognition," in Proc. Eur. Conf. Speech Communication Technology, 1999, pp. 1291-1294.
- (1999) Proc. Eur. Conf. Speech Communication Technology , pp. 1291-1294
- Potamianos, G.¹ Potamianos, A.²

116
- 84947917954
- The M2VTS multimodal face database
- J. Bigün, G. Chollet, and G. Borgefors, Eds. Berlin, Germany: Springer-Verlag
- S. Pigeon and L. Vandendorpe, "The M2VTS multimodal face database," in Audio-and Video-based Biometric Person Authentication, J. Bigün, G. Chollet, and G. Borgefors, Eds. Berlin, Germany: Springer-Verlag, 1997, pp. 403-109.
- (1997) Audio-and Video-based Biometric Person Authentication , pp. 403-1109
- Pigeon, S.¹ Vandendorpe, L.²

117
- 0001935972
- XM2VTS: The extended M2VTS database
- K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, "XM2VTS: The extended M2VTS database," in Proc. Int. Conf. Audio Video-based Biometric Person Authentication, 1999, pp. 72-76.
- (1999) Proc. Int. Conf. Audio Video-based Biometric Person Authentication , pp. 72-76
- Messer, K.¹ Matas, J.² Kittler, J.³ Luettin, J.⁴ Maitre, G.⁵

118
- 0035791211
- Detection of faces under shadows and lighting variations
- G. Iyengar and C. Neti, "Detection of faces under shadows and lighting variations," in Proc. Workshop Multimedia Signal Processing, 2001, pp. 15-20.
- (2001) Proc. Workshop Multimedia Signal Processing , pp. 15-20
- Iyengar, G.¹ Neti, C.²

119
- 0002068237
- A fast approximate acoustic match for large vocabulary speech recognition
- Jan.
- L. R. Bahl, S. V. De Gennaro, P. S. Gopalakrishnan, and R. L. Mercer, "A fast approximate acoustic match for large vocabulary speech recognition," IEEE Trans. Speech Audio Processing, vol. 1, pp. 59-67, Jan. 1993.
- (1993) IEEE Trans. Speech Audio Processing , vol.1 , pp. 59-67
- Bahl, L.R.¹ De Gennaro, S.V.² Gopalakrishnan, P.S.³ Mercer, R.L.⁴

120
- 0032074310
- Audio-visual integration in multimodal communication
- May
- T. Chen and R. R. Rao, "Audio-visual integration in multimodal communication," Proc. IEEE, vol. 86, pp. 837-852, May 1998.
- (1998) Proc. IEEE , vol.86 , pp. 837-852
- Chen, T.¹ Rao, R.R.²

121
- 0031640392
- A syntactic approach to automatic lip feature extraction for speaker identification
- T. Wark and S. Sridharan, "A syntactic approach to automatic lip feature extraction for speaker identification," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1998, pp. 3693-3696.
- (1998) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 3693-3696
- Wark, T.¹ Sridharan, S.²

122
- 0002347773
- Multisensor biometric person recognition in an access control system
- B. Fröba, C. Küblbeck, C. Rothe, and P. Plankensteiner, "Multisensor biometric person recognition in an access control system," in Proc. Int. Conf. Audio Video-based Biometric Person Authentication, 1999, pp. 55-59.
- (1999) Proc. Int. Conf. Audio Video-based Biometric Person Authentication , pp. 55-59
- Fröba, B.¹ Küblbeck, C.² Rothe, C.³ Plankensteiner, P.⁴

123
- 21244474602
- Audio-visual speaker recognition for broadcast news: Some fusion techniques
- B. Maison, C. Neti, and A. Senior, "Audio-visual speaker recognition for broadcast news: Some fusion techniques," in Proc. Workshop Multimedia Signal Processing, 1999, pp. 161-167.
- (1999) Proc. Workshop Multimedia Signal Processing , pp. 161-167
- Maison, B.¹ Neti, C.² Senior, A.³

124
- 33747294606
- What can visual speech synthesis tell visual speech recognition?
- Pacific Grove, CA
- M. M. Cohen and D. W. Massaro, "What can visual speech synthesis tell visual speech recognition?," presented at the Asilomar Conf. Signals, Systems, Computers, Pacific Grove, CA, 1994.
- (1994) Asilomar Conf. Signals, Systems, Computers
- Cohen, M.M.¹ Massaro, D.W.²

125
- 0029291072
- Lip synchronization using speech-assisted video processing
- Apr.
- T. Chen, H. P. Graf, and K. Wang, "Lip synchronization using speech-assisted video processing," IEEE Signal Processing Lett., vol.2, pp. 57-59, Apr. 1995.
- (1995) IEEE Signal Processing Lett. , vol.2 , pp. 57-59
- Chen, T.¹ Graf, H.P.² Wang, K.³

126
- 85069404424
- Audio-visual unit selection for the synthesis of photo-realistic talking-heads
- E. Cosatto, G. Potamianos, and H. P. Graf, "Audio-visual unit selection for the synthesis of photo-realistic talking-heads," in Proc. Int. Conf. Multimedia Expo, 2000, pp. 1097-1100.
- (2000) Proc. Int. Conf. Multimedia Expo , pp. 1097-1100
- Cosatto, E.¹ Potamianos, G.² Graf, H.P.³

127
- 0034271782
- Photo-realistic talking-heads from image samples
- Sept.
- E. Cosatto and H. P. Graf, "Photo-realistic talking-heads from image samples," IEEE Trans. Multimedia, vol. 2, pp. 152-163, Sept. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , pp. 152-163
- Cosatto, E.¹ Graf, H.P.²

128
- 0036650527
- An HMM-based speech-to-video synthesizer
- July
- J. J. Williams and A. K. Katsaggelos, "An HMM-based speech-to-video synthesizer," IEEE Trans. Neural Networks, vol. 13, pp. 900-915, July 2002.
- (2002) IEEE Trans. Neural Networks , vol.13 , pp. 900-915
- Williams, J.J.¹ Katsaggelos, A.K.²

129
- 0033708494
- Audio-visual intent to speak detection for human computer interaction
- P. De Cuetos, C. Neti, and A. Senior, "Audio-visual intent to speak detection for human computer interaction," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 2000, pp. 1325-1328.
- (2000) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 1325-1328
- De Cuetos, P.¹ Neti, C.² Senior, A.³

130
- 0003342953
- Integration of multimodal features for video scene classification based on HMM
- J. Huang, Z. Liu, Y. Wang, Y. Chen, and E. Wong, "Integration of multimodal features for video scene classification based on HMM," in Proc. Workshop Multimedia Signal Processing, 1999, pp. 53-58.
- (1999) Proc. Workshop Multimedia Signal Processing , pp. 53-58
- Huang, J.¹ Liu, Z.² Wang, Y.³ Chen, Y.⁴ Wong, E.⁵

131
- 84925591950
- Audiovisual speech coder: Using vector quantization to exploit the audio/video correlation
- E. Foucher, L. Girin, and G. Feng, "Audiovisual speech coder: Using vector quantization to exploit the audio/video correlation," in Proc. Conf. Audio-Visual Speech Processing, 1998, pp. 67-71.
- (1998) Proc. Conf. Audio-visual Speech Processing , pp. 67-71
- Foucher, E.¹ Girin, L.² Feng, G.³

132
- 0036874541
- Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
- Nov.
- D. Sodoyer, J.-L. Schwartz, L. Girin, J. Klinkisch, and C. Jutten, "Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1165-1173, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1165-1173
- Sodoyer, D.¹ Schwartz, J.-L.² Girin, L.³ Klinkisch, J.⁴ Jutten, C.⁵

133
- 0028997041
- Knowing who to listen 10 in speech recognition: Visually guided beamforming
- U. Bub, M. Hunke, and A. Waibel, "Knowing who to listen 10 in speech recognition: Visually guided beamforming," in Proc. Int. Conf. Acoustics, Speech, and Signal Processing, 1995, pp. 848-851.
- (1995) Proc. Int. Conf. Acoustics, Speech, and Signal Processing , pp. 848-851
- Bub, U.¹ Hunke, M.² Waibel, A.³

134
- 0036874485
- Joint audio-visual tracking using particle filters
- Nov.
- D. N. Zotkin, R. Duraiswami, and L. S. Davis, "Joint audio-visual tracking using particle filters," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1154-1164, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1154-1164
- Zotkin, D.N.¹ Duraiswami, R.² Davis, L.S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.