SCOPUS 정보 검색 플랫폼

IEEE Transactions on Multimedia

Volumn 2, Issue 3, 2000, Pages 141-151

Audio-visual speech modeling for continuous speech recognition

(2) Dupont, Steéphane a Luettin, Juergen b

a FACULTÉ POLYTECHNIQUE DE MONS (Belgium)

b IDIAP RESEARCH INSTITUTE (Switzerland)

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES; MATHEMATICAL MODELS; SENSOR DATA FUSION; SIGNAL TO NOISE RATIO; SPEECH ANALYSIS; SPEECH RECOGNITION;

AUDIO-VISUAL SPEECH MODELING; HIDDEN MARKOV MODELS; JOINT TEMPORAL MODELING; PERCEPTUAL LINEAR PREDICTION; SENSOR INTEGRATION;

MULTIMEDIA SYSTEMS;

EID: 0034270644 PISSN: 15209210 EISSN: None Source Type: Journal
DOI: 10.1109/6046.865479 Document Type: Article

Times cited : (538)

References (48)

1
- 0027128576
- "Lipreading and audio-visual speech perception,"
- A. Q. Summerfield, "Lipreading and audio-visual speech perception," Philos. Trans. R. Soc. London B. vol. 335, pp. 71-78, 1992.
- (1992) Philos. Trans. R. Soc. London B. , vol.335 , pp. 71-78
- Summerfield, A.Q.¹

2
- 0001048664
- "Visual contributions to speech intelligibility in noise,"
- W. H. Sumby and I. Pollak, "Visual contributions to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, pp. 212-215, 1954.
- (1954) J. Acoust. Soc. Amer. , vol.26 , pp. 212-215
- Sumby, W.H.¹ Pollak, I.²

3
- 0025767028
- "Evaluating the articulation index for auditory-visual input,"
- K. W. Grant and L. D. Braida, "Evaluating the articulation index for auditory-visual input," J. Acoust. Soc. Amer., vol. 89, no. 6, pp. 2952-2960, 1991.
- (1991) J. Acoust. Soc. Amer. , vol.89 , Issue.6 , pp. 2952-2960
- Grant, K.W.¹ Braida, L.D.²

4
- 33749915404
- J. Luettin, "Visual Speech and Speaker Recognition," Ph.D. dissertation, Univ. Sheffield, Sheffield, U.K., 1997.
- (1997) "Visual Speech and Speaker Recognition," Ph.D. dissertation, Univ. Sheffield, Sheffield, U.K
- Luettin, J.¹

5
- 0002132290
- "Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli," in
- D. Reisberg, J. McLean, and A. Goldfield, "Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli," in Hearing by Eye: The Psychology of Lip-Reading, B. Dodd and R. Campbell, Eds. London, U.K.: Lawrence Erlbaum, 1987, pp. 97-113.
- (1987) Hearing by Eye: the Psychology of Lip-Reading, B. Dodd and R. Campbell, Eds. London, U.K.: Lawrence Erlbaum , pp. 97-113
- Reisberg, D.¹ McLean, J.² Goldfield, A.³

6
- 0017199877
- "Hearing lips and seeing voices,"
- H. McGurk and J. Mac Donald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976.
- (1976) Nature , vol.264 , pp. 746-748
- McGurk, H.¹ Mac Donald, J.²

7
- 0031187171
- "Speech recognition by machines and humans,"
- R. P. Lippmann, "Speech recognition by machines and humans," Speech Commun., vol. 22, pp. 1-15, 1997.
- (1997) Speech Commun. , vol.22 , pp. 1-15
- Lippmann, R.P.¹

8
- 0029288202
- "Speech recognition in noisy environments: A survey,"
- Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol. 16, pp. 261-291, 1995.
- (1995) Speech Commun. , vol.16 , pp. 261-291
- Gong, Y.¹

9
- 0029230678
- "The challenge of spoken language processing: Research directions for the nineties," IEEE Trans
- R. Cole, L. Hirschmann, and L. Atlas et al., "The challenge of spoken language processing: Research directions for the nineties," IEEE Trans. Speech Audio Processing, vol. 3, no. 1, pp. 1-20, 1995.
- (1995) Speech Audio Processing , vol.3 , Issue.1 , pp. 1-20
- Cole, R.¹ Hirschmann, L.² Atlas, L.³

10
- 0025503485
- "Neural network models of sensory integration for improved vowel recognition,"
- B. P. Yuhas, M. H. Goldstein, T. J. Sejnowski, and R. E. Jenkins, "Neural network models of sensory integration for improved vowel recognition," Proc. IEEE, vol. 78, pp. 1658-1668, Oct. 1990.
- (1990) Proc. IEEE , vol.78 , pp. 1658-1668
- Yuhas, B.P.¹ Goldstein, M.H.² Sejnowski, T.J.³ Jenkins, R.E.⁴

11
- 0029234004
- "Nonlinear manifold learning for visual speech recognition," in
- C. Bregler and S. M. Omohundro, "Nonlinear manifold learning for visual speech recognition," in IEEE Int. Conf. Computer Vision, Piscataway, NJ, 1995, pp. 494-499.
- (1995) IEEE Int. Conf. Computer Vision, Piscataway, NJ , pp. 494-499
- Bregler, C.¹ Omohundro, S.M.²

12
- 0030247984
- "Computer lipreading for improved accuracy in automatic speech recognition,"
- P. L. Silsbee and A. C. Bovik, "Computer lipreading for improved accuracy in automatic speech recognition," IEEE Trans. Speech Audio Processing, vol. 4, no. 5, pp. 337-351, 1996.
- (1996) IEEE Trans. Speech Audio Processing , vol.4 , Issue.5 , pp. 337-351
- Silsbee, P.L.¹ Bovik, A.C.²

13
- 0032314380
- "An image transform approach for HMM based automatic lipreading," in
- G. Potamianos, H. P. Graf, and E. Cosatto, "An image transform approach for HMM based automatic lipreading," in Proc. IEEE Int. Conf. Image Processing, 1998, pp. 173-177.
- (1998) Proc. IEEE Int. Conf. Image Processing , pp. 173-177
- Potamianos, G.¹ Graf, H.P.² Cosatto, E.³

14
- 0025750892
- "Automatic lipreading by optical flow analysis,"
- K. Mase and A. Pentland, "Automatic lipreading by optical flow analysis," Syst. Comput. Jpn., vol. 22, no. 6, 1991.
- (1991) Syst. Comput. Jpn. , vol.22 , Issue.6
- Mase, K.¹ Pentland, A.²

15
- 0022228262
- "Automatic lipreading to enhance speech recognition," in
- E. D. Petajan, "Automatic lipreading to enhance speech recognition," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 1985, pp. 40-47.
- (1985) Proc. IEEE Conf. Computer Vision and Pattern Recognition , pp. 40-47
- Petajan, E.D.¹

16
- 0000134331
- "2D deformable models for visual speech analysis," in
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, vol. 150 of NATO ASI Series, Series F: Computer and Systems Sciences
- T. Coianiz, L. Torresani, and B. Capril, "2D deformable models for visual speech analysis," in Speechreading by Humans and Machines: Models, Systems and Applications, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, vol. 150 of NATO ASI Series, Series F: Computer and Systems Sciences, pp. 391-398.
- Speechreading by Humans and Machines: Models, Systems and Applications , pp. 391-398
- Coianiz, T.¹ Torresani, L.² Capril, B.³

17
- 0031069562
- "Speechreading using probabilistic models,"
- J. Luettin and N. A. Thacker, "Speechreading using probabilistic models," Comput. Vis. Image Understand., vol. 65, no. 2, pp. 163-178, Feb. 1997.
- (1997) Comput. Vis. Image Understand. , vol.65 , Issue.2 , pp. 163-178
- Luettin, J.¹ Thacker, N.A.²

18
- 0032309170
- "3D modeling and tracking of human lip motion," in
- S. Basu, N. Oliver, and A. Pentland, "3D modeling and tracking of human lip motion," in Proc. IEEE Int. Conf. Computer Vision, 1998.
- (1998) Proc. IEEE Int. Conf. Computer Vision
- Basu, S.¹ Oliver, N.² Pentland, A.³

19
- 0029182228
- "Active shape models-their training and application,"
- T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, "Active shape models-their training and application," Comput. Vis. Image Understand., vol. 61, pp. 38-59, Jan. 1995.
- (1995) Comput. Vis. Image Understand. , vol.61 , pp. 38-59
- Cootes, T.F.¹ Taylor, C.J.² Cooper, D.H.³ Graham, J.⁴

20
- 0031187270
- "Automatic interpretation and coding of face images using flexible models,"
- A. Lanitis, C. J. Taylor, and T. F. Cootes, "Automatic interpretation and coding of face images using flexible models," IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 743-756, July 1997.
- (1997) IEEE Trans. Pattern Anal. Machine Intell. , vol.19 , pp. 743-756
- Lanitis, A.¹ Taylor, C.J.² Cootes, T.F.³

21
- 0000238336
- "A simplex method for function optimization,"
- J. A. Neider and R. Mead, "A simplex method for function optimization," Comput. J., vol. 7, no. 4, pp. 308-313, 1965.
- (1965) Comput. J. , vol.7 , Issue.4 , pp. 308-313
- Neider, J.A.¹ Mead, R.²

22
- 0000874921
- "Dynamic features for visual speechreading: A systematic comparison," in
- M. S. Gray, J. R. Movellan, and T. J. Sejnowski, "Dynamic features for visual speechreading: A systematic comparison," in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, vol. 9.
- (1997) Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press , vol.9
- Gray, M.S.¹ Movellan, J.R.² Sejnowski, T.J.³

23
- 0029747053
- "Integrating audio and visual information to provide highly robust speech recognition," in
- M. J. Tomlinson, M. J. Rüssel, and N. M. Brooke, "Integrating audio and visual information to provide highly robust speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech and Signal Processing, 1996, pp. 821-824.
- (1996) Proc. IEEE Int. Conf. Acoust. Speech and Signal Processing , pp. 821-824
- Tomlinson, M.J.¹ Rüssel, M.J.² Brooke, N.M.³

24
- 0026200336
- "Crossmodal integration in the identification of consonants,"
- L. Braida, "Crossmodal integration in the identification of consonants," Q. J. Exp. Psych., vol. 43A, no. 3, pp. 647-677, 1991.
- (1991) Q. J. Exp. Psych. , vol.43 , Issue.3 , pp. 647-677
- Braida, L.¹

25
- 0018074107
- "Voice-mouth synthesis of tactual/visual perception of /pa, ba, ma/,"
- N. P. Erber and C. L. De Filippo, "Voice-mouth synthesis of tactual/visual perception of /pa, ba, ma/," J. Acoust. Soc. Amer, vol. 64, pp. 1015-1019, 1978.
- (1978) J. Acoust. Soc. Amer , vol.64 , pp. 1015-1019
- Erber, N.P.¹ De Filippo, C.L.²

26
- 0022411853
- "On the role of visual rate information in phonetic perception,"
- K. P. Green and J. L. Miller, "On the role of visual rate information in phonetic perception," Percept. Psychophys., vol. 38, no. 3, pp. 269-276, 1985.
- (1985) Percept. Psychophys. , vol.38 , Issue.3 , pp. 269-276
- Green, K.P.¹ Miller, J.L.²

27
- 33749889711
- H. Fletcher, Speech and Hearing in Communication. New York: Krieger, 1953.
- (1953) Speech and Hearing in Communication. New York: Krieger
- Fletcher, H.¹

28
- 0028516073
- "How do humans process and recognize speech?,"
- J. B. Allen, "How do humans process and recognize speech?," IEEE Trans. Speech Audio Processing, vol. 2, no. 4, pp. 567-577, 1994.
- (1994) IEEE Trans. Speech Audio Processing , vol.2 , Issue.4 , pp. 567-577
- Allen, J.B.¹

29
- 0030643240
- "Sub-band-based speech recognition," in
- H. Bourlard and S. Dupont, "Sub-band-based speech recognition," in Proc. IEEE Int.. Conf. Acoustic Speech and Signal Processing, Apr. 1997, pp. 1251-1254.
- (1997) Proc. IEEE Int.. Conf. Acoustic Speech and Signal Processing, Apr. , pp. 1251-1254
- Bourlard, H.¹ Dupont, S.²

30
- 0004244302
- L. Rabiner and B. H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall Signal Processing Series, 1993.
- (1993) Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall Signal Processing Series
- Rabiner, L.¹ Juang, B.H.²

31
- 84899026353
- "Hidden Markov decision trees," in
- M.I. Jordan, Z. Ghahramani, and L. K. Saul, "Hidden Markov decision trees," in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, vol. 9.
- (1997) Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press , vol.9
- Jordan, M.I.¹ Ghahramani, Z.² Saul, L.K.³

32
- 33750924107
- "Toward Markov random field modeling of speech," in
- G. Gravier, M. Sigelle, and G. Chollet, "Toward Markov random field modeling of speech," in Proc. Int. Conf. Spoken Language Processing, Sydney, Australia, Dec. 1998.
- (1998) Proc. Int. Conf. Spoken Language Processing, Sydney, Australia, Dec.
- Gravier, G.¹ Sigelle, M.² Chollet, G.³

33
- 0018724280
- "Two level DP matching-a dynamic time warping based pattern matching algorithm for continuous speech recognition,"
- H. Sakoe, "Two level DP matching-a dynamic time warping based pattern matching algorithm for continuous speech recognition," IEEE Trans. IECE Jpn., vol. 3, 1979.
- (1979) IEEE Trans. IECE Jpn. , vol.3
- Sakoe, H.¹

34
- 0025681008
- "Hidden Markov model decomposition of speech and noise," in
- A. P. Varga and R. K. Moore, "Hidden Markov model decomposition of speech and noise," in Proc. IEEE Int. Conf. Acoustic Speech and Signal Processing, 1990, pp. 845-848.
- (1990) Proc. IEEE Int. Conf. Acoustic Speech and Signal Processing , pp. 845-848
- Varga, A.P.¹ Moore, R.K.²

35
- 84892184580
- "Speech intelligibility in the presence of crosschannel spectral asynchrony," in
- T. Aral and S. Greenberg, "Speech intelligibility in the presence of crosschannel spectral asynchrony," in Proc. ICASSP, 1998, pp. 933-936.
- (1998) Proc. ICASSP , pp. 933-936
- Aral, T.¹ Greenberg, S.²

36
- 0005089970
- "Perceiving asynchronous bimodal speech in consonant vowel and vowel syllables,"
- D. W. Massaro and M. M. Cohen, "Perceiving asynchronous bimodal speech in consonant vowel and vowel syllables," Speech Commun., vol. 13, pp. 127-134, 1993.
- (1993) Speech Commun. , vol.13 , pp. 127-134
- Massaro, D.W.¹ Cohen, M.M.²

37
- 78649589093
- "Intelligibility of audio-visually desynchronized speech: Asymmetrical effect of phoneme position," in Proc
- P. M. Smeele et al., "Intelligibility of audio-visually desynchronized speech: Asymmetrical effect of phoneme position," in Proc. Int. Conf. Spoken Language Processing, Alberta, Canada, 1992, pp. 65-68.
- (1992) Int. Conf. Spoken Language Processing, Alberta, Canada , pp. 65-68
- Smeele, P.M.¹

38
- 0003573244
- H. Bourlard and N. Morgan, Connectionist Speech Recognition-A Hybrid Approach. Norwell, MA: Kluwer, 1994.
- (1994) Connectionist Speech Recognition-A Hybrid Approach. Norwell, MA: Kluwer
- Bourlard, H.¹ Morgan, N.²

39
- 0031624666
- "Discriminative training of hmm stream exponents for audio-visual speech recognition," in
- G. Potamianos and H. P. Graf, "Discriminative training of hmm stream exponents for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoustic Speech and Signal Processing, Seattle, WA, 1998, pp. 3733-3736.
- (1998) Proc. IEEE Int. Conf. Acoustic Speech and Signal Processing, Seattle, WA , pp. 3733-3736
- Potamianos, G.¹ Graf, H.P.²

40
- 84892174007
- "Weighted viterbi algorithm and state duration modeling for speech recognition in noise," in
- N.B. Yoma, F. R. Mclnnes, and M. A. Jack, "Weighted viterbi algorithm and state duration modeling for speech recognition in noise," in Proc. IEEE Int. Conf. Acoustic Speech and Signal Processing, Seattle, WA, 1998, pp. 709-712.
- (1998) Proc. IEEE Int. Conf. Acoustic Speech and Signal Processing, Seattle, WA , pp. 709-712
- Yoma, N.B.¹ Mclnnes, F.R.² Jack, M.A.³

41
- 0006184263
- "The M2VTS multimodal face database (release 1.00)," in
- S. Pigeon and L. Vandendorpe, "The M2VTS multimodal face database (release 1.00)," in Proc. of the First International Conference on Audioand Video-based Biometrie Person Authentication, Crans-Montana, Switzerland, 1997.
- (1997) Proc. of the First International Conference on Audioand Video-based Biometrie Person Authentication, Crans-Montana, Switzerland
- Pigeon, S.¹ Vandendorpe, L.²

42
- 0025041264
- "Perceptual linear predictive (PLP) analysis of speech,"
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
- (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

43
- 0022667694
- "Speaker indépendant isolated word recognizer using dynamic features of speech spectrum,"
- S. Furui, "Speaker indépendant isolated word recognizer using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, no. 1, pp. 52-59, 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Processing , vol.34 , Issue.1 , pp. 52-59
- Furui, S.¹

44
- 33749951737
- A. Constantinescu, and P. Langlais, Swiss French polyphone and polyvar
- G. Chollet, J. L. Cochard, C. Jaboulet, A. Constantinescu, and P. Langlais, Swiss French polyphone and polyvar: Telephone speech databases to model inter and intra-speaker variability, , IDIAP, Martigny, Switzerland, 1996.
- (1996) Telephone Speech Databases to Model Inter and Intra-speaker Variability, , IDIAP, Martigny, Switzerland
- Chollet, G.¹ Cochard, J.L.² Jaboulet, C.³

45
- 33749928608
- N. N. Mirghafori, "A Multi-Band Approach to Automatic Speech Recognition," Ph.D. dissertation, Int. Comput. Sei. Inst., Berkeley, CA, Jan. 1999.
- (1999) "A Multi-Band Approach to Automatic Speech Recognition," Ph.D. dissertation, Int. Comput. Sei. Inst., Berkeley, CA, Jan
- Mirghafori, N.N.¹

46
- 0028517164
- "Rasta processing of speech,"
- H. Hermansky and N. Morgan, "Rasta processing of speech," IEEE Trans. Speech Audio Processing, vol. 2, no. 4, pp. 578-589, 1994.
- (1994) IEEE Trans. Speech Audio Processing , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

47
- 84988755174
- ULg, "ULg-acoustics laboratory-The MADRAS project (1998, Aug.). Online Available http://www.montefiore.ulg.ac.be/services/acous/homelab.html
- "ULg-acoustics laboratory-The MADRAS project , vol.1998

48
- 0002768123
- "Assessing local noise level estimation methods," in
- S. Dupont and C. Ris, "Assessing local noise level estimation methods," in Workshop on Robust Methods for Speech Recognition in Adverse Conditions (Nokia, COST249, IEEE) Tampere, Finland, May 1999, pp. 115-118.
- (1999) Workshop on Robust Methods for Speech Recognition in Adverse Conditions Nokia, COST , vol.249 , pp. 115-118
- Dupont, S.¹ Ris, C.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.