메뉴 건너뛰기




Volumn 2, Issue 3, 2000, Pages 141-151

Audio-visual speech modeling for continuous speech recognition

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES; MATHEMATICAL MODELS; SENSOR DATA FUSION; SIGNAL TO NOISE RATIO; SPEECH ANALYSIS; SPEECH RECOGNITION;

EID: 0034270644     PISSN: 15209210     EISSN: None     Source Type: Journal    
DOI: 10.1109/6046.865479     Document Type: Article
Times cited : (538)

References (48)
  • 1
    • 0027128576 scopus 로고
    • "Lipreading and audio-visual speech perception,"
    • A. Q. Summerfield, "Lipreading and audio-visual speech perception," Philos. Trans. R. Soc. London B. vol. 335, pp. 71-78, 1992.
    • (1992) Philos. Trans. R. Soc. London B. , vol.335 , pp. 71-78
    • Summerfield, A.Q.1
  • 2
    • 0001048664 scopus 로고
    • "Visual contributions to speech intelligibility in noise,"
    • W. H. Sumby and I. Pollak, "Visual contributions to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, pp. 212-215, 1954.
    • (1954) J. Acoust. Soc. Amer. , vol.26 , pp. 212-215
    • Sumby, W.H.1    Pollak, I.2
  • 3
    • 0025767028 scopus 로고
    • "Evaluating the articulation index for auditory-visual input,"
    • K. W. Grant and L. D. Braida, "Evaluating the articulation index for auditory-visual input," J. Acoust. Soc. Amer., vol. 89, no. 6, pp. 2952-2960, 1991.
    • (1991) J. Acoust. Soc. Amer. , vol.89 , Issue.6 , pp. 2952-2960
    • Grant, K.W.1    Braida, L.D.2
  • 6
    • 0017199877 scopus 로고
    • "Hearing lips and seeing voices,"
    • H. McGurk and J. Mac Donald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976.
    • (1976) Nature , vol.264 , pp. 746-748
    • McGurk, H.1    Mac Donald, J.2
  • 7
    • 0031187171 scopus 로고    scopus 로고
    • "Speech recognition by machines and humans,"
    • R. P. Lippmann, "Speech recognition by machines and humans," Speech Commun., vol. 22, pp. 1-15, 1997.
    • (1997) Speech Commun. , vol.22 , pp. 1-15
    • Lippmann, R.P.1
  • 8
    • 0029288202 scopus 로고
    • "Speech recognition in noisy environments: A survey,"
    • Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol. 16, pp. 261-291, 1995.
    • (1995) Speech Commun. , vol.16 , pp. 261-291
    • Gong, Y.1
  • 9
    • 0029230678 scopus 로고
    • "The challenge of spoken language processing: Research directions for the nineties," IEEE Trans
    • R. Cole, L. Hirschmann, and L. Atlas et al., "The challenge of spoken language processing: Research directions for the nineties," IEEE Trans. Speech Audio Processing, vol. 3, no. 1, pp. 1-20, 1995.
    • (1995) Speech Audio Processing , vol.3 , Issue.1 , pp. 1-20
    • Cole, R.1    Hirschmann, L.2    Atlas, L.3
  • 10
    • 0025503485 scopus 로고
    • "Neural network models of sensory integration for improved vowel recognition,"
    • B. P. Yuhas, M. H. Goldstein, T. J. Sejnowski, and R. E. Jenkins, "Neural network models of sensory integration for improved vowel recognition," Proc. IEEE, vol. 78, pp. 1658-1668, Oct. 1990.
    • (1990) Proc. IEEE , vol.78 , pp. 1658-1668
    • Yuhas, B.P.1    Goldstein, M.H.2    Sejnowski, T.J.3    Jenkins, R.E.4
  • 12
    • 0030247984 scopus 로고    scopus 로고
    • "Computer lipreading for improved accuracy in automatic speech recognition,"
    • P. L. Silsbee and A. C. Bovik, "Computer lipreading for improved accuracy in automatic speech recognition," IEEE Trans. Speech Audio Processing, vol. 4, no. 5, pp. 337-351, 1996.
    • (1996) IEEE Trans. Speech Audio Processing , vol.4 , Issue.5 , pp. 337-351
    • Silsbee, P.L.1    Bovik, A.C.2
  • 14
    • 0025750892 scopus 로고
    • "Automatic lipreading by optical flow analysis,"
    • K. Mase and A. Pentland, "Automatic lipreading by optical flow analysis," Syst. Comput. Jpn., vol. 22, no. 6, 1991.
    • (1991) Syst. Comput. Jpn. , vol.22 , Issue.6
    • Mase, K.1    Pentland, A.2
  • 16
    • 0000134331 scopus 로고    scopus 로고
    • "2D deformable models for visual speech analysis," in
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, vol. 150 of NATO ASI Series, Series F: Computer and Systems Sciences
    • T. Coianiz, L. Torresani, and B. Capril, "2D deformable models for visual speech analysis," in Speechreading by Humans and Machines: Models, Systems and Applications, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, vol. 150 of NATO ASI Series, Series F: Computer and Systems Sciences, pp. 391-398.
    • Speechreading by Humans and Machines: Models, Systems and Applications , pp. 391-398
    • Coianiz, T.1    Torresani, L.2    Capril, B.3
  • 17
    • 0031069562 scopus 로고    scopus 로고
    • "Speechreading using probabilistic models,"
    • J. Luettin and N. A. Thacker, "Speechreading using probabilistic models," Comput. Vis. Image Understand., vol. 65, no. 2, pp. 163-178, Feb. 1997.
    • (1997) Comput. Vis. Image Understand. , vol.65 , Issue.2 , pp. 163-178
    • Luettin, J.1    Thacker, N.A.2
  • 20
    • 0031187270 scopus 로고    scopus 로고
    • "Automatic interpretation and coding of face images using flexible models,"
    • A. Lanitis, C. J. Taylor, and T. F. Cootes, "Automatic interpretation and coding of face images using flexible models," IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 743-756, July 1997.
    • (1997) IEEE Trans. Pattern Anal. Machine Intell. , vol.19 , pp. 743-756
    • Lanitis, A.1    Taylor, C.J.2    Cootes, T.F.3
  • 21
    • 0000238336 scopus 로고
    • "A simplex method for function optimization,"
    • J. A. Neider and R. Mead, "A simplex method for function optimization," Comput. J., vol. 7, no. 4, pp. 308-313, 1965.
    • (1965) Comput. J. , vol.7 , Issue.4 , pp. 308-313
    • Neider, J.A.1    Mead, R.2
  • 24
    • 0026200336 scopus 로고
    • "Crossmodal integration in the identification of consonants,"
    • L. Braida, "Crossmodal integration in the identification of consonants," Q. J. Exp. Psych., vol. 43A, no. 3, pp. 647-677, 1991.
    • (1991) Q. J. Exp. Psych. , vol.43 , Issue.3 , pp. 647-677
    • Braida, L.1
  • 25
    • 0018074107 scopus 로고
    • "Voice-mouth synthesis of tactual/visual perception of /pa, ba, ma/,"
    • N. P. Erber and C. L. De Filippo, "Voice-mouth synthesis of tactual/visual perception of /pa, ba, ma/," J. Acoust. Soc. Amer, vol. 64, pp. 1015-1019, 1978.
    • (1978) J. Acoust. Soc. Amer , vol.64 , pp. 1015-1019
    • Erber, N.P.1    De Filippo, C.L.2
  • 26
    • 0022411853 scopus 로고
    • "On the role of visual rate information in phonetic perception,"
    • K. P. Green and J. L. Miller, "On the role of visual rate information in phonetic perception," Percept. Psychophys., vol. 38, no. 3, pp. 269-276, 1985.
    • (1985) Percept. Psychophys. , vol.38 , Issue.3 , pp. 269-276
    • Green, K.P.1    Miller, J.L.2
  • 28
    • 0028516073 scopus 로고
    • "How do humans process and recognize speech?,"
    • J. B. Allen, "How do humans process and recognize speech?," IEEE Trans. Speech Audio Processing, vol. 2, no. 4, pp. 567-577, 1994.
    • (1994) IEEE Trans. Speech Audio Processing , vol.2 , Issue.4 , pp. 567-577
    • Allen, J.B.1
  • 33
    • 0018724280 scopus 로고
    • "Two level DP matching-a dynamic time warping based pattern matching algorithm for continuous speech recognition,"
    • H. Sakoe, "Two level DP matching-a dynamic time warping based pattern matching algorithm for continuous speech recognition," IEEE Trans. IECE Jpn., vol. 3, 1979.
    • (1979) IEEE Trans. IECE Jpn. , vol.3
    • Sakoe, H.1
  • 35
    • 84892184580 scopus 로고    scopus 로고
    • "Speech intelligibility in the presence of crosschannel spectral asynchrony," in
    • T. Aral and S. Greenberg, "Speech intelligibility in the presence of crosschannel spectral asynchrony," in Proc. ICASSP, 1998, pp. 933-936.
    • (1998) Proc. ICASSP , pp. 933-936
    • Aral, T.1    Greenberg, S.2
  • 36
    • 0005089970 scopus 로고
    • "Perceiving asynchronous bimodal speech in consonant vowel and vowel syllables,"
    • D. W. Massaro and M. M. Cohen, "Perceiving asynchronous bimodal speech in consonant vowel and vowel syllables," Speech Commun., vol. 13, pp. 127-134, 1993.
    • (1993) Speech Commun. , vol.13 , pp. 127-134
    • Massaro, D.W.1    Cohen, M.M.2
  • 37
    • 78649589093 scopus 로고
    • "Intelligibility of audio-visually desynchronized speech: Asymmetrical effect of phoneme position," in Proc
    • P. M. Smeele et al., "Intelligibility of audio-visually desynchronized speech: Asymmetrical effect of phoneme position," in Proc. Int. Conf. Spoken Language Processing, Alberta, Canada, 1992, pp. 65-68.
    • (1992) Int. Conf. Spoken Language Processing, Alberta, Canada , pp. 65-68
    • Smeele, P.M.1
  • 42
    • 0025041264 scopus 로고
    • "Perceptual linear predictive (PLP) analysis of speech,"
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
    • (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 43
    • 0022667694 scopus 로고
    • "Speaker indépendant isolated word recognizer using dynamic features of speech spectrum,"
    • S. Furui, "Speaker indépendant isolated word recognizer using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Processing, vol. 34, no. 1, pp. 52-59, 1986.
    • (1986) IEEE Trans. Acoust., Speech, Signal Processing , vol.34 , Issue.1 , pp. 52-59
    • Furui, S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.