메뉴 건너뛰기




Volumn 4, Issue 5, 1996, Pages 337-350

Computer lipreading for improved accuracy in automatic speech recognition

Author keywords

[No Author keywords available]

Indexed keywords

AUDIO SYSTEMS; CHARACTER RECOGNITION; COMPUTER VISION; ERRORS; MARKOV PROCESSES; MATHEMATICAL MODELS; PERFORMANCE;

EID: 0030247984     PISSN: 10636676     EISSN: None     Source Type: Journal    
DOI: 10.1109/89.536928     Document Type: Article
Times cited : (88)

References (46)
  • 1
    • 0013143830 scopus 로고
    • Energy-conditioned spectral estimation for recognition of noisy speech
    • Jan.
    • A. Erell and M. Weintraub, "Energy-conditioned spectral estimation for recognition of noisy speech," IEEE Trans, Speech Audio Processing, vol. 1, no. 1, pp. 84-89, Jan. 1993.
    • (1993) IEEE Trans, Speech Audio Processing , vol.1 , Issue.1 , pp. 84-89
    • Erell, A.1    Weintraub, M.2
  • 2
    • 0026843273 scopus 로고
    • Gain-adapted hidden Markov models for recognition of clean and noisy speech
    • Apr.
    • Y. Ephraim, "Gain-adapted hidden Markov models for recognition of clean and noisy speech," IEEE Trans. Acoust., Speech, Signal Processing, vol. 40, no. 4, pp. 725-735, Apr. 1992.
    • (1992) IEEE Trans. Acoust., Speech, Signal Processing , vol.40 , Issue.4 , pp. 725-735
    • Ephraim, Y.1
  • 3
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • Apr.
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
    • (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 4
    • 0000030810 scopus 로고
    • Auditory nerve representation as a basis for speech processing
    • S. Furui and M. M. Sondhi, Eds., New York: Marcel Dekker
    • O. Ghitza, "Auditory nerve representation as a basis for speech processing," in S. Furui and M. M. Sondhi, Eds., Advances in Speech Signal Processing. New York: Marcel Dekker, 1992, pp. 453-485.
    • (1992) Advances in Speech Signal Processing. , pp. 453-485
    • Ghitza, O.1
  • 5
    • 0003071809 scopus 로고
    • Evaluation and optimization of perceptually based ASR front end
    • Jan.
    • J.-C. Junqua, H. Wakita, and H. Hermansky, "Evaluation and optimization of perceptually based ASR front end," IEEE Trims. Speech Audio Processing, vol. 1, no. 1, pp. 39-48, Jan. 1993.
    • (1993) IEEE Trims. Speech Audio Processing , vol.1 , Issue.1 , pp. 39-48
    • Junqua, J.-C.1    Wakita, H.2    Hermansky, H.3
  • 6
    • 33646938054 scopus 로고
    • Language processing for speech understanding
    • A. Waibel and K.-F. Lee, Eds., San Mateo, CA: Morgan Kaufman
    • W. A. Woods, "Language processing for speech understanding," in A. Waibel and K.-F. Lee, Eds., Readings in Speech Recognition. San Mateo, CA: Morgan Kaufman, 1990, pp. 519-533.
    • (1990) Readings in Speech Recognition. , pp. 519-533
    • Woods, W.A.1
  • 7
    • 0344685169 scopus 로고
    • High level knowledge sources in usable speech recognition systems
    • A. Waibel and K.-F. Lee, Eds., San Mateo, CA: Morgan Kaufmann
    • S. R. Young, A. G. Hauptmann, W. H. Ward, E. T. Smith, and P. Werner, "High level knowledge sources in usable speech recognition systems," in A. Waibel and K.-F. Lee, Eds., Readings in Speech Recognition, San Mateo, CA: Morgan Kaufmann, 1990, pp. 538-549.
    • (1990) Readings in Speech Recognition , pp. 538-549
    • Young, S.R.1    Hauptmann, A.G.2    Ward, W.H.3    Smith, E.T.4    Werner, P.5
  • 9
    • 33646941794 scopus 로고
    • Prosodie knowledge sources for word hypothesization in a continuous speech recognition system
    • A. Waibel and K.-F. Lee, Eds., San Mateo, CA: Morgan Kaufmann
    • A. Waibel, "Prosodie knowledge sources for word hypothesization in a continuous speech recognition system," in A. Waibel and K.-F. Lee, Eds., Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann, 1990, pp. 534-537.
    • (1990) Readings in Speech Recognition. , pp. 534-537
    • Waibel, A.1
  • 11
    • 0002365852 scopus 로고
    • Surface learning with applications to lipreading
    • J. D. Cowan, G. Tesauro, and J. Alspector, Eds, San Francisco, CA: Morgan Kaufmann
    • C. Bregler and S. M. Omohundro, "Surface learning with applications to lipreading," in J. D. Cowan, G. Tesauro, and J. Alspector, Eds, Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kaufmann, 1994, pp. 43-50, vol. 6.
    • (1994) Advances in Neural Information Processing Systems. , vol.6 , pp. 43-50
    • Bregler, C.1    Omohundro, S.M.2
  • 15
    • 38249029471 scopus 로고
    • Automatic optically-based recognition of speech
    • K. E. Finn and A. A. Montgomery, "Automatic optically-based recognition of speech," Patt. Recogn. Lett., vol. 8, no. 3, pp. 159-164, 1988.
    • (1988) Patt. Recogn. Lett. , vol.8 , Issue.3 , pp. 159-164
    • Finn, K.E.1    Montgomery, A.A.2
  • 18
    • 85029619676 scopus 로고
    • Visual speech recognition with stochastic networks
    • G. Tesauro, D. Touretzky, and T. Leen, Eds., Cambridge, MA: MIT Press
    • J. R. Movellan, "Visual speech recognition with stochastic networks," in G. Tesauro, D. Touretzky, and T. Leen, Eds., Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, vol. 7, 1995, pp. 851-858.
    • (1995) Advances in Neural Information Processing Systems. , vol.7 , pp. 851-858
    • Movellan, J.R.1
  • 19
    • 84921138344 scopus 로고    scopus 로고
    • "Speech recognition enhancement by lip information
    • S. Nishida, "Speech recognition enhancement by lip information," in Proc. Comput. Human Interfaces '86, pp. 198-204.
    • Proc. Comput. Human Interfaces '86 , pp. 198-204
    • Nishida, S.1
  • 20
    • 4244043499 scopus 로고
    • An improved automatic lipreading system to enhance speech recognition
    • AT&T Bell Labs.
    • E. D. Petajan, "An improved automatic lipreading system to enhance speech recognition," Tech. Rep. 11251-871012-111TM, AT&T Bell Labs., 1987.
    • (1987) Tech. Rep. 11251-871012-111TM
    • Petajan, E.D.1
  • 23
  • 24
    • 0025503485 scopus 로고
    • Neural network models of sensory integration for improved vowel recognition
    • Oct.
    • B. P. Yuhas, M. H. Goldstein, T. J. Sejnowski, and R. E. Jenkins, "Neural network models of sensory integration for improved vowel recognition," Proc. IEEE, vol. 78, no. 10, pp. 1658-1668, Oct. 1990.
    • (1990) Proc. IEEE , vol.78 , Issue.10 , pp. 1658-1668
    • Yuhas, B.P.1    Goldstein, M.H.2    Sejnowski, T.J.3    Jenkins, R.E.4
  • 26
    • 85013580214 scopus 로고
    • Sensory integration in audiovisual automatic speech recognition
    • Nov.
    • P. L. Silsbee, "Sensory integration in audiovisual automatic speech recognition," in 28th Ann. Asilomar Conf. Signals, Syst., Comput., vol. I, Nov. 1994, pp. 561-565.
    • (1994) 28th Ann. Asilomar Conf. Signals, Syst., Comput. , vol.1 , pp. 561-565
    • Silsbee, P.L.1
  • 27
    • 2542503213 scopus 로고
    • Visual lipreading by computer to improve automatic speech recognition accuracy
    • Univ. of Texas Comput. Vision Res. Center, Austin, TX
    • P. L. Silsbee and A. C. Bovik, "Visual lipreading by computer to improve automatic speech recognition accuracy," Tech. Rep., TR93-02-90, Univ. of Texas Comput. Vision Res. Center, Austin, TX, 1993.
    • (1993) Tech. Rep., TR93-02-90
    • Silsbee, P.L.1    Bovik, A.C.2
  • 28
    • 0000585224 scopus 로고
    • Lipreading by neural networks: Visual preprocessing, learning and sensory integration
    • J. D. Cowan, G. Tesauro, and J. Alspector, Eds., San Francisco, CA: Morgan Kaufmann
    • G. J. Wolff, K. V. Prasad, D. G. Stork, and M. E. Hennecke, "Lipreading by neural networks: Visual preprocessing, learning and sensory integration," in J. D. Cowan, G. Tesauro, and J. Alspector, Eds., Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kaufmann, 1994, pp. 1027-1034, vol. 6.
    • (1994) Advances in Neural Information Processing Systems. , vol.6 , pp. 1027-1034
    • Wolff, G.J.1    Prasad, K.V.2    Stork, D.G.3    Hennecke, M.E.4
  • 29
    • 0026369237 scopus 로고
    • Neural network vowel recognition jointly using voice features and mouth shape image
    • J. Wu et al., "Neural network vowel recognition jointly using voice features and mouth shape image," Patt. Recogn., vol. 24, no. 10, pp. 921-927, 1991.
    • (1991) Patt. Recogn. , vol.24 , Issue.10 , pp. 921-927
    • Wu, J.1
  • 31
    • 0001048664 scopus 로고
    • Visual contribution to speech intelligibility in noise
    • W. H. Sumby and I. Pollock, "Visual contribution to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, pp. 212-215, 1954.
    • (1954) J. Acoust. Soc. Amer. , vol.26 , pp. 212-215
    • Sumby, W.H.1    Pollock, I.2
  • 32
    • 0002028032 scopus 로고
    • Some preliminaries to a comprehensive account of audio-visual speech perception
    • B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
    • Q. Summerfield, "Some preliminaries to a comprehensive account of audio-visual speech perception," in B. Dodd and R. Campbell, Eds., Hearing by Eye: The Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 3-51.
    • (1987) Hearing by Eye: the Psychology of Lip-reading. , pp. 3-51
    • Summerfield, Q.1
  • 33
    • 0002132290 scopus 로고
    • Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli
    • B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
    • D. Reisbcrg, "Easy to hear but hard to understand: a lip-reading advantage with intact auditory stimuli," in B. Dodd and R. Campbell, Eds., Hearing by Eye: The Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 97-113.
    • (1987) Hearing by Eye: the Psychology of Lip-reading. , pp. 97-113
    • Reisbcrg, D.1
  • 34
    • 0008745835 scopus 로고
    • Speech perception by ear and eye
    • B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
    • D. W. Massaro, "Speech perception by ear and eye," in B. Dodd and R. Campbell, Eds., Hearing by Eye: Tlie Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 53-83.
    • (1987) Hearing by Eye: Tlie Psychology of Lip-reading. , pp. 53-83
    • Massaro, D.W.1
  • 35
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • H. McGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976.
    • (1976) Nature , vol.264 , pp. 746-748
    • McGurk, H.1    MacDonald, J.2
  • 36
    • 0040914411 scopus 로고
    • Lip-reading in the prelingually deaf
    • B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
    • K. Mogford, "Lip-reading in the prelingually deaf," in B. Dodd and R. Campbell, Eds., Hearing by Eye: The Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 191-211.
    • (1987) Hearing by Eye: the Psychology of Lip-reading. , pp. 191-211
    • Mogford, K.1
  • 39
    • 0017060763 scopus 로고
    • Perceptual dimensions underlying vowel lipreading performance
    • P. L. Jackson, A. A. Montgomery, and C. A. Binnie, "Perceptual dimensions underlying vowel lipreading performance," J. Speech Hearing Res., vol. 19, pp. 796-812, 1976.
    • (1976) J. Speech Hearing Res. , vol.19 , pp. 796-812
    • Jackson, P.L.1    Montgomery, A.A.2    Binnie, C.A.3
  • 44
    • 0024752328 scopus 로고
    • A new vector quantization clustering algorithm
    • Oct.
    • W. H. Equitz, "A new vector quantization clustering algorithm," IEEE Trans. Acoust., Speech, Signal-Processing, vol. 37, no. 10, pp. 1568-1575, Oct. 1989.
    • (1989) IEEE Trans. Acoust., Speech, Signal-Processing , vol.37 , Issue.10 , pp. 1568-1575
    • Equitz, W.H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.