SCOPUS 정보 검색 플랫폼

IEEE Transactions on Speech and Audio Processing

Volumn 4, Issue 5, 1996, Pages 337-350

Computer lipreading for improved accuracy in automatic speech recognition

(2) Silsbee, Peter L a Bovik, Alan C a

a The University of Texas at Austin (United States)

Author keywords

[No Author keywords available]

Indexed keywords

AUDIO SYSTEMS; CHARACTER RECOGNITION; COMPUTER VISION; ERRORS; MARKOV PROCESSES; MATHEMATICAL MODELS; PERFORMANCE;

AUDIOVISUAL SYSTEM; AUTOMATIC SPEECH RECOGNITION SYSTEMS; HIDDEN MARKOV MODELS;

SPEECH RECOGNITION;

EID: 0030247984 PISSN: 10636676 EISSN: None Source Type: Journal
DOI: 10.1109/89.536928 Document Type: Article

Times cited : (88)

References (46)

1
- 0013143830
- Energy-conditioned spectral estimation for recognition of noisy speech
- Jan.
- A. Erell and M. Weintraub, "Energy-conditioned spectral estimation for recognition of noisy speech," IEEE Trans, Speech Audio Processing, vol. 1, no. 1, pp. 84-89, Jan. 1993.
- (1993) IEEE Trans, Speech Audio Processing , vol.1 , Issue.1 , pp. 84-89
- Erell, A.¹ Weintraub, M.²

2
- 0026843273
- Gain-adapted hidden Markov models for recognition of clean and noisy speech
- Apr.
- Y. Ephraim, "Gain-adapted hidden Markov models for recognition of clean and noisy speech," IEEE Trans. Acoust., Speech, Signal Processing, vol. 40, no. 4, pp. 725-735, Apr. 1992.
- (1992) IEEE Trans. Acoust., Speech, Signal Processing , vol.40 , Issue.4 , pp. 725-735
- Ephraim, Y.¹

3
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Apr.
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
- (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

4
- 0000030810
- Auditory nerve representation as a basis for speech processing
- S. Furui and M. M. Sondhi, Eds., New York: Marcel Dekker
- O. Ghitza, "Auditory nerve representation as a basis for speech processing," in S. Furui and M. M. Sondhi, Eds., Advances in Speech Signal Processing. New York: Marcel Dekker, 1992, pp. 453-485.
- (1992) Advances in Speech Signal Processing. , pp. 453-485
- Ghitza, O.¹

5
- 0003071809
- Evaluation and optimization of perceptually based ASR front end
- Jan.
- J.-C. Junqua, H. Wakita, and H. Hermansky, "Evaluation and optimization of perceptually based ASR front end," IEEE Trims. Speech Audio Processing, vol. 1, no. 1, pp. 39-48, Jan. 1993.
- (1993) IEEE Trims. Speech Audio Processing , vol.1 , Issue.1 , pp. 39-48
- Junqua, J.-C.¹ Wakita, H.² Hermansky, H.³

6
- 33646938054
- Language processing for speech understanding
- A. Waibel and K.-F. Lee, Eds., San Mateo, CA: Morgan Kaufman
- W. A. Woods, "Language processing for speech understanding," in A. Waibel and K.-F. Lee, Eds., Readings in Speech Recognition. San Mateo, CA: Morgan Kaufman, 1990, pp. 519-533.
- (1990) Readings in Speech Recognition. , pp. 519-533
- Woods, W.A.¹

7
- 0344685169
- High level knowledge sources in usable speech recognition systems
- A. Waibel and K.-F. Lee, Eds., San Mateo, CA: Morgan Kaufmann
- S. R. Young, A. G. Hauptmann, W. H. Ward, E. T. Smith, and P. Werner, "High level knowledge sources in usable speech recognition systems," in A. Waibel and K.-F. Lee, Eds., Readings in Speech Recognition, San Mateo, CA: Morgan Kaufmann, 1990, pp. 538-549.
- (1990) Readings in Speech Recognition , pp. 538-549
- Young, S.R.¹ Hauptmann, A.G.² Ward, W.H.³ Smith, E.T.⁴ Werner, P.⁵

8
- 0003539541
- Ph.D. thesis, Carnegie-Mellon Univ., Pittsburgh, PA
- K.-F. Lee, Large-Vocabulary "Speaker-independent continuous speech recognition: The SPHINX system," Ph.D. thesis, Carnegie-Mellon Univ., Pittsburgh, PA, 1988.
- (1988) Large-Vocabulary "Speaker-independent Continuous Speech Recognition: the SPHINX System"
- Lee, K.-F.¹

9
- 33646941794
- Prosodie knowledge sources for word hypothesization in a continuous speech recognition system
- A. Waibel and K.-F. Lee, Eds., San Mateo, CA: Morgan Kaufmann
- A. Waibel, "Prosodie knowledge sources for word hypothesization in a continuous speech recognition system," in A. Waibel and K.-F. Lee, Eds., Readings in Speech Recognition. San Mateo, CA: Morgan Kaufmann, 1990, pp. 534-537.
- (1990) Readings in Speech Recognition. , pp. 534-537
- Waibel, A.¹

10
- 0003699540
- Ph.D. thesis, Univ. of Illinois, Urbana, IL
- E. D. Petajan, "Automatic lipreading to enhance speech recognition," Ph.D. thesis, Univ. of Illinois, Urbana, IL, 1984.
- (1984) Automatic Lipreading to Enhance Speech Recognition
- Petajan, E.D.¹

11
- 0002365852
- Surface learning with applications to lipreading
- J. D. Cowan, G. Tesauro, and J. Alspector, Eds, San Francisco, CA: Morgan Kaufmann
- C. Bregler and S. M. Omohundro, "Surface learning with applications to lipreading," in J. D. Cowan, G. Tesauro, and J. Alspector, Eds, Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kaufmann, 1994, pp. 43-50, vol. 6.
- (1994) Advances in Neural Information Processing Systems. , vol.6 , pp. 43-50
- Bregler, C.¹ Omohundro, S.M.²

12
- 85009082168
- A hybrid approach to bimodal speech recognition
- Pacific Grove, CA, Nov.
- C. Bregler, S. M. Omohundro, and Y. Konig, "A hybrid approach to bimodal speech recognition," in Proc. 28th Ann. Asilomar Conf. Signals, Syst., Comput., vol. 1, Pacific Grove, CA, Nov. 1994, pp. 556-560.
- (1994) Proc. 28th Ann. Asilomar Conf. Signals, Syst., Comput. , vol.1 , pp. 556-560
- Bregler, C.¹ Omohundro, S.M.² Konig, Y.³

13
- 84947971415
- Bimodal recognition experiments with recurrent neural networks
- P. Cosi, E. M. Caldognetto, K. Vagges, G. A. Mian, and M. Contolini, "Bimodal recognition experiments with recurrent neural networks," in Proc. Int. Conf. Acoust., Speech, Signal Processing, vol. 2, 1994, pp. 11/553-556.
- (1994) Proc. Int. Conf. Acoust., Speech, Signal Processing , vol.2 , pp. 11
- Cosi, P.¹ Caldognetto, E.M.² Vagges, K.³ Mian, G.A.⁴ Contolini, M.⁵

14
- 85135321224
- See me, hear me: Integrating automatic speech recognition and lip-reading
- P. Duchnowski, U. Meier, and A. Waibel, "See me, hear me: Integrating automatic speech recognition and lip-reading," in Proc. Int. Conf. Spoken Language Processing, 1994.
- (1994) Proc. Int. Conf. Spoken Language Processing
- Duchnowski, P.¹ Meier, U.² Waibel, A.³

15
- 38249029471
- Automatic optically-based recognition of speech
- K. E. Finn and A. A. Montgomery, "Automatic optically-based recognition of speech," Patt. Recogn. Lett., vol. 8, no. 3, pp. 159-164, 1988.
- (1988) Patt. Recogn. Lett. , vol.8 , Issue.3 , pp. 159-164
- Finn, K.E.¹ Montgomery, A.A.²

16
- 78650077027
- Ph.D. thesis, Dept. of Elect. Eng. and Comput. Sci., George Washington Univ.
- A. J. Goldschen, "Continuous automatic speech recognition by lipreading," Ph.D. thesis, Dept. of Elect. Eng. and Comput. Sci., George Washington Univ., 1993.
- (1993) Continuous Automatic Speech Recognition by Lipreading
- Goldschen, A.J.¹

17
- 78649238564
- Using deformable templates to infer visual speech dynamics
- Pacific Grove, CA, Nov.
- M. E. Hennecke, K. V. Prasad, and D. G. Stork, "Using deformable templates to infer visual speech dynamics," in Proc. 28th Ann. Asilomar Conf. Signals, Syst., Comput., vol. 1, Pacific Grove, CA, Nov. 1994, pp. 578-582.
- (1994) Proc. 28th Ann. Asilomar Conf. Signals, Syst., Comput. , vol.1 , pp. 578-582
- Hennecke, M.E.¹ Prasad, K.V.² Stork, D.G.³

18
- 85029619676
- Visual speech recognition with stochastic networks
- G. Tesauro, D. Touretzky, and T. Leen, Eds., Cambridge, MA: MIT Press
- J. R. Movellan, "Visual speech recognition with stochastic networks," in G. Tesauro, D. Touretzky, and T. Leen, Eds., Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, vol. 7, 1995, pp. 851-858.
- (1995) Advances in Neural Information Processing Systems. , vol.7 , pp. 851-858
- Movellan, J.R.¹

19
- 84921138344
- "Speech recognition enhancement by lip information
- S. Nishida, "Speech recognition enhancement by lip information," in Proc. Comput. Human Interfaces '86, pp. 198-204.
- Proc. Comput. Human Interfaces '86 , pp. 198-204
- Nishida, S.¹

20
- 4244043499
- An improved automatic lipreading system to enhance speech recognition
- AT&T Bell Labs.
- E. D. Petajan, "An improved automatic lipreading system to enhance speech recognition," Tech. Rep. 11251-871012-111TM, AT&T Bell Labs., 1987.
- (1987) Tech. Rep. 11251-871012-111TM
- Petajan, E.D.¹

21
- 85060684689
- Lip modeling for visual speech recognition
- Pacific Grove, CA, Nov.
- R. R. Rao and R. M. Mersereau, "Lip modeling for visual speech recognition," in Proc. 28th Ann. Asilomar Conf. Signals, Syst., Comput., vol. 1, Pacific Grove, CA, Nov. 1994, pp. 587-590.
- (1994) Proc. 28th Ann. Asilomar Conf. Signals, Syst., Comput. , vol.1 , pp. 587-590
- Rao, R.R.¹ Mersereau, R.M.²

22
- 0041900541
- Ph.D. thesis. Institut National Polytechnique de Grenoble, Grnoble, France
- J. Robert-Ribes, "Modèles d'integration audio-visuelle de signaux linguistiques: De la perception humaine à la reconnaissance automatique des voyelles," Ph.D. thesis. Institut National Polytechnique de Grenoble, Grnoble, France, 1995.
- (1995) Modèles D'integration Audio-visuelle De Signaux Linguistiques: De La Perception Humaine à La Reconnaissance Automatique Des Voyelles
- Robert-Ribes, J.¹

23
- 33747692222
- Lip reading: Automatic visual recognition of spoken words
- June
- K. Mase and A. Pentland, "Lip reading: Automatic visual recognition of spoken words," Opt. Soc. Amer. Topical Mtg. Machine Vision, June 1989, pp. 1565-1570.
- (1989) Opt. Soc. Amer. Topical Mtg. Machine Vision , pp. 1565-1570
- Mase, K.¹ Pentland, A.²

24
- 0025503485
- Neural network models of sensory integration for improved vowel recognition
- Oct.
- B. P. Yuhas, M. H. Goldstein, T. J. Sejnowski, and R. E. Jenkins, "Neural network models of sensory integration for improved vowel recognition," Proc. IEEE, vol. 78, no. 10, pp. 1658-1668, Oct. 1990.
- (1990) Proc. IEEE , vol.78 , Issue.10 , pp. 1658-1668
- Yuhas, B.P.¹ Goldstein, M.H.² Sejnowski, T.J.³ Jenkins, R.E.⁴

25
- 85132038963
- Neural network lipreading system for improved speech recognition
- D. G. Stork, G. Wolff, and E. Levine, "Neural network lipreading system for improved speech recognition," in Proc. Int. Joint Conf. Neural Networks, 1992, pp. 285-295.
- (1992) Proc. Int. Joint Conf. Neural Networks , pp. 285-295
- Stork, D.G.¹ Wolff, G.² Levine, E.³

26
- 85013580214
- Sensory integration in audiovisual automatic speech recognition
- Nov.
- P. L. Silsbee, "Sensory integration in audiovisual automatic speech recognition," in 28th Ann. Asilomar Conf. Signals, Syst., Comput., vol. I, Nov. 1994, pp. 561-565.
- (1994) 28th Ann. Asilomar Conf. Signals, Syst., Comput. , vol.1 , pp. 561-565
- Silsbee, P.L.¹

27
- 2542503213
- Visual lipreading by computer to improve automatic speech recognition accuracy
- Univ. of Texas Comput. Vision Res. Center, Austin, TX
- P. L. Silsbee and A. C. Bovik, "Visual lipreading by computer to improve automatic speech recognition accuracy," Tech. Rep., TR93-02-90, Univ. of Texas Comput. Vision Res. Center, Austin, TX, 1993.
- (1993) Tech. Rep., TR93-02-90
- Silsbee, P.L.¹ Bovik, A.C.²

28
- 0000585224
- Lipreading by neural networks: Visual preprocessing, learning and sensory integration
- J. D. Cowan, G. Tesauro, and J. Alspector, Eds., San Francisco, CA: Morgan Kaufmann
- G. J. Wolff, K. V. Prasad, D. G. Stork, and M. E. Hennecke, "Lipreading by neural networks: Visual preprocessing, learning and sensory integration," in J. D. Cowan, G. Tesauro, and J. Alspector, Eds., Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kaufmann, 1994, pp. 1027-1034, vol. 6.
- (1994) Advances in Neural Information Processing Systems. , vol.6 , pp. 1027-1034
- Wolff, G.J.¹ Prasad, K.V.² Stork, D.G.³ Hennecke, M.E.⁴

29
- 0026369237
- Neural network vowel recognition jointly using voice features and mouth shape image
- J. Wu et al., "Neural network vowel recognition jointly using voice features and mouth shape image," Patt. Recogn., vol. 24, no. 10, pp. 921-927, 1991.
- (1991) Patt. Recogn. , vol.24 , Issue.10 , pp. 921-927
- Wu, J.¹

30
- 84919209139
- Automatic recognition of noisy speech
- J. P. Haton, "Automatic recognition of noisy speech," in New Advances and Trends in Speech Recognition and Coding, NATO Advanced Study Institute, 1993.
- (1993) New Advances and Trends in Speech Recognition and Coding, NATO Advanced Study Institute
- Haton, J.P.¹

31
- 0001048664
- Visual contribution to speech intelligibility in noise
- W. H. Sumby and I. Pollock, "Visual contribution to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, pp. 212-215, 1954.
- (1954) J. Acoust. Soc. Amer. , vol.26 , pp. 212-215
- Sumby, W.H.¹ Pollock, I.²

32
- 0002028032
- Some preliminaries to a comprehensive account of audio-visual speech perception
- B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
- Q. Summerfield, "Some preliminaries to a comprehensive account of audio-visual speech perception," in B. Dodd and R. Campbell, Eds., Hearing by Eye: The Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 3-51.
- (1987) Hearing by Eye: the Psychology of Lip-reading. , pp. 3-51
- Summerfield, Q.¹

33
- 0002132290
- Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli
- B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
- D. Reisbcrg, "Easy to hear but hard to understand: a lip-reading advantage with intact auditory stimuli," in B. Dodd and R. Campbell, Eds., Hearing by Eye: The Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 97-113.
- (1987) Hearing by Eye: the Psychology of Lip-reading. , pp. 97-113
- Reisbcrg, D.¹

34
- 0008745835
- Speech perception by ear and eye
- B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
- D. W. Massaro, "Speech perception by ear and eye," in B. Dodd and R. Campbell, Eds., Hearing by Eye: Tlie Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 53-83.
- (1987) Hearing by Eye: Tlie Psychology of Lip-reading. , pp. 53-83
- Massaro, D.W.¹

35
- 0017199877
- Hearing lips and seeing voices
- H. McGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976.
- (1976) Nature , vol.264 , pp. 746-748
- McGurk, H.¹ MacDonald, J.²

36
- 0040914411
- Lip-reading in the prelingually deaf
- B. Dodd and R. Campbell, Eds., London: Lawrence Erlbaum
- K. Mogford, "Lip-reading in the prelingually deaf," in B. Dodd and R. Campbell, Eds., Hearing by Eye: The Psychology of Lip-reading. London: Lawrence Erlbaum, 1987, pp. 191-211.
- (1987) Hearing by Eye: the Psychology of Lip-reading. , pp. 191-211
- Mogford, K.¹

37
- 0003966402
- Hillsdale, NJ: Lawrence Erlbaum
- D. W. Massaro, Speech Perception by Ear and Eye: A Paradigm for Psychological Inquiry. Hillsdale, NJ: Lawrence Erlbaum, 1987.
- (1987) Speech Perception by Ear and Eye: a Paradigm for Psychological Inquiry.
- Massaro, D.W.¹

38
- 0004238403
- Baltimore, MD: National Education
- K. W. Berger, Speechreading: Principles and Methods. Baltimore, MD: National Education, 1972.
- (1972) Speechreading: Principles and Methods.
- Berger, K.W.¹

39
- 0017060763
- Perceptual dimensions underlying vowel lipreading performance
- P. L. Jackson, A. A. Montgomery, and C. A. Binnie, "Perceptual dimensions underlying vowel lipreading performance," J. Speech Hearing Res., vol. 19, pp. 796-812, 1976.
- (1976) J. Speech Hearing Res. , vol.19 , pp. 796-812
- Jackson, P.L.¹ Montgomery, A.A.² Binnie, C.A.³

40
- 0346080351
- Roles of lips and teeth in lipreading vowels
- M. McGrath, A. Q. Summerfield, and N. M. Brooke, "Roles of lips and teeth in lipreading vowels," Proc. Inst. Acoust., pp. 401-408, 1984.
- (1984) Proc. Inst. Acoust. , pp. 401-408
- McGrath, M.¹ Summerfield, A.Q.² Brooke, N.M.³

41
- 0004319972
- Ph.D. thesis, Univ. of Texas
- P. L. Silsbee, "Computer lipreading for improved accuracy in automatic speech recognition," Ph.D. thesis, Univ. of Texas, 1993.
- (1993) Computer Lipreading for Improved Accuracy in Automatic Speech Recognition
- Silsbee, P.L.¹

42
- 0023211284
- Integration of acoustic information in a large vocabulary word recognizer
- V. M. Gupta, M. Lennig, and P. Mermelstein, "Integration of acoustic information in a large vocabulary word recognizer," in Proc. Int. Conf. Acoust., Speech, Signal Processing, 1987, pp. 697-700.
- (1987) Proc. Int. Conf. Acoust., Speech, Signal Processing , pp. 697-700
- Gupta, V.M.¹ Lennig, M.² Mermelstein, P.³

43
- 0003770715
- Bostion: Kluwer
- K.-F. Lee, Automatic Speech Recognition: The Development of the SPHINX System. Bostion: Kluwer, 1989.
- (1989) Automatic Speech Recognition: the Development of the SPHINX System.
- Lee, K.-F.¹

44
- 0024752328
- A new vector quantization clustering algorithm
- Oct.
- W. H. Equitz, "A new vector quantization clustering algorithm," IEEE Trans. Acoust., Speech, Signal-Processing, vol. 37, no. 10, pp. 1568-1575, Oct. 1989.
- (1989) IEEE Trans. Acoust., Speech, Signal-Processing , vol.37 , Issue.10 , pp. 1568-1575
- Equitz, W.H.¹

45
- 0021412027
- Vector quantization
- Apr.
- R. M. Gray, "Vector quantization," IEEE Acoust., Speech, Signal Processing Mag., vol. 2, pp. 4-29, Apr. 1984.
- (1984) IEEE Acoust., Speech, Signal Processing Mag. , vol.2 , pp. 4-29
- Gray, R.M.¹

46
- 0004117631
- Englewood Cliffs, NJ, Prentice-Hall
- D. H. Ballard and C. M. Brown, Computer Vision. Englewood Cliffs, NJ, Prentice-Hall, 1982.
- (1982) Computer Vision.
- Ballard, D.H.¹ Brown, C.M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.