SCOPUS 정보 검색 플랫폼

Journal of Research and Practice in Information Technology

Volumn 35, Issue 1, 2003, Pages 41-64

Audio-visual speech recognition using red exclusion and neural networks

(2) Lewis, Trent W a Powers, David M W a

a FLINDERS UNIVERSITY (Australia)

Author keywords

Audio Visual Speech Recognition; Feature Extraction; Neural Networks; Sensor Fusion

Indexed keywords

ACOUSTIC NOISE; LINGUISTICS; NEURAL NETWORKS; SENSOR DATA FUSION;

AUDIO-VISUAL SPEECH RECOGNITION;

SPEECH RECOGNITION;

EID: 0041624571 PISSN: 1443458X EISSN: None Source Type: Journal
DOI: None Document Type: Article

Times cited : (12)

References (54)

1
- 0001432664
- STORK and HENNECKE (1996)
- ADJOUDANI, A. and BENOIT, C. (1996): On the integration of auditory and visual parameters in an HMM-based ASR, in STORK and HENNECKE (1996), 461-471.
- (1996) On the Integration of Auditory and Visual Parameters in an HMM-based ASR , pp. 461-471
- Adjoudani, A.¹ Benoit, C.²

2
- 4244194696
- Multispectral color modeling
- University of Pennsylvania, CIS
- ANGELOPOULOU, E., MOLANA, R., and DANIILIDIS, K. (2001): Multispectral color modeling. Technical Report MS-CIS-01-22, University of Pennsylvania, CIS.
- (2001) Technical Report , vol.MS-CIS-01-22
- Angelopoulou, E.¹ Molana, R.² Daniilidis, K.³

3
- 0033329956
- The learning behavior of single neuron classifiers on linearly separable or nonseparable input
- Washington, D.C.
- BASU, M. and HO, T.K. (1999): The learning behavior of single neuron classifiers on linearly separable or nonseparable input. In Proceedings of the 1999 International Joint Conference on Neural Networks, Washington, D.C.
- (1999) Proceedings of the 1999 International Joint Conference on Neural Networks
- Basu, M.¹ Ho, T.K.²

4
- 84943272400
- Bimodal sensor integration on the example of "speechreading"
- BREGLER, C., MANKE, S., HILD, H., and WAIBEL, A. (1993): Bimodal sensor integration on the example of "speechreading". Proceedings of the IEEE International Conference on Neural Networks, 667-671.
- (1993) Proceedings of the IEEE International Conference on Neural Networks , pp. 667-671
- Bregler, C.¹ Manke, S.² Hild, H.³ Waibel, A.⁴

5
- 0042453502
- STORK and HENNECKE (1996)
- BREGLER, C., OMOHUNDRO, S.M., SHI, J., and KONIG, Y. (1996): Towards a robust speechreading dialog system. In STORK and HENNECKE (1996), 410-423.
- (1996) Towards a Robust Speechreading Dialog System , pp. 410-423
- Bregler, C.¹ Omohundro, S.M.² Shi, J.³ Konig, Y.⁴

6
- 48349113750
- World Wide Web
- BROOKES, M. (2000): VOICEBOX: Speech Processing Toolbox for MATLAB. World Wide Web, http://www.ee.ic.ac.uk/ hp/staff/dmb/voicebox/voicebox.html.
- (2000) VOICEBOX: Speech Processing Toolbox for MATLAB
- Brookes, M.¹

7
- 0004199188
- MIT Press, Cambridge, MA
- CHARNIAK, E. (1993): Statistical language learning. MIT Press, Cambridge, MA.
- (1993) Statistical Language Learning
- Charniak, E.¹

8
- 0029304865
- Human and machine recognition of faces: A survey
- CHELAPPA, R., WILSON, C., and SIROHEY, S. (1995): Human and machine recognition of faces: A survey, in Proceedings of the IEEE, 83(5): 705-739.
- (1995) Proceedings of the IEEE , vol.83 , Issue.5 , pp. 705-739
- Chelappa, R.¹ Wilson, C.² Sirohey, S.³

9
- 0010424566
- STORK and HENNECKE (1996)
- COHEN, M., WALKER, R., and MASSARO, D. (1996): Perception of synthetic visual speech. In STORK and HENNECKE (1996), 153-168.
- (1996) Perception of Synthetic Visual Speech , pp. 153-168
- Cohen, M.¹ Walker, R.² Massaro, D.³

10
- 0000134331
- In STORK and HENNECKE (1996)
- COIANIZ, T., TORRESANI, L. and CAPRILE, B. (1996): 2d deformable models for visual speech analysis. In STORK and HENNECKE (1996), 391-398.
- (1996) 2d Deformable Models for Visual Speech Analysis , pp. 391-398
- Coianiz, T.¹ Torresani, L.² Caprile, B.³

11
- 0042453480
- Comparision of parametric representations for monosyllabic word recognition in continuously spoken sentences
- WAIBEL, A. and LEE, K., editors, Morgan Kaufmann Publishers Inc., San Mateo, CA
- DAVIS, S. and MERMELSTEIN, P. (1990): Comparision of parametric representations for monosyllabic word recognition in continuously spoken sentences. In WAIBEL, A. and LEE, K., editors, Readings in Speech Recognition, 64-74. Morgan Kaufmann Publishers Inc., San Mateo, CA.
- (1990) Readings in Speech Recognition , pp. 64-74
- Davis, S.¹ Mermelstein, P.²

12
- 0003396255
- DEMUTH, H. and BEALE, M. (1998): Neural Network Toolbox: User's Guide. The MathWorks, http://www.mathworks.com.
- (1998) Neural Network Toolbox: User's Guide
- Demuth, H.¹ Beale, M.²

13
- 0003616059
- Lawrence Erlbaum Associates, Hillsdale NJ
- DODD, B. and CAMPBELL, R., editors (1987): Hearing by Eye: The pyschology of lip-reading. Lawrence Erlbaum Associates, Hillsdale NJ.
- (1987) Hearing by Eye: The Pyschology of Lip-reading
- Dodd, B.¹ Campbell, R.²

14
- 0028996862
- Toward movement-invariant automatic lip-reading and speech recognition
- Detriot USA
- DUCHNOWSKI, P., HUNKE, P., BUSCHING, M., MEIER, U., and WAIBEL, A. (1995): Toward movement-invariant automatic lip-reading and speech recognition. In Proceedings of the International Conference of Acoustics, Speech, and Signal Processing, Detriot USA.
- (1995) Proceedings of the International Conference of Acoustics, Speech, and Signal Processing
- Duchnowski, P.¹ Hunke, P.² Busching, M.³ Meier, U.⁴ Waibel, A.⁵

15
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- DUPONT. S. and LEUTTIN, J. (2000): Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia, 2(3):141-151.
- (2000) IEEE Transactions on Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Leuttin, J.²

16
- 0003824723
- Hartcort Brace and Company, Sydney, 3rd edition
- FROMKIN, V., RODMAN, R., COLLINS, P., and BLAIR, D. (1996): An Introduction to Langauge. Hartcort Brace and Company, Sydney, 3rd edition.
- (1996) An Introduction to Langauge
- Fromkin, V.¹ Rodman, R.² Collins, P.³ Blair, D.⁴

17
- 0034842451
- Weighting schemes for audio-visual fusion in speech recognition
- GLOTIN, H., VERGYRI, D., NETI, C., POTAMIANOS, G., and LUETTIN, J. (2001): Weighting schemes for audio-visual fusion in speech recognition. In Proc. Int. Conf. Acoust, Speech Signal Process.
- (2001) Proc. Int. Conf. Acoust. Speech Signal Process.
- Glotin, H.¹ Vergyri, D.² Neti, C.³ Potamianos, G.⁴ Luettin, J.⁵

18
- 0012745879
- STORK and HENNECKE (1996)
- GOLDSCHEN, A., GARCIA, O., and PETAJAN, E. (1996): Rationale for phoneme-viseme mapping and feature selection in visual speech recognition. In STORK and HENNECKE (1996), 505-515.
- (1996) Rationale for Phoneme-viseme Mapping and Feature Selection in Visual Speech Recognition , pp. 505-515
- Goldschen, A.¹ Garcia, O.² Petajan, E.³

19
- 0041451439
- The use of visible speech cues (speechreading) for directing auditory attention: Reducing temporal and spectral uncertainty in auditory detection of spoken utterances
- GRANT, K. and SEITZ, P. (1998): The use of visible speech cues (speechreading) for directing auditory attention: Reducing temporal and spectral uncertainty in auditory detection of spoken utterances. In 16th International Congress on Acoustics.
- (1998) 16th International Congress on Acoustics
- Grant, K.¹ Seitz, P.²

20
- 0000874921
- Dynamic features for visual speechreading: A systematic comparision
- MOZER, JORDAN, and PERSCHE, editors, MIT Press, Cambridge MA
- GRAY, M., MOVELLAN, J., and SEJNOWSKI, T. (1997): Dynamic features for visual speechreading: A systematic comparision. In MOZER, JORDAN, and PERSCHE, editors, Advances in Neural Information Processing Systems, volume 9. MIT Press, Cambridge MA.
- (1997) Advances in Neural Information Processing Systems , vol.9
- Gray, M.¹ Movellan, J.² Sejnowski, T.³

21
- 84881270561
- A hybrid ANN/HMM audio-visual speech recognition system
- HECKMANN, M., BERTHOMMIER, F., and KROSCHEL, K. (2001a): A hybrid ANN/ HMM audio-visual speech recognition system. In Proceedings of AVSP-2001.
- (2001) Proceedings of AVSP-2001
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

22
- 0034848499
- Optimal weighting of posteriors for audio-visual speech recognition
- Salt Lake City, Utah
- HECKMANN, M., BERTHOMMIER, F., and KROSCHEL, K. (2001b): Optimal weighting of posteriors for audio-visual speech recognition. In Proceedings of lCASSP 2001, Salt Lake City, Utah.
- (2001) Proceedings of LCASSP 2001
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

23
- 4243462047
- Automatic speech recognition using acoustic and visual signals
- Ricoh Californian Research Centre
- HENNECKE, M., PRASAD, K.V., and STORK, D. (1995): Automatic speech recognition using acoustic and visual signals. Technical Report CRC-TR-95-37, Ricoh Californian Research Centre.
- (1995) Technical Report , vol.CRC-TR-95-37
- Hennecke, M.¹ Prasad, K.V.² Stork, D.³

24
- 78649238564
- Using deformable templates to infer visual speech dynamics
- Pacific Grove, CA. IEEE Computer
- HENNECKE, M., PRASAD, V., and STORK, D. (1994): Using deformable templates to infer visual speech dynamics. In 28th Annual Asimolar Conference on Signals, Systems, and Computer, Pacific Grove, CA. IEEE Computer. 2:576-582.
- (1994) 28th Annual Asimolar Conference on Signals, Systems, and Computer , vol.2 , pp. 576-582
- Hennecke, M.¹ Prasad, V.² Stork, D.³

25
- 0000417467
- STORK and HENNECKE (1996)
- HENNECKE, M., STORK, D., and PRASAD, K.V. (1996): Visionary speech: Looking ahead to practical speech reading systems. In STORK and HENNECKE (1996), 331-350.
- (1996) Visionary Speech: Looking Ahead to Practical Speech Reading Systems , pp. 331-350
- Hennecke, M.¹ Stork, D.² Prasad, K.V.³

26
- 84992590661
- Face locating and tracking for human-computer interaction
- IEEE Computer Society, Pacific Grove, CA
- HUNKE, M. and WAIBEL, A. (1994): Face locating and tracking for human-computer interaction. In 28th Annual Asimolar Conference on Signals, Systems, and Computers, IEEE Computer Society, Pacific Grove, CA. 2: 1277-1281.
- (1994) 28th Annual Asimolar Conference on Signals, Systems, and Computers , vol.2 , pp. 1277-1281
- Hunke, M.¹ Waibel, A.²

27
- 84957804529
- Continuous audio-visual speech recognition
- LEUTTIN, J. and DUPONT, S. (1998): Continuous audio-visual speech recognition. In Proceedings of the 5th European Conference on Computer Vision, 2: 657-673.
- (1998) Proceedings of the 5th European Conference on Computer Vision , vol.2 , pp. 657-673
- Leuttin, J.¹ Dupont, S.²

28
- 0041952887
- LEWIS, T.W. (2000). Audio visual speech recognition: Extraction, recognition, and integration
- (2000) Audio Visual Speech Recognition: Extraction, Recognition, and Integration
- Lewis, T.W.¹

29
- 1542320375
- Lip feature extraction using red exclusion
- EADES, P. and JIN, J., editors
- Lewis T.W. and POWERS, D. (2001): Lip feature extraction using red exclusion. In EADES, P. and JIN, J., editors, CRPIT: Visualisation, 2000, 2: 61-70.
- (2000) CRPIT: Visualisation , vol.2 , pp. 61-70
- Lewis, T.W.¹ Powers, D.²

30
- 0042954456
- World Wide Web
- M2VTS (2000): M2VTS Multimodel face database, release 1.0. World Wide Web, http://www.tele.ucl.ac.be/ PROJECTS/M2VTS/.
- (2000) M2VTS Multimodel Face Database, Release 1.0

31
- 0032072433
- Speech recognition and sensory integration: A 240-year old theorem helps explain how people and machines can integrate auditory and visual information to understand speech
- MASSARO, D. and STORK, D. (1998): Speech recognition and sensory integration: a 240-year old theorem helps explain how people and machines can integrate auditory and visual information to understand speech. American Scientist, 86(3): 236-245.
- (1998) American Scientist , vol.86 , Issue.3 , pp. 236-245
- Massaro, D.¹ Stork, D.²

32
- 0017199877
- Hearing lips and seeing voices
- MCGURK, H. and MACDONALD, J. (1976): Hearing lips and seeing voices. Nature, 264:746-748.
- (1976) Nature , vol.264 , pp. 746-748
- McGurk, H.¹ Macdonald, J.²

33
- 0029725863
- Adaptive bimodal sensor fusion for automatic speechreading
- MEIER, U., HURST, W., and DUCHNOWSKI, P. (1996): Adaptive bimodal sensor fusion for automatic speechreading. In Proceedings of the International Conference of Acoustics, Speech, and Signal Processing, 2: 833-837.
- (1996) Proceedings of the International Conference of Acoustics, Speech, and Signal Processing , vol.2 , pp. 833-837
- Meier, U.¹ Hurst, W.² Duchnowski, P.³

34
- 2642559942
- Towards unrestricted lip reading
- Hong Kong
- MEIER, U., STEIFELHAGEN, R., YANG, J., and WAIBEL, A. (1999): Towards unrestricted lip reading. In Second International Conference on Multimedia Interfaces, Hong Kong, http://wemer.ir.uks.de/js.
- (1999) Second International Conference on Multimedia Interfaces
- Meier, U.¹ Steifelhagen, R.² Yang, J.³ Waibel, A.⁴

35
- 85029619676
- Visual speech recognition with stochastic networks
- Tesauro, G., Toruetzky, D., and Leen, T., editors, MIT Press, Cambridge
- MOVELLAN, J. (1995): Visual speech recognition with stochastic networks. In Tesauro, G., Toruetzky, D., and Leen, T., editors, Advances in Neural Information Processing Systems, 7: 851-858. MIT Press, Cambridge.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 851-858
- Movellan, J.¹

36
- 0032138429
- Robust sensor fusion: Analysis and application to audio visual speech recognition
- MOVELLAN, J. and MINEIRO, P. (1998): Robust sensor fusion: Analysis and application to audio visual speech recognition. Machine Learning, 32: 85-100.
- (1998) Machine Learning , vol.32 , pp. 85-100
- Movellan, J.¹ Mineiro, P.²

37
- 0035790960
- Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins summer 2000 workshop
- Cannes
- NETI, C., POTAMIANOS, G., LEUTTIN, J., MATTHEWS, I., GLOTIN, H., and VERGYRI, D. (2001): Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins summer 2000 workshop. In Workshop on Multimedia Signal Processing. Special Session on Joint Audio-Visual Processing, Cannes.
- (2001) Workshop on Multimedia Signal Processing. Special Session on Joint Audio-Visual Processing
- Neti, C.¹ Potamianos, G.² Leuttin, J.³ Matthews, I.⁴ Glotin, H.⁵ Vergyri, D.⁶

38
- 85009153179
- Stream confidence estimation for audio-visual speech recognition
- Beijing
- POTAMIANOS, G. and NETI, C. (2000): Stream confidence estimation for audio-visual speech recognition. In Proceedings of the International Conference on Spoken Language Processing, 746-749, Beijing.
- (2000) Proceedings of the International Conference on Spoken Language Processing , pp. 746-749
- Potamianos, G.¹ Neti, C.²

39
- 0010127090
- Speaker adaptation for audio-visual speech recognition
- Budapest
- POTAMIAONOS, G. and POTAMIANOS, A. (1999): Speaker adaptation for audio-visual speech recognition. In Proceedings of EUROSPEECH (3), 1291-1294, Budapest.
- (1999) Proceedings of EUROSPEECH (3) , pp. 1291-1294
- Potamiaonos, G.¹ Potamianos, A.²

40
- 0003552976
- Preprocessing video images for neural learning of lipreading
- Ricoh California Research Centre
- PRASAD, K., STORK, D., and WOLFF, G. (1993): Preprocessing video images for neural learning of lipreading. Technical Report CRC-TR-93-26, Ricoh California Research Centre.
- (1993) Technical Report , vol.CRC-TR-93-26
- Prasad, K.¹ Stork, D.² Wolff, G.³

41
- 0004244302
- Prentice Hall, Englewood Cliffs, NJ
- RABINER, L. and JUANG, B. (1993): Fundamentals of Speech Recognition. Prentice Hall, Englewood Cliffs, NJ.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.²

42
- 85060684689
- Lip modeling for visual speech recognition
- IEEE Computer Society, Pacific Grove CA
- RAO, R. and MERSEREAU, R. (1994): Lip modeling for visual speech recognition. In 28th Annual Asimolar Conference on Signals, Systems, and Computers, volume 2. IEEE Computer Society, Pacific Grove CA.
- (1994) 28th Annual Asimolar Conference on Signals, Systems, and Computers , vol.2
- Rao, R.¹ Mersereau, R.²

43
- 0042453473
- STORK and HENNECKE (1996)
- ROBERT-RIBES, J., PIQUEMAL, M., SCHWARTZ, J., and ESCUDIER, P. (1996): Exploiting sensor fusion and stimuli complementary in av speech recognition. In STORK and HENNECKE (1996), 194-219.
- (1996) Exploiting Sensor Fusion and Stimuli Complementary in av Speech Recognition , pp. 194-219
- Robert-Ribes, J.¹ Piquemal, M.² Schwartz, J.³ Escudier, P.⁴

44
- 0002358797
- Discriminative learning of visual data for audiovisual speech recognition
- ROGOZAN, A. (1999): Discriminative learning of visual data for audiovisual speech recognition. International Journal of Artificial Intelligence Tools, 8(1):43-52.
- (1999) International Journal of Artificial Intelligence Tools , vol.8 , Issue.1 , pp. 43-52
- Rogozan, A.¹

45
- 0038133938
- Digital representations of speech signals
- WAIBEL, A. and LEE, K., editors, Morgan Kaufmann Publishers Inc., San Mateo, CA
- SCHAFER, R. and RABINER, L. (1990): Digital representations of speech signals. In WAIBEL, A. and LEE, K., editors, Readings in Speech Recognition, 49-64. Morgan Kaufmann Publishers Inc., San Mateo, CA.
- (1990) Readings in Speech Recognition , pp. 49-64
- Schafer, R.¹ Rabiner, L.²

46
- 0042954452
- Master's thesis, University of Karlsruhe
- SCHIFFERDECKER, G. (1994): Finding structure in language. Master's thesis, University of Karlsruhe.
- (1994) Finding Structure in Language
- Schifferdecker, G.¹

47
- 78649891234
- Real time lip tracking for lipreading
- STIEFELHAGEN, R., YANG, J., and MEIER, U. (1997): Real time lip tracking for lipreading. In Proceedings of Eurospeech '97.
- (1997) Proceedings of Eurospeech '97
- Stiefelhagen, R.¹ Yang, J.² Meier, U.³

48
- 0003544881
- NATO/Springer-Verlag, New York
- STORK, D. and HENNECKE, M., editors (1996): Speechreading by Man and Machine: Models, System, and Applications. NATO/Springer-Verlag, New York.
- (1996) Speechreading by Man and Machine: Models, System, and Applications
- Stork, D.¹ Hennecke, M.²

49
- 0002028032
- DODD and CAMPBELL (1987)
- SUMMERFIELD, Q. (1987): Some preliminaries to a comprehensive account of audio-visual speech perception, 3-52. In DODD and CAMPBELL (1987).
- (1987) Some Preliminaries to a Comprehensive Account of Audio-visual Speech Perception , pp. 3-52
- Summerfield, Q.¹

50
- 0042954451
- Late integration in audio-visual continuous speech recognition
- VERMA, A., FARUQUIE, T., NETI, C., BASU, S., and SENIOR, A. (1999): Late integration in audio-visual continuous speech recognition. In Automatic Speech Recognition and Understanding.
- (1999) Automatic Speech Recognition and Understanding
- Verma, A.¹ Faruquie, T.² Neti, C.³ Basu, S.⁴ Senior, A.⁵

51
- 0005277057
- STORK and HENNECKE (1996)
- VOGT, M. (1996): Fast matching of a dynamic lip model to color video sequences under regular illumination conditions. In STORK and HENNECKE (1996), 399-407.
- (1996) Fast Matching of a Dynamic Lip Model to Color Video Sequences under Regular Illumination Conditions , pp. 399-407
- Vogt, M.¹

52
- 0017357502
- Effect of training on the visual recognition of consonants
- WALDEN, B., PROSEK, R., MONTGOMERY, A., SCHERR, C., and JONES, C. (1977): Effect of training on the visual recognition of consonants. Journal of Speech and Hearing Research, 20:130-145.
- (1977) Journal of Speech and Hearing Research , vol.20 , pp. 130-145
- Walden, B.¹ Prosek, R.² Montgomery, A.³ Scherr, C.⁴ Jones, C.⁵

53
- 0004524499
- An approach to statistical lip modelling for speaker identification via chromatic feature extraction
- WARK, T., SRIDHARAN, S., and CHANDRAN, V. (1998): An approach to statistical lip modelling for speaker identification via chromatic feature extraction. In Proceedings of the IEEE International Conference on Pattern Recognition, 123-125.
- (1998) Proceedings of the IEEE International Conference on Pattern Recognition , pp. 123-125
- Wark, T.¹ Sridharan, S.² Chandran, V.³

54
- 0030418169
- A real-time face tracker
- YANG, J. and WAIBEL, A. (1996): A real-time face tracker. In Proceedings of WACV'96, 142-147.
- (1996) Proceedings of WACV'96 , pp. 142-147
- Yang, J.¹ Waibel, A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.