SCOPUS 정보 검색 플랫폼

Information Fusion

Volumn 5, Issue 2, 2004, Pages 91-101

Continuous audio-visual digit recognition using N -best decision fusion

(3) Meyer, Georg F a Mulligan, Jeffrey B b Wuerger, Sophie M a

a UNIVERSITY OF LIVERPOOL (United Kingdom)

b NASA AMES RESEARCH CENTER (United States)

Author keywords

Audio visual speech; Decision fusion; Lip reading; Speech recognition

Indexed keywords

ACOUSTIC SIGNAL PROCESSING; ALGORITHMS; DATABASE SYSTEMS; MARKOV PROCESSES; MAXIMUM LIKELIHOOD ESTIMATION; SPURIOUS SIGNAL NOISE; VIDEO SIGNAL PROCESSING; WORD PROCESSING;

AUDIO-VISUAL SPEECH; DECISION FUSION; LINEAR DISCRIMINANT ANALYSIS; LIP READING;

SPEECH RECOGNITION;

EID: 1842854571 PISSN: 15662535 EISSN: None Source Type: Journal
DOI: 10.1016/j.inffus.2003.07.001 Document Type: Article

Times cited : (26)

References (37)

1
- 0031187171
- Speech recognition by machines and humans
- Lippmann R.P. Speech recognition by machines and humans. Speech Communication. 22:1997;1-15.
- (1997) Speech Communication , vol.22 , pp. 1-15
- Lippmann, R.P.¹

2
- 0004084456
- Cambridge, MA: MIT Press
- Massaro D.W. Perceiving Talking Faces. From Speech Perception to a Behavioral Principle. 1998;MIT Press, Cambridge, MA.
- (1998) Perceiving Talking Faces. From Speech Perception to a Behavioral Principle
- Massaro, D.W.¹

3
- 0037999967
- Large-vocabulary audio-visual speech recognition by machines and humans
- G. Potamianos, C. Neti, G. Iyenar, E. Helmuth, Large-vocabulary audio-visual speech recognition by machines and humans, in: Proceedings on Eurospeech, Aalborg, 2001, pp. 1899-1902.
- (2001) Proceedings on Eurospeech, Aalborg , pp. 1899-1902
- Potamianos, G.¹ Neti, C.² Iyenar, G.³ Helmuth, E.⁴

4
- 85013580214
- Sensory integration in audiovisual automatic speech recognition
- P.L. Silsbee, Sensory integration in audiovisual automatic speech recognition, in: 28th Annual Asilomar Conference on Signals, Systems, and Computers, vol. 1, 1994, pp. 561-565.
- (1994) 28th Annual Asilomar Conference on Signals, Systems, and Computers , vol.1 , pp. 561-565
- Silsbee, P.L.¹

5
- 0033640646
- Statistical pattern recognition: A review
- Jain A.K., Duin R.P.W., Mao J. Statistical pattern recognition: a review. IEEE Transaction on Pattern Analysis and Machine Intelligence. 22:2000;4-37.
- (2000) IEEE Transaction on Pattern Analysis and Machine Intelligence , vol.22 , pp. 4-37
- Jain, A.K.¹ Duin, R.P.W.² Mao, J.³

6
- 0034848499
- Optimal weighting of posteriors for audio-visual speech recognition
- Salt Lake
- M. Heckmann, F. Berthommier, K. Kroschel, Optimal weighting of posteriors for audio-visual speech recognition, in: Proceedings on ICASSP 2001, Salt Lake, 2001, pp. 161-164.
- (2001) Proceedings on ICASSP 2001 , pp. 161-164
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

7
- 0036874527
- Noise adaptive weighting in audio-visual speech recognition
- Heckmann M., Berthommier F., Kroschel K. Noise adaptive weighting in audio-visual speech recognition. Journal on Applied Signal Processing. 11:2002;1260-1273.
- (2002) Journal on Applied Signal Processing , vol.11 , pp. 1260-1273
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

8
- 85009083793
- Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition
- Aalborg
- M. Heckmann, T. Wild, F. Berthommier, K. Kroschel, Comparing audio- and a-posteriori-probability-based stream confidence measures for audio-visual speech recognition, in: Proceedings on Eurospeech 2001, Aalborg, 2001, pp. 1023-1026.
- (2001) Proceedings on Eurospeech 2001 , pp. 1023-1026
- Heckmann, M.¹ Wild, T.² Berthommier, F.³ Kroschel, K.⁴

9
- 85009153179
- Stream confidence estimation for audio-visual speech recognition
- Beijing
- G. Potamianos, C. Neti, Stream confidence estimation for audio-visual speech recognition, in: Proceedings on ICSLP 2000, Beijing, 2000, pp. 746-749.
- (2000) Proceedings on ICSLP 2000 , pp. 746-749
- Potamianos, G.¹ Neti, C.²

10
- 0034842451
- Weighting schemes for audio-visual fusion in speech recognition
- Salt Lake
- H. Glotin, D. Vergyri, C. Neti, G. Potamianos, J. Luettin, Weighting schemes for audio-visual fusion in speech recognition, in: Proceedings on ICASSP 2001, Salt Lake, 2001.
- (2001) Proceedings on ICASSP 2001
- Glotin, H.¹ Vergyri, D.² Neti, C.³ Potamianos, G.⁴ Luettin, J.⁵

11
- 0003322357
- Audio visual speech recognition
- Center for Language and Speech Processing, The Johns Hopkins University, Baltimore
- C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A. Mashari, J. Zhou, Audio visual speech recognition, Final Workshop 2000 Report, Center for Language and Speech Processing, The Johns Hopkins University, Baltimore, 2000.
- (2000) Final Workshop 2000 Report
- Neti, C.¹ Potamianos, G.² Luettin, J.³ Matthews, I.⁴ Glotin, H.⁵ Vergyri, D.⁶ Sison, J.⁷ Mashari, A.⁸ Zhou, J.⁹

12
- 0020836249
- Evaluation and integration of visual and auditory information in speech perception
- Massaro D.W., Cohen M.M. Evaluation and integration of visual and auditory information in speech perception. Journal of Experimental Psychology: HPP. 9:1983;751-753.
- (1983) Journal of Experimental Psychology: HPP , vol.9 , pp. 751-753
- Massaro, D.W.¹ Cohen, M.M.²

13
- 85009080413
- Auditory visual speech processing
- Aalborg
- D.W. Massaro, Auditory visual speech processing, in: Proceedings on Eurospeech 2001, Aalborg, 2001, pp. 1153-1156.
- (2001) Proceedings on Eurospeech 2001 , pp. 1153-1156
- Massaro, D.W.¹

14
- 0000789852
- Channel separability in the audio-visual integration of speech: A Bayesian approach
- Speechreading by Man and Machine, Models, Systems and Applications, Berlin: Springer-Verlag
- Movellan J.R., Chadderon G. Channel separability in the audio-visual integration of speech: a Bayesian approach. Speechreading by Man and Machine, Models, Systems and Applications. NATO ASI Series. 1996;473-487 Springer-Verlag, Berlin.
- (1996) NATO ASI Series , pp. 473-487
- Movellan, J.R.¹ Chadderon, G.²

15
- 0002028032
- Some preliminaries to a comprehensive account of audio-visual speech perception
- B. Dodd, & R. Campbell. Hillsdale, NJ: Lawrence Erlbaum Associates
- Summerfield A.Q. Some preliminaries to a comprehensive account of audio-visual speech perception. Dodd B., Campbell R. Hearing by Eye, the Psychology of Lip-reading. 1987;3-51 Lawrence Erlbaum Associates, Hillsdale, NJ.
- (1987) Hearing by Eye, the Psychology of Lip-reading , pp. 3-51
- Summerfield, A.Q.¹

16
- 0036297183
- A coupled HMM for audio-visual speech recognition
- A.V. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao, K. Murphy, A coupled HMM for audio-visual speech recognition, in: Proceedings on ICASSP 2002, vol. 2, 2002, pp. 2013-2016.
- (2002) Proceedings on ICASSP 2002 , vol.2 , pp. 2013-2016
- Nefian, A.V.¹ Liang, L.² Pi, X.³ Xiaoxiang, L.⁴ Mao, C.⁵ Murphy, K.⁶

17
- 0032314380
- An image transform approach for HMM based automatic lipreading
- Chicago
- G. Potamianos, H.P. Graf, E. Cosatto, An image transform approach for HMM based automatic lipreading, in: Proceedings of the International Conference on Image Processing, Chicago, vol. III, 1998, pp. 173-177.
- (1998) Proceedings of the International Conference on Image Processing , vol.3 , pp. 173-177
- Potamianos, G.¹ Graf, H.P.² Cosatto, E.³

18
- 0034270644
- Audio-visual speech modelling for continuous speech recognition
- Dupont S., Luettin J. Audio-visual speech modelling for continuous speech recognition. IEEE Transactions on Multimedia. 2:2000;141-151.
- (2000) IEEE Transactions on Multimedia , vol.2 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

19
- 84957810405
- A comparison of active shape model and scale decomposition based features for visual speech recognition
- Freiburg
- I. Mathews, J.A. Bangham, R. Harvey, S. Cox, A comparison of active shape model and scale decomposition based features for visual speech recognition, in: Proceedings of the European Conference on Computer Vision, Freiburg, 1998, pp. 514-528.
- (1998) Proceedings of the European Conference on Computer Vision , pp. 514-528
- Mathews, I.¹ Bangham, J.A.² Harvey, R.³ Cox, S.⁴

20
- 84987702417
- The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Beijing
- D. Pearce, H.-G. Hirsch, The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, in: Proceedings on ICSLP'00, Beijing, vol. 4, 2000, pp. 29-32.
- (2000) Proceedings on ICSLP'00 , vol.4 , pp. 29-32
- Pearce, D.¹ Hirsch, H.-G.²

21
- 0003822743
- S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, P. Woodland, The HTK Book, Revised version for HTK V 3.0, 2000, http://htk.eng.cam.ac.uk/index. shtml.
- (2000) The HTK Book, Revised Version for HTK V 3.0
- Young, S.¹ Kershaw, D.² Odell, J.³ Ollason, D.⁴ Valtchev, V.⁵ Woodland, P.⁶

22
- 0004244302
- Englewood Cliffs: Prentice Hall
- Rabiner L., Juang B.H. Fundamentals of Speech Recognition. 1993;Prentice Hall, Englewood Cliffs.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.H.²

23
- 0003552976
- Preprocessing video images for neural learning of lipreading
- K.V. Prasad, G. Storck, G.J. Wolf, Preprocessing video images for neural learning of lipreading, Ricoh CRC Technical Report 93-26, 1993.
- (1993) Ricoh CRC Technical Report , vol.93 , Issue.26
- Prasad, K.V.¹ Storck, G.² Wolf, G.J.³

24
- 85013597845
- Eigenlips for robust speech recognition
- Adelaide
- C. Bregler, Y. Konig, Eigenlips for robust speech recognition, in: Proceedings on ICASSP'94, Adelaide, 1994, pp. 669-672.
- (1994) Proceedings on ICASSP'94 , pp. 669-672
- Bregler, C.¹ Konig, Y.²

25
- 85133465985
- Lipreading using shape, shade and scale
- Terrigal
- I. Mathews, S. Cootes, S. Cox, R. Harvey, J.A. Bangham, Lipreading using shape, shade and scale, in: Proceedings of Workshop on Audio Visual Speech Processing, Terrigal, 1998, pp. 73-78.
- (1998) Proceedings of Workshop on Audio Visual Speech Processing , pp. 73-78
- Mathews, I.¹ Cootes, S.² Cox, S.³ Harvey, R.⁴ Bangham, J.A.⁵

26
- 0031211240
- Lipreading from color video
- Chiou G., Hwang J.N. Lipreading from color video. IEEE Transactions on Image Processing. 6:1997;1192-1195.
- (1997) IEEE Transactions on Image Processing , vol.6 , pp. 1192-1195
- Chiou, G.¹ Hwang, J.N.²

27
- 1842841598
- Snakes: Active contour models
- Kass M., Witkin A., Terzopoulos D. Snakes: active contour models. Computer, Speech and Language. 5:1991;275-294.
- (1991) Computer, Speech and Language , vol.5 , pp. 275-294
- Kass, M.¹ Witkin, A.² Terzopoulos, D.³

28
- 0026903014
- Feature extraction from faces using deformable templates
- Yuille A.L., Hallinan P.W., Cohen D.S. Feature extraction from faces using deformable templates. International Journal of Computer Vision. 8:1992;99-111.
- (1992) International Journal of Computer Vision , vol.8 , pp. 99-111
- Yuille, A.L.¹ Hallinan, P.W.² Cohen, D.S.³

29
- 0008571982
- PCA image coding schemes and visual speech intelligibility
- Windermere
- N.M. Brooke, S.D. Scott, PCA image coding schemes and visual speech intelligibility, in: Proceedings on Institute of Acoustics, Windermere, vol. 16, 1994, pp. 123-129.
- (1994) Proceedings on Institute of Acoustics , vol.16 , pp. 123-129
- Brooke, N.M.¹ Scott, S.D.²

30
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Dupont S., Luettin J. Audio-visual speech modeling for continuous speech recognition. IEEE Transactions on Multimedia. 2:2000;141-151.
- (2000) IEEE Transactions on Multimedia , vol.2 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

31
- 0028996862
- Toward movement invariant automatic lip-reading and speech recognition
- Philadelphia
- P. Duchnowski, M. Hunke, D. Büsching, U. Meier, A. Waibel, Toward movement invariant automatic lip-reading and speech recognition, in: Proceedings of International Conference on Spoken Language Processing, Philadelphia, vol. 1, 1995, pp. 109-112.
- (1995) Proceedings of International Conference on Spoken Language Processing , vol.1 , pp. 109-112
- Duchnowski, P.¹ Hunke, M.² Büsching, D.³ Meier, U.⁴ Waibel, A.⁵

32
- 0034517163
- A cascade image transform for speaker independent automatic speech reading
- G. Potanianos, A. Verma, C. Neti, G. Iyengar, S. Basu, A cascade image transform for speaker independent automatic speech reading, in: Proceedings of International Conference on Multimedia and Expo, vol. 2, 2000, pp. 1097-1100.
- (2000) Proceedings of International Conference on Multimedia and Expo , vol.2 , pp. 1097-1100
- Potanianos, G.¹ Verma, A.² Neti, C.³ Iyengar, G.⁴ Basu, S.⁵

33
- 0000813366
- Talking heads and speech recognisers that can see: The computer processing of visual speech signals
- D.G. Stork, & M.E. Hennecke. Berlin: Springer-Verlag
- Brooke N.M. Talking heads and speech recognisers that can see: the computer processing of visual speech signals. Stork D.G., Hennecke M.E. Speechreading by Humans and Machines. 1996;351-371 Springer-Verlag, Berlin.
- (1996) Speechreading by Humans and Machines , pp. 351-371
- Brooke, N.M.¹

34
- 0003770986
- Comparing models of audio-visual fusion in a noisy vowel recognition task
- Tessier P., Robert-Ribes N., Schwartz J.-L., Guerin-Dugue A. Comparing models of audio-visual fusion in a noisy vowel recognition task. IEEE Transactions on SAP. 7:1999;629-642.
- (1999) IEEE Transactions on SAP , vol.7 , pp. 629-642
- Tessier, P.¹ Robert-Ribes, N.² Schwartz, J.-L.³ Guerin-Dugue, A.⁴

35
- 85009284526
- DCT-based video features for audio-visual speech recognition
- Beijing
- M. Heckmann, K. Kroschel, C. Savariaux, F. Berthommier, DCT-based video features for audio-visual speech recognition, in: Proceedings on ICSLP, Beijing, vol. 3, 2002, pp. 1925-1928.
- (2002) Proceedings on ICSLP , vol.3 , pp. 1925-1928
- Heckmann, M.¹ Kroschel, K.² Savariaux, C.³ Berthommier, F.⁴

36
- 0002358797
- Discriminative learning of visual data for audiovisual speech recognition
- Rogozan A. Discriminative learning of visual data for audiovisual speech recognition. International Journal on Artificial Intelligence Tools. 8:1999;43-52.
- (1999) International Journal on Artificial Intelligence Tools , vol.8 , pp. 43-52
- Rogozan, A.¹

37
- 0025547193
- Links between Markov models and multilayer perceptrons
- Bourlard H., Wellekens C.J. Links between Markov models and multilayer perceptrons. IEEE Transactions on Pattern Analysis and Machine Intelligence. 12(2):1990;1167-1178.
- (1990) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.12 , Issue.2 , pp. 1167-1178
- Bourlard, H.¹ Wellekens, C.J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.