SCOPUS 정보 검색 플랫폼

Eurasip Journal on Applied Signal Processing

Volumn 2002, Issue 11, 2002, Pages 1213-1227

Audio-visual speech recognition using MPEG-4 compliant visual features

(4) Aleksic, Petar S a Williams, Jay J a Wu, Zhilin a Katsaggelos, Aggelos K a

a Northwestern University (United States)

Author keywords

Audio visual speech recognition; Facial animation parameters; Snake

Indexed keywords

ACOUSTIC SIGNAL PROCESSING; ANIMATION; FEATURE EXTRACTION; GESTURE RECOGNITION; PRINCIPAL COMPONENT ANALYSIS; SIGNAL TO NOISE RATIO; VIDEO SIGNAL PROCESSING; WHITE ACOUSTIC NOISE;

AUDIO-VISUAL SIGNAL PROCESSING; AUTOMATIC SPEECH RECOGNITION (ASR);

SPEECH RECOGNITION;

EID: 0036874915 PISSN: 11108657 EISSN: None Source Type: Journal
DOI: 10.1155/S1110865702206162 Document Type: Article

Times cited : (51)

References (37)

1
- 0027128576
- Lipreading and audio-visual speech perception
- Q. Summerfield, "Lipreading and audio-visual speech perception," Philos. Trans. Roy. Soc. London Ser. B, vol. 335, pp. 71-78, 1992.
- (1992) Philos. Trans. Roy. Soc. London Ser. B , vol.335 , pp. 71-78
- Summerfield, Q.¹

2
- 0025767028
- Evaluating the articulation index for auditory visual input
- K. W. Grant and L. D. Braida, "Evaluating the articulation index for auditory visual input," Journal of the Acoustical Society of America, vol. 89, no. 6, pp. 2952-2960, 1991.
- (1991) Journal of the Acoustical Society of America , vol.89 , Issue.6 , pp. 2952-2960
- Grant, K.W.¹ Braida, L.D.²

3
- 84937322907
- Ph.D. thesis, Northwestern University, Evanston, Ill, USA, June
- J. J. Williams, Speech-to-video conversion for individuals with impaired hearing, Ph.D. thesis, Northwestern University, Evanston, Ill, USA, June 2000.
- (2000) Speech-to-Video Conversion for Individuals with Impaired Hearing
- Williams, J.J.¹

4
- 0036650527
- An HMM-based speech-to-video synthesizer
- Special Issue on Intelligent Multi-media
- J. J. Williams and A. K. Katsaggelos, "An HMM-based speech-to-video synthesizer," IEEE Trans. on Neural Networks, vol. 13, no. 4, pp. 900-915, 2002, Special Issue on Intelligent Multi-media.
- (2002) IEEE Trans. on Neural Networks , vol.13 , Issue.4 , pp. 900-915
- Williams, J.J.¹ Katsaggelos, A.K.²

5
- 0002028032
- Some preliminaries to a comprehensive account of audio-visual speech perception
- B. Dodd and R. Campbell, Eds., Lawrence Erlbaum Associates, Hillside, Minn, USA
- Q. Summerfield, "Some preliminaries to a comprehensive account of audio-visual speech perception," in Hearing by Eye: The Psychology of Lip-Reading, B. Dodd and R. Campbell, Eds., pp. 97-113, Lawrence Erlbaum Associates, Hillside, Minn, USA, 1987.
- (1987) Hearing by Eye: The Psychology of Lip-Reading , pp. 97-113
- Summerfield, Q.¹

6
- 0031187171
- Speech recognition by machines and humans
- R. Lippman, "Speech recognition by machines and humans," Speech Communication, vol. 22, no. 1, pp. 1-15, 1997.
- (1997) Speech Communication , vol.22 , Issue.1 , pp. 1-15
- Lippman, R.¹

7
- 0029288202
- Speech recognition in noisy environments: A survey
- Y. Gong, "Speech recognition in noisy environments: A survey," Speech Communication, vol. 16, no. 3, pp. 261-291, 1995.
- (1995) Speech Communication , vol.16 , Issue.3 , pp. 261-291
- Gong, Y.¹

8
- 0022228262
- Automatic lipreading to enhance speech recognition
- San Francisco, Calif, USA
- E. Petajan, "Automatic lipreading to enhance speech recognition," in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 40-47, San Francisco, Calif, USA, 1985.
- (1985) Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition , pp. 40-47
- Petajan, E.¹

9
- 84893671339
- An improved automatic lipreading system to enhance speech recognition
- ACM, Washington, DC, USA
- E. Petajan, B. Bischoff, D. Bodoff, and N. M. Brooke, "An improved automatic lipreading system to enhance speech recognition," in CHI-88, pp. 19-25, ACM, Washington, DC, USA, 1988.
- (1988) CHI-88 , pp. 19-25
- Petajan, E.¹ Bischoff, B.² Bodoff, D.³ Brooke, N.M.⁴

10
- 84875584220
- Continuous optical automatic speech recognition by lipreading
- Pacific Grove, Calif, USA, October
- A. J. Goldschen, O. N. Garcia, and E. Petajan, "Continuous optical automatic speech recognition by lipreading," in Proc. IEEE 28th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, Calif, USA, October 1994.
- (1994) Proc. IEEE 28th Asilomar Conference on Signals, Systems and Computers
- Goldschen, A.J.¹ Garcia, O.N.² Petajan, E.³

11
- 0003544881
- Springer-Verlag, New York, NY, USA
- D. G. Stork and M. E. Hennecke, Eds., Speechreading by Man and Machine, Springer-Verlag, New York, NY, USA, 1996.
- (1996) Speechreading by Man and Machine
- Stork, D.G.¹ Hennecke, M.E.²

12
- 0030421449
- Robust face feature analysis for automatic speechreading and character animation
- Killington, Vt, USA
- E. Petajan and H. P. Graf, "Robust face feature analysis for automatic speechreading and character animation," in Proc. 2nd Int. Conf. Automatic Face and Gesture Recognition, pp. 357-362, Killington, Vt, USA, 1996.
- (1996) Proc. 2nd Int. Conf. Automatic Face and Gesture Recognition , pp. 357-362
- Petajan, E.¹ Graf, H.P.²

13
- 0004052871
- Tech. Rep., Johns Hopkins University, Baltimore, Md, USA, October
- C. Neti, G. Potamianos, J. Luettin, et al., "Audio-visual speech recognition," Tech. Rep., Johns Hopkins University, Baltimore, Md, USA, October 2000.
- (2000) Audio-Visual Speech Recognition
- Neti, C.¹ Potamianos, G.² Luettin, J.³

14
- 0034853041
- Hierarchical discriminant features for audio-visual LVCSR
- Salt Lake City, Utah, USA
- G. Potamianos, J. Luettin, and C. Neti, "Hierarchical discriminant features for audio-visual LVCSR," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 165-168, Salt Lake City, Utah, USA, 2001.
- (2001) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.1 , pp. 165-168
- Potamianos, G.¹ Luettin, J.² Neti, C.³

15
- 0031624666
- Discriminative training of HMM stream exponents for audio-visual speech recognition
- Seattle, Wash, USA
- G. Potamianos and H. P. Graph, "Discriminative training of HMM stream exponents for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 6, pp. 3733-3736, Seattle, Wash, USA, 1998.
- (1998) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.6 , pp. 3733-3736
- Potamianos, G.¹ Graph, H.P.²

16
- 85013597845
- "Eigenlips" for robust speech recognition
- Adelaide, Australia
- C. Bregler and Y. Conig, ""Eigenlips" for robust speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, pp. 669-672, Adelaide, Australia, 1994.
- (1994) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing , pp. 669-672
- Bregler, C.¹ Conig, Y.²

17
- 0031069562
- Speechreading using probabilistic models
- J. Luettin and N. A. Thacker, "Speechreading using probabilistic models," Computer Vision and Image Understanding, vol. 65, no. 2, pp. 163-178, 1997.
- (1997) Computer Vision and Image Understanding , vol.65 , Issue.2 , pp. 163-178
- Luettin, J.¹ Thacker, N.A.²

18
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

19
- 85009060634
- Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction
- Beijing, China, October
- C. Neti, G. Iyengar, G. Potamianos, A. Senior, and B. Maison, "Perceptual interfaces for information interaction: Joint processing of audio and visual information for human-computer interaction," in Proc. International Conference on Spoken Language Processing, vol. III, pp. 11-14, Beijing, China, October 2000.
- (2000) Proc. International Conference on Spoken Language Processing , vol.3 , pp. 11-14
- Neti, C.¹ Iyengar, G.² Potamianos, G.³ Senior, A.⁴ Maison, B.⁵

20
- 84957886748
- Real-time lip tracking for audio-visual speech recognition applications
- Cambridge, UK
- R. Kaucic, B. Dalton, and A. Blake, "Real-time lip tracking for audio-visual speech recognition applications," in Proc. European Conference on Computer Vision, vol. 2, pp. 376-387, Cambridge, UK, 1996.
- (1996) Proc. European Conference on Computer Vision , vol.2 , pp. 376-387
- Kaucic, R.¹ Dalton, B.² Blake, A.³

21
- 0031211240
- Lipreading from color video
- G. Chiou and J.-N. Hwang, "Lipreading from color video," IEEE Trans. Image Processing, vol. 6, no. 8, pp. 1192-1195, 1997.
- (1997) IEEE Trans. Image Processing , vol.6 , Issue.8 , pp. 1192-1195
- Chiou, G.¹ Hwang, J.-N.²

22
- 84925639646
- Real-time lip tracking and bimodal continuous speech recognition
- Redondo Beach, Los Angeles, Calif, USA, December
- M. T. Chan, Y. Zhang, and T. S. Huang, "Real-time lip tracking and bimodal continuous speech recognition," in Proc. IEEE 2nd Workshop on Multimedia Signal Processing, pp. 65-70, Redondo Beach, Los Angeles, Calif, USA, December 1998.
- (1998) Proc. IEEE 2nd Workshop on Multimedia Signal Processing , pp. 65-70
- Chan, M.T.¹ Zhang, Y.² Huang, T.S.³

23
- 0026903014
- Feature extraction from faces using deformable templates
- A. L. Yuille, P. W. Hallinan, and D. S. Cohen, "Feature extraction from faces using deformable templates," International Journal of Computer Vision, vol. 8, no. 2, pp. 99-111, 1992.
- (1992) International Journal of Computer Vision , vol.8 , Issue.2 , pp. 99-111
- Yuille, A.L.¹ Hallinan, P.W.² Cohen, D.S.³

24
- 0012725683
- Text for ISO/IEC FDIS 14496-2 Visual, ISO/IEC JTC1/SC29/WG11 N2502, November 1998
- Text for ISO/IEC FDIS 14496-2 Visual, ISO/IEC JTC1/SC29/WG11 N2502, November 1998.

25
- 0012705505
- Text for ISO/IEC FDIS 14496-1 Systems, ISO/IEC JTC1/SC29/ WG11 N2502, November 1998
- Text for ISO/IEC FDIS 14496-1 Systems, ISO/IEC JTC1/SC29/ WG11 N2502, November 1998.

26
- 0035472468
- An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame
- F. Lavagetto and R. Pockaj, "An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame," IEEE Trans. Circuits and Systems for Video Technology, vol. 11, no. 10, pp. 1085-1097, 2001.
- (2001) IEEE Trans. Circuits and Systems for Video Technology , vol.11 , Issue.10 , pp. 1085-1097
- Lavagetto, F.¹ Pockaj, R.²

27
- 0034502124
- Approaches to visual speech processing based on the MPEG-4 face animation standard
- New York, NY, USA, 30 July-2 August
- E. Petajan, "Approaches to visual speech processing based on the MPEG-4 face animation standard," in Proc. IEEE International Conference on Multimedia and Expo(I), pp. 575-587, New York, NY, USA, 30 July-2 August 2000.
- (2000) Proc. IEEE International Conference on Multimedia and Expo(I) , pp. 575-587
- Petajan, E.¹

28
- 0003505216
- Tech. Rep., Johns Hopkins University, Baltimore, Md, USA
- L. Bernstein and S. Eberhardt, "Johns Hopkins lipreading corpus I-II," Tech. Rep., Johns Hopkins University, Baltimore, Md, USA, 1986.
- (1986) Johns Hopkins Lipreading Corpus I-II
- Bernstein, L.¹ Eberhardt, S.²

29
- 0003526379
- Institute Superior Técnico, (c), Lisbon, Portugal
- G. Abrantes, FACE-Facial Animation System, version 3.3.1, Institute Superior Técnico, (c), Lisbon, Portugal, 1997-1998.
- (1997) FACE-Facial Animation System, Version 3.3.1
- Abrantes, G.¹

30
- 34250090755
- Snakes: Active con-tour models
- M. Kass, A. Witkin, and D. Terzopoulos, "Snakes: Active con-tour models," International Journal of Computer Vision, vol. 1, no. 4, pp. 321-331, 1988.
- (1988) International Journal of Computer Vision , vol.1 , Issue.4 , pp. 321-331
- Kass, M.¹ Witkin, A.² Terzopoulos, D.³

31
- 0027693887
- Finite element methods for active contour models and balloons for 2-D and 3-D images
- L. D. Cohen and I. Cohen, "Finite element methods for active contour models and balloons for 2-D and 3-D images," IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 15, no. 11, pp. 1131-1147, 1993.
- (1993) IEEE Trans. on Pattern Analysis and Machine Intelligence , vol.15 , Issue.11 , pp. 1131-1147
- Cohen, L.D.¹ Cohen, I.²

32
- 0030715832
- Gradient vector flow: A new external force for snakes
- Puerto Rico, June
- C. Xu and J. L. Prince, "Gradient vector flow: A new external force for snakes," in Proc. IEEE International Conf. on Computer Vision and Pattern Recognition, pp. 66-71, Puerto Rico, June 1997.
- (1997) Proc. IEEE International Conf. on Computer Vision and Pattern Recognition , pp. 66-71
- Xu, C.¹ Prince, J.L.²

33
- 0032028944
- Snakes, shapes, and gradient vector flow
- C. Xu and J. L. Prince, "Snakes, shapes, and gradient vector flow," IEEE Trans. Image Processing, vol. 7, no. 3, pp. 359-369, 1998.
- (1998) IEEE Trans. Image Processing , vol.7 , Issue.3 , pp. 359-369
- Xu, C.¹ Prince, J.L.²

34
- 0003792917
- McGraw-Hill, New York, NY, USA
- R. Jain, R. Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill, New York, NY, USA, 1995.
- (1995) Machine Vision
- Jain, R.¹ Kasturi, R.² Schunck, B.G.³

35
- 0003822743
- Entropie, Cambridge, UK
- S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book, Entropie, Cambridge, UK, 1999.
- (1999) The HTK Book
- Young, S.¹ Kershaw, D.² Odell, J.³ Ollason, D.⁴ Valtchev, V.⁵ Woodland, P.⁶

36
- 0004244302
- Prentice-Hall, Englewood Cliffs, NJ, USA
- L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition, Prentice-Hall, Englewood Cliffs, NJ, USA, 1993.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.-H.²

37
- 0034842451
- Weighting schemes for audio-visual fusion in speech recognition
- Salt Lake City, Utah, USA
- H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, pp. 165-168, Salt Lake City, Utah, USA, 2001.
- (2001) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.1 , pp. 165-168
- Glotin, H.¹ Vergyri, D.² Neti, C.³ Potamianos, G.⁴ Luettin, J.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.