SCOPUS 정보 검색 플랫폼

IEEE Transactions on Circuits and Systems for Video Technology

Volumn 14, Issue 5, 2004, Pages 682-692

Speech-to-video synthesis using MPEG-4 compliant visual features

(2) Aleksic, Petar S a Katsaggelos, Aggelos K a

a Northwestern University (United States)

Author keywords

Audio visual speech recognition; Correlation hidden Markov models (CHMMs); Facial animation parameters (FAPs); Speech to video synthesis

Indexed keywords

ALGORITHMS; CORRELATION METHODS; DATA REDUCTION; MARKOV PROCESSES; SIGNAL TO NOISE RATIO; SPEECH RECOGNITION; SPEECH SYNTHESIS; SYNCHRONIZATION; TOPOLOGY;

AUDIO-VISUAL SPEECH RECOGNITION; CORRELATION HIDDEN MARKOV MODELS (CHMM); FACIAL ANIMATION PARAMETERS (FAP); SPEECH-TO-VIDEO SYNTHESIS;

IMAGE COMPRESSION;

EID: 2542499812 PISSN: 10518215 EISSN: None Source Type: Journal
DOI: 10.1109/TCSVT.2004.826760 Document Type: Article

Times cited : (21)

References (32)

1
- 0031187171
- Speech recognition by machines and humans
- July
- R. Lippman, "Speech recognition by machines and humans," Speech Commun., vol. 22, no. 1, pp. 1-15, July 1997.
- (1997) Speech Commun. , vol.22 , Issue.1 , pp. 1-15
- Lippman, R.¹

2
- 0029288202
- Speech recognition in noisy environments: A survey
- Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol. 16, pp. 261-291, 1995.
- (1995) Speech Commun. , vol.16 , pp. 261-291
- Gong, Y.¹

3
- 0004244302
- Englewood Cliffs, NJ: Prentice-Hall
- L. Rabiner and B.-H. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.-H.²

4
- 0024610919
- A tutorial on hidden Markov models and selected applications in speech recognition
- Feb.
- L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, pp. 257-286, Feb. 1989.
- (1989) Proc. IEEE , vol.77 , pp. 257-286
- Rabiner, L.R.¹

5
- 0029270677
- Converting speech into lip movements: A multimedia telephone for hard of hearing people
- Mar.
- F. Lavagetto, "Converting speech into lip movements: A multimedia telephone for hard of hearing people," IEEE Trans. Rehab. Eng., vol. 3, pp. 1-14, Mar. 1995.
- (1995) IEEE Trans. Rehab. Eng. , vol.3 , pp. 1-14
- Lavagetto, F.¹

6
- 0000051247
- Generation of mouth shapes for a synthetic talking head
- A. Simons and S. Cox, "Generation of mouth shapes for a synthetic talking head," Proc. Inst. Acoust., vol. 12, pp. 475-482, 1990.
- (1990) Proc. Inst. Acoust. , vol.12 , pp. 475-482
- Simons, A.¹ Cox, S.²

7
- 0032074310
- Audio-visual integration in multimedia communication
- May
- T. Chen and R. R. Rao, "Audio-visual integration in multimedia communication," Proc. IEEE, vol. 86, pp. 837-852, May 1998.
- (1998) Proc. IEEE , vol.86 , pp. 837-852
- Chen, T.¹ Rao, R.R.²

8
- 0030677313
- Video rewrite: Driving visual speech with audio
- C. Bregler, M. Covell, and M. Slaney, "Video rewrite: Driving visual speech with audio," in Proc. ACM SIGGRAPH, 1997, pp. 353-360.
- (1997) Proc. ACM SIGGRAPH , pp. 353-360
- Bregler, C.¹ Covell, M.² Slaney, M.³

9
- 0000497160
- Baum-welch hidden Markov model inversion for reliable audio-to-video conversion
- K. Choi and J.-N. Hwang, "Baum-welch hidden Markov model inversion for reliable audio-to-video conversion," in Proc. IEEE 3rd Workshop Multimedia Signal Processing, 1999, pp. 175-180.
- (1999) Proc. IEEE 3rd Workshop Multimedia Signal Processing , pp. 175-180
- Choi, K.¹ Hwang, J.-N.²

10
- 0031100269
- Robust speech recognition based on joint model and feature space optimization of hidden Markov models
- Mar.
- S. Moon and J.-N. Hwang, "Robust speech recognition based on joint model and feature space optimization of hidden Markov models," IEEE Trans. Neural Networks, vol. 8, pp. 194-204, Mar. 1997.
- (1997) IEEE Trans. Neural Networks , vol.8 , pp. 194-204
- Moon, S.¹ Hwang, J.-N.²

11
- 82055176921
- Fusion of audio-visual information for integrated speech processing
- Halmstad, Sweden
- S. Nakamura, "Fusion of audio-visual information for integrated speech processing," in Proc. Third Int. Conf. Audio- and Video-Based Biometric Person Authentication, Halmstad, Sweden, 2001, pp. 127-143.
- (2001) Proc. Third Int. Conf. Audio- and Video-based Biometric Person Authentication , pp. 127-143
- Nakamura, S.¹

12
- 0036650527
- An HMM-based speech-to-video synthesizer
- July
- J. J. Williams and A. K. Katsaggelos, "An HMM-based speech-to-video synthesizer," IEEE Trans. Neural Networks, vol. 3, pp. 900-915, July 2002.
- (2002) IEEE Trans. Neural Networks , vol.3 , pp. 900-915
- Williams, J.J.¹ Katsaggelos, A.K.²

13
- 0034779303
- Subjective analysis of an HMM-based visual speech synthesizer
- San Jose, CA, Jan.
- J. J. Williams, A. K. Katsaggelos, and D. C. Garstecki, "Subjective analysis of an HMM-based visual speech synthesizer," in Proc. SPIE Conf. Human Vision and Electronic Imaging, vol. 4299, San Jose, CA, Jan. 2001, pp. 544-555.
- (2001) Proc. SPIE Conf. Human Vision and Electronic Imaging , vol.4299 , pp. 544-555
- Williams, J.J.¹ Katsaggelos, A.K.² Garstecki, D.C.³

14
- 0003455864
- ISO/IEC JTC1/SC29/WG11 N2502, Nov.
- Text for ISO/IEC FDIS 14496-2 Visual, ISO/IEC JTC1/SC29/WG11 N2502, Nov. 1998.
- (1998) Text for ISO/IEC FDIS 14496-2 Visual

15
- 0003455860
- ISO/IEC JTC1/SC29AVG11 N2502, Nov.
- Text for ISO/IEC FDIS 14496-1 Systems, ISO/IEC JTC1/SC29AVG11 N2502, Nov. 1998.
- (1998) Text for ISO/IEC FDIS 14496-1 Systems

16
- 0003526379
- Instituto Superior Tecnico, (c)
- G. A. Abrantes, FACE-Facial Animation System, Version 3.3.1: Instituto Superior Tecnico, (c), 1997-98.
- (1997) FACE-facial Animation System, Version 3.3.1
- Abrantes, G.A.¹

17
- 0035472468
- An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame
- Oct.
- F. Lavagetto and R. Pockaj, "An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame," IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 1085-1097, Oct. 2001.
- (2001) IEEE Trans. Circuits Syst. Video Technol. , vol.11 , pp. 1085-1097
- Lavagetto, F.¹ Pockaj, R.²

18
- 0036874915
- Audio-visual speech recognition using MPEG-4 compliant visual features
- P. S. Aleksic, J. J. Williams, Z. Wu, and A. K. Katsaggelos, "Audio-visual speech recognition using MPEG-4 compliant visual features," EURASIP J. Appl. Signal Processing, pp. 1213-1227, 2002.
- (2002) EURASIP J. Appl. Signal Processing , pp. 1213-1227
- Aleksic, P.S.¹ Williams, J.J.² Wu, Z.³ Katsaggelos, A.K.⁴

19
- 0036447870
- Audio-visual continuous speech recognition using MPEG-4 compliant visual feature
- Rochester, NY, Sept.
- _, "Audio-visual continuous speech recognition using MPEG-4 compliant visual feature," in Proc. Int. Conf. Image Processing, Rochester, NY, Sept. 2002, pp. 960-963.
- (2002) Proc. Int. Conf. Image Processing , pp. 960-963

20
- 0344044794
- Washington, DC: Gallaudet University
- L. E. Bernstein, Lipreading Corpus V-VI: Disc 3. Washington, DC: Gallaudet University, 1991.
- (1991) Lipreading Corpus V-VI: Disc 3
- Bernstein, L.E.¹

21
- 2542459982
- [Online]. Available
- (1990) Carnegie Melon University Pronunciation Dictionary. [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
- (1990) Carnegie Melon University Pronunciation Dictionary

22
- 0003922190
- New York: Wiley
- R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification. New York: Wiley, 2001.
- (2001) Pattern Classification
- Duda, R.O.¹ Hart, P.E.² Stork, D.G.³

23
- 0022228262
- Automatic lipreading to enhance speech recognition
- San Francisco, CA
- E. Petajan, "Automatic lipreading to enhance speech recognition," in Proc. IEEE Conf. Computer Vision and Pattern Recognition, San Francisco, CA, 1985, pp. 40-47.
- (1985) Proc. IEEE Conf. Computer Vision and Pattern Recognition , pp. 40-47
- Petajan, E.¹

24
- 0003544881
- New York: Springer-Verlag
- D. G. Stork and M. E. Hennecke, Eds., Speechreading by Man and Machine. New York: Springer-Verlag, 1996.
- (1996) Speechreading by Man and Machine
- Stork, D.G.¹ Hennecke, M.E.²

25
- 2542482407
- Baltimore, MD, Oct.
- C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, D. Vergyri, J. Sison, A. Mashari, and J. Zhou, Workshop Audio-Visual Speech Recognition, Final Report. Baltimore, MD, Oct. 2000.
- (2000) Workshop Audio-visual Speech Recognition, Final Report
- Neti, C.¹ Potamianos, G.² Luettin, J.³ Matthews, I.⁴ Glotin, H.⁵ Vergyri, D.⁶ Sison, J.⁷ Mashari, A.⁸ Zhou, J.⁹

26
- 4544290191
- Recent advances in the automatic recognition of audio-visual speech
- Sept.
- G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, "Recent advances in the automatic recognition of audio-visual speech," Proc. IEEE, vol. 91, pp. 1306-1326, Sept. 2003.
- (2003) Proc. IEEE , vol.91 , pp. 1306-1326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ Senior, A.W.⁵

27
- 0034853041
- Hierarchical discriminant features for audio-visual LVCSR
- G. Potamianos, J. Luettin, and C. Neti, "Hierarchical discriminant features for audio-visual LVCSR," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, 2001, pp. 165-168.
- (2001) Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing , vol.1 , pp. 165-168
- Potamianos, G.¹ Luettin, J.² Neti, C.³

28
- 85013597845
- Eigenlips' for robust speech recognition
- Adelaide, Australia
- C. Bregler and Y. Conig, "Eigenlips' for robust speech recognition," in Proc. Int. Conf. Acoustics, Speech and Signal Processing, Adelaide, Australia, 1994, pp. 669-672.
- (1994) Proc. Int. Conf. Acoustics, Speech and Signal Processing , pp. 669-672
- Bregler, C.¹ Conig, Y.²

29
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Mar.
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, pp. 141-151, Mar. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

30
- 0034842451
- Weighting schemes for audio-visual fusion in speech recognition
- H. Glotin, D. Vergyri, C. Neti, G. Potamianos, and J. Luettin, "Weighting schemes for audio-visual fusion in speech recognition," in Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing, vol. 1, 2001, pp. 165-168.
- (2001) Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing , vol.1 , pp. 165-168
- Glotin, H.¹ Vergyri, D.² Neti, C.³ Potamianos, G.⁴ Luettin, J.⁵

31
- 0003822743
- Cambridge, U.K.: Entropic Ltd.
- S. Young, D. Kershaw, J. Odell, D. Ollason, V. Valtchev, and P. Woodland, The HTK Book. Cambridge, U.K.: Entropic Ltd., 2002.
- (2002) The HTK Book
- Young, S.¹ Kershaw, D.² Odell, J.³ Ollason, D.⁴ Valtchev, V.⁵ Woodland, P.⁶

32
- 0345134263
- Speech-to-video synthesis using facial animation parameters
- Barcelona, Spain, Sept.
- P. S. Aleksic and A. K. Katsaggelos, "Speech-to-video synthesis using facial animation parameters," in Proc. Int. Conf. Image Processing, Barcelona, Spain, Sept. 2003, pp. 1-4.
- (2003) Proc. Int. Conf. Image Processing , pp. 1-4
- Aleksic, P.S.¹ Katsaggelos, A.K.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.