SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Pattern Recognition

Volumn 40, Issue 8, 2007, Pages 2325-2340

A coupled HMM approach to video-realistic speech animation

(2) Xie, Lei a Liu, Zhi Qiang a

a CITY UNIVERSITY OF HONG KONG (Hong Kong)

Author keywords

Audio to visual conversion; Coupled hidden Markov models (CHMMs); Facial animation; Speech animation; Talking faces

Indexed keywords

FACE RECOGNITION; HIDDEN MARKOV MODELS; OPTIMIZATION; PARAMETER ESTIMATION; SPEECH RECOGNITION; VIDEO RECORDING;

AUDIO TO VISUAL CONVERSION; COUPLED HIDDEN MARKOV MODELS (CHMM); FACIAL ANIMATION; SPEECH ANIMATION; TALKING FACES;

ANIMATION;

EID: 34147186624 PISSN: 00313203 EISSN: None Source Type: Journal
DOI: 10.1016/j.patcog.2006.12.001 Document Type: Article

Times cited : (77)

References (44)

1
- 0031187171
- Speech recognition by machines and humans
- Lippman R. Speech recognition by machines and humans. Speech Commun. 22 1 (1997) 1-15
- (1997) Speech Commun. , vol.22 , Issue.1 , pp. 1-15
- Lippman, R.¹

2
- 10044221981
- J. Ostermann, A. Weissenfeld, Talking faces-technologies and applications, in: Proceedings of ICPR'04, vol. 3, 2004, pp. 826-833.

3
- 0001514782
- Modeling coarticulation in synthetic visual speech
- Magnenat-Thalmann M., and Thalmann D. (Eds), Springer, Tokyo
- Cohen M.M., and Massaro D.W. Modeling coarticulation in synthetic visual speech. In: Magnenat-Thalmann M., and Thalmann D. (Eds). Models and Techniques in Computer Animation (1993), Springer, Tokyo 139-156
- (1993) Models and Techniques in Computer Animation , pp. 139-156
- Cohen, M.M.¹ Massaro, D.W.²

4
- 79952193244
- F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, D.H. Salesin, Synthesizing realistic facial expressions from photographs, in: Proceedings of ACM SIGGRAPH'98, vol. 3, 1998, pp. 75-84.

5
- 0035501711
- Synthesizing realistic facial animations using energy minimization for model-based coding
- Yin L., Basu A., Bernogger S., and Pinz A. Synthesizing realistic facial animations using energy minimization for model-based coding. Pattern Recognition 34 11 (2001) 2201-2213
- (2001) Pattern Recognition , vol.34 , Issue.11 , pp. 2201-2213
- Yin, L.¹ Basu, A.² Bernogger, S.³ Pinz, A.⁴

6
- 10044281988
- Lifelike talking faces for interactive services
- Cosatto E., Ostermann J., Graf H.P., and Schroeter J. Lifelike talking faces for interactive services. Proc. IEEE 91 9 (2003) 1406-1428
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1406-1428
- Cosatto, E.¹ Ostermann, J.² Graf, H.P.³ Schroeter, J.⁴

7
- 0030677313
- C. Bregler, M. Covell, M. Slaney, Video rewrite: driving visual speech with audio, in: Proceedings of ACM SIGGRAPH'97, 1997.

8
- 0036989560
- T. Ezzat, G. Geiger, T. Poggio, Trainable videorealistic speech animation, in: Proceedings of ACM SIGGRAPH, 2002, pp. 388-397.

9
- 84872004031
- E. Cosatto, H. Graf, Sample-based synthesis of photo-realistic talking heads, in: Proceedings of IEEE Computer Animation, 1998, pp. 103-110.

10
- 0034271782
- Photo-realistic talking heads from image samples
- Cosatto E., and Graf H. Photo-realistic talking heads from image samples. IEEE Trans. Multimedia 2 3 (2000) 152-163
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 152-163
- Cosatto, E.¹ Graf, H.²

11
- 0036650837
- Real-time speech-driven face animation with expressions using neural networks
- Hong P., Wen Z., and Huang T.S. Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Networks 13 4 (2002) 916-927
- (2002) IEEE Trans. Neural Networks , vol.13 , Issue.4 , pp. 916-927
- Hong, P.¹ Wen, Z.² Huang, T.S.³

12
- 85017188218
- F.J. Huang, T. Chen, Real-time lip-synch face animation driven by human voice, in: IEEE Second Workshop on Multimedia Signal Processing, 1998, pp. 352-357.

13
- 0031997085
- Audio-to-visual conversion for multimedia communication
- Rao R.R., Chen T., and Mersereau R.M. Audio-to-visual conversion for multimedia communication. IEEE Trans. Ind. Electron. 45 1 (1998) 15-22
- (1998) IEEE Trans. Ind. Electron. , vol.45 , Issue.1 , pp. 15-22
- Rao, R.R.¹ Chen, T.² Mersereau, R.M.³

14
- 0032179320
- Lip movement synthesis from speech based on Hidden Markov Models
- Yamamoto E., Nakamura S., and Shikano K. Lip movement synthesis from speech based on Hidden Markov Models. Speech Commun. 26 1-2 (1998) 105-115
- (1998) Speech Commun. , vol.26 , Issue.1-2 , pp. 105-115
- Yamamoto, E.¹ Nakamura, S.² Shikano, K.³

15
- 84937437186
- M. Brand, Voice puppetry, in: SIGGRAPH'99, Los Angeles, 1999, pp. 21-28.

16
- 34147127210
- K. Choi, J. N. Hwang, Baum-Welch hidden Markov model inversion for reliable audio-to-visual conversion, in: Proceedings of the IEEE 3rd Workshop Multimedia Signal Processing, 1999, pp. 175-180.

17
- 0035426641
- Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system
- Choi K., Luo Y., and Hwang J.N. Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. J. VLSI Signal Process. 29 1-2 (2001) 51-61
- (2001) J. VLSI Signal Process. , vol.29 , Issue.1-2 , pp. 51-61
- Choi, K.¹ Luo, Y.² Hwang, J.N.³

18
- 84919327072
- S. Lee, D. Yook, Audio-to-visual conversion using hidden Markov models, in: M. Ishizuka, S. A. (Eds.), Proceedings of PRICAI2002, Lecture Notes in Artificial Intelligence, Springer, Berlin, 2002, pp. 563-570.

19
- 33845277490
- L. Xie, D.-M. Jiang, I. Ravyse, W. Verhelst, H. Sahli, V. Slavova, R.-C. Zhao, Context dependent viseme models for voice driven animation, in: The 4th EURASIP Conference on Video/Image Processing and Multimedia Communications, vol. 2, 2003, pp. 649-654.

20
- 2542499812
- Speech-to-video synthesis using MPEG-4 compliant visual features
- Aleksic P.S., and Katsaggelos A.K. Speech-to-video synthesis using MPEG-4 compliant visual features. IEEE Trans. Circuits Systems Video Technol. 14 5 (2004) 682-692
- (2004) IEEE Trans. Circuits Systems Video Technol. , vol.14 , Issue.5 , pp. 682-692
- Aleksic, P.S.¹ Katsaggelos, A.K.²

21
- 0024610919
- A tutorial on hidden Markov models and selected applications in speech animation
- Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech animation. Proc. IEEE 77 2 (1989) 257-286
- (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.R.¹

22
- 16244385915
- Audio/visual mapping with cross-modal hidden Markov models
- Fu S., Gutierrez-Osuna R., Esposito A., Kakumanu K.P., and Garcia O.N. Audio/visual mapping with cross-modal hidden Markov models. IEEE Trans. Multimedia 7 2 (2005) 243-251
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.2 , pp. 243-251
- Fu, S.¹ Gutierrez-Osuna, R.² Esposito, A.³ Kakumanu, K.P.⁴ Garcia, O.N.⁵

23
- 0028996864
- S.Y. Moon, J.N. Hwang, Noisy speech recognition using robust inversion of hidden Markov models, in: Proceedings of ICASSP'95, 1995, pp. 145-148.

24
- 85009254391
- T. Ezzat, T. Poggio, Miketalk: A talking facial display based on morphing visemes, in: Proceedings of the Computer Animation Conference, 1998, pp. 96-102.

25
- 34147133577
- D.G. Stork, M.E. Hennecke (Eds.), Speechreading by Humans and Machines, Springer, Berlin, 1996.

26
- 34147108960
- L. Xie, Research on key issues of audio visual speech recognition, Ph.D. Thesis, Northwestern Polytechnical University, September 2004.

27
- 34147143176
- K.W. Grant, S. Greenberg, Speech intelligibility derived from asynchronous processing of auditory-visual information, in: Proceedings of the International Conference on Auditory-Visual Speech Processing, Aalborg, Denmark, 2001, pp. 132-37.

28
- 0029270677
- Converting speech into lip movements: a multimedia telephone for hard hearing people
- Lavagetto F. Converting speech into lip movements: a multimedia telephone for hard hearing people. IEEE Trans. Rehabil. Eng. 3 (1995) 90-102
- (1995) IEEE Trans. Rehabil. Eng. , vol.3 , pp. 90-102
- Lavagetto, F.¹

29
- 0022019614
- Intermodal timing relations and audio-visual speech recognition
- McGrath M., and SummerLeld Q. Intermodal timing relations and audio-visual speech recognition. J. Acoust. Soc. Am. 77 (1985) 678-685
- (1985) J. Acoust. Soc. Am. , vol.77 , pp. 678-685
- McGrath, M.¹ SummerLeld, Q.²

30
- 4544290191
- Recent advances in the automatic recognition of audio-visual speech
- Potamianos G., Neti C., Gravier G., Garg A., and Senior A.W. Recent advances in the automatic recognition of audio-visual speech. Proc. IEEE 91 9 (2003) 1306-1326
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ Senior, A.W.⁵

31
- 0003448310
- Springer, Berlin
- Jensen F.V. Bayesian Networks and Decision Graphs (2001), Springer, Berlin
- (2001) Bayesian Networks and Decision Graphs
- Jensen, F.V.¹

32
- 34147156067
- K. Murphy, Dynamic Bayesian networks: representation, inference and learning, Ph.D. Thesis, University of California, Berkeley, 2002.

33
- 0030355935
- H. Bourlard, S. Dupont, A new ASR approach based on independent processing and recombination of partial frequency bands, in: Proceedings of the International Conference on Spoken Language Processing, Philadelphia, 1996, pp. 426-429.

34
- 34147132541
- B. Logan, P.J. Moreno, Factorial hidden Markov models for speech recognition: preliminary experiments, Technical Reports of Cambridge Research Lab (CRL-97-7).

35
- 0030685285
- M. Brand, N. Oliver, A. Pentland, Coupled hidden Markov models for complex action recognition, in: IEEE International Conference on Computer Vision and Pattern Recognition, 1997, pp. 994-999.

36
- 0036297183
- A.V. Nefian, L. Liang, X. Pi, X. Liu, C. Mao, K. Murphy, A coupled HMM for audio-visual speech recognition, in: Proceedings of ICASSP'02, 2002.

37
- 10044240183
- F. Pernkopf, 3D surface inspection using coupled HMMs, in: Proceedings of 17th ICPR'04, 2004.

38
- 33646806777
- S. Ananthakrishnan, S.S. Narayanan, An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model, in: Proceedings of ICASSP'05, 2005.

39
- 34147130506
- L. Xie, Z. Ye, The JEWEL audio visual dataset for facial animation, URL 〈http://www.cityu.edu.hk/rcmt/mouth-synching/jewel.htm〉.

40
- 6344258662
- L. Xie, X.-L. Cai, R.-C. Zhao, A robust hierarchical lip tracking approach for lipreading and audio visual speech recognition, in: The 3rd IEEE International Conference on Machine Learning and Cybernetics, vol. 6, Shanghai, China, 2004, pp. 3620-3624.

41
- 0002629270
- Maximum likelihood from incomplete data via the EM algorithm
- Dempster A., Laird A.N., and Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. (Ser. B) 39 (1977) 89-111
- (1977) J. R. Statist. Soc. (Ser. B) , vol.39 , pp. 89-111
- Dempster, A.¹ Laird, A.N.² Rubin, D.³

42
- 34147163731
- S. Young, G. Evermann, D. Kershaw, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (Version 3.2), Cambirdge University Engineering Department, Cambridge, 2002, URL 〈http://htk.eng.cam.ac.uk/〉.

43
- 0034270644
- Audio-visual speech modelling for continuous speech recognition
- Dupont S., and Luettin J. Audio-visual speech modelling for continuous speech recognition. IEEE Trans. Multimedia 2 3 (2000) 141-151
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

44
- 4644303413
- Poisson image editing
- Pèrez P., Gangnet M., and Blake A. Poisson image editing. ACM Trans. Graphics (SIGGRAPH) 22 3 (2003) 313-318
- (2003) ACM Trans. Graphics (SIGGRAPH) , vol.22 , Issue.3 , pp. 313-318
- Pèrez, P.¹ Gangnet, M.² Blake, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.