메뉴 건너뛰기




Volumn 40, Issue 8, 2007, Pages 2325-2340

A coupled HMM approach to video-realistic speech animation

Author keywords

Audio to visual conversion; Coupled hidden Markov models (CHMMs); Facial animation; Speech animation; Talking faces

Indexed keywords

FACE RECOGNITION; HIDDEN MARKOV MODELS; OPTIMIZATION; PARAMETER ESTIMATION; SPEECH RECOGNITION; VIDEO RECORDING;

EID: 34147186624     PISSN: 00313203     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.patcog.2006.12.001     Document Type: Article
Times cited : (77)

References (44)
  • 1
    • 0031187171 scopus 로고    scopus 로고
    • Speech recognition by machines and humans
    • Lippman R. Speech recognition by machines and humans. Speech Commun. 22 1 (1997) 1-15
    • (1997) Speech Commun. , vol.22 , Issue.1 , pp. 1-15
    • Lippman, R.1
  • 2
    • 10044221981 scopus 로고    scopus 로고
    • J. Ostermann, A. Weissenfeld, Talking faces-technologies and applications, in: Proceedings of ICPR'04, vol. 3, 2004, pp. 826-833.
  • 3
    • 0001514782 scopus 로고
    • Modeling coarticulation in synthetic visual speech
    • Magnenat-Thalmann M., and Thalmann D. (Eds), Springer, Tokyo
    • Cohen M.M., and Massaro D.W. Modeling coarticulation in synthetic visual speech. In: Magnenat-Thalmann M., and Thalmann D. (Eds). Models and Techniques in Computer Animation (1993), Springer, Tokyo 139-156
    • (1993) Models and Techniques in Computer Animation , pp. 139-156
    • Cohen, M.M.1    Massaro, D.W.2
  • 4
    • 79952193244 scopus 로고    scopus 로고
    • F. Pighin, J. Hecker, D. Lischinski, R. Szeliski, D.H. Salesin, Synthesizing realistic facial expressions from photographs, in: Proceedings of ACM SIGGRAPH'98, vol. 3, 1998, pp. 75-84.
  • 5
    • 0035501711 scopus 로고    scopus 로고
    • Synthesizing realistic facial animations using energy minimization for model-based coding
    • Yin L., Basu A., Bernogger S., and Pinz A. Synthesizing realistic facial animations using energy minimization for model-based coding. Pattern Recognition 34 11 (2001) 2201-2213
    • (2001) Pattern Recognition , vol.34 , Issue.11 , pp. 2201-2213
    • Yin, L.1    Basu, A.2    Bernogger, S.3    Pinz, A.4
  • 6
    • 10044281988 scopus 로고    scopus 로고
    • Lifelike talking faces for interactive services
    • Cosatto E., Ostermann J., Graf H.P., and Schroeter J. Lifelike talking faces for interactive services. Proc. IEEE 91 9 (2003) 1406-1428
    • (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1406-1428
    • Cosatto, E.1    Ostermann, J.2    Graf, H.P.3    Schroeter, J.4
  • 7
    • 0030677313 scopus 로고    scopus 로고
    • C. Bregler, M. Covell, M. Slaney, Video rewrite: driving visual speech with audio, in: Proceedings of ACM SIGGRAPH'97, 1997.
  • 8
    • 0036989560 scopus 로고    scopus 로고
    • T. Ezzat, G. Geiger, T. Poggio, Trainable videorealistic speech animation, in: Proceedings of ACM SIGGRAPH, 2002, pp. 388-397.
  • 9
    • 84872004031 scopus 로고    scopus 로고
    • E. Cosatto, H. Graf, Sample-based synthesis of photo-realistic talking heads, in: Proceedings of IEEE Computer Animation, 1998, pp. 103-110.
  • 10
    • 0034271782 scopus 로고    scopus 로고
    • Photo-realistic talking heads from image samples
    • Cosatto E., and Graf H. Photo-realistic talking heads from image samples. IEEE Trans. Multimedia 2 3 (2000) 152-163
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 152-163
    • Cosatto, E.1    Graf, H.2
  • 11
    • 0036650837 scopus 로고    scopus 로고
    • Real-time speech-driven face animation with expressions using neural networks
    • Hong P., Wen Z., and Huang T.S. Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Networks 13 4 (2002) 916-927
    • (2002) IEEE Trans. Neural Networks , vol.13 , Issue.4 , pp. 916-927
    • Hong, P.1    Wen, Z.2    Huang, T.S.3
  • 12
    • 85017188218 scopus 로고    scopus 로고
    • F.J. Huang, T. Chen, Real-time lip-synch face animation driven by human voice, in: IEEE Second Workshop on Multimedia Signal Processing, 1998, pp. 352-357.
  • 13
    • 0031997085 scopus 로고    scopus 로고
    • Audio-to-visual conversion for multimedia communication
    • Rao R.R., Chen T., and Mersereau R.M. Audio-to-visual conversion for multimedia communication. IEEE Trans. Ind. Electron. 45 1 (1998) 15-22
    • (1998) IEEE Trans. Ind. Electron. , vol.45 , Issue.1 , pp. 15-22
    • Rao, R.R.1    Chen, T.2    Mersereau, R.M.3
  • 14
    • 0032179320 scopus 로고    scopus 로고
    • Lip movement synthesis from speech based on Hidden Markov Models
    • Yamamoto E., Nakamura S., and Shikano K. Lip movement synthesis from speech based on Hidden Markov Models. Speech Commun. 26 1-2 (1998) 105-115
    • (1998) Speech Commun. , vol.26 , Issue.1-2 , pp. 105-115
    • Yamamoto, E.1    Nakamura, S.2    Shikano, K.3
  • 15
    • 84937437186 scopus 로고    scopus 로고
    • M. Brand, Voice puppetry, in: SIGGRAPH'99, Los Angeles, 1999, pp. 21-28.
  • 16
    • 34147127210 scopus 로고    scopus 로고
    • K. Choi, J. N. Hwang, Baum-Welch hidden Markov model inversion for reliable audio-to-visual conversion, in: Proceedings of the IEEE 3rd Workshop Multimedia Signal Processing, 1999, pp. 175-180.
  • 17
    • 0035426641 scopus 로고    scopus 로고
    • Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system
    • Choi K., Luo Y., and Hwang J.N. Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system. J. VLSI Signal Process. 29 1-2 (2001) 51-61
    • (2001) J. VLSI Signal Process. , vol.29 , Issue.1-2 , pp. 51-61
    • Choi, K.1    Luo, Y.2    Hwang, J.N.3
  • 18
    • 84919327072 scopus 로고    scopus 로고
    • S. Lee, D. Yook, Audio-to-visual conversion using hidden Markov models, in: M. Ishizuka, S. A. (Eds.), Proceedings of PRICAI2002, Lecture Notes in Artificial Intelligence, Springer, Berlin, 2002, pp. 563-570.
  • 19
    • 33845277490 scopus 로고    scopus 로고
    • L. Xie, D.-M. Jiang, I. Ravyse, W. Verhelst, H. Sahli, V. Slavova, R.-C. Zhao, Context dependent viseme models for voice driven animation, in: The 4th EURASIP Conference on Video/Image Processing and Multimedia Communications, vol. 2, 2003, pp. 649-654.
  • 21
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech animation
    • Rabiner L.R. A tutorial on hidden Markov models and selected applications in speech animation. Proc. IEEE 77 2 (1989) 257-286
    • (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.R.1
  • 23
    • 0028996864 scopus 로고    scopus 로고
    • S.Y. Moon, J.N. Hwang, Noisy speech recognition using robust inversion of hidden Markov models, in: Proceedings of ICASSP'95, 1995, pp. 145-148.
  • 24
    • 85009254391 scopus 로고    scopus 로고
    • T. Ezzat, T. Poggio, Miketalk: A talking facial display based on morphing visemes, in: Proceedings of the Computer Animation Conference, 1998, pp. 96-102.
  • 25
    • 34147133577 scopus 로고    scopus 로고
    • D.G. Stork, M.E. Hennecke (Eds.), Speechreading by Humans and Machines, Springer, Berlin, 1996.
  • 26
    • 34147108960 scopus 로고    scopus 로고
    • L. Xie, Research on key issues of audio visual speech recognition, Ph.D. Thesis, Northwestern Polytechnical University, September 2004.
  • 27
    • 34147143176 scopus 로고    scopus 로고
    • K.W. Grant, S. Greenberg, Speech intelligibility derived from asynchronous processing of auditory-visual information, in: Proceedings of the International Conference on Auditory-Visual Speech Processing, Aalborg, Denmark, 2001, pp. 132-37.
  • 28
    • 0029270677 scopus 로고
    • Converting speech into lip movements: a multimedia telephone for hard hearing people
    • Lavagetto F. Converting speech into lip movements: a multimedia telephone for hard hearing people. IEEE Trans. Rehabil. Eng. 3 (1995) 90-102
    • (1995) IEEE Trans. Rehabil. Eng. , vol.3 , pp. 90-102
    • Lavagetto, F.1
  • 29
    • 0022019614 scopus 로고
    • Intermodal timing relations and audio-visual speech recognition
    • McGrath M., and SummerLeld Q. Intermodal timing relations and audio-visual speech recognition. J. Acoust. Soc. Am. 77 (1985) 678-685
    • (1985) J. Acoust. Soc. Am. , vol.77 , pp. 678-685
    • McGrath, M.1    SummerLeld, Q.2
  • 30
    • 4544290191 scopus 로고    scopus 로고
    • Recent advances in the automatic recognition of audio-visual speech
    • Potamianos G., Neti C., Gravier G., Garg A., and Senior A.W. Recent advances in the automatic recognition of audio-visual speech. Proc. IEEE 91 9 (2003) 1306-1326
    • (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
    • Potamianos, G.1    Neti, C.2    Gravier, G.3    Garg, A.4    Senior, A.W.5
  • 32
    • 34147156067 scopus 로고    scopus 로고
    • K. Murphy, Dynamic Bayesian networks: representation, inference and learning, Ph.D. Thesis, University of California, Berkeley, 2002.
  • 33
    • 0030355935 scopus 로고    scopus 로고
    • H. Bourlard, S. Dupont, A new ASR approach based on independent processing and recombination of partial frequency bands, in: Proceedings of the International Conference on Spoken Language Processing, Philadelphia, 1996, pp. 426-429.
  • 34
    • 34147132541 scopus 로고    scopus 로고
    • B. Logan, P.J. Moreno, Factorial hidden Markov models for speech recognition: preliminary experiments, Technical Reports of Cambridge Research Lab (CRL-97-7).
  • 35
    • 0030685285 scopus 로고    scopus 로고
    • M. Brand, N. Oliver, A. Pentland, Coupled hidden Markov models for complex action recognition, in: IEEE International Conference on Computer Vision and Pattern Recognition, 1997, pp. 994-999.
  • 36
    • 0036297183 scopus 로고    scopus 로고
    • A.V. Nefian, L. Liang, X. Pi, X. Liu, C. Mao, K. Murphy, A coupled HMM for audio-visual speech recognition, in: Proceedings of ICASSP'02, 2002.
  • 37
    • 10044240183 scopus 로고    scopus 로고
    • F. Pernkopf, 3D surface inspection using coupled HMMs, in: Proceedings of 17th ICPR'04, 2004.
  • 38
    • 33646806777 scopus 로고    scopus 로고
    • S. Ananthakrishnan, S.S. Narayanan, An automatic prosody recognizer using a coupled multi-stream acoustic model and a syntactic-prosodic language model, in: Proceedings of ICASSP'05, 2005.
  • 39
    • 34147130506 scopus 로고    scopus 로고
    • L. Xie, Z. Ye, The JEWEL audio visual dataset for facial animation, URL 〈http://www.cityu.edu.hk/rcmt/mouth-synching/jewel.htm〉.
  • 40
    • 6344258662 scopus 로고    scopus 로고
    • L. Xie, X.-L. Cai, R.-C. Zhao, A robust hierarchical lip tracking approach for lipreading and audio visual speech recognition, in: The 3rd IEEE International Conference on Machine Learning and Cybernetics, vol. 6, Shanghai, China, 2004, pp. 3620-3624.
  • 41
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the EM algorithm
    • Dempster A., Laird A.N., and Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J. R. Statist. Soc. (Ser. B) 39 (1977) 89-111
    • (1977) J. R. Statist. Soc. (Ser. B) , vol.39 , pp. 89-111
    • Dempster, A.1    Laird, A.N.2    Rubin, D.3
  • 42
    • 34147163731 scopus 로고    scopus 로고
    • S. Young, G. Evermann, D. Kershaw, J. Odell, D. Ollason, D. Povey, V. Valtchev, P. Woodland, The HTK Book (Version 3.2), Cambirdge University Engineering Department, Cambridge, 2002, URL 〈http://htk.eng.cam.ac.uk/〉.
  • 43
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modelling for continuous speech recognition
    • Dupont S., and Luettin J. Audio-visual speech modelling for continuous speech recognition. IEEE Trans. Multimedia 2 3 (2000) 141-151
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.