SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 17, Issue 3, 2009, Pages 469-477

Realistic visual Speech synthesis based on hybrid concatenation method

(3) Tao, Jianhua a Xin, Le a Yin, Panrong a

a INSTITUTE OF AUTOMATION (China)

Author keywords

Fused hidden markov model (HMM); Inversion; Speech driven facial animation; Unit concatenation; Visual speech synthesis

Indexed keywords

A-FRAMES; COMPUTING EFFICIENCY; FACIAL ANIMATION; FACIAL EXPRESSIONS; FUSED HIDDEN MARKOV MODEL (HMM); GAUSSIAN MIXTURE MODELS; HIGH QUALITY; HYBRID CONCATENATION; INVERSION; LOOSE SYNCHRONIZATIONS; MAPPING METHOD; REAL-TIME APPLICATION; RUNNING SPEED; SECOND LAYER; TIGHTLY-COUPLED; TWO LAYERS; UNIT CONCATENATION; UNIT SELECTION; VISUAL SPEECH SYNTHESIS; VITERBI SEARCH;

ANIMATION; MAPPING; OBJECT RECOGNITION; SPEECH SYNTHESIS; SYNTHESIS (CHEMICAL);

HIDDEN MARKOV MODELS;

EID: 70350437421 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2008.2011538 Document Type: Article

Times cited : (17)

References (35)

1
- 48149105519
- Dynamic audio-visual mapping using fused hidden markov model inversion method
- San Antonio, TX, 2007
- L. Xin, J. H. Tao, and T. N. Tan, "Dynamic audio-visual mapping using fused hidden Markov model inversion method, " in Proc. ICIP, San Antonio, TX, 2007, pp. 293-296.
- Proc. ICIP , pp. 293-296
- Xin, L.¹ Tao, J.H.² Tan, T.N.³

2
- 34547732498
- Speech driven face animation based on dynamic concatenation model
- J. H. Tao and P. R. Yin, "Speech driven face animation based on dynamic concatenation model, " J. Inf. Computat. Sci., vol. 4, no. 1, pp. 271-280, 2007.
- (2007) J. Inf. Computat. Sci. , vol.4 , Issue.1 , pp. 271-280
- Tao, J.H.¹ Yin, P.R.²

3
- 38049013313
- Expressive face animation synthesis based on dynamic mapping method
- ser. Lecture Notes in Computer Science, A. Paiva, R. Prada, and R. W. Picard, Eds. New York: Springer-Verlag
- P. R. Yin, L. Y. Zhao, L. X. Huang, and J. H. Tao, "Expressive face animation synthesis based on dynamic mapping method, " in Affective Computing and Intelligent Interaction, ser. Lecture Notes in Computer Science, A. Paiva, R. Prada, and R. W. Picard, Eds. New York: Springer-Verlag, 2007, pp. 1-11.
- (2007) Affective Computing and Intelligent Interaction , pp. 1-11
- Yin, P.R.¹ Zhao, L.Y.² Huang, L.X.³ Tao, J.H.⁴

4
- 1542303714
- A fused hidden markov model with application to bimodal speech processing
- Mar
- H. Pan, S. Levinson, T. S. Huang, and Z. P. Liang, "A fused hidden Markov model with application to bimodal speech processing, " IEEE Trans. Signal Process., vol. 52, no. 3, pp. 573-581, Mar. 2004.
- (2004) IEEE Trans. Signal Process. , vol.52 , Issue.3 , pp. 573-581
- Pan, H.¹ Levinson, S.² Huang, T.S.³ Liang, Z.P.⁴

5
- 24644432083
- Audio-visual affect recognition through multi-stream fusedHMMfor HCI
- Z. H. Zeng, J. L. Tu, B. Pianfetti, M. Liu, T. Zhang, Z. Q. Zhang, T. S. Huang, and S. Levinsion, "Audio-visual affect recognition through multi-stream fusedHMMfor HCI, " in Proc. CVPR, 2005, pp. 967-972.
- (2005) Proc. CVPR , pp. 967-972
- Zeng, Z.H.¹ Tu, J.L.² Pianfetti, B.³ Liu, M.⁴ Zhang, T.⁵ Zhang, Z.Q.⁶ Huang, T.S.⁷ Levinsion, S.⁸

6
- 0034517331
- Audio-visual unit selection for the synthesis of photo-realistic talking-heads
- E. Cosatto, G. Potamianos, and H. P. Graf, "Audio-visual unit selection for the synthesis of photo-realistic talking-heads, " in Proc. IEEE Int. Conf. Multimedia Expo (ICME), 2000, pp. 619-622.
- (2000) Proc. IEEE Int. Conf. Multimedia Expo (ICME) , pp. 619-622
- Cosatto, E.¹ Potamianos, G.² Graf, H.P.³

7
- 0030677313
- Video rewrite: Driving visual speech with audio
- C. Bregler, M. Covell, and M. Slaney, "Video rewrite: Driving visual speech with audio, " in Proc. ACM SIGGRAPH, 1997, pp. 353-360.
- (1997) Proc. ACM SIGGRAPH , pp. 353-360
- Bregler, C.¹ Covell, M.² Slaney, M.³

8
- 34047240820
- Speech animation using coupled hidden markov models
- Hong Kong, China
- L. Xie and Z. Liu, "Speech animation using coupled Hidden Markov Models, " in Proc. 18th Int. Conf. Pattern Recognition (ICPR), Hong Kong, China, 2006, pp. 1128-1131.
- (2006) Proc. 18th Int. Conf. Pattern Recognition (ICPR) , pp. 1128-1131
- Xie, L.¹ Liu, Z.²

9
- 85133709259
- Picture my voice: Audio to visual speech synthesis using artificial neural networks
- Santa Cruz, CA
- D. W. Massaro, J. Beskow, M. M. Cohen, C. L. Fry, and T. Rodriguez, "Picture my voice: Audio to visual speech synthesis using artificial neural networks, " in Proc. AVSP, Santa Cruz, CA, 1999, pp. 133-138.
- (1999) Proc. AVSP , pp. 133-138
- Massaro, D.W.¹ Beskow, J.² Cohen, M.M.³ Fry, C.L.⁴ Rodriguez, T.⁵

10
- 85009254391
- Miketalk: A talking facial display based on morphing visemes
- Philadelphia, PA
- T. Ezzat and T. Poggio, "MikeTalk: A talking facial display based on morphing visemes, " in Proc. Comput. Animation Conf., Philadelphia, PA, 1998, pp. 96-102.
- (1998) Proc. Comput. Animation Conf. , pp. 96-102
- Ezzat, T.¹ Poggio, T.²

11
- 0032179320
- Lip movement synthesis from speech based on hidden markov models
- E. Yamamoto, S. Nakamura, and K. Shikano, "Lip movement synthesis from speech based on Hidden Markov Models, " Speech Commun., vol. 26, pp. 105-115, 1998.
- (1998) Speech Commun. , vol.26 , pp. 105-115
- Yamamoto, E.¹ Nakamura, S.² Shikano, K.³

12
- 0036650837
- Real-time speech-driven face animation with expressions using neural networks
- P. Y. Hong, Z. Wen, and T. S. Huang, "Real-time speech-driven face animation with expressions using neural networks, " IEEE Trans. Neural Netw., vol. 13, no. 4, pp. 916-927, 2002.
- (2002) IEEE Trans. Neural Netw. , vol.13 , Issue.4 , pp. 916-927
- Hong, P.Y.¹ Wen, Z.² Huang, T.S.³

13
- 84937437186
- Voice puppetry
- M. Brand, "Voice puppetry, " in Proc. SIGGRAPH, 1999, pp. 21-28.
- (1999) Proc. SIGGRAPH , pp. 21-28
- Brand, M.¹

14
- 33646785065
- Dynamic mapping method based speech driven face animation system
- ser. Lecture Notes in Computer Science, J. Tao, T. Tie, and R. W. Picard, Eds. New York: Springer-Verlag
- P. R. Yin and J. H. Tao, "Dynamic mapping method based speech driven face animation system, " in Affective Computing and Intelligent Interaction, ser. Lecture Notes in Computer Science, J. Tao, T. Tie, and R. W. Picard, Eds. New York: Springer-Verlag, 2005.
- (2005) Affective Computing and Intelligent Interaction
- Yin, P.R.¹ Tao, J.H.²

15
- 0033879110
- Face and 2-D mesh animation in MPEG -4
- M. Tekalp and J. Ostermann, "Face and 2-D mesh animation in MPEG-4, " Signal Process.: Image Commun., vol. 15, pp. 387-421, 2000.
- (2000) Signal Process.: Image Commun. , vol.15 , pp. 387-421
- Tekalp, M.¹ Ostermann, J.²

16
- 4544227639
- A real-time cantonese text-to-audiovisual speech synthesizer
- ICASSP
- J. Q.Wang, K. H.Wong, P. A. Pheng, H. M. Meng, and T. T.Wong, "A real-time Cantonese text-to-audiovisual speech synthesizer, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. ICASSP, 2004, pp. 653-659.
- (2004) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 653-659
- Wang, J.Q.¹ Wong, K.H.² Pheng, P.A.³ Meng, H.M.⁴ Wong, T.T.⁵

17
- 10044251046
- Animating expressive faces across languages
- Dec
- A. Verma, L. V. Subramaniam, N. Rajput, C. Neti, and T. A. Faruquie, "Animating expressive faces across languages, " IEEE Trans. Multimedia, vol. 6, no. 6, pp. 791-800, Dec. 2004.
- (2004) IEEE Trans. Multimedia , vol.6 , Issue.6 , pp. 791-800
- Verma, A.¹ Subramaniam, L.V.² Rajput, N.³ Neti, C.⁴ Faruquie, T.A.⁵

18
- 14944376823
- Emotional chinese talking head system
- State College, PA
- J. H. Tao and T. H. Tan, "Emotional Chinese Talking Head System, " in Proc. ACM6th Int. Conf. Multimodal Interfaces (ICMI), State College, PA, 2004, pp. 273-280.
- (2004) Proc. ACM6th Int. Conf. Multimodal Interfaces (ICMI) , pp. 273-280
- Tao, J.H.¹ Tan, T.H.²

19
- 13144278330
- Speech-driven facial animation with realistic dynamics
- Feb
- R. Gutierrez-Osuna, P. K. Kakumanu, A. Esposito, O. N. Garcia, A. Bojorquez, J. L. Castillo, and I. Rudomin, "Speech-driven facial animation with realistic dynamics, " IEEE Trans. Multimedia, vol. 7, no. 1, pp. 33-42, Feb. 2005.
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.1 , pp. 33-42
- Gutierrez-Osuna, R.¹ Kakumanu, P.K.² Esposito, A.³ Garcia, O.N.⁴ Bojorquez, A.⁵ Castillo, J.L.⁶ Rudomin, I.⁷

20
- 70350498363
- Real-time lip synchronization based on hidden markov models
- Y. Huang, S. Lin, X. Ding, B. Guo, and H. Shum, "Real-time Lip synchronization based on Hidden Markov Models, " in Proc. ACCV, 2002, pp. 176-181.
- (2002) Proc. ACCV , pp. 176-181
- Huang, Y.¹ Lin, S.² Ding, X.³ Guo, B.⁴ Shum, H.⁵

21
- 0031997085
- Audio-to-visual conversion for multimedia communication
- Feb
- R. Rao, T. Chen, and R. M. Mersereau, "Audio-to-visual conversion for multimedia communication, " IEEE Trans. Ind. Electron., vol. 45, no. 1, pp. 15-22, Feb. 1998.
- (1998) IEEE Trans. Ind. Electron , vol.45 , Issue.1 , pp. 15-22
- Rao, R.¹ Chen, T.² Mersereau, R.M.³

22
- 16244385915
- Audio-visual mapping with cross-modal Hidden Markov models, 3
- Apr
- S. L. Fu, R. Gutierrez-Osuna, A. Esposito, P. K. Kakumanu, and O. N. Garcia, "Audio/visual mapping with cross-modal Hidden Markov models, " IEEE Trans. Multimedia, vol. 7, no. 3, pp. 243-252, Apr. 2005.
- (2005) IEEE Trans. Multimedia , vol.7 , pp. 243-252
- Fu, S.L.¹ Gutierrez-Osuna, R.² Esposito, A.³ Kakumanu, P.K.⁴ Garcia, O.N.⁵

23
- 33646752807
- Learning dynamic audio-visual mapping with input-output hidden markov models
- Apr
- Y. Li and H. Y. Shum, "Learning dynamic audio-visual mapping with input-output Hidden Markov Models, " IEEE Trans. Multimedia, vol. 8, no. 3, pp. 542-549, Apr. 2006.
- (2006) IEEE Trans. Multimedia , vol.8 , Issue.3 , pp. 542-549
- Li, Y.¹ Shum, H.Y.²

24
- 0030685285
- Coupled hidden markov models for complex action recognition
- M. Brand and N. Oliver, "Coupled hidden markov models for complex action recognition, " in Proc. Comput. Vis. Pattern Recognition, 1997, pp. 201-206.
- (1997) Proc. Comput. Vis. Pattern Recognition , pp. 201-206
- Brand, M.¹ Oliver, N.²

25
- 84890517975
- Least-square fitting of two 3-D point sets
- May
- K. S. Arun, T. S. Huang, and S. D. Blostein, "Least-square fitting of two 3-D point sets, " IEEE Trans. Pattern Anal. Mach. Intell., vol. PAMI-9, no. 5, pp. 698-700, May 1987.
- (1987) IEEE Trans. Pattern Anal. Mach. Intell. , vol.PAMI-9 , Issue.5 , pp. 698-700
- Arun, K.S.¹ Huang, T.S.² Blostein, S.D.³

26
- 0034792570
- Speech-driven cartoon animation with emotions
- Y. Li, F. Yu, Y. Q. Xu, E. Chang, and H. Y. Shum, "Speech-driven cartoon animation with emotions, " in Proc. 9th ACM Int. Conf. Multimedia, 2001, pp. 365-371.
- (2001) Proc. 9th ACM Int. Conf. Multimedia , pp. 365-371
- Li, Y.¹ Yu, F.² Xu, Y.Q.³ Chang, E.⁴ Shum, H.Y.⁵

27
- 41549121431
- Exploiting audio-visual correlation in coding of talking head sequences
- Melbourne, Australia, Mar
- R. Rao and T. Chen, "Exploiting audio-visual correlation in coding of talking head sequences, " in Proc. Picture Coding Symp., Melbourne, Australia, Mar. 1996, pp. 653-658.
- (1996) Proc. Picture Coding Symp. , pp. 653-658
- Rao, R.¹ Chen, T.²

28
- 0000286376
- Using dynamic timewarping to find patterns in time series
- D. Berndt and J. Clifford, "Using dynamic timewarping to find patterns in time series, " in Proc. KDD Workshop, 1994, pp. 359-370.
- (1994) Proc. KDD Workshop , pp. 359-370
- Berndt, D.¹ Clifford, J.²

29
- 12444277784
- Smoothing techniques via the bezier curve
- C. Kim, W. Kim, B. Park, C. Hong, and M. Jeong, "Smoothing techniques via the bezier curve, " Commun. Statist.-Theory Methods, vol. 28, no. 7, pp. 1577-1597, 1999.
- (1999) Commun. Statist.-Theory Methods , vol.28 , Issue.7 , pp. 1577-1597
- Kim, C.¹ Kim, W.² Park, B.³ Hong, C.⁴ Jeong, M.⁵

30
- 84905560807
- Voice conversion with smoothedgmm and map adaptation
- Y. Chen, M. Chu, E. Chang, J. Liu, and R. Liu, "Voice conversion with smoothedGMM and MAP adaptation, " in Proc. Eurospeech, 2003, pp. 2413-2416.
- (2003) Proc. Eurospeech , pp. 2413-2416
- Chen, Y.¹ Chu, M.² Chang, E.³ Liu, J.⁴ Liu, R.⁵

31
- 34047263010
- Prosody conversion from neutral speech to emotional speech
- Jul
- J. H. Tao, Y. G. Kang, and A. J. Li, "Prosody conversion from neutral speech to emotional speech, " IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1145-1154, Jul. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.4 , pp. 1145-1154
- Tao, J.H.¹ Kang, Y.G.² Li, A.J.³

32
- 85009159448
- Emotional space improves emotion recognition
- Denver, CO
- R. Tato, R. Santos, R. Kompe, and J. M. Pardo, "Emotional space improves emotion recognition, " in Proc. ICSLP, Denver, CO, 2002, pp. 2029-2032.
- (2002) Proc. ICSLP , pp. 2029-2032
- Tato, R.¹ Santos, R.² Kompe, R.³ Pardo, J.M.⁴

33
- 84983154011
- Perception of affect in speech-towards an automatic processing of paralinguistic information in spoken conversation
- Jeju, Korea
- N. Campbell, "Perception of affect in speech-Towards an automatic processing of paralinguistic information in spoken conversation, " in Proc. ICSLP, Jeju, Korea, 2004, pp. 881-884.
- (2004) Proc. ICSLP , pp. 881-884
- Campbell, N.¹

34
- 0004203240
- The em algorithm and extensions
- New York:Wiley
- G. McLachlan and T. Krishnan, "The EM algorithm and extensions, " in Wiley Series in Probability and Statistics. New York:Wiley, 1997.
- (1997) Wiley Series in Probability and Statistics
- Mclachlan, G.¹ Krishnan, T.²

35
- 33745906824
- Automatic 3D face modeling from video
- L. Xin, Q. Wang, J. H. Tao, X. Tang, T. Tan, and H. Shum, "Automatic 3D face modeling from video, " in Proc. ICCV, 2005, vol. 2, pp. 1193-1199.
- (2005) Proc. ICCV , vol.2 , pp. 1193-1199
- Xin, L.¹ Wang, Q.² Tao, J.H.³ Tang, X.⁴ Tan, T.⁵ Shum, H.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.