SCOPUS 정보 검색 플랫폼

7th ISCA Workshop on Speech Synthesis, SSW 2010

Volumn , Issue , 2010, Pages 217-222

Photo-Real Lips Synthesis with Trajectory-Guided Sample Selection

(4) Wang, Lijuan a Qian, Xiaojun b Han, Wei c Soong, Frank K a

a MICROSOFT RESEARCH ASIA (China)

b CHINESE UNIVERSITY OF HONG KONG (Hong Kong)

c SHANGHAI JIAO TONG UNIVERSITY (China)

Author keywords

photo real; talking head; trajectory guided; visual speech synthesis

Indexed keywords

IMAGE PROCESSING; MAXIMUM LIKELIHOOD; SPEECH COMMUNICATION; SPEECH SYNTHESIS; TRAJECTORIES;

AUDIO-VISUAL DATABASE; HIDDEN-MARKOV MODELS; LIP MOVEMENTS; PHOTO-REAL; REAL IMAGES; SAMPLES SELECTION; SPEECH SIGNALS; TALKING HEADS; TRAJECTORY-GUIDED; VISUAL SPEECH SYNTHESIS;

HIDDEN MARKOV MODELS;

EID: 84996687897 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (8)

References (31)

1
- 0034271782
- Photo-realistic talking heads from image samples
- E. Cosatto and H.P. Graf, “Photo-realistic talking heads from image samples”, IEEE Trans. Multimedia, 2000, vol. 2, no. 3, pp. 152-163.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 152-163
- Cosatto, E.¹ Graf, H.P.²

2
- 0030677313
- Video Rewrite: Driving Visual Speech with Audio
- Los Angeles, CA
- C. Bregler, M. Covell, M. Slaney, “Video Rewrite: Driving Visual Speech with Audio,” In Proc. ACM SIGGRAPH 97, Los Angeles, CA, 1997, pp. 353-360.
- (1997) Proc. ACM SIGGRAPH 97 , pp. 353-360
- Bregler, C.¹ Covell, M.² Slaney, M.³

3
- 0036289950
- Triphone based unit selection for concatenative visual speech synthesis
- F. Huang, E. Cosatto, H.P. Graf, “Triphone based unit selection for concatenative visual speech synthesis,” Proc. ICASSP 2002. Vol. 2, 2002 pp.2037-2040.
- (2002) Proc. ICASSP 2002 , vol.2 , pp. 2037-2040
- Huang, F.¹ Cosatto, E.² Graf, H.P.³

4
- 0036989560
- Trainable video realistic speech animation
- San Antonio, Texas
- T. Ezzat, G. Geiger, and T. Poggio, “Trainable video realistic speech animation,” Proc. ACM SIGGRAPH2002, San Antonio, Texas, 2002, pp. 388-398.
- (2002) Proc. ACM SIGGRAPH2002 , pp. 388-398
- Ezzat, T.¹ Geiger, G.² Poggio, T.³

5
- 57949116211
- Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis
- The Netherlands
- W. Mattheyses, L. Latacz, W. Verhelst, H. Sahii, “Multimodal Unit Selection for 2D Audiovisual Text-to-Speech Synthesis,” Proc. MLMI 2008, The Netherlands, 2008, pp. 125-136.
- (2008) Proc. MLMI 2008 , pp. 125-136
- Mattheyses, W.¹ Latacz, L.² Verhelst, W.³ Sahii, H.⁴

6
- 84867227937
- Realistic Facial Animation System for Interactive Services
- Brisbane, Australia, Sept
- K. Liu, J. Ostermann, “Realistic Facial Animation System for Interactive Services,” Proc. Interspeech2008, Brisbane, Australia, Sept. 2008, pp.2330-2333.
- (2008) Proc. Interspeech2008 , pp. 2330-2333
- Liu, K.¹ Ostermann, J.²

7
- 79952258981
- etc
- K. Tokuda, H. Zen, etc., “The HMM-based speech synthesis system (HTS),” http://hts.ics.nitech.ac.jp/.
- The HMM-based speech synthesis system (HTS)
- Tokuda, K.¹ Zen, H.²

8
- 85009089413
- HMM-based Text-To-Audio-Visual Speech Synthesis
- S. Sako, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “HMM-based Text-To-Audio-Visual Speech Synthesis,” ICSLP 2000.
- (2000) ICSLP
- Sako, S.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

9
- 34047240820
- Speech Animation Using Coupled Hidden Markov Models
- August
- L. Xie, Z.Q. Liu, “Speech Animation Using Coupled Hidden Markov Models,” Pro. ICPR'06, August 2006, pp. 1128-1131.
- (2006) Pro. ICPR'06 , pp. 1128-1131
- Xie, L.¹ Liu, Z.Q.²

10
- 34547503417
- HMM-based unit selection using frame sized speech segments
- Sep
- Z.H. Ling and R.H. Wang, “HMM-based unit selection using frame sized speech segments,” Proc. Interspeech 2006, Sep. 2006, pp. 2034-2037.
- (2006) Proc. Interspeech 2006 , pp. 2034-2037
- Ling, Z.H.¹ Wang, R.H.²

11
- 78049399368
- Rich-Context Unit Selection (RUS) Approach to High Quality TTS
- March
- Z.J. Yan, Y. Qian, F. Soong, “Rich-Context Unit Selection (RUS) Approach to High Quality TTS,” Proc. ICASSP 2010, March 2010, pp.4798-4801.
- (2010) Proc. ICASSP 2010 , pp. 4798-4801
- Yan, Z.J.¹ Qian, Y.² Soong, F.³

12
- 84867222285
- LIPS2008: Visual Speech Synthesis Challenge
- Brisbane, Australia, Sept
- B. Theobald, S. Fagel, G. Bailly, and F. Elisei, “LIPS2008: Visual Speech Synthesis Challenge,” Proc. Interspeech2008, Brisbane, Australia, Sept. 2008, pp.2310-2313.
- (2008) Proc. Interspeech2008 , pp. 2310-2313
- Theobald, B.¹ Fagel, S.² Bailly, G.³ Elisei, F.⁴

13
- 85032752352
- Audiovisual speech processing
- Jan
- T. Chen, “Audiovisual speech processing,” Signal Processing Magazine, IEEE Vol.18, Issue 1, Jan. 2001, pp.9-21.
- (2001) Signal Processing Magazine, IEEE , vol.18 , Issue.1 , pp. 9-21
- Chen, T.¹

14
- 17444408556
- Creating speech-synchronized animation
- May-June
- S.A. King, R.E. Parent,”Creating speech-synchronized animation,” Visualization and Computer Graphics, IEEE Transactions on Vol. 11, Issue 3, May-June 2005, pp.341-352.
- (2005) Visualization and Computer Graphics, IEEE Transactions on , vol.11 , Issue.3 , pp. 341-352
- King, S.A.¹ Parent, R.E.²

15
- 84872004031
- Sample-based synthesis of photo-realistic talking heads
- E. Cosatto and H.P. Graf, “Sample-based synthesis of photo-realistic talking heads,” Proc. IEEE Computer Animation, pp. 103-110, 1998.
- (1998) Proc. IEEE Computer Animation , pp. 103-110
- Cosatto, E.¹ Graf, H.P.²

16
- 85009254391
- Miketalk: A talking facial display based on morphing visemes
- June
- T. Ezzat, T. Poggio, “Miketalk: A talking facial display based on morphing visemes,” Proc. Computer Animation, June 1998, pp. 96-102.
- (1998) Proc. Computer Animation , pp. 96-102
- Ezzat, T.¹ Poggio, T.²

17
- 10444256499
- Near videorealistic synthetic talking faces: implementation and evaluation
- B.J. Theobald, J.A. Bangham, I.A. Matthews, G.C. Cawley, “Near videorealistic synthetic talking faces: implementation and evaluation,” Speech Communication 2004, Vol. 44, pp.127-140.
- (2004) Speech Communication , vol.44 , pp. 127-140
- Theobald, B.J.¹ Bangham, J.A.² Matthews, I.A.³ Cawley, G.C.⁴

18
- 33947683441
- Parameterization of Mouth Images by LLE and PCA for Image-Based Facial Animation
- May
- K. Liu, A.Weissenfeld, J. Ostermann, “Parameterization of Mouth Images by LLE and PCA for Image-Based Facial Animation,” Proc. ICASSP 2006, Vol. V, May 2006, pp.461-464.
- (2006) Proc. ICASSP 2006 , vol.V , pp. 461-464
- Liu, K.¹ Weissenfeld, A.² Ostermann, J.³

19
- 33845676666
- Real-time Bayesian 3-d pose tracking
- Q. Wang, W. Zhang, X. Tang, H.Y. Shum, “Real-time Bayesian 3-d pose tracking,” IEEE Transactions on Circuits and Systems for Video Technology 16(12) (2006), pp.1533-1541.
- (2006) IEEE Transactions on Circuits and Systems for Video Technology , vol.16 , Issue.12 , pp. 1533-1541
- Wang, Q.¹ Zhang, W.² Tang, X.³ Shum, H.Y.⁴

20
- 0036650148
- Statistical Multimodal Integration for AudioVisual Speech Processing
- July
- S. Nakamura, “Statistical Multimodal Integration for AudioVisual Speech Processing,” IEEE Transactions on Neural Networks, Vol.13, No.4, July 2002, pp.854-866.
- (2002) IEEE Transactions on Neural Networks , vol.13 , Issue.4 , pp. 854-866
- Nakamura, S.¹

21
- 20444375102
- Integration Strategies for Audio-Visual Speech Processing: Applied to Text-Dependent Speaker Recognition
- June
- S. Lucey, T. Chen, S. Sridharan, and V. Chandran, “Integration Strategies for Audio-Visual Speech Processing: Applied to Text-Dependent Speaker Recognition,” IEEE Transactions on Multimedia, Vol.7, No.3, June 2005, pp.495-506.
- (2005) IEEE Transactions on Multimedia , vol.7 , Issue.3 , pp. 495-506
- Lucey, S.¹ Chen, T.² Sridharan, S.³ Chandran, V.⁴

22
- 0029725605
- Speech synthesis using HMMs with dynamic features
- K. Tokuda, T. Masuko, T. kobayashi and S. Imai, “Speech synthesis using HMMs with dynamic features,” Proc. ICASSP 1996, Vol. I, pp. 389-392.
- (1996) Proc. ICASSP , vol.I , pp. 389-392
- Tokuda, K.¹ Masuko, T.² kobayashi, T.³ Imai, S.⁴

23
- 0032678076
- Hidden Markov models based on Multi-space probability distribution for pitch pattern modeling
- K. Tokuda, T. Masuko, N. Miyazaki, T. Kobayashi, “Hidden Markov models based on Multi-space probability distribution for pitch pattern modeling,” Proc. ICASSP 1999, Vol. I, pp.229-232.
- (1999) Proc. ICASSP , vol.I , pp. 229-232
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

24
- 33646779506
- Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter
- T. Toda, A. Black, K. Tokuda, “Spectral Conversion Based on Maximum Likelihood Estimation Considering Global Variance of Converted Parameter,” Proc. ICASSP 2005, Vol. I, pp. 9-12.
- (2005) Proc. ICASSP , vol.I , pp. 9-12
- Toda, T.¹ Black, A.² Tokuda, K.³

25
- 4644303413
- Poisson Image Editing
- P. Perez, M. Gangnet, A. Blake, “Poisson Image Editing,” Proc. ACM SIGGRAPH2003, pp.313-318.
- (2003) Proc. ACM SIGGRAPH , pp. 313-318
- Perez, P.¹ Gangnet, M.² Blake, A.³

26
- 0030683369
- Recent improvements on Microsoft's trainable text-to-speech system - Whistler
- X. Huang, A. Acero, H. Hon, Y. Ju, J. Liu, S. Merdith, and M. Plumpe, “Recent improvements on Microsoft's trainable text-to-speech system - Whistler,” Proc. ICASSP 1997, pp. 959-962.
- (1997) Proc. ICASSP , pp. 959-962
- Huang, X.¹ Acero, A.² Hon, H.³ Ju, Y.⁴ Liu, J.⁵ Merdith, S.⁶ Plumpe, M.⁷

27
- 84944962517
- The IBM trainable speech synthesis system
- R.E. Donovan, and E.M. Eide, “The IBM trainable speech synthesis system,” Proc. ICSLP 1998, pp.1703-1706.
- (1998) Proc. ICSLP , pp. 1703-1706
- Donovan, R.E.¹ Eide, E.M.²

28
- 85063141494
- Using 5 ms segments in concatenative speech synthesis
- Pittsburgh, PA, USA
- T. Hirai, and S. Tenpaku, “Using 5 ms segments in concatenative speech synthesis,” Proc. of 5th ISCA Speech Synthesis Workshop, Pittsburgh, PA, USA, 2004, pp. 37-42.
- (2004) Proc. of 5th ISCA Speech Synthesis Workshop , pp. 37-42
- Hirai, T.¹ Tenpaku, S.²

29
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," Proc. ICASSP 1996, pp. 373-376.
- (1996) Proc. ICASSP , pp. 373-376
- Hunt, A.¹ Black, A.²

30
- 0042905281
- Fast Normalized Cross-Correlation
- J.P. Lewis, "Fast Normalized Cross-Correlation," Industrial Light & Magic.
- Industrial Light & Magic
- Lewis, J.P.¹

31
- 4644303413
- Poisson Image Editing
- P. Perez, M. Gangnet, A. Blake, “Poisson Image Editing,” in ACM Transactions on Graphics (SIGGRAPH'03), 22(3), pp.313-318.
- ACM Transactions on Graphics (SIGGRAPH'03) , vol.22 , Issue.3 , pp. 313-318
- Perez, P.¹ Gangnet, M.² Blake, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.