SCOPUS 정보 검색 플랫폼

ACM Transactions on Graphics

Volumn 36, Issue 4, 2017, Pages

Synthesizing obama: Learning lip sync from audio

(3) Suwajanakorn, Supasorn a Seitz, Steven M a Kemelmacher Shlizerman, Ira a

a UNIVERSITY OF WASHINGTON (United States)

Author keywords

Audio; Audiovisual speech; Big data; Face synthesis; Lip sync; LSTM; RNN; Uncanny valley; Videos

Indexed keywords

INTERACTIVE COMPUTER GRAPHICS; RECURRENT NEURAL NETWORKS;

AUDIO; AUDIO-VISUAL SPEECH; FACE SYNTHESIS; LIP SYNC; LSTM; UNCANNY VALLEY; VIDEOS;

BIG DATA;

EID: 85030784278 PISSN: 07300301 EISSN: 15577368 Source Type: Journal
DOI: 10.1145/3072959.3073640 Document Type: Conference Paper

Times cited : (1114)

References (56)

1
- 84971577321
- others arXiv preprint arXiv: 1603.04467 (2016)
- Martin Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, and others. 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv: 1603.04467 (2016).
- (2016) Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems
- Abadi, M.¹ Agarwal, A.² Barham, P.³ Brevdo, E.⁴ Chen, Z.⁵ Citro, C.⁶ Corrado, G.S.⁷ Davis, A.⁸ Dean, J.⁹ Devin, M.¹⁰

2
- 84881575031
- An expressive text-driven 3D talking head
- ACM
- Robert Anderson, Björn Stenger, Vincent Wan, and Roberto Cipolla. 2013a. An expressive text-driven 3D talking head. In ACM SIGGRAPH 2013 Posters. ACM, 80.
- (2013) ACM SIGGRAPH 2013 Posters , pp. 80
- Anderson, R.¹ Stenger, B.² Wan, V.³ Cipolla, R.⁴

3
- 84887344163
- Expressive visual text-to-speech using active appearance models
- Robert Anderson, Bjorn Stenger, Vincent Wan, and Roberto Cipolla. 2013b. Expressive visual text-to-speech using active appearance models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3382-3389.
- (2013) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3382-3389
- Anderson, R.¹ Stenger, B.² Wan, V.³ Cipolla, R.⁴

4
- 85030783704
- others (2012)
- Fabrice Bellard, M Niedermayer, and others. 2012. FFmpeg. Availabel from: http://ffm.peg.org (2012).
- (2012)
- Bellard, F.¹ Niedermayer, M.²

5
- 84872221378
- Tools for placing cuts and transitions in interview video
- (2012)
- Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2012. Tools for placing cuts and transitions in interview video. ACM Trans. Graph. 31, 4 (2012), 67-1.
- (2012) ACM Trans. Graph. , vol.31 , Issue.4 , pp. 61-67
- Berthouzoz, F.¹ Li, W.² Agrawala, M.³

6
- 85030768808
- (2000)
- G. Bradski. 2000. Dr. Dobb's Journal of Software Tools (2000).
- (2000) Dr. Dobb's Journal of Software Tools
- Bradski, G.¹

7
- 84937437186
- Voice puppetry
- ACM Press/Addison-Wesley Publishing Co, New York, NY, USA
- Matthew Brand. 1999. Voice Puppetry. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99). ACM Press/Addison-Wesley Publishing Co, New York, NY, USA, 21-28. DOI:https://doi.org/10.1145/311535.311537
- (1999) Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH '99) , pp. 21-28
- Brand, M.¹

8
- 84942814785
- Video rewrite: Driving visual speech with audio
- ACM Press/Addison-Wesley Publishing Co
- Christoph Bregler, Michèle Covell, and Malcolm Slaney. 1997. Video rewrite: Driving visual speech with audio. In Proceedings of the 24th annual conference on Computer graphics and interactive techniques. ACM Press/Addison-Wesley Publishing Co, 353-360.
- (1997) Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques , pp. 353-360
- Bregler, C.¹ Covell, M.² Slaney, M.³

9
- 79551559765
- A multiresolution spline with application to image mosaics
- (1983)
- Peter J Burt and Edward H Adelson. 1983. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics (TOG) 2, 4 (1983), 217-236.
- (1983) ACM Transactions on Graphics (TOG) , vol.2 , Issue.4 , pp. 217-236
- Burt, P.J.¹ Adelson, E.H.²

10
- 84980047577
- Real-time facial animation with image-based dynamic avatars
- (2016)
- Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Transactions on Graphics (TOG) 35, 4 (2016), 126.
- (2016) ACM Transactions on Graphics (TOG) , vol.35 , Issue.4 , pp. 126
- Cao, C.¹ Wu, H.² Weng, Y.³ Shao, T.⁴ Zhou, K.⁵

11
- 33645777234
- Expressive speech-driven facial animation
- (2005)
- Yong Cao, Wen C Tien, Petros Faloutsos, and Frédéric Pighin. 2005. Expressive speech-driven facial animation. ACM Transactions on Graphics (TOG) 24, 4 (2005), 1283-1302.
- (2005) ACM Transactions on Graphics (TOG) , vol.24 , Issue.4 , pp. 1283-1302
- Cao, Y.¹ Tien, W.C.² Faloutsos, P.³ Pighin, F.⁴

12
- 0035363218
- Active appearance models
- others (2001)
- Timothy F Cootes, Gareth J Edwards, Christopher J Taylor, and others. 2001. Active appearance models. IEEE Transactions on pattern analysis and machine intelligence 23, 6(2001), 681-685.
- (2001) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.23 , Issue.6 , pp. 681-685
- Cootes, T.F.¹ Edwards, G.J.² Taylor, C.J.³

13
- 82455171679
- Video face replacement
- (2011)
- Kevin Dale, Kalyan Sunkavalli, Micah K Johnson, Daniel Vlasic, Wojciech Matusik, and Hanspeter Pfister. 2011. Video face replacement. ACM Transactions on Graphics (TOG) 30, 6 (2011), 130.
- (2011) ACM Transactions on Graphics (TOG) , vol.30 , Issue.6 , pp. 130
- Dale, K.¹ Sunkavalli, K.² Johnson, M.K.³ Vlasic, D.⁴ Matusik, W.⁵ Pfister, H.⁶

14
- 0036989560
- ACM
- Tony Ezzat, Gadi Geiger, and Tomaso Poggio. 2002. Trainable videorealistic speech animation. Vol. 21. ACM.
- (2002) Trainable Videorealistic Speech Animation , vol.21
- Ezzat, T.¹ Geiger, G.² Poggio, T.³

15
- 84946029513
- Photo-real talking head with deep bidirectional LSTM
- IEEE
- Bo Fan, Lijuan Wang, Frank K Soong, and Lei Xie. 2015a. Photo-real talking head with deep bidirectional LSTM. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4884-4888.
- (2015) 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4884-4888
- Fan, B.¹ Wang, L.² Soong, F.K.³ Xie, L.⁴

16
- 84994256183
- A deep bidirectional LSTM approach for video-realistic talking head
- (2015)
- Bo Fan, Lei Xie, Shan Yang, Lijuan Wang, and Frank K Soong. 2015b. A deep bidirectional LSTM approach for video-realistic talking head. Multimedia Tools and Applications (2015), 1-23.
- (2015) Multimedia Tools and Applications , pp. 1-23
- Fan, B.¹ Xie, L.² Yang, S.³ Wang, L.⁴ Soong, F.K.⁵

17
- 16244385915
- Audio/visual mapping with cross-modal hidden Markov models
- (2005)
- Shengli Fu, Ricardo Gutierrez-Osuna, Anna Esposito, Praveen K Kakumanu, and Oscar N Garcia. 2005. Audio/visual mapping with cross-modal hidden Markov models. IEEE Transactions on Multimedia 7, 2 (2005), 243-252.
- (2005) IEEE Transactions on Multimedia , vol.7 , Issue.2 , pp. 243-252
- Fu, S.¹ Gutierrez-Osuna, R.² Esposito, A.³ Kakumanu, P.K.⁴ Garcia, O.N.⁵

18
- 84994531050
- arXivpreprint arXiv:1512.05287 (2015)
- Yarin Gal. 2015. A theoretically grounded application of dropout in recurrent neural networks. arXivpreprint arXiv:1512.05287 (2015).
- (2015) A Theoretically Grounded Application of Dropout in Recurrent Neural Networks
- Gal, Y.¹

19
- 84911366471
- Automatic face reenactment
- Pablo Garrido, Levi Valgaerts, Ole Rehmsen, Thorsten Thormahlen, Patrick Perez, and Christian Theobalt. 2014. Automatic face reenactment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4217-4224.
- (2014) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 4217-4224
- Garrido, P.¹ Valgaerts, L.² Rehmsen, O.³ Thormahlen, T.⁴ Perez, P.⁵ Theobalt, C.⁶

20
- 84932116100
- Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track
- Wiley Online Library
- Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 193-204.
- (2015) Computer Graphics Forum , vol.34 , pp. 193-204
- Garrido, P.¹ Valgaerts, L.² Sarmadi, H.³ Steiner, I.⁴ Varanasi, K.⁵ Perez, P.⁶ Theobalt, C.⁷

21
- 84906979661
- arXiv preprint arXiv:1308.0850 (2013)
- Alex Graves. 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850 (2013).
- (2013) Generating Sequences with Recurrent Neural Networks
- Graves, A.¹

22
- 84893701254
- Hybrid speech recognition with deep bidirectional LSTM
- IEEE
- Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. 2013. Hybrid speech recognition with deep bidirectional LSTM. In Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 273-278.
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop On , pp. 273-278
- Graves, A.¹ Jaitly, N.² Mohamed, A.-R.³

23
- 27744588611
- Framewise phoneme classification with bidirectional LSTM and other neural network architectures
- (2005)
- Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18, 5 (2005), 602-610.
- (2005) Neural Networks , vol.18 , Issue.5 , pp. 602-610
- Graves, A.¹ Schmidhuber, J.²

24
- 0031573117
- Long short-term memory
- (1997)
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735-1780.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

25
- 84898663109
- Data-driven speech animation synthesis focusing on realistic inside of the mouth
- (2014)
- Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2014. Data-driven speech animation synthesis focusing on realistic inside of the mouth. Journal of information processing 22, 2 (2014), 401-409.
- (2014) Journal of Information Processing , vol.22 , Issue.2 , pp. 401-409
- Kawai, M.¹ Iwao, T.² Mima, D.³ Maejima, A.⁴ Morishima, S.⁵

26
- 84866672749
- Collection flow
- IEEE
- Ira Kemelmacher-Shlizerman and Steven M Seitz. 2012. Collection flow. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 1792-1799.
- (2012) Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference On , pp. 1792-1799
- Kemelmacher-Shlizerman, I.¹ Seitz, S.M.²

27
- 84954136808
- A decision tree framework for spatiotemporal sequence prediction
- ACM
- Taehwan Kim, Yisong Yue, Sarah Taylor, and Iain Matthews. 2015. A decision tree framework for spatiotemporal sequence prediction. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 577-586.
- (2015) Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pp. 577-586
- Kim, T.¹ Yue, Y.² Taylor, S.³ Matthews, I.⁴

28
- 70349425850
- Dlib-ml: A machine learning toolkit
- (2009)
- Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755-1758.
- (2009) Journal of Machine Learning Research , vol.10 , pp. 1755-1758
- King, D.E.¹

29
- 84941620184
- arXiv preprint arXiv:1412.6980 (2014)
- Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

30
- 84866661849
- A data-driven approach for facial expression synthesis in video
- IEEE
- Kai Li, Feng Xu, Jue Wang, Qionghai Dai, and Yebin Liu. 2012. A data-driven approach for facial expression synthesis in video. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 57-64.
- (2012) Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference On , pp. 57-64
- Li, K.¹ Xu, F.² Wang, J.³ Dai, Q.⁴ Liu, Y.⁵

31
- 84990837263
- Head reconstruction from internet photos
- Springer
- Shu Liang, Linda G Shapiro, and Ira Kemelmacher-Shlizerman. 2016. Head reconstruction from internet photos. In European Conference on Computer Vision. Springer, 360-374.
- (2016) European Conference on Computer Vision , pp. 360-374
- Liang, S.¹ Shapiro, L.G.² Kemelmacher-Shlizerman, I.³

32
- 51949118316
- Human-assisted motion annotation
- CVPR 2008. IEEE Conference on. IEEE
- Ce Liu, William T Freeman, Edward H Adelson, and Yair Weiss. 2008. Human-assisted motion annotation. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 1-8.
- (2008) Computer Vision and Pattern Recognition, 2008 , pp. 1-8
- Liu, C.¹ Freeman, W.T.² Adelson, E.H.³ Weiss, Y.⁴

33
- 84879068811
- Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis
- (2013)
- Wesley Mattheyses, Lukas Latacz, and Werner Verhelst. 2013. Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis. Speech Communication 55, 7 (2013), 857-876.
- (2013) Speech Communication , vol.55 , Issue.7 , pp. 857-876
- Mattheyses, W.¹ Latacz, L.² Verhelst, W.³

34
- 84912553696
- Audiovisual speech synthesis: An overview of the state-of-the-art
- (2015)
- Wesley Mattheyses and Werner Verhelst. 2015. Audiovisual speech synthesis: An overview of the state-of-the-art. Speech Communication 66 (2015), 182-217.
- (2015) Speech Communication , vol.66 , pp. 182-217
- Mattheyses, W.¹ Verhelst, W.²

35
- 33749242231
- Hybrid images
- (July 2006)
- Aude Oliva, Antonio Torralba, and Philippe G. Schyns. 2006. Hybrid Images. ACM Trans. Graph. 25, 3 (July 2006), 527-532. DOI: https://doi.org/10.1145/1141911.1141919
- (2006) ACM Trans. Graph. , vol.25 , Issue.3 , pp. 527-532
- Oliva, A.¹ Torralba, A.² Schyns, P.G.³

36
- 85030788856
- (2016)
- Wener Robitza. 2016. ffmpeg-normalize. https://github.com/slhck/ffmpeg-normalize. (2016).
- (2016) Ffmpeg-Normalize
- Robitza, W.¹

37
- 85028069640
- arXiv preprint arXiv:1604.02647 (2016)
- Shunsuke Saito, Tianye Li, and Hao Li. 2016. Real-Time Facial Segmentation and Performance Capture from RGB Input. arXiv preprint arXiv:1604.02647 (2016).
- (2016) Real-Time Facial Segmentation and Performance Capture from RGB Input
- Saito, S.¹ Li, T.² Li, H.³

38
- 0010069372
- HMM-based text-to-audio-visual speech synthesis
- Shinji Sako, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura. 2000. HMM-based text-to-audio-visual speech synthesis. In INTERSPEECH. 25-28.
- (2000) INTERSPEECH , pp. 25-28
- Sako, S.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

39
- 84988899143
- (2014)
- YiChang Shih, Sylvain Paris, Connelly Barnes, William T Freeman, and Frédo Durand. 2014. Style transfer for headshot portraits. (2014).
- (2014) Style Transfer for Headshot Portraits
- Shih, Y.C.¹ Paris, S.² Barnes, C.³ Freeman, W.T.⁴ Durand, F.⁵

40
- 84963745989
- Talking heads synthesis from audio with deep neural networks
- IEEE
- Taiki Shimba, Ryuhei Sakurai, Hirotake Yamazoe, and Joo-Ho Lee. 2015. Talking heads synthesis from audio with deep neural networks. In 2015 IEEE/SICE International Symposium on System Integration (SII). IEEE, 100-105.
- (2015) 2015 IEEE/SICE International Symposium on System Integration (SII) , pp. 100-105
- Shimba, T.¹ Sakurai, R.² Yamazoe, H.³ Lee, J.-H.⁴

41
- 84906498425
- Total moving face reconstruction
- Springer
- Supasorn Suwajanakorn, Ira Kemelmacher-Shlizerman, and Steven M Seitz. 2014. Total moving face reconstruction. In European Conference on Computer Vision. Springer, 796-812.
- (2014) European Conference on Computer Vision , pp. 796-812
- Suwajanakorn, S.¹ Kemelmacher-Shlizerman, I.² Seitz, S.M.³

42
- 84973890906
- What makes tom hanks look like tom hanks
- Supasorn Suwajanakorn, Steven M Seitz, and Ira Kemelmacher-Shlizerman. 2015. What Makes Tom Hanks Look Like Tom Hanks. In Proceedings of the IEEE International Conference on Computer Vision. 3952-3960.
- (2015) Proceedings of the IEEE International Conference on Computer Vision , pp. 3952-3960
- Suwajanakorn, S.¹ Seitz, S.M.² Kemelmacher-Shlizerman, I.³

43
- 85030785993
- (2016)
- Sarah Taylor, Akihiro Kato, Ben Milner, and Iain Matthews. 2016. Audio-to-Visual Speech Conversion using Deep Neural Networks. (2016).
- (2016) Audio-to-Visual Speech Conversion Using Deep Neural Networks
- Taylor, S.¹ Kato, A.² Milner, B.³ Matthews, I.⁴

44
- 84988955843
- Dynamic units of visual speech
- Eurographics Association
- Sarah L Taylor, Moshe Mahler, Barry-John Theobald, and Iain Matthews. 2012. Dynamic units of visual speech. In Proceedings of the 11th ACM SIGGRAPH/Eurographics conference on Computer Animation. Eurographics Association, 275-284.
- (2012) Proceedings of the 11th ACM SIGGRAPH/Eurographics Conference on Computer Animation , pp. 275-284
- Taylor, S.L.¹ Mahler, M.² Theobald, B.-J.³ Matthews, I.⁴

45
- 24644514008
- An image inpainting technique based on the fast marching method
- (2004)
- Alexandru Telea. 2004. An image inpainting technique based on the fast marching method. Journal of graphics tools 9, 1 (2004), 23-34.
- (2004) Journal of Graphics Tools , vol.9 , Issue.1 , pp. 23-34
- Telea, A.¹

46
- 84995921764
- Real-time expression transfer for facial reenactment
- (2015)
- Justus Thies, Michael Zollhöfer, Matthias Nießner, Levi Valgaerts, Marc Stamminger, and Christian Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Transactions on Graphics (TOG) 34, 6 (2015), 183.
- (2015) ACM Transactions on Graphics (TOG) , vol.34 , Issue.6 , pp. 183
- Thies, J.¹ Zollhöfer, M.² Nießner, M.³ Valgaerts, L.⁴ Stamminger, M.⁵ Theobalt, C.⁶

47
- 84986308411
- Face2face: Real-time face capture and reenactment of rgb videos
- IEEE (2016)
- Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 1 (2016).
- (2016) Proc. Computer Vision and Pattern Recognition (CVPR) , vol.1
- Thies, J.¹ Zollhöfer, M.² Stamminger, M.³ Theobalt, C.⁴ Nießner, M.⁵

48
- 85011070895
- arXiv preprint arXiv:1609.03499 (2016)
- Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. 2016. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016).
- (2016) Wavenet: A Generative Model for Raw Audio
- Van Den Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

49
- 33646016842
- Face transfer with multilinear models
- ACM
- Daniel Vlasic, Matthew Brand, Hanspeter Pfister, and JovanPopovic. 2005. Face transfer with multilinear models. In ACM Transactions on Graphics (TOG), Vol. 24. ACM, 426-433.
- (2005) ACM Transactions on Graphics (TOG) , vol.24 , pp. 426-433
- Vlasic, D.¹ Brand, M.² Pfister, H.³ Popovic, J.⁴

50
- 84863490267
- High quality lip-sync animation for 3D photo-realistic talking head
- IEEE
- Lijuan Wang, Wei Han, and Frank K Soong. 2012. High quality lip-sync animation for 3D photo-realistic talking head. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 4529-4532.
- (2012) 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4529-4532
- Wang, L.¹ Han, W.² Soong, F.K.³

51
- 79959854294
- Synthesizing photo-real talking head via trajectory-guided sample selection
- Lijuan Wang, Xiaojun Qian, Wei Han, and Frank K Soong. 2010. Synthesizing photo-real talking head via trajectory-guided sample selection. In INTERSPEECH, Vol. 10. 446-449.
- (2010) INTERSPEECH , vol.10 , pp. 446-449
- Wang, L.¹ Qian, X.² Han, W.³ Soong, F.K.⁴

52
- 34147186624
- A coupled HMM approach to video-realistic speech animation
- (2007)
- Lei Xie and Zhi-Qiang Liu. 2007a. A coupled HMM approach to video-realistic speech animation. Pattern Recognition 40, 8 (2007), 2325-2340.
- (2007) Pattern Recognition , vol.40 , Issue.8 , pp. 2325-2340
- Xie, L.¹ Liu, Z.-Q.²

53
- 33947583073
- Realistic mouth-synching for speech-driven talking face using articulatory modelling
- (2007)
- Lei Xie and Zhi-Qiang Liu. 2007b. Realistic mouth-synching for speech-driven talking face using articulatory modelling. IEEE Transactions on Multimedia 9, 3 (2007), 500-510.
- (2007) IEEE Transactions on Multimedia , vol.9 , Issue.3 , pp. 500-510
- Xie, L.¹ Liu, Z.-Q.²

54
- 84887383859
- Supervised descent method and its applications to face alignment
- Xuehan Xiong and Fernando De la Torre. 2013. Supervised descent method and its applications to face alignment. In Proceedings of the IEEE conference on computer vision and pattern recognition. 532-539.
- (2013) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 532-539
- Xiong, X.¹ De La Torre, F.²

55
- 84944053926
- arXiv preprint arXiv:1409.2329 (2014)
- Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).
- (2014) Recurrent Neural Network Regularization
- Zaremba, W.¹ Sutskever, I.² Vinyals, O.³

56
- 84906253471
- A new language independent, photo-realistic talking head driven by voice only
- Xinjian Zhang, Lijuan Wang, Gang Li, Frank Seide, and Frank K Soong. 2013. A new language independent, photo-realistic talking head driven by voice only. In INTERSPEECH. 2743-2747.
- (2013) INTERSPEECH , pp. 2743-2747
- Zhang, X.¹ Wang, L.² Li, G.³ Seide, F.⁴ Soong, F.K.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.