메뉴 건너뛰기




Volumn 36, Issue 4, 2017, Pages

Synthesizing obama: Learning lip sync from audio

Author keywords

Audio; Audiovisual speech; Big data; Face synthesis; Lip sync; LSTM; RNN; Uncanny valley; Videos

Indexed keywords

INTERACTIVE COMPUTER GRAPHICS; RECURRENT NEURAL NETWORKS;

EID: 85030784278     PISSN: 07300301     EISSN: 15577368     Source Type: Journal    
DOI: 10.1145/3072959.3073640     Document Type: Conference Paper
Times cited : (1114)

References (56)
  • 4
    • 85030783704 scopus 로고    scopus 로고
    • others (2012)
    • Fabrice Bellard, M Niedermayer, and others. 2012. FFmpeg. Availabel from: http://ffm.peg.org (2012).
    • (2012)
    • Bellard, F.1    Niedermayer, M.2
  • 5
    • 84872221378 scopus 로고    scopus 로고
    • Tools for placing cuts and transitions in interview video
    • (2012)
    • Floraine Berthouzoz, Wilmot Li, and Maneesh Agrawala. 2012. Tools for placing cuts and transitions in interview video. ACM Trans. Graph. 31, 4 (2012), 67-1.
    • (2012) ACM Trans. Graph. , vol.31 , Issue.4 , pp. 61-67
    • Berthouzoz, F.1    Li, W.2    Agrawala, M.3
  • 9
    • 79551559765 scopus 로고
    • A multiresolution spline with application to image mosaics
    • (1983)
    • Peter J Burt and Edward H Adelson. 1983. A multiresolution spline with application to image mosaics. ACM Transactions on Graphics (TOG) 2, 4 (1983), 217-236.
    • (1983) ACM Transactions on Graphics (TOG) , vol.2 , Issue.4 , pp. 217-236
    • Burt, P.J.1    Adelson, E.H.2
  • 10
    • 84980047577 scopus 로고    scopus 로고
    • Real-time facial animation with image-based dynamic avatars
    • (2016)
    • Chen Cao, Hongzhi Wu, Yanlin Weng, Tianjia Shao, and Kun Zhou. 2016. Real-time facial animation with image-based dynamic avatars. ACM Transactions on Graphics (TOG) 35, 4 (2016), 126.
    • (2016) ACM Transactions on Graphics (TOG) , vol.35 , Issue.4 , pp. 126
    • Cao, C.1    Wu, H.2    Weng, Y.3    Shao, T.4    Zhou, K.5
  • 20
    • 84932116100 scopus 로고    scopus 로고
    • Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track
    • Wiley Online Library
    • Pablo Garrido, Levi Valgaerts, Hamid Sarmadi, Ingmar Steiner, Kiran Varanasi, Patrick Perez, and Christian Theobalt. 2015. Vdub: Modifying face video of actors for plausible visual alignment to a dubbed audio track. In Computer Graphics Forum, Vol. 34. Wiley Online Library, 193-204.
    • (2015) Computer Graphics Forum , vol.34 , pp. 193-204
    • Garrido, P.1    Valgaerts, L.2    Sarmadi, H.3    Steiner, I.4    Varanasi, K.5    Perez, P.6    Theobalt, C.7
  • 23
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    • (2005)
    • Alex Graves and Jürgen Schmidhuber. 2005. Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Networks 18, 5 (2005), 602-610.
    • (2005) Neural Networks , vol.18 , Issue.5 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 24
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • (1997)
    • Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735-1780.
    • (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 25
    • 84898663109 scopus 로고    scopus 로고
    • Data-driven speech animation synthesis focusing on realistic inside of the mouth
    • (2014)
    • Masahide Kawai, Tomoyori Iwao, Daisuke Mima, Akinobu Maejima, and Shigeo Morishima. 2014. Data-driven speech animation synthesis focusing on realistic inside of the mouth. Journal of information processing 22, 2 (2014), 401-409.
    • (2014) Journal of Information Processing , vol.22 , Issue.2 , pp. 401-409
    • Kawai, M.1    Iwao, T.2    Mima, D.3    Maejima, A.4    Morishima, S.5
  • 28
    • 70349425850 scopus 로고    scopus 로고
    • Dlib-ml: A machine learning toolkit
    • (2009)
    • Davis E. King. 2009. Dlib-ml: A Machine Learning Toolkit. Journal of Machine Learning Research 10 (2009), 1755-1758.
    • (2009) Journal of Machine Learning Research , vol.10 , pp. 1755-1758
    • King, D.E.1
  • 33
    • 84879068811 scopus 로고    scopus 로고
    • Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis
    • (2013)
    • Wesley Mattheyses, Lukas Latacz, and Werner Verhelst. 2013. Comprehensive many-to-many phoneme-to-viseme mapping and its application for concatenative visual speech synthesis. Speech Communication 55, 7 (2013), 857-876.
    • (2013) Speech Communication , vol.55 , Issue.7 , pp. 857-876
    • Mattheyses, W.1    Latacz, L.2    Verhelst, W.3
  • 34
    • 84912553696 scopus 로고    scopus 로고
    • Audiovisual speech synthesis: An overview of the state-of-the-art
    • (2015)
    • Wesley Mattheyses and Werner Verhelst. 2015. Audiovisual speech synthesis: An overview of the state-of-the-art. Speech Communication 66 (2015), 182-217.
    • (2015) Speech Communication , vol.66 , pp. 182-217
    • Mattheyses, W.1    Verhelst, W.2
  • 35
    • 33749242231 scopus 로고    scopus 로고
    • Hybrid images
    • (July 2006)
    • Aude Oliva, Antonio Torralba, and Philippe G. Schyns. 2006. Hybrid Images. ACM Trans. Graph. 25, 3 (July 2006), 527-532. DOI: https://doi.org/10.1145/1141911.1141919
    • (2006) ACM Trans. Graph. , vol.25 , Issue.3 , pp. 527-532
    • Oliva, A.1    Torralba, A.2    Schyns, P.G.3
  • 36
    • 85030788856 scopus 로고    scopus 로고
    • (2016)
    • Wener Robitza. 2016. ffmpeg-normalize. https://github.com/slhck/ffmpeg-normalize. (2016).
    • (2016) Ffmpeg-Normalize
    • Robitza, W.1
  • 45
    • 24644514008 scopus 로고    scopus 로고
    • An image inpainting technique based on the fast marching method
    • (2004)
    • Alexandru Telea. 2004. An image inpainting technique based on the fast marching method. Journal of graphics tools 9, 1 (2004), 23-34.
    • (2004) Journal of Graphics Tools , vol.9 , Issue.1 , pp. 23-34
    • Telea, A.1
  • 51
    • 79959854294 scopus 로고    scopus 로고
    • Synthesizing photo-real talking head via trajectory-guided sample selection
    • Lijuan Wang, Xiaojun Qian, Wei Han, and Frank K Soong. 2010. Synthesizing photo-real talking head via trajectory-guided sample selection. In INTERSPEECH, Vol. 10. 446-449.
    • (2010) INTERSPEECH , vol.10 , pp. 446-449
    • Wang, L.1    Qian, X.2    Han, W.3    Soong, F.K.4
  • 52
    • 34147186624 scopus 로고    scopus 로고
    • A coupled HMM approach to video-realistic speech animation
    • (2007)
    • Lei Xie and Zhi-Qiang Liu. 2007a. A coupled HMM approach to video-realistic speech animation. Pattern Recognition 40, 8 (2007), 2325-2340.
    • (2007) Pattern Recognition , vol.40 , Issue.8 , pp. 2325-2340
    • Xie, L.1    Liu, Z.-Q.2
  • 53
    • 33947583073 scopus 로고    scopus 로고
    • Realistic mouth-synching for speech-driven talking face using articulatory modelling
    • (2007)
    • Lei Xie and Zhi-Qiang Liu. 2007b. Realistic mouth-synching for speech-driven talking face using articulatory modelling. IEEE Transactions on Multimedia 9, 3 (2007), 500-510.
    • (2007) IEEE Transactions on Multimedia , vol.9 , Issue.3 , pp. 500-510
    • Xie, L.1    Liu, Z.-Q.2
  • 56
    • 84906253471 scopus 로고    scopus 로고
    • A new language independent, photo-realistic talking head driven by voice only
    • Xinjian Zhang, Lijuan Wang, Gang Li, Frank Seide, and Frank K Soong. 2013. A new language independent, photo-realistic talking head driven by voice only. In INTERSPEECH. 2743-2747.
    • (2013) INTERSPEECH , pp. 2743-2747
    • Zhang, X.1    Wang, L.2    Li, G.3    Seide, F.4    Soong, F.K.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.