메뉴 건너뛰기




Volumn 48, Issue 6, 2006, Pages 598-615

A comparison of acoustic coding models for speech-driven facial animation

Author keywords

Audio visual mapping; Linear discriminants analysis; Speech driven facial animation

Indexed keywords

ANIMATION; COMPUTER SIMULATION; GESTURE RECOGNITION; LINEAR SYSTEMS; NATURAL FREQUENCIES; SPEECH ANALYSIS;

EID: 33747766904     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2005.09.005     Document Type: Article
Times cited : (18)

References (63)
  • 1
    • 0011048689 scopus 로고
    • Plateaus, catastrophes and the structuring of vowel systems
    • Abry C., Boe L.J., and Schwartz J.L. Plateaus, catastrophes and the structuring of vowel systems. J. Phonet. 17 (1989) 47-54
    • (1989) J. Phonet. , vol.17 , pp. 47-54
    • Abry, C.1    Boe, L.J.2    Schwartz, J.L.3
  • 2
    • 0033100056 scopus 로고    scopus 로고
    • Codebook based face point trajectory synthesis algorithm using speech input
    • Arslan L.M., and Talkin D. Codebook based face point trajectory synthesis algorithm using speech input. Speech Commun. 27 (1999) 81-93
    • (1999) Speech Commun. , vol.27 , pp. 81-93
    • Arslan, L.M.1    Talkin, D.2
  • 3
    • 84890517975 scopus 로고
    • Least-square fitting of two 3-d point sets
    • Arun K.S., Huang T.S., and Blostein S.D. Least-square fitting of two 3-d point sets. IEEE Trans. PAMI 9 5 (1987) 698-700
    • (1987) IEEE Trans. PAMI , vol.9 , Issue.5 , pp. 698-700
    • Arun, K.S.1    Huang, T.S.2    Blostein, S.D.3
  • 4
    • 0035574930 scopus 로고    scopus 로고
    • Aversano, G., Esposito, A., Esposito, A., Marinaro, M., 2001. A new text-independent method for phoneme segmentation. In: Proc. IEEE-MWSCAS Conference, Dayton, OH, pp. 516-519.
  • 5
    • 33747792416 scopus 로고    scopus 로고
    • Balan, N., 2003. Analysis and Evaluation of Factors Affecting Speech Driven Facial Animation, MS Thesis, Dept. of Computer Science and Engineering, Wright State University.
  • 6
    • 0032178686 scopus 로고    scopus 로고
    • Benoit, C., Le Goff, B., 1998. audio-visual speech synthesis from French text: eight years of models, designs, and evaluation at the ICP, Speech Commun. 26, 117-129.
  • 7
    • 0030362791 scopus 로고    scopus 로고
    • Bernstein, L.E., Benoit, C., 1996. For speech perception by humans or machines, three senses are better than one. In: Proc. ICSLP, Philadelphia 3, pp. 1477-1480.
  • 8
    • 33747751287 scopus 로고    scopus 로고
    • Beskow, J., 1995. Rule-based visual speech synthesis, Proc. EUROSPEECH, Madrid, Spain 1, pp. 299-302.
  • 9
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • Boll S.F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. ASSP 27 2 (1979) 112-113
    • (1979) IEEE Trans. ASSP , vol.27 , Issue.2 , pp. 112-113
    • Boll, S.F.1
  • 10
    • 84937437186 scopus 로고    scopus 로고
    • Brand, M., 1999. Voice puppetry, Proc. SIGGRAPH, LA, California, pp. 21-28.
  • 11
    • 33747783965 scopus 로고    scopus 로고
    • Bregler, C., Omohundro, S., 1995. Nonlinear image interpolation using manifold learning. In: Tesauro, G., Touretzky, D., Leen, T. (Eds.), Advances in Neural Information Processing Systems 7, MIT press, Cambridge, pp. 401-408.
  • 12
    • 33747755815 scopus 로고    scopus 로고
    • Bryll, R., Ma, X., Quek, F., 1999. Camera calibration utility description, VisLab Tech. Rep., University of Illinois at Chicago.
  • 13
    • 33747764018 scopus 로고    scopus 로고
    • Caldognetto, E.M., Vagges, K., Borghese, N.A., Ferrigno, G., 1989. Automatic analysis of lip and jaw kinematics in VCV sequences. In: Proc. of EUROSPEECH, Paris 2, pp. 453-456.
  • 14
    • 0001514782 scopus 로고
    • Modeling coarticulation in synthetic visual speech
    • Thalmann N.M., and Thalmann D. (Eds), Springer
    • Cohen M., and Massaro D.W. Modeling coarticulation in synthetic visual speech. In: Thalmann N.M., and Thalmann D. (Eds). Models and Techniques in Computer Animation (1993), Springer 141-155
    • (1993) Models and Techniques in Computer Animation , pp. 141-155
    • Cohen, M.1    Massaro, D.W.2
  • 15
    • 33747779045 scopus 로고    scopus 로고
    • Coianiz, T., Torresani, L., Caprile, L., 1995. 2D deformable models for visual speech analysis. In: Stork, D., Hennecke, M. (Eds.), Speech Reading by Man and Machine. Springer, pp. 391-398.
  • 17
    • 0016987103 scopus 로고    scopus 로고
    • Duttweiler, D., Messerschmitt, D., 1976. Nearly instantaneous companding for nonuniformly quantized PCM. In: IEEE Trans. on Comm., COM-24, pp. 864-873.
  • 18
    • 33747764558 scopus 로고    scopus 로고
    • Essa, I., 1995. Analysis, interpretation, and synthesis of facial expression, Ph.D. thesis, MIT Media Arts and Sciences, Cambridge, MA.
  • 19
    • 0034207427 scopus 로고    scopus 로고
    • Visual speech synthesis by morphing visemes
    • Ezzat T., and Poggio T. Visual speech synthesis by morphing visemes. J. Comput. Vis. 38 1 (2000) 45-57
    • (2000) J. Comput. Vis. , vol.38 , Issue.1 , pp. 45-57
    • Ezzat, T.1    Poggio, T.2
  • 20
    • 33747756511 scopus 로고    scopus 로고
    • Finn, K., 1986. An investigation of visible lip information to be used in automatic speech recognition, Ph.D. dissertation, Dept. CS, Georgetown University, Washington, DC.
  • 21
    • 33747750568 scopus 로고    scopus 로고
    • Fu, S., 2002. Visual Mapping Based on Hidden Markov Models, MS Thesis, Dept. of Computer Science and Engineering, Wright State University.
  • 22
    • 16244385915 scopus 로고    scopus 로고
    • Fu, S., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P.K., Garcia, O.N., 2005. Audio/Visual Mapping with Cross-Modal Hidden Markov Models. IEEE Transactions on Multimedia 7, No. 2, April.
  • 23
    • 33747756510 scopus 로고    scopus 로고
    • Garofolo, J., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pellet, D.S., Dahlgren, N.L, 1988. The DARPA TIMIT CDROM. Available from LDC: .
  • 24
    • 33747755814 scopus 로고    scopus 로고
    • Goldschen, A.J., 1993. Continuous automatic speech recognition by lipreading, Ph.D. thesis, George Washington University.
  • 25
    • 0345443171 scopus 로고    scopus 로고
    • Temporal properties of spontaneous speech-a syllable-centric perspective
    • Greenberg S., Carvey H., Hitchcock L., and Chang S. Temporal properties of spontaneous speech-a syllable-centric perspective. J. Phonet. 31 3-4 (2003) 465-485
    • (2003) J. Phonet. , vol.31 , Issue.3-4 , pp. 465-485
    • Greenberg, S.1    Carvey, H.2    Hitchcock, L.3    Chang, S.4
  • 26
    • 33747784294 scopus 로고    scopus 로고
    • Gutierrez-Osuna, R., Kakumanu, P., Esposito, A., Garcia, O.N., Bojorquez, A., Castillo, J., Rudomin, I., 2002. WSU Technical.Report CS-WSU-02-03, Dayton, OH.
  • 28
    • 0028517164 scopus 로고
    • RASTA processing of speech
    • Hermansky H., and Morgan N. RASTA processing of speech. IEEE Trans. SAP 2 4 (1994) 578-589
    • (1994) IEEE Trans. SAP , vol.2 , Issue.4 , pp. 578-589
    • Hermansky, H.1    Morgan, N.2
  • 29
    • 0036650837 scopus 로고    scopus 로고
    • Real-time speech-driven face animation with expressions using neural networks
    • Hong P., Wen Z., and Huang T.S. Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Networks 13 4 (2002) 916-927
    • (2002) IEEE Trans. Neural Networks , vol.13 , Issue.4 , pp. 916-927
    • Hong, P.1    Wen, Z.2    Huang, T.S.3
  • 30
    • 33747751903 scopus 로고    scopus 로고
    • Itakura, F., 1975. Line spectrum representation of linear prediction coefficients of speech signal, JASA57, pp. 535 (abstract).
  • 33
    • 0032778055 scopus 로고    scopus 로고
    • Interlacing properties of line spectrum pair frequencies
    • Kim H., and Lee H. Interlacing properties of line spectrum pair frequencies. IEEE Trans. SAP 7 (1999) 87-91
    • (1999) IEEE Trans. SAP , vol.7 , pp. 87-91
    • Kim, H.1    Lee, H.2
  • 34
    • 84989489267 scopus 로고    scopus 로고
    • Klatt, D.H., 1982. Prediction of perceived phonetic distance from critical band spectra: a first step. In: Proc. of ICASSP, Paris, pp. 1278-1281.
  • 35
    • 33747757580 scopus 로고    scopus 로고
    • Kühnert, B., Nolan, F., 1999. The origin of coarticulation, in Coarticulation: theory, data, and techniques. In: Harcastle, W., Helwett, N. (Eds.), Cambridge University Press, pp. 7-29.
  • 36
    • 0029270677 scopus 로고
    • Converting speech into lip movements: a multimedia telephone for hard of hearing people
    • Lavagetto F. Converting speech into lip movements: a multimedia telephone for hard of hearing people. IEEE Trans. Rehab. Eng. 3 1 (1995) 90-102
    • (1995) IEEE Trans. Rehab. Eng. , vol.3 , Issue.1 , pp. 90-102
    • Lavagetto, F.1
  • 37
    • 0029182694 scopus 로고    scopus 로고
    • Lee, Y., Terzopoulos, D., Waters, K., 1995. Realisitc modeling for facial animation. In: Proc. of SIGGRAPH, LA, California, pp. 55-62.
  • 38
    • 0032165329 scopus 로고    scopus 로고
    • Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis
    • Leps¢y S., and Curinga S. Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis. Signal Process. Image Commun. 13 (1998) 209-225
    • (1998) Signal Process. Image Commun. , vol.13 , pp. 209-225
    • Lepscy, S.1    Curinga, S.2
  • 39
    • 33747756160 scopus 로고    scopus 로고
    • Luttin, J., Thacher, N.A., Beet, S.W., 1996. Active shape models for visual speech feature extraction. In: Stork, D., Hennecke, M. (Eds.), Speech-Reading by Man and Machine, vol. 150. Springer, pp. 383-390.
  • 42
    • 33747786384 scopus 로고    scopus 로고
    • Massaro, D.W., Beskow, J., Cohen, M.M., Fry, C.L., Rodriquez, T., 1999. Picture my voice: audio to visual speech synthesis using Artificial Neural Networks. In: Proc. AVSP, Santa Cruz, CA, pp. 133-138.
  • 44
    • 0020960564 scopus 로고
    • Physical characteristics of lips underlying vowel lipreading performances
    • Montgomery A., and Jackson P. Physical characteristics of lips underlying vowel lipreading performances. JASA 73 6 (1983) 2134-2144
    • (1983) JASA , vol.73 , Issue.6 , pp. 2134-2144
    • Montgomery, A.1    Jackson, P.2
  • 45
    • 0026156861 scopus 로고
    • A Media conversion from speech to facial image for man-machine interface
    • Morishima S., and Harashima H. A Media conversion from speech to facial image for man-machine interface. IEEE J. Selected Areas Commun. 9 4 (1991) 594-600
    • (1991) IEEE J. Selected Areas Commun. , vol.9 , Issue.4 , pp. 594-600
    • Morishima, S.1    Harashima, H.2
  • 46
    • 0035251712 scopus 로고    scopus 로고
    • Speech-to-lip movements synthesis by maximizing audio-visual joint probability
    • Nakamura S., and Yamamoto E. Speech-to-lip movements synthesis by maximizing audio-visual joint probability. J. VLSI Signal Proc. 27 (2001) 119-126
    • (2001) J. VLSI Signal Proc. , vol.27 , pp. 119-126
    • Nakamura, S.1    Yamamoto, E.2
  • 47
    • 0020202671 scopus 로고
    • Parameterized models for facial animation
    • Parke F.I. Parameterized models for facial animation. IEEE Comput. Graph. Appl. 2 9 (1982) 61-68
    • (1982) IEEE Comput. Graph. Appl. , vol.2 , Issue.9 , pp. 61-68
    • Parke, F.I.1
  • 50
    • 0027659197 scopus 로고
    • Signal modeling techniques in speech recognition
    • Picone J.W. Signal modeling techniques in speech recognition. Proc. IEEE 81 9 (1993) 1215-1247
    • (1993) Proc. IEEE , vol.81 , Issue.9 , pp. 1215-1247
    • Picone, J.W.1
  • 52
    • 0032180188 scopus 로고    scopus 로고
    • Adaptive fusion of acoustic and visual sources for automatic speech recognition
    • Rogozan A., and Deléglise P. Adaptive fusion of acoustic and visual sources for automatic speech recognition. Speech Commun. 26 (1998) 149-161
    • (1998) Speech Commun. , vol.26 , pp. 149-161
    • Rogozan, A.1    Deléglise, P.2
  • 53
    • 84928837806 scopus 로고
    • A joint synchrony/mean-rate model of auditory speech processing
    • Seneff S. A joint synchrony/mean-rate model of auditory speech processing. J. Phonetics 16 1 (1988) 55-76
    • (1988) J. Phonetics , vol.16 , Issue.1 , pp. 55-76
    • Seneff, S.1
  • 54
    • 33747807632 scopus 로고    scopus 로고
    • Sharma, S., Vermeulen, P., Hermansky, H., 1998. Combining information from multiple classifiers to speaker verification. In: Proc. RL2C, France, pp. 115-119.
  • 55
    • 84885499464 scopus 로고
    • Optimal quantization of line LSP parameters
    • Soong F.K., and Juang B.H. Optimal quantization of line LSP parameters. IEEE Trans. SAP 1 (1993) 15-24
    • (1993) IEEE Trans. SAP , vol.1 , pp. 15-24
    • Soong, F.K.1    Juang, B.H.2
  • 56
    • 0018701386 scopus 로고
    • Use of visual information for phonetic perception
    • Summerfield Q. Use of visual information for phonetic perception. Phonetics 36 (1979) 314-331
    • (1979) Phonetics , vol.36 , pp. 314-331
    • Summerfield, Q.1
  • 57
    • 0033879110 scopus 로고    scopus 로고
    • Tekalp, A.M., Ostermann, J., 2000. Face and2-D mesh animation in MPEG-4. In: Sig. Processing: Image Comm. 15, pp. 387-421.
  • 58
    • 0030682291 scopus 로고    scopus 로고
    • Tibrewala, S., Hermansky, H., 1997. Sub-band based recognition of noisy speech. In: Proc. of ICASSP, Munich, Germany, pp. 1255-1258.
  • 59
    • 0023397578 scopus 로고
    • A versatile camera calibration technique for high-accuracy 3D machine vision metrology
    • Tsai R.Y. A versatile camera calibration technique for high-accuracy 3D machine vision metrology. IEEE J. Robot. Automat. 3 (1987) 323-344
    • (1987) IEEE J. Robot. Automat. , vol.3 , pp. 323-344
    • Tsai, R.Y.1
  • 60
    • 0029462324 scopus 로고    scopus 로고
    • Waters, K., Frisbie, J., 1995. A coordinated muscle model for speech animation. In: Proc. of Graphics Interface, Ontario, pp. 163-170.
  • 61
    • 33747788055 scopus 로고    scopus 로고
    • Waters, K., Levergood, T., 1993. DECface: an automatic lip synchronization algorithm for synthetic faces, RLE, Cambridge, MA Tech. Rep. CRL 93/4.
  • 62
    • 0343081513 scopus 로고    scopus 로고
    • Reduction techniques for exemplar-based learning algorithms
    • Wilson D.R., and Martinez T.R. Reduction techniques for exemplar-based learning algorithms. Mach. Learning 38 3 (2000) 257-286
    • (2000) Mach. Learning , vol.38 , Issue.3 , pp. 257-286
    • Wilson, D.R.1    Martinez, T.R.2
  • 63
    • 0032179320 scopus 로고    scopus 로고
    • Lip movement synthesis from speech based on Hidden Markov models
    • Yamamoto E., Nakamura S., and Shikano K. Lip movement synthesis from speech based on Hidden Markov models. Speech Commun. 28 (1998) 105-115
    • (1998) Speech Commun. , vol.28 , pp. 105-115
    • Yamamoto, E.1    Nakamura, S.2    Shikano, K.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.