SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 48, Issue 6, 2006, Pages 598-615

A comparison of acoustic coding models for speech-driven facial animation

(4) Kakumanu, Praveen a Esposito, Anna b Garcia, Oscar N c Gutierrez Osuna, Ricardo d

a Wright State University (United States)

b SECOND UNIVERSITY OF NAPLES (Italy)

c UNIVERSITY OF NORTH TEXAS (United States)

d TEXAS A AND M UNIVERSITY (United States)

Author keywords

Audio visual mapping; Linear discriminants analysis; Speech driven facial animation

Indexed keywords

ANIMATION; COMPUTER SIMULATION; GESTURE RECOGNITION; LINEAR SYSTEMS; NATURAL FREQUENCIES; SPEECH ANALYSIS;

AUDIO VISUAL MAPPING; LINEAR DISCRIMINANTS ANALYSIS; SPECTRAL ENERGY; SPEECH DRIVEN FACIAL ANIMATION;

FACE RECOGNITION;

EID: 33747766904 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2005.09.005 Document Type: Article

Times cited : (18)

References (63)

1
- 0011048689
- Plateaus, catastrophes and the structuring of vowel systems
- Abry C., Boe L.J., and Schwartz J.L. Plateaus, catastrophes and the structuring of vowel systems. J. Phonet. 17 (1989) 47-54
- (1989) J. Phonet. , vol.17 , pp. 47-54
- Abry, C.¹ Boe, L.J.² Schwartz, J.L.³

2
- 0033100056
- Codebook based face point trajectory synthesis algorithm using speech input
- Arslan L.M., and Talkin D. Codebook based face point trajectory synthesis algorithm using speech input. Speech Commun. 27 (1999) 81-93
- (1999) Speech Commun. , vol.27 , pp. 81-93
- Arslan, L.M.¹ Talkin, D.²

3
- 84890517975
- Least-square fitting of two 3-d point sets
- Arun K.S., Huang T.S., and Blostein S.D. Least-square fitting of two 3-d point sets. IEEE Trans. PAMI 9 5 (1987) 698-700
- (1987) IEEE Trans. PAMI , vol.9 , Issue.5 , pp. 698-700
- Arun, K.S.¹ Huang, T.S.² Blostein, S.D.³

4
- 0035574930
- Aversano, G., Esposito, A., Esposito, A., Marinaro, M., 2001. A new text-independent method for phoneme segmentation. In: Proc. IEEE-MWSCAS Conference, Dayton, OH, pp. 516-519.

5
- 33747792416
- Balan, N., 2003. Analysis and Evaluation of Factors Affecting Speech Driven Facial Animation, MS Thesis, Dept. of Computer Science and Engineering, Wright State University.

6
- 0032178686
- Benoit, C., Le Goff, B., 1998. audio-visual speech synthesis from French text: eight years of models, designs, and evaluation at the ICP, Speech Commun. 26, 117-129.

7
- 0030362791
- Bernstein, L.E., Benoit, C., 1996. For speech perception by humans or machines, three senses are better than one. In: Proc. ICSLP, Philadelphia 3, pp. 1477-1480.

8
- 33747751287
- Beskow, J., 1995. Rule-based visual speech synthesis, Proc. EUROSPEECH, Madrid, Spain 1, pp. 299-302.

9
- 0018455310
- Suppression of acoustic noise in speech using spectral subtraction
- Boll S.F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. ASSP 27 2 (1979) 112-113
- (1979) IEEE Trans. ASSP , vol.27 , Issue.2 , pp. 112-113
- Boll, S.F.¹

10
- 84937437186
- Brand, M., 1999. Voice puppetry, Proc. SIGGRAPH, LA, California, pp. 21-28.

11
- 33747783965
- Bregler, C., Omohundro, S., 1995. Nonlinear image interpolation using manifold learning. In: Tesauro, G., Touretzky, D., Leen, T. (Eds.), Advances in Neural Information Processing Systems 7, MIT press, Cambridge, pp. 401-408.

12
- 33747755815
- Bryll, R., Ma, X., Quek, F., 1999. Camera calibration utility description, VisLab Tech. Rep., University of Illinois at Chicago.

13
- 33747764018
- Caldognetto, E.M., Vagges, K., Borghese, N.A., Ferrigno, G., 1989. Automatic analysis of lip and jaw kinematics in VCV sequences. In: Proc. of EUROSPEECH, Paris 2, pp. 453-456.

14
- 0001514782
- Modeling coarticulation in synthetic visual speech
- Thalmann N.M., and Thalmann D. (Eds), Springer
- Cohen M., and Massaro D.W. Modeling coarticulation in synthetic visual speech. In: Thalmann N.M., and Thalmann D. (Eds). Models and Techniques in Computer Animation (1993), Springer 141-155
- (1993) Models and Techniques in Computer Animation , pp. 141-155
- Cohen, M.¹ Massaro, D.W.²

15
- 33747779045
- Coianiz, T., Torresani, L., Caprile, L., 1995. 2D deformable models for visual speech analysis. In: Stork, D., Hennecke, M. (Eds.), Speech Reading by Man and Machine. Springer, pp. 391-398.

16
- 0003922190
- Wiley, New York
- Duda R.O., Hart P.E., and Stork D.G. Pattern Classification. second ed. (2001), Wiley, New York
- (2001) Pattern Classification. second ed.
- Duda, R.O.¹ Hart, P.E.² Stork, D.G.³

17
- 0016987103
- Duttweiler, D., Messerschmitt, D., 1976. Nearly instantaneous companding for nonuniformly quantized PCM. In: IEEE Trans. on Comm., COM-24, pp. 864-873.

18
- 33747764558
- Essa, I., 1995. Analysis, interpretation, and synthesis of facial expression, Ph.D. thesis, MIT Media Arts and Sciences, Cambridge, MA.

19
- 0034207427
- Visual speech synthesis by morphing visemes
- Ezzat T., and Poggio T. Visual speech synthesis by morphing visemes. J. Comput. Vis. 38 1 (2000) 45-57
- (2000) J. Comput. Vis. , vol.38 , Issue.1 , pp. 45-57
- Ezzat, T.¹ Poggio, T.²

20
- 33747756511
- Finn, K., 1986. An investigation of visible lip information to be used in automatic speech recognition, Ph.D. dissertation, Dept. CS, Georgetown University, Washington, DC.

21
- 33747750568
- Fu, S., 2002. Visual Mapping Based on Hidden Markov Models, MS Thesis, Dept. of Computer Science and Engineering, Wright State University.

22
- 16244385915
- Fu, S., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P.K., Garcia, O.N., 2005. Audio/Visual Mapping with Cross-Modal Hidden Markov Models. IEEE Transactions on Multimedia 7, No. 2, April.

23
- 33747756510
- Garofolo, J., Lamel, L.F., Fisher, W.M., Fiscus, J.G., Pellet, D.S., Dahlgren, N.L, 1988. The DARPA TIMIT CDROM. Available from LDC: .

24
- 33747755814
- Goldschen, A.J., 1993. Continuous automatic speech recognition by lipreading, Ph.D. thesis, George Washington University.

25
- 0345443171
- Temporal properties of spontaneous speech-a syllable-centric perspective
- Greenberg S., Carvey H., Hitchcock L., and Chang S. Temporal properties of spontaneous speech-a syllable-centric perspective. J. Phonet. 31 3-4 (2003) 465-485
- (2003) J. Phonet. , vol.31 , Issue.3-4 , pp. 465-485
- Greenberg, S.¹ Carvey, H.² Hitchcock, L.³ Chang, S.⁴

26
- 33747784294
- Gutierrez-Osuna, R., Kakumanu, P., Esposito, A., Garcia, O.N., Bojorquez, A., Castillo, J., Rudomin, I., 2002. WSU Technical.Report CS-WSU-02-03, Dayton, OH.

27
- 13144278330
- Speech-driven Facial Animation with Realistic Dynamics
- Gutierrez-Osuna R., Kakumanu P.K., Esposito A., Garcia O.N., Bojorquez A., Castillo J.L., and Rudomin I.J. Speech-driven Facial Animation with Realistic Dynamics. IEEE Trans. Multimedia 7 1 (2005) 33-42
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.1 , pp. 33-42
- Gutierrez-Osuna, R.¹ Kakumanu, P.K.² Esposito, A.³ Garcia, O.N.⁴ Bojorquez, A.⁵ Castillo, J.L.⁶ Rudomin, I.J.⁷

28
- 0028517164
- RASTA processing of speech
- Hermansky H., and Morgan N. RASTA processing of speech. IEEE Trans. SAP 2 4 (1994) 578-589
- (1994) IEEE Trans. SAP , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

29
- 0036650837
- Real-time speech-driven face animation with expressions using neural networks
- Hong P., Wen Z., and Huang T.S. Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Networks 13 4 (2002) 916-927
- (2002) IEEE Trans. Neural Networks , vol.13 , Issue.4 , pp. 916-927
- Hong, P.¹ Wen, Z.² Huang, T.S.³

30
- 33747751903
- Itakura, F., 1975. Line spectrum representation of linear prediction coefficients of speech signal, JASA57, pp. 535 (abstract).

31
- 0031220766
- Acoustic-labial speaker verification
- Jourlin P., Luettin J., Genoud D., and Wassner H. Acoustic-labial speaker verification. Patt. Rec. Lett. 18 (1997) 853-858
- (1997) Patt. Rec. Lett. , vol.18 , pp. 853-858
- Jourlin, P.¹ Luettin, J.² Genoud, D.³ Wassner, H.⁴

32
- 34250090755
- Snakes: active contour models
- Kass M., Witkin A., and Terzopoulos D. Snakes: active contour models. Int. J. Comput. Vis. 1 4 (1988) 321-331
- (1988) Int. J. Comput. Vis. , vol.1 , Issue.4 , pp. 321-331
- Kass, M.¹ Witkin, A.² Terzopoulos, D.³

33
- 0032778055
- Interlacing properties of line spectrum pair frequencies
- Kim H., and Lee H. Interlacing properties of line spectrum pair frequencies. IEEE Trans. SAP 7 (1999) 87-91
- (1999) IEEE Trans. SAP , vol.7 , pp. 87-91
- Kim, H.¹ Lee, H.²

34
- 84989489267
- Klatt, D.H., 1982. Prediction of perceived phonetic distance from critical band spectra: a first step. In: Proc. of ICASSP, Paris, pp. 1278-1281.

35
- 33747757580
- Kühnert, B., Nolan, F., 1999. The origin of coarticulation, in Coarticulation: theory, data, and techniques. In: Harcastle, W., Helwett, N. (Eds.), Cambridge University Press, pp. 7-29.

36
- 0029270677
- Converting speech into lip movements: a multimedia telephone for hard of hearing people
- Lavagetto F. Converting speech into lip movements: a multimedia telephone for hard of hearing people. IEEE Trans. Rehab. Eng. 3 1 (1995) 90-102
- (1995) IEEE Trans. Rehab. Eng. , vol.3 , Issue.1 , pp. 90-102
- Lavagetto, F.¹

37
- 0029182694
- Lee, Y., Terzopoulos, D., Waters, K., 1995. Realisitc modeling for facial animation. In: Proc. of SIGGRAPH, LA, California, pp. 55-62.

38
- 0032165329
- Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis
- Leps¢y S., and Curinga S. Conversion of articulatory parameters into active shape model coefficients for lip motion representation and synthesis. Signal Process. Image Commun. 13 (1998) 209-225
- (1998) Signal Process. Image Commun. , vol.13 , pp. 209-225
- Lepscy, S.¹ Curinga, S.²

39
- 33747756160
- Luttin, J., Thacher, N.A., Beet, S.W., 1996. Active shape models for visual speech feature extraction. In: Stork, D., Hennecke, M. (Eds.), Speech-Reading by Man and Machine, vol. 150. Springer, pp. 383-390.

40
- 0003874959
- Springer
- Markel J., and Gray A. Linear Prediction of Speech (1976), Springer
- (1976) Linear Prediction of Speech
- Markel, J.¹ Gray, A.²

41
- 0004084456
- MIT Press
- Massaro D.W. Perceiving Talking Faces: From Speech Perception to a Behavioral Principle (1997), MIT Press
- (1997) Perceiving Talking Faces: From Speech Perception to a Behavioral Principle
- Massaro, D.W.¹

42
- 33747786384
- Massaro, D.W., Beskow, J., Cohen, M.M., Fry, C.L., Rodriquez, T., 1999. Picture my voice: audio to visual speech synthesis using Artificial Neural Networks. In: Proc. AVSP, Santa Cruz, CA, pp. 133-138.

43
- 0032207399
- Speaker independence in automated lip-sync for audio-video communication
- McAllister D.F., Rodman R.D., Bitzer D.K., and Freeman A.S. Speaker independence in automated lip-sync for audio-video communication. Comput. Networks ISDN Syst. 30 (1998) 1975-1980
- (1998) Comput. Networks ISDN Syst. , vol.30 , pp. 1975-1980
- McAllister, D.F.¹ Rodman, R.D.² Bitzer, D.K.³ Freeman, A.S.⁴

44
- 0020960564
- Physical characteristics of lips underlying vowel lipreading performances
- Montgomery A., and Jackson P. Physical characteristics of lips underlying vowel lipreading performances. JASA 73 6 (1983) 2134-2144
- (1983) JASA , vol.73 , Issue.6 , pp. 2134-2144
- Montgomery, A.¹ Jackson, P.²

45
- 0026156861
- A Media conversion from speech to facial image for man-machine interface
- Morishima S., and Harashima H. A Media conversion from speech to facial image for man-machine interface. IEEE J. Selected Areas Commun. 9 4 (1991) 594-600
- (1991) IEEE J. Selected Areas Commun. , vol.9 , Issue.4 , pp. 594-600
- Morishima, S.¹ Harashima, H.²

46
- 0035251712
- Speech-to-lip movements synthesis by maximizing audio-visual joint probability
- Nakamura S., and Yamamoto E. Speech-to-lip movements synthesis by maximizing audio-visual joint probability. J. VLSI Signal Proc. 27 (2001) 119-126
- (2001) J. VLSI Signal Proc. , vol.27 , pp. 119-126
- Nakamura, S.¹ Yamamoto, E.²

47
- 0020202671
- Parameterized models for facial animation
- Parke F.I. Parameterized models for facial animation. IEEE Comput. Graph. Appl. 2 9 (1982) 61-68
- (1982) IEEE Comput. Graph. Appl. , vol.2 , Issue.9 , pp. 61-68
- Parke, F.I.¹

48
- 0004274888
- McGraw Hill (Chapter 3)
- Parsons T.W. Voice and Speech Processing (1986), McGraw Hill (Chapter 3)
- (1986) Voice and Speech Processing
- Parsons, T.W.¹

49
- 0002473893
- Generating facial expressions for speech
- Pelachaud C., Badler N.I., and Steedman M. Generating facial expressions for speech. Cognit. Sci. 20 (1996) 1-46
- (1996) Cognit. Sci. , vol.20 , pp. 1-46
- Pelachaud, C.¹ Badler, N.I.² Steedman, M.³

50
- 0027659197
- Signal modeling techniques in speech recognition
- Picone J.W. Signal modeling techniques in speech recognition. Proc. IEEE 81 9 (1993) 1215-1247
- (1993) Proc. IEEE , vol.81 , Issue.9 , pp. 1215-1247
- Picone, J.W.¹

51
- 0003425258
- Prentice-Hall
- Rabiner L.R., and Schafer R.W. Digital Processing of Speech Signals (1978), Prentice-Hall
- (1978) Digital Processing of Speech Signals
- Rabiner, L.R.¹ Schafer, R.W.²

52
- 0032180188
- Adaptive fusion of acoustic and visual sources for automatic speech recognition
- Rogozan A., and Deléglise P. Adaptive fusion of acoustic and visual sources for automatic speech recognition. Speech Commun. 26 (1998) 149-161
- (1998) Speech Commun. , vol.26 , pp. 149-161
- Rogozan, A.¹ Deléglise, P.²

53
- 84928837806
- A joint synchrony/mean-rate model of auditory speech processing
- Seneff S. A joint synchrony/mean-rate model of auditory speech processing. J. Phonetics 16 1 (1988) 55-76
- (1988) J. Phonetics , vol.16 , Issue.1 , pp. 55-76
- Seneff, S.¹

54
- 33747807632
- Sharma, S., Vermeulen, P., Hermansky, H., 1998. Combining information from multiple classifiers to speaker verification. In: Proc. RL2C, France, pp. 115-119.

55
- 84885499464
- Optimal quantization of line LSP parameters
- Soong F.K., and Juang B.H. Optimal quantization of line LSP parameters. IEEE Trans. SAP 1 (1993) 15-24
- (1993) IEEE Trans. SAP , vol.1 , pp. 15-24
- Soong, F.K.¹ Juang, B.H.²

56
- 0018701386
- Use of visual information for phonetic perception
- Summerfield Q. Use of visual information for phonetic perception. Phonetics 36 (1979) 314-331
- (1979) Phonetics , vol.36 , pp. 314-331
- Summerfield, Q.¹

57
- 0033879110
- Tekalp, A.M., Ostermann, J., 2000. Face and2-D mesh animation in MPEG-4. In: Sig. Processing: Image Comm. 15, pp. 387-421.

58
- 0030682291
- Tibrewala, S., Hermansky, H., 1997. Sub-band based recognition of noisy speech. In: Proc. of ICASSP, Munich, Germany, pp. 1255-1258.

59
- 0023397578
- A versatile camera calibration technique for high-accuracy 3D machine vision metrology
- Tsai R.Y. A versatile camera calibration technique for high-accuracy 3D machine vision metrology. IEEE J. Robot. Automat. 3 (1987) 323-344
- (1987) IEEE J. Robot. Automat. , vol.3 , pp. 323-344
- Tsai, R.Y.¹

60
- 0029462324
- Waters, K., Frisbie, J., 1995. A coordinated muscle model for speech animation. In: Proc. of Graphics Interface, Ontario, pp. 163-170.

61
- 33747788055
- Waters, K., Levergood, T., 1993. DECface: an automatic lip synchronization algorithm for synthetic faces, RLE, Cambridge, MA Tech. Rep. CRL 93/4.

62
- 0343081513
- Reduction techniques for exemplar-based learning algorithms
- Wilson D.R., and Martinez T.R. Reduction techniques for exemplar-based learning algorithms. Mach. Learning 38 3 (2000) 257-286
- (2000) Mach. Learning , vol.38 , Issue.3 , pp. 257-286
- Wilson, D.R.¹ Martinez, T.R.²

63
- 0032179320
- Lip movement synthesis from speech based on Hidden Markov models
- Yamamoto E., Nakamura S., and Shikano K. Lip movement synthesis from speech based on Hidden Markov models. Speech Commun. 28 (1998) 105-115
- (1998) Speech Commun. , vol.28 , pp. 105-115
- Yamamoto, E.¹ Nakamura, S.² Shikano, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.