SCOPUS 정보 검색 플랫폼

Journal of the Acoustical Society of America

Volumn 124, Issue 5, 2008, Pages 3183-3190

A linear model of acoustic-to-facial mapping: Model parameters, data set size, and generalization across speakers

(3) Craig, Matthew S a Van Lieshout, Pascal a Wong, Willy a

a UNIVERSITY OF TORONTO (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

ACOUSTIC SIGNALS; ACOUSTIC WINDOWS; AUDIO VISUALS; CRITICAL SIZES; DATA SETS; FACIAL MAPPINGS; FACIAL MOTIONS; LINEAR MODELS; LINEAR TRANSFORMATIONS; MODEL PARAMETERS; RECORDED MOTIONS; SPEECH PERCEPTIONS; TRAINING SETS; VISUAL ASPECTS; VISUAL SPEECHES; WINDOW SIZES;

ACOUSTICS; ANIMATION; FORECASTING; MATHEMATICAL TRANSFORMATIONS; SPEECH; WINDOWS;

MATHEMATICAL MODELS;

ACCURACY; ACOUSTICS; ADULT; ARTICLE; CORRELATION ANALYSIS; FACIAL EXPRESSION; FACIES; FEMALE; HUMAN; HUMAN EXPERIMENT; MALE; MATHEMATICAL MODEL; NORMAL HUMAN; PARAMETER; PHONEME; PRIORITY JOURNAL; SPEECH; SPEECH DISCRIMINATION; SPEECH PERCEPTION; STATISTICAL MODEL;

ACOUSTICS; FACE; FACIAL MUSCLES; HUMANS; INTERPERSONAL RELATIONS; MODELS, THEORETICAL; MOTOR ACTIVITY; MOVEMENT; PERCEPTION; SPEECH; SPEECH INTELLIGIBILITY;

EID: 56749174163 PISSN: 00014966 EISSN: None Source Type: Journal
DOI: 10.1121/1.2982369 Document Type: Article

Times cited : (10)

References (34)

1
- 56749186586
- " Proceedings of Audio-Visual Speech Processing, Santa-Cruz
- Agelfors, E., Beskow, J., Granstrom, B., Lundenber, M., Salvi, G., Spens, K., and Ohman, R. (1999). " Synthetic visual speech driven from auditory speech.," Proceedings of Audio-Visual Speech Processing, Santa-Cruz, pp. 123-127.
- (1999) Synthetic Visual Speech Driven from Auditory Speech , pp. 123-127
- Agelfors, E.¹ Beskow, J.² Granstrom, B.³ Lundenber, M.⁴ Salvi, G.⁵ Spens, K.⁶ Ohman, R.⁷

2
- 0012066381
- Spatial and temporal variability in gestural specification
- in, edited by W. Hulstijn, F. Peters, and P. van Lieshout (Elsevier Science, Amsterdam)
- Alfonso, P. J., and Van Lieshout, P. (1997). " Spatial and temporal variability in gestural specification.," in Speech Production: Motor Control, Brain Research and Fluency Disorders, edited by, W. Hulstijn, F. Peters, and, P. van Lieshout, (Elsevier Science, Amsterdam), pp. 151-160.
- (1997) Speech Production: Motor Control, Brain Research and Fluency Disorders , pp. 151-160
- Alfonso, P.J.¹ Van Lieshout, P.²

3
- 0017968519
- Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer sorting technique
- ".
- Atal, B. S., Chang, J. J., Mathews, M. V., and Tukey, J. W. (1978). " Inversion of articulatory-to-acoustic transformation in the vocal tract by a computer sorting technique.," J. Acoust. Soc. Am. 63, 1535-1555.
- (1978) J. Acoust. Soc. Am. , vol.63 , pp. 1535-1555
- Atal, B.S.¹ Chang, J.J.² Mathews, M.V.³ Tukey, J.W.⁴

4
- 56749183149
- Proceedings of Audio-Visual Speech Processing, Santa-Cruz
- Barker, J. P., and Berthommier, F. (1999). " Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models.," Proceedings of Audio-Visual Speech Processing, Santa-Cruz, pp. 112-117.
- (1999) Estimation of Speech Acoustics from Visual Speech Features: A Comparison of Linear and Non-linear Models , pp. 112-117
- Barker, J.P.¹ Berthommier, F.²

5
- 56749105262
- in Smart Object Conference, Grenoble, France.
- Berthommier, F. (2003). " Direct synthesis of video from speech sounds for new telecommunication applications.," in Smart Object Conference, Grenoble, France,.
- (2003) Direct Synthesis of Video from Speech Sounds for New Telecommunication Applications
- Berthommier, F.¹

6
- 84871613031
- Proceedings of Eurospeech, Aalborg
- Brugnara, F. (2001). " Model agglomeration for context-dependent acoustic modeling.," Proceedings of Eurospeech, Aalborg, pp. 1641-1644.
- (2001) Model Agglomeration for Context-dependent Acoustic Modeling , pp. 1641-1644
- Brugnara, F.¹

7
- 85055826239
- (CRC Press, Boca Raton).
- Chou, B. H., and Juang, W. (2003). Pattern Recognition in Speech and Language Processing (CRC Press, Boca Raton).
- (2003) Pattern Recognition in Speech and Language Processing
- Chou, B.H.¹ Juang, W.²

8
- 34447624135
- Suitability of a UV-based video recording system for the analysis of small facial motions during speech
- ".
- Craig, M., Van Lieshout, P., and Wong, W. (2007). " Suitability of a UV-based video recording system for the analysis of small facial motions during speech.," Speech Commun. 49, 679-686.
- (2007) Speech Commun. , vol.49 , pp. 679-686
- Craig, M.¹ Van Lieshout, P.² Wong, W.³

9
- 56749174535
- " Proceedings of European Signal Processing Conference, Triest
- Curinga, S., Lavaghetto, F., and Vignoli, F. (1996). " Lip movements synthesis using time delay neural networks.," Proceedings of European Signal Processing Conference, Triest, pp. 999-1002.
- (1996) Lip Movements Synthesis Using Time Delay Neural Networks , pp. 999-1002
- Curinga, S.¹ Lavaghetto, F.² Vignoli, F.³

10
- 0003616059
- (Lawrence Erlbaum Associates, Hillsdale, NJ).
- Dodd, B. (1987). Hearing by Eye: The Psychology of Lip Reading (Lawrence Erlbaum Associates, Hillsdale, NJ).
- (1987) Hearing by Eye: The Psychology of Lip Reading
- Dodd, B.¹

11
- 56749176065
- (Springer, New York).
- Ferguson, G. A. (1984). Statistical Analysis, Synthesis, and Perception (Springer, New York).
- (1984) Statistical Analysis, Synthesis, and Perception
- Ferguson, G.A.¹

12
- 56749105261
- Lip and jaw coarticulation
- in, edited by W. Hardcastle and N. Hewlett (Cambridge University Press, Cambridge)
- Fletcher, J., and Harrington, J. (1999). " Lip and jaw coarticulation.," in Coarticulation: Theory, Data and Techniques, edited by, W. Hardcastle, and, N. Hewlett, (Cambridge University Press, Cambridge), pp. 164-178.
- (1999) Coarticulation: Theory, Data and Techniques , pp. 164-178
- Fletcher, J.¹ Harrington, J.²

13
- 16244385915
- Audio/visual mapping with cross-modal hidden Markov models
- ".
- Fu, S., Gutierrez-Osuna, R., Esposito, A., Kakumanu, P., and Garcia, O. (2005). " Audio/visual mapping with cross-modal hidden Markov models.," IEEE Trans. Multimedia 7, 243-252.
- (2005) IEEE Trans. Multimedia , vol.7 , pp. 243-252
- Fu, S.¹ Gutierrez-Osuna, R.² Esposito, A.³ Kakumanu, P.⁴ Garcia, O.⁵

14
- 0034113506
- Lip-jaw and tongue-jaw coordination during rate-controlled syllable repetitions
- Hertrick, I., and Ackermann, H. (2000). " Lip-jaw and tongue-jaw coordination during rate-controlled syllable repetitions.," J. Acoust. Soc. Am. 107, 2236-2246.
- (2000) J. Acoust. Soc. Am. , vol.107 , pp. 2236-2246
- Hertrick, I.¹ Ackermann, H.²

15
- 34247647975
- Inverting mappings from smooth paths through Rn to paths through Rm: A technique applied to recovering articulation from acoustics
- ".
- Hogden, J., Rubin, P., McDermott, E., Katagiri, S., and Goldstein, L. (2007). " Inverting mappings from smooth paths through Rn to paths through Rm: A technique applied to recovering articulation from acoustics.," Speech Commun. 49, 361-383.
- (2007) Speech Commun. , vol.49 , pp. 361-383
- Hogden, J.¹ Rubin, P.² McDermott, E.³ Katagiri, S.⁴ Goldstein, L.⁵

16
- 0036650837
- Real-time speech-driven face animation with expressions using neural networks
- ".
- Hong, P., Wen, Z., and Huang, T. S. (2002). " Real-time speech-driven face animation with expressions using neural networks.," IEEE Trans. Neural Netw. 13, 916-927.
- (2002) IEEE Trans. Neural Netw. , vol.13 , pp. 916-927
- Hong, P.¹ Wen, Z.² Huang, T.S.³

17
- 85009141770
- " Proceedings of the International Conference on Spoken Language Processing, Beijing, Vol.
- Jiang, J., Alwan, A., Bernstein, L., Keating, P., and Auer, E. (2000a). " On the correlation between facial movements, tongue movements and speech acoustics.," Proceedings of the International Conference on Spoken Language Processing, Beijing, Vol. 1, pp. 42-45.
- (2000) On the Correlation between Facial Movements, Tongue Movements and Speech Acoustics , vol.1 , pp. 42-45
- Jiang, J.¹ Alwan, A.² Bernstein, L.³ Keating, P.⁴ Auer, E.⁵

18
- 56749180289
- " Proceedings of the 5th International Conference on Signal Processing, Beijing
- Jiang, T., Li, Y., and Chen, H. (2000b). " A 1.44 Kbps vocoder based on LSP.," Proceedings of the 5th International Conference on Signal Processing, Beijing, pp. 697-701.
- (2000) A 1.44 Kbps Vocoder Based on LSP , pp. 697-701
- Jiang, T.¹ Li, Y.² Chen, H.³

19
- 64149118512
- " Proceedings of the International Conference on Multimedia and Expo, Lausanne
- Jiang, J., Alwan, A., Bernstein, L., Alwan, A., and Keating, P. (2002). " Predicting face movements from speech acoustics using spectral dynamics.," Proceedings of the International Conference on Multimedia and Expo, Lausanne, pp. 181-184.
- (2002) Predicting Face Movements from Speech Acoustics Using Spectral Dynamics , pp. 181-184
- Jiang, J.¹ Alwan, A.² Bernstein, L.³ Alwan, A.⁴ Keating, P.⁵

20
- 50249158527
- Version 2. Technical report, Department Electrical and Computer Engineering, McGill University, Montreal.
- Kabal, P. (2003). " Time windows for linear prediction of speech.," Version 2. Technical report, Department Electrical and Computer Engineering, McGill University, Montreal.
- (2003) Time Windows for Linear Prediction of Speech
- Kabal, P.¹

21
- 13144267153
- M.Sc. thesis, Computer Science Department, Wright State University, Dayton, OH.
- Kakumanu, P. (2002). " Audio-video processing for speech driven facial animation.," M.Sc. thesis, Computer Science Department, Wright State University, Dayton, OH.
- (2002) Audio-video Processing for Speech Driven Facial Animation
- Kakumanu, P.¹

22
- 33747766904
- A comparison of acoustic coding models for speech-driven facial animation
- ".
- Kakumanu, P., Esposito, A., Garcia, O., and Gutierrez-Osuna, R. (2006). " A comparison of acoustic coding models for speech-driven facial animation.," Speech Commun. 48, 598-615.
- (2006) Speech Commun. , vol.48 , pp. 598-615
- Kakumanu, P.¹ Esposito, A.² Garcia, O.³ Gutierrez-Osuna, R.⁴

23
- 0004225947
- (Singular Publishing Group, Inc., San Diego).
- Kent, R. D., and Read, C. (1992). The Acoustic Analysis of Speech (Singular Publishing Group, Inc., San Diego).
- (1992) The Acoustic Analysis of Speech
- Kent, R.D.¹ Read, C.²

24
- 0024344665
- Segmental intelligibility of synthetic speech produced by rule
- ".
- Logan, J. S., Greene, B. G., and Pisoni, D. B. (1989). " Segmental intelligibility of synthetic speech produced by rule.," J. Acoust. Soc. Am. 86, 566-581.
- (1989) J. Acoust. Soc. Am. , vol.86 , pp. 566-581
- Logan, J.S.¹ Greene, B.G.² Pisoni, D.B.³

25
- 56749184645
- " Proceedings of Audio-Visual Speech Processing, Santa-Cruz
- Massaro, D. W., Beskow, J., Cohen, M. M., Fry, C. L., and Rodriquez, T. (1999). " Picture my voice: Audio to visual speech synthesis using artificial neural networks.," Proceedings of Audio-Visual Speech Processing, Santa-Cruz, pp. 133-138.
- (1999) Picture My Voice: Audio to Visual Speech Synthesis Using Artificial Neural Networks , pp. 133-138
- Massaro, D.W.¹ Beskow, J.² Cohen, M.M.³ Fry, C.L.⁴ Rodriquez, T.⁵

26
- 33745712098
- Speaker-independent 3D face synthesis driven by speech and text
- ".
- Savran, A., Arslan, L., and Akarun, L. (2006). " Speaker-independent 3D face synthesis driven by speech and text.," Signal Process. 86, 2932-2951.
- (2006) Signal Process. , vol.86 , pp. 2932-2951
- Savran, A.¹ Arslan, L.² Akarun, L.³

27
- 0014077928
- Determination of the geometry of the human vocal tract by acoustic measurements
- Schroeder, M. R. (1967). " Determination of the geometry of the human vocal tract by acoustic measurements.," J. Acoust. Soc. Am. 41, 1002-1010.
- (1967) J. Acoust. Soc. Am. , vol.41 , pp. 1002-1010
- Schroeder, M.R.¹

28
- 0000352807
- Trade-oils in tongue, jaw and palate contributions to speech production
- Stone, M., and Vatikiotis-Bateson, E. (1995). " Trade-oils in tongue, jaw and palate contributions to speech production.," J. Phonetics 23, 81-100.
- (1995) J. Phonetics , vol.23 , pp. 81-100
- Stone, M.¹ Vatikiotis-Bateson, E.²

29
- 0027128576
- Lipreading and audio-visual speech perception
- Summerfield, Q. (1992). " Lipreading and audio-visual speech perception.," Philos. Trans. R. Soc. London, Ser. B 335 (1273), 71-78.
- (1992) Philos. Trans. R. Soc. London, Ser. B , vol.335 , Issue.1273 , pp. 71-78
- Summerfield, Q.¹

30
- 34147186624
- A coupled HMM approach to video-realistic speech animation
- Xie, L., and Liu, Z.-Q. (2007). " A coupled HMM approach to video-realistic speech animation.," Pattern Recogn. Lett. 40, 2325-2340.
- (2007) Pattern Recogn. Lett. , vol.40 , pp. 2325-2340
- Xie, L.¹ Liu, Z.-Q.²

31
- 0032179320
- Lip movement synthesis from speech based on hidden Markov models
- ".
- Yamamoto, E., Nakamura, S., and Shikano, K. (1998). " Lip movement synthesis from speech based on hidden Markov models.," Speech Commun. 26, 105-115.
- (1998) Speech Commun. , vol.26 , pp. 105-115
- Yamamoto, E.¹ Nakamura, S.² Shikano, K.³

32
- 0032178592
- Quantitative association of vocal-tract and facial behavior
- ".
- Yehia, H., Rubin, P., and Vatikiotis-Bateson, E. (1998). " Quantitative association of vocal-tract and facial behavior.," Speech Commun. 26, 23-43.
- (1998) Speech Commun. , vol.26 , pp. 23-43
- Yehia, H.¹ Rubin, P.² Vatikiotis-Bateson, E.³

33
- 0036656895
- Linking facial animation, head motion and speech acoustics
- ".
- Yehia, H., Kuratate, T., and Vatikiotis-Bateson, E. (2001). " Linking facial animation, head motion and speech acoustics.," J. Phonetics 30, 555-568.
- (2001) J. Phonetics , vol.30 , pp. 555-568
- Yehia, H.¹ Kuratate, T.² Vatikiotis-Bateson, E.³

34
- 33749437734
- Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis
- ".
- Zelezny, M., Krnoul, Z., Cisar, P., and Matousek, J. (2006). " Design, implementation and evaluation of the Czech realistic audio-visual speech synthesis.," Signal Process. 86, 3657-3673.
- (2006) Signal Process. , vol.86 , pp. 3657-3673
- Zelezny, M.¹ Krnoul, Z.² Cisar, P.³ Matousek, J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.