SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 17, Issue 3, 2009, Pages 411-422

Face active appearance modeling and speech acoustic information to recover articulation

(3) Katsamanis, Athanassios a Papandreou, George a Maragos, Petros a

a NATIONAL TECHNICAL UNIVERSITY OF ATHENS (Greece)

Author keywords

Active appearance models (AAMs); Audiovisual to articulatory speech inversion; Canonical correlation analysis (CCA); Multimodal fusion

Indexed keywords

ACTIVE APPEARANCE MODELS; ACTIVE APPEARANCE MODELS (AAMS); APPEARANCE MODELING; AUDIO FEATURES; AUDIOVISUAL-TO-ARTICULATORY SPEECH INVERSION; CANONICAL CORRELATION ANALYSIS; CANONICAL CORRELATION ANALYSIS (CCA); DYNAMIC INFORMATION; ELECTROMAGNETIC ARTICULOGRAPHY; FACE TRACKING; FACIAL ANALYSIS; ILL POSED; ILL-POSEDNESS; INVERSION PROCESS; INVERSION SCHEME; LINE SPECTRAL FREQUENCIES; LINEAR MAPPING; MARKOVIAN; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; MODEL SWITCHING; MULTI-MODAL; MULTI-STREAM HIDDEN MARKOV MODEL; MULTIMODAL FUSION; PIECEWISE LINEAR MODELS; POINTS OF INTEREST; SPEECH ACOUSTICS; SPEECH INVERSION; SPEECH PRODUCTION; VISUAL FEATURE EXTRACTION; VISUAL INFORMATION; VISUAL MODALITIES; VOCAL-TRACTS;

FACE RECOGNITION; FEATURE EXTRACTION; FREQUENCY ESTIMATION; HIDDEN MARKOV MODELS; PIECEWISE LINEAR TECHNIQUES; SPEECH RECOGNITION; VISUAL COMMUNICATION;

AUDIO ACOUSTICS;

EID: 70350574658 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2008.2008740 Document Type: Article

Times cited : (36)

References (45)

1
- 4544290191
- Recent advances in the automatic recognition of audio-visual speech
- Sep
- G. Potamianos, C. Neti, G. Gravier, and A. Garg, "Recent advances in the automatic recognition of audio-visual speech, " Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, Sep. 2003.
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴

2
- 0142216141
- Audiovisual speech synthesis
- Oct
- G. Bailly, M. Bérar, F. Elisei, and M. Odisio, "Audiovisual speech synthesis, " Int. J. Speech Technol., vol. 6, no. 4, pp. 331-346, Oct. 2003.
- (2003) Int. J. Speech Technol. , vol.6 , Issue.4 , pp. 331-346
- Bailly, G.¹ Bérar, M.² Elisei, F.³ Odisio, M.⁴

3
- 0032178592
- Quantitative association of vocal-tract and facial behavior
- H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, "Quantitative association of vocal-tract and facial behavior, " Speech Commun., vol. 26, pp. 23-43, 1998.
- (1998) Speech Commun , vol.26 , pp. 23-43
- Yehia, H.¹ Rubin, P.² Vatikiotis-Bateson, E.³

4
- 0017199877
- Hearing lips and seeing voices
- H. Mcgurk, J. Macdonald, Hearing lips and seeing voices, Nature, 264, 746-748, 1976.
- (1976) Nature , vol.264 , pp. 746-748
- Mcgurk, H.¹ Macdonald, J.²

5
- 0028259480
- Techniques for estimating vocal-tract shapes from the speech signal
- Jan
- J. Schroeter and M. Sondhi, "Techniques for estimating vocal-tract shapes from the speech signal, " IEEE Trans. Speech Audio Process., vol. 2, no. 1, pp. 133-150, Jan. 1994.
- (1994) IEEE Trans. Speech Audio Process , vol.2 , Issue.1 , pp. 133-150
- Schroeter, J.¹ Sondhi, M.²

6
- 33846680938
- Speech production knowledge in automatic speech recognition
- Feb
- S. King, J. Frankel, K. Livescu, E. McDermott, K. Richmond, and M. Wester, "Speech production knowledge in automatic speech recognition, " J. Acoust. Soc. Amer., vol. 121, no. 2, pp. 723-742, Feb. 2007.
- (2007) J. Acoust. Soc. Amer. , vol.121 , Issue.2 , pp. 723-742
- King, S.¹ Frankel, J.² Livescu, K.³ Mcdermott, E.⁴ Richmond, K.⁵ Wester, M.⁶

7
- 0001736204
- Speech coding based on physiological models of speech production
- S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker
- J. Schroeter and M. M. Sondhi, "Speech coding based on physiological models of speech production, " in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker, 1992.
- (1992) Advances in Speech Signal Processing
- Schroeter, J.¹ Sondhi, M.M.²

8
- 84894560828
- Designing the user interface of the computer-based speech training system artur based on early user tests
- O. Engwall, O. Bälter, A.-M. Öster, and H. Sidenbladh- Kjellström, "Designing the user interface of the computer-based speech training system ARTUR based on early user tests, " J. Behavior Inf. Technol., vol. 25, no. 4, pp. 353-365, 2006.
- (2006) J. Behavior Inf. Technol. , vol.25 , Issue.4 , pp. 353-365
- Engwall, O.¹ Bälter, O.² Öster, A.-M.³ Sidenbladh-Kjellström, H.⁴

9
- 22144465830
- Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion
- S. Ouni and Y. Laprie, "Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion, " J. Acoust. Soc. Amer., vol. 118, no. 1, pp. 444-460, 2005.
- (2005) J. Acoust. Soc. Amer. , vol.118 , Issue.1 , pp. 444-460
- Ouni, S.¹ Laprie, Y.²

10
- 0038359547
- Modelling the uncertainty in recovering articulation from acoustics
- K. Richmond, S. King, and P. Taylor, "Modelling the uncertainty in recovering articulation from acoustics, " Comput. Speech Lang., vol. 17, pp. 153-172, 2003.
- (2003) Comput. Speech Lang , vol.17 , pp. 153-172
- Richmond, K.¹ King, S.² Taylor, P.³

11
- 38649140222
- Statistical mapping between articulatory movements and acoustic spectrum using a gaussian mixture model
- T. Toda, A. W. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model, " Speech Commun., vol. 50, pp. 215-227, 2008.
- (2008) Speech Commun. , vol.50 , pp. 215-227
- Toda, T.¹ Black, A.W.² Tokuda, K.³

12
- 2142659020
- Estimation of articulatory movements from speech acoustics using an hmm-based speech production model
- S. Hiroya and M. Honda, "Estimation of articulatory movements from speech acoustics using an HMM-based speech production model, " IEEE Trans. Speech Audio Process., vol. 12, no. 2, pp. 175-185, Mar. 2004.
- (2004) IEEE Trans. Speech Audio Process , vol.12 , Issue.2 , pp. 175-185
- Hiroya, S.¹ Honda, M.²

13
- 85032752352
- Audiovisual speech processing
- Jan
- T. Chen, "Audiovisual speech processing, " IEEE Signal Process. Mag., vol. 18, no. 1, pp. 9-21, Jan. 2001.
- (2001) IEEE Signal Process. Mag. , vol.18 , Issue.1 , pp. 9-21
- Chen, T.¹

14
- 0032179320
- Lip movement synthesis from speech based on hidden markov models
- E. Yamamoto, S. Nakamura, and K. Shikano, "Lip movement synthesis from speech based on hidden Markov models, " Speech Commun., vol. 26, pp. 105-115, 1998.
- (1998) Speech Commun , vol.26 , pp. 105-115
- Yamamoto, E.¹ Nakamura, S.² Shikano, K.³

15
- 85162060060
- A probabilistic model for generating realistic speech movements from speech
- G. Englebienne, T. Cootes, and M. Rattray, "A probabilistic model for generating realistic speech movements from speech, " in Proc. Adv. Neural Inf. Process. Syst., 2007.
- (2007) Proc. Adv. Neural Inf. Process. Syst.
- Englebienne, G.¹ Cootes, T.² Rattray, M.³

16
- 0035426641
- Hidden markov model inversion for audio-to-visual conversion in an mpeg-4 facial animation system
- K. Choi, Y. Luo, and J.-N. Hwang, "Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system, " J. VLSI Signal Process., vol. 29, pp. 51-61, 2001.
- (2001) J. VLSI Signal Process. , vol.29 , pp. 51-61
- Choi, K.¹ Luo, Y.² Hwang, J.-N.³

17
- 33947583073
- Realistic mouth-synching for speech-driven talking face using articulatory modeling
- Apr
- L. Xie and Z.-Q. Liu, "Realistic mouth-synching for speech-driven talking face using articulatory modeling, " IEEE Trans. Multimedia, vol. 9, no. 3, pp. 500-510, Apr. 2007.
- (2007) IEEE Trans. Multimedia , vol.9 , Issue.3 , pp. 500-510
- Xie, L.¹ Liu, Z.-Q.²

18
- 0036874551
- On the relationship between face movements, tongue movements, and speech acoustics
- J. Jiang, A. Alwan, P. A. Keating, E. T. Auer, and L. E. Bernstein, "On the relationship between face movements, tongue movements, and speech acoustics, " EURASIP J. Appl. Signal Process., vol. 11, pp. 1174-1188, 2002.
- (2002) EURASIP J. Appl. Signal Process , vol.11 , pp. 1174-1188
- Jiang, J.¹ Alwan, A.² Keating, P.A.³ Auer, E.T.⁴ Bernstein, L.E.⁵

19
- 33745183111
- Introducing visual cues in acoustic-to-articulatory inversion
- O. Engwall, "Introducing visual cues in acoustic-to-articulatory inversion, " in Proc. Int. Conf. Spoken Lang. Process., 2005, pp. 3205-3208.
- (2005) Proc. Int. Conf. Spoken Lang. Process , pp. 3205-3208
- Engwall, O.¹

20
- 34548378893
- Reconstructing tongue movements from audio and video
- H. Kjellström, O. Engwall, and O. Bälter, "Reconstructing tongue movements from audio and video, " in Proc. Int. Conf. Spoken Lang. Process., 2006, pp. 2238-2241.
- (2006) Proc. Int. Conf. Spoken Lang. Process , pp. 2238-2241
- Kjellström, H.¹ Engwall, O.² Bälter, O.³

21
- 48149084421
- Audiovisual-to-articulatory speech inversion using hmms
- A. Katsamanis, G. Papandreou, and P. Maragos, "Audiovisual-to- articulatory speech inversion using HMMS, " in Proc. Int. Workshop Multimedia Signal Process. (MMSP), 2007, pp. 457-460.
- (2007) Proc. Int. Workshop Multimedia Signal Process. (MMSP) , pp. 457-460
- Katsamanis, A.¹ Papandreou, G.² Maragos, P.³

22
- 51449089369
- Audiovisual-to-articulatory speech inversion using active appearance models for the face and hidden markov models for the dynamics
- A. Katsamanis, G. Papandreou, and P. Maragos, "Audiovisual-to- articulatory speech inversion using active appearance models for the face and hidden Markov models for the dynamics, " in Proc. Int. Conf. Acoust., Speech, Signal Process., 2008, pp. 2237-2240.
- (2008) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 2237-2240
- Katsamanis, A.¹ Papandreou, G.² Maragos, P.³

23
- 0035363218
- Active appearance models
- Jun
- T. F. Cootes, G. J. Edwards, and C. J. Taylor, "Active appearance models, " IEEE Trans. Pattern Anal. Mach. Intell., vol. 23, no. 6, pp. 681-685, Jun. 2001.
- (2001) IEEE Trans. Pattern Anal. Mach. Intell , vol.23 , Issue.6 , pp. 681-685
- Cootes, T.F.¹ Edwards, G.J.² Taylor, C.J.³

24
- 0035680116
- Rapid object detection using a boosted cascade of simple features
- P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features, " in Proc. IEEE Int. Conf. Comp. Vision Pattern Recog., 2001, vol. I, pp. 511-518.
- (2001) Proc. IEEE Int. Conf. Comp. Vision Pattern Recog , vol.I , pp. 511-518
- Viola, P.¹ Jones, M.²

25
- 0010424152
- Acoustic-to-articulatory inversion using dynamical and phonological constraints
- S. Dusan and L. Deng, "Acoustic-to-articulatory inversion using dynamical and phonological constraints, " in Proc. Seminar Speech Production, 2000, pp. 237-240.
- (2000) Proc. Seminar Speech Production , pp. 237-240
- Dusan, S.¹ Deng, L.²

26
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Sep
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition, " IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

27
- 0037503670
- A multichannel articulatory speech database and its application for automatic speech recognition
- A. Wrench and W. Hardcastle, "A multichannel articulatory speech database and its application for automatic speech recognition, " in Proc. 5th Seminar Speech Production, Kloster Seeon, Bavaria, 2000, pp. 305-308. .
- (2000) Proc. 5th Seminar Speech Production , pp. 305-308
- Wrench, A.¹ Hardcastle, W.²

28
- 48149088768
- Resynthesis of 3d tongue movements from facial data
- O. Engwall and J. Beskow, "Resynthesis of 3D tongue movements from facial data, " in Proc. Eur. Conf. Speech Commun. Technol., 2003, pp. 2261-2264.
- (2003) Proc. Eur. Conf. Speech Commun. Technol. , pp. 2261-2264
- Engwall, O.¹ Beskow, J.²

29
- 0003607151
- New York: Academic
- K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis. New York: Academic, 1979.
- (1979) Multivariate Analysis.
- Mardia, K.V.¹ Kent, J.T.² Bibby, J.M.³

30
- 0032023788
- Wiener filters in canonical coordinates for transform coding, filtering, and quantizing
- May
- L. L. Scharf and J. K. Thomas, "Wiener filters in canonical coordinates for transform coding, filtering, and quantizing, " IEEE Trans. Speech Audio Process., vol. 46, no. 3, pp. 647-654, May 1998.
- (1998) IEEE Trans. Speech Audio Process. , vol.46 , Issue.3 , pp. 647-654
- Scharf, L.L.¹ Thomas, J.K.²

31
- 0000927638
- Predicting multivariate responses in multiple linear regression
- L. Breiman and J. H. Friedman, "Predicting multivariate responses in multiple linear regression, " J. Roy. Statist. Soc. (B), vol. 59, no. 1, pp. 3-54, 1997.
- (1997) J. Roy. Statist. Soc. (B) , vol.59 , Issue.1 , pp. 3-54
- Breiman, L.¹ Friedman, J.H.²

32
- 33846516584
- New York: Springer
- C. Bishop, Pattern Recognition and Machine Learning. New York: Springer, 2006.
- (2006) Pattern Recognition and Machine Learning
- Bishop, C.¹

33
- 84863731362
- Audiovisual speech inversion by switching dynamical modeling governed by a hidden markov process
- CD-ROM
- A. Katsamanis, G. Ananthakrishnan, G. Papandreou, P. Maragos, and O. Engwall, "Audiovisual speech inversion by switching dynamical modeling governed by a hidden Markov process, " in Proc. Eur. Signal Process. Conf. (EUSIPCO), 2008, CD-ROM.
- (2008) Proc. Eur. Signal Process. Conf. (EUSIPCO)
- Katsamanis, A.¹ Ananthakrishnan, G.² Papandreou, G.³ Maragos, P.⁴ Engwall, O.⁵

34
- 0004244302
- Englewood Cliffs, NJ: Prentice-Hall
- L. Rabiner and B. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.²

35
- 0001237218
- A maximum likelihood methodology for clusterwise linear regression
- W. DeSarbo and W. Cron, "A maximum likelihood methodology for clusterwise linear regression, " J. Classification, vol. 5, pp. 249-282, 1988.
- (1988) J. Classification , vol.5 , pp. 249-282
- Desarbo, W.¹ Cron, W.²

36
- 0003544881
- D. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer
- Speechreading by Humans and Machines, D. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer, 1996.
- (1996) Speechreading by Humans and Machines

37
- 0034842342
- Asynchronous stream modeling for large vocabulary audio-visual speech recognition
- J. Luettin, G. Potamianos, and C. Neti, "Asynchronous stream modeling for large vocabulary audio-visual speech recognition, " in Proc. Int. Conf. Acoust., Speech, Signal Process., 2001, pp. 169-172.
- (2001) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 169-172
- Luettin, J.¹ Potamianos, G.² Neti, C.³

38
- 51949086284
- Adaptive and constrained algorithms for inverse compositional active appearance model fitting
- G. Papandreou and P. Maragos, "Adaptive and constrained algorithms for inverse compositional active appearance model fitting, " in Proc. IEEE Int. Conf. Comp. Vision and Patern Recog., 2008.
- (2008) Proc. IEEE Int. Conf. Comp. Vision and Patern Recog.
- Papandreou, G.¹ Maragos, P.²

39
- 0032660758
- Direct least square fitting of ellipses
- May
- A. Fitzgibbon, M. Pilu, and R. Fisher, "Direct least square fitting of ellipses, " IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 5, pp. 476-480, May 1999.
- (1999) IEEE Trans. Pattern Anal. Mach. Intell. , vol.21 , Issue.5 , pp. 476-480
- Fitzgibbon, A.¹ Pilu, M.² Fisher, R.³

40
- 57549101447
- Audiovisual synchronization and fusion using canonical correlation analysis
- Nov
- M. E. Sargin, Y. Yemez, E. Erzin, and M. Tekalp, "Audiovisual synchronization and fusion using canonical correlation analysis, " IEEE Trans. Multimedia, vol. 9, no. 7, pp. 1396-1403, Nov. 2007.
- (2007) IEEE Trans. Multimedia , vol.9 , Issue.7 , pp. 1396-1403
- Sargin, M.E.¹ Yemez, Y.² Erzin, E.³ Tekalp, M.⁴

41
- 0003822743
- for HTK version 3.2, Cambridge Univ. Eng. Dept. Tech. Rep
- S. Young, G. Evermann, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book (for HTK version 3.2) Cambridge Univ. Eng. Dept., Tech. Rep, 2002.
- (2002) The HTK Book
- Young, S.¹ Evermann, G.² Kershaw, D.³ Moore, G.⁴ Odell, J.⁵ Ollason, D.⁶ Povey, D.⁷ Valtchev, V.⁸ Woodland, P.⁹

42
- 0002557614
- Line spectrum pair and speech data compression
- F. K. Soong and B.-H. Juang, "Line spectrum pair and speech data compression, " in Proc. Int. Conf. Acoust., Speech Signal Process, 1984, vol. 9, pp. 37-40.
- (1984) Proc. Int. Conf. Acoust., Speech Signal Process , vol.9 , pp. 37-40
- Soong, F.K.¹ Juang, B.-H.²

43
- 68149181313
- A comparison of acoustic features for articulatory inversion
- C. Qin and M. Carreira-Perpinan, "A comparison of acoustic features for articulatory inversion, " in Proc. Int. Conf. Spoken Lang. Process., 2007, pp. 2469-2472.
- (2007) Proc. Int. Conf. Spoken Lang. Process , pp. 2469-2472
- Qin, C.¹ Carreira-Perpinan, M.²

44
- 34047263009
- Visual model structures and synchrony constraints for audio-visual speech recognition
- May
- T. J. Hazen, "Visual model structures and synchrony constraints for audio-visual speech recognition, " IEEE Trans. Speech Audio Process., vol. 14, no. 3, pp. 1082-1089, May 2006.
- (2006) IEEE Trans. Speech Audio Process , vol.14 , Issue.3 , pp. 1082-1089
- Hazen, T.J.¹

45
- 0000807171
- Reduced-rank regression and canonical analysis
- M.-S. Tso, "Reduced-rank regression and canonical analysis, " J. R. Statist. Soc. (B), vol. 43, pp. 183-189, 1981.
- (1981) J. R. Statist. Soc. (B) , vol.43 , pp. 183-189
- Tso, M.-S.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.