메뉴 건너뛰기




Volumn 17, Issue 3, 2009, Pages 411-422

Face active appearance modeling and speech acoustic information to recover articulation

Author keywords

Active appearance models (AAMs); Audiovisual to articulatory speech inversion; Canonical correlation analysis (CCA); Multimodal fusion

Indexed keywords

ACTIVE APPEARANCE MODELS; ACTIVE APPEARANCE MODELS (AAMS); APPEARANCE MODELING; AUDIO FEATURES; AUDIOVISUAL-TO-ARTICULATORY SPEECH INVERSION; CANONICAL CORRELATION ANALYSIS; CANONICAL CORRELATION ANALYSIS (CCA); DYNAMIC INFORMATION; ELECTROMAGNETIC ARTICULOGRAPHY; FACE TRACKING; FACIAL ANALYSIS; ILL POSED; ILL-POSEDNESS; INVERSION PROCESS; INVERSION SCHEME; LINE SPECTRAL FREQUENCIES; LINEAR MAPPING; MARKOVIAN; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; MODEL SWITCHING; MULTI-MODAL; MULTI-STREAM HIDDEN MARKOV MODEL; MULTIMODAL FUSION; PIECEWISE LINEAR MODELS; POINTS OF INTEREST; SPEECH ACOUSTICS; SPEECH INVERSION; SPEECH PRODUCTION; VISUAL FEATURE EXTRACTION; VISUAL INFORMATION; VISUAL MODALITIES; VOCAL-TRACTS;

EID: 70350574658     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2008.2008740     Document Type: Article
Times cited : (36)

References (45)
  • 1
    • 4544290191 scopus 로고    scopus 로고
    • Recent advances in the automatic recognition of audio-visual speech
    • Sep
    • G. Potamianos, C. Neti, G. Gravier, and A. Garg, "Recent advances in the automatic recognition of audio-visual speech, " Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, Sep. 2003.
    • (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
    • Potamianos, G.1    Neti, C.2    Gravier, G.3    Garg, A.4
  • 3
    • 0032178592 scopus 로고    scopus 로고
    • Quantitative association of vocal-tract and facial behavior
    • H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, "Quantitative association of vocal-tract and facial behavior, " Speech Commun., vol. 26, pp. 23-43, 1998.
    • (1998) Speech Commun , vol.26 , pp. 23-43
    • Yehia, H.1    Rubin, P.2    Vatikiotis-Bateson, E.3
  • 4
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • H. Mcgurk, J. Macdonald, Hearing lips and seeing voices, Nature, 264, 746-748, 1976.
    • (1976) Nature , vol.264 , pp. 746-748
    • Mcgurk, H.1    Macdonald, J.2
  • 5
    • 0028259480 scopus 로고
    • Techniques for estimating vocal-tract shapes from the speech signal
    • Jan
    • J. Schroeter and M. Sondhi, "Techniques for estimating vocal-tract shapes from the speech signal, " IEEE Trans. Speech Audio Process., vol. 2, no. 1, pp. 133-150, Jan. 1994.
    • (1994) IEEE Trans. Speech Audio Process , vol.2 , Issue.1 , pp. 133-150
    • Schroeter, J.1    Sondhi, M.2
  • 7
    • 0001736204 scopus 로고
    • Speech coding based on physiological models of speech production
    • S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker
    • J. Schroeter and M. M. Sondhi, "Speech coding based on physiological models of speech production, " in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker, 1992.
    • (1992) Advances in Speech Signal Processing
    • Schroeter, J.1    Sondhi, M.M.2
  • 8
    • 84894560828 scopus 로고    scopus 로고
    • Designing the user interface of the computer-based speech training system artur based on early user tests
    • O. Engwall, O. Bälter, A.-M. Öster, and H. Sidenbladh- Kjellström, "Designing the user interface of the computer-based speech training system ARTUR based on early user tests, " J. Behavior Inf. Technol., vol. 25, no. 4, pp. 353-365, 2006.
    • (2006) J. Behavior Inf. Technol. , vol.25 , Issue.4 , pp. 353-365
    • Engwall, O.1    Bälter, O.2    Öster, A.-M.3    Sidenbladh-Kjellström, H.4
  • 9
    • 22144465830 scopus 로고    scopus 로고
    • Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion
    • S. Ouni and Y. Laprie, "Modeling the articulatory space using a hypercube codebook for acoustic-to-articulatory inversion, " J. Acoust. Soc. Amer., vol. 118, no. 1, pp. 444-460, 2005.
    • (2005) J. Acoust. Soc. Amer. , vol.118 , Issue.1 , pp. 444-460
    • Ouni, S.1    Laprie, Y.2
  • 10
    • 0038359547 scopus 로고    scopus 로고
    • Modelling the uncertainty in recovering articulation from acoustics
    • K. Richmond, S. King, and P. Taylor, "Modelling the uncertainty in recovering articulation from acoustics, " Comput. Speech Lang., vol. 17, pp. 153-172, 2003.
    • (2003) Comput. Speech Lang , vol.17 , pp. 153-172
    • Richmond, K.1    King, S.2    Taylor, P.3
  • 11
    • 38649140222 scopus 로고    scopus 로고
    • Statistical mapping between articulatory movements and acoustic spectrum using a gaussian mixture model
    • T. Toda, A. W. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model, " Speech Commun., vol. 50, pp. 215-227, 2008.
    • (2008) Speech Commun. , vol.50 , pp. 215-227
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 12
    • 2142659020 scopus 로고    scopus 로고
    • Estimation of articulatory movements from speech acoustics using an hmm-based speech production model
    • S. Hiroya and M. Honda, "Estimation of articulatory movements from speech acoustics using an HMM-based speech production model, " IEEE Trans. Speech Audio Process., vol. 12, no. 2, pp. 175-185, Mar. 2004.
    • (2004) IEEE Trans. Speech Audio Process , vol.12 , Issue.2 , pp. 175-185
    • Hiroya, S.1    Honda, M.2
  • 13
    • 85032752352 scopus 로고    scopus 로고
    • Audiovisual speech processing
    • Jan
    • T. Chen, "Audiovisual speech processing, " IEEE Signal Process. Mag., vol. 18, no. 1, pp. 9-21, Jan. 2001.
    • (2001) IEEE Signal Process. Mag. , vol.18 , Issue.1 , pp. 9-21
    • Chen, T.1
  • 14
    • 0032179320 scopus 로고    scopus 로고
    • Lip movement synthesis from speech based on hidden markov models
    • E. Yamamoto, S. Nakamura, and K. Shikano, "Lip movement synthesis from speech based on hidden Markov models, " Speech Commun., vol. 26, pp. 105-115, 1998.
    • (1998) Speech Commun , vol.26 , pp. 105-115
    • Yamamoto, E.1    Nakamura, S.2    Shikano, K.3
  • 16
    • 0035426641 scopus 로고    scopus 로고
    • Hidden markov model inversion for audio-to-visual conversion in an mpeg-4 facial animation system
    • K. Choi, Y. Luo, and J.-N. Hwang, "Hidden Markov model inversion for audio-to-visual conversion in an MPEG-4 facial animation system, " J. VLSI Signal Process., vol. 29, pp. 51-61, 2001.
    • (2001) J. VLSI Signal Process. , vol.29 , pp. 51-61
    • Choi, K.1    Luo, Y.2    Hwang, J.-N.3
  • 17
    • 33947583073 scopus 로고    scopus 로고
    • Realistic mouth-synching for speech-driven talking face using articulatory modeling
    • Apr
    • L. Xie and Z.-Q. Liu, "Realistic mouth-synching for speech-driven talking face using articulatory modeling, " IEEE Trans. Multimedia, vol. 9, no. 3, pp. 500-510, Apr. 2007.
    • (2007) IEEE Trans. Multimedia , vol.9 , Issue.3 , pp. 500-510
    • Xie, L.1    Liu, Z.-Q.2
  • 18
    • 0036874551 scopus 로고    scopus 로고
    • On the relationship between face movements, tongue movements, and speech acoustics
    • J. Jiang, A. Alwan, P. A. Keating, E. T. Auer, and L. E. Bernstein, "On the relationship between face movements, tongue movements, and speech acoustics, " EURASIP J. Appl. Signal Process., vol. 11, pp. 1174-1188, 2002.
    • (2002) EURASIP J. Appl. Signal Process , vol.11 , pp. 1174-1188
    • Jiang, J.1    Alwan, A.2    Keating, P.A.3    Auer, E.T.4    Bernstein, L.E.5
  • 19
    • 33745183111 scopus 로고    scopus 로고
    • Introducing visual cues in acoustic-to-articulatory inversion
    • O. Engwall, "Introducing visual cues in acoustic-to-articulatory inversion, " in Proc. Int. Conf. Spoken Lang. Process., 2005, pp. 3205-3208.
    • (2005) Proc. Int. Conf. Spoken Lang. Process , pp. 3205-3208
    • Engwall, O.1
  • 22
    • 51449089369 scopus 로고    scopus 로고
    • Audiovisual-to-articulatory speech inversion using active appearance models for the face and hidden markov models for the dynamics
    • A. Katsamanis, G. Papandreou, and P. Maragos, "Audiovisual-to- articulatory speech inversion using active appearance models for the face and hidden Markov models for the dynamics, " in Proc. Int. Conf. Acoust., Speech, Signal Process., 2008, pp. 2237-2240.
    • (2008) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 2237-2240
    • Katsamanis, A.1    Papandreou, G.2    Maragos, P.3
  • 24
    • 0035680116 scopus 로고    scopus 로고
    • Rapid object detection using a boosted cascade of simple features
    • P. Viola and M. Jones, "Rapid object detection using a boosted cascade of simple features, " in Proc. IEEE Int. Conf. Comp. Vision Pattern Recog., 2001, vol. I, pp. 511-518.
    • (2001) Proc. IEEE Int. Conf. Comp. Vision Pattern Recog , vol.I , pp. 511-518
    • Viola, P.1    Jones, M.2
  • 25
    • 0010424152 scopus 로고    scopus 로고
    • Acoustic-to-articulatory inversion using dynamical and phonological constraints
    • S. Dusan and L. Deng, "Acoustic-to-articulatory inversion using dynamical and phonological constraints, " in Proc. Seminar Speech Production, 2000, pp. 237-240.
    • (2000) Proc. Seminar Speech Production , pp. 237-240
    • Dusan, S.1    Deng, L.2
  • 26
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modeling for continuous speech recognition
    • Sep
    • S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition, " IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 27
    • 0037503670 scopus 로고    scopus 로고
    • A multichannel articulatory speech database and its application for automatic speech recognition
    • A. Wrench and W. Hardcastle, "A multichannel articulatory speech database and its application for automatic speech recognition, " in Proc. 5th Seminar Speech Production, Kloster Seeon, Bavaria, 2000, pp. 305-308. .
    • (2000) Proc. 5th Seminar Speech Production , pp. 305-308
    • Wrench, A.1    Hardcastle, W.2
  • 30
    • 0032023788 scopus 로고    scopus 로고
    • Wiener filters in canonical coordinates for transform coding, filtering, and quantizing
    • May
    • L. L. Scharf and J. K. Thomas, "Wiener filters in canonical coordinates for transform coding, filtering, and quantizing, " IEEE Trans. Speech Audio Process., vol. 46, no. 3, pp. 647-654, May 1998.
    • (1998) IEEE Trans. Speech Audio Process. , vol.46 , Issue.3 , pp. 647-654
    • Scharf, L.L.1    Thomas, J.K.2
  • 31
    • 0000927638 scopus 로고    scopus 로고
    • Predicting multivariate responses in multiple linear regression
    • L. Breiman and J. H. Friedman, "Predicting multivariate responses in multiple linear regression, " J. Roy. Statist. Soc. (B), vol. 59, no. 1, pp. 3-54, 1997.
    • (1997) J. Roy. Statist. Soc. (B) , vol.59 , Issue.1 , pp. 3-54
    • Breiman, L.1    Friedman, J.H.2
  • 35
    • 0001237218 scopus 로고
    • A maximum likelihood methodology for clusterwise linear regression
    • W. DeSarbo and W. Cron, "A maximum likelihood methodology for clusterwise linear regression, " J. Classification, vol. 5, pp. 249-282, 1988.
    • (1988) J. Classification , vol.5 , pp. 249-282
    • Desarbo, W.1    Cron, W.2
  • 36
    • 0003544881 scopus 로고    scopus 로고
    • D. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer
    • Speechreading by Humans and Machines, D. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer, 1996.
    • (1996) Speechreading by Humans and Machines
  • 40
    • 57549101447 scopus 로고    scopus 로고
    • Audiovisual synchronization and fusion using canonical correlation analysis
    • Nov
    • M. E. Sargin, Y. Yemez, E. Erzin, and M. Tekalp, "Audiovisual synchronization and fusion using canonical correlation analysis, " IEEE Trans. Multimedia, vol. 9, no. 7, pp. 1396-1403, Nov. 2007.
    • (2007) IEEE Trans. Multimedia , vol.9 , Issue.7 , pp. 1396-1403
    • Sargin, M.E.1    Yemez, Y.2    Erzin, E.3    Tekalp, M.4
  • 44
    • 34047263009 scopus 로고    scopus 로고
    • Visual model structures and synchrony constraints for audio-visual speech recognition
    • May
    • T. J. Hazen, "Visual model structures and synchrony constraints for audio-visual speech recognition, " IEEE Trans. Speech Audio Process., vol. 14, no. 3, pp. 1082-1089, May 2006.
    • (2006) IEEE Trans. Speech Audio Process , vol.14 , Issue.3 , pp. 1082-1089
    • Hazen, T.J.1
  • 45
    • 0000807171 scopus 로고
    • Reduced-rank regression and canonical analysis
    • M.-S. Tso, "Reduced-rank regression and canonical analysis, " J. R. Statist. Soc. (B), vol. 43, pp. 183-189, 1981.
    • (1981) J. R. Statist. Soc. (B) , vol.43 , pp. 183-189
    • Tso, M.-S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.