메뉴 건너뛰기




Volumn 91, Issue 9, 2003, Pages 1306-1325

Recent advances in the automatic recognition of audiovisual speech

Author keywords

Adaptation; Audiovisual fusion; Audiovisual speech recognition (ASR); Face tracking; Hidden Markov models (HMM); Multimedia databases; Multistream hidden Markov models; Product hidden Markov models; Speechreading; Stream reliability; Visual feature extraction

Indexed keywords

ALGORITHMS; COMPUTER SIMULATION; DATABASE SYSTEMS; FACE RECOGNITION; FEATURE EXTRACTION; HUMAN COMPUTER INTERACTION; IMAGE PROCESSING; MARKOV PROCESSES; RELIABILITY; SPEECH ANALYSIS; VISUAL COMMUNICATION;

EID: 4544290191     PISSN: 00189219     EISSN: None     Source Type: Journal    
DOI: 10.1109/JPROC.2003.817150     Document Type: Conference Paper
Times cited : (700)

References (134)
  • 1
    • 0031187171 scopus 로고    scopus 로고
    • Speech recognition by machines and humans
    • R. P. Lippmann, "Speech recognition by machines and humans," Speech Commun., vol. 22, pp. 1-15, 1997.
    • (1997) Speech Commun. , vol.22 , pp. 1-15
    • Lippmann, R.P.1
  • 2
    • 0026189808 scopus 로고
    • Speech recognition in adverse environments
    • B. H. Juang, "Speech recognition in adverse environments," Comput. Speech Lang., vol. 5, pp. 275-294, 1991.
    • (1991) Comput. Speech Lang. , vol.5 , pp. 275-294
    • Juang, B.H.1
  • 3
    • 0002788784 scopus 로고    scopus 로고
    • Signal processing for robust speech recognition
    • C.-H. Lee, F. K. Soong, and Y. Ohshima, Eds. Norwell, MA: Kluwer, ch. 15
    • R. Stern, A. Acero, F.-H. Liu, and Y. Ohshima, "Signal processing for robust speech recognition," in Automatic Speech and Speaker Recognition. Advanced Topics, C.-H. Lee, F. K. Soong, and Y. Ohshima, Eds. Norwell, MA: Kluwer, 1997, ch. 15, pp. 357-384.
    • (1997) Automatic Speech and Speaker Recognition. Advanced Topics , pp. 357-384
    • Stern, R.1    Acero, A.2    Liu, F.-H.3    Ohshima, Y.4
  • 6
    • 0001048664 scopus 로고
    • Visual contribution to speech intelligibility in noise
    • W. H. Sumby and I. Pollack, "Visual contribution to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, pp. 212-215, 1954.
    • (1954) J. Acoust. Soc. Amer. , vol.26 , pp. 212-215
    • Sumby, W.H.1    Pollack, I.2
  • 7
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • H. MacGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976.
    • (1976) Nature , vol.264 , pp. 746-748
    • MacGurk, H.1    MacDonald, J.2
  • 8
    • 85058246934 scopus 로고    scopus 로고
    • Mouth movement and signed communication
    • R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, ch. 13
    • M. Marschark, D. LePoutre, and L. Bernent, "Mouth movement and signed communication," in Hearing by Eye II, R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, 1998, ch. 13, pp. 245-266.
    • (1998) Hearing by Eye II , pp. 245-266
    • Marschark, M.1    Lepoutre, D.2    Bernent, L.3
  • 9
    • 85069146767 scopus 로고    scopus 로고
    • What makes a good speechreader? First you have to find one
    • R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, ch. 11
    • L. E. Bernstein, M. E. Demorest, and P. E. Tucker, "What makes a good speechreader? First you have to find one," in Hearing by Eye II, R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, 1998, ch. 11, pp. 211-227.
    • (1998) Hearing by Eye II , pp. 211-227
    • Bernstein, L.E.1    Demorest, M.E.2    Tucker, P.E.3
  • 10
    • 0002028032 scopus 로고
    • Some preliminaries to a comprehensive account of audio visual speech perception
    • R. Campbell and B. Dodd, Eds. London, U.K.: Lawrence Erlbaum
    • A. Q. Summerfield, "Some preliminaries to a comprehensive account of audio visual speech perception," in Hearing by Eye: The Psychology of Lip-Reading, R. Campbell and B. Dodd, Eds. London, U.K.: Lawrence Erlbaum, 1987, pp. 3-51.
    • (1987) Hearing by Eye: The Psychology of Lip-reading , pp. 3-51
    • Summerfield, A.Q.1
  • 11
    • 0032072433 scopus 로고    scopus 로고
    • Speech recognition and sensory integration
    • D. W. Massaro and D. G. Stork, "Speech recognition and sensory integration," Amer. Sci., vol. 86, pp. 236-244, 1998.
    • (1998) Amer. Sci. , vol.86 , pp. 236-244
    • Massaro, D.W.1    Stork, D.G.2
  • 12
    • 0032178592 scopus 로고    scopus 로고
    • Quantitative association of vocal-tract and facial behavior
    • H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, "Quantitative association of vocal-tract and facial behavior," Speech Commun., vol. 26, pp. 23-43, 1998.
    • (1998) Speech Commun. , vol.26 , pp. 23-43
    • Yehia, H.1    Rubin, P.2    Vatikiotis-Bateson, E.3
  • 13
    • 0012725678 scopus 로고    scopus 로고
    • Estimation of speech acoustics from visual speech features: A comparison of linear and nonlinear models
    • J. P. Barker and F. Berthommier, "Estimation of speech acoustics from visual speech features: A comparison of linear and nonlinear models," in Proc. Conf. Audio-Visual Speech Processing, 1999, pp. 112-117.
    • (1999) Proc. Conf. Audio-visual Speech Processing , pp. 112-117
    • Barker, J.P.1    Berthommier, F.2
  • 15
    • 0002955163 scopus 로고
    • Lips, teeth, and the benefits of lipreading
    • H. D. Ellis and A. W. Young, Eds. Amsterdam, The Netherlands: Elsevier
    • Q. Summerfield, A. MacLeod, M. McGrath, and M. Brooke, "Lips, teeth, and the benefits of lipreading," in Handbook of Research on Face Processing, H. D. Ellis and A. W. Young, Eds. Amsterdam, The Netherlands: Elsevier, 1989, pp. 223-233.
    • (1989) Handbook of Research on Face Processing , pp. 223-233
    • Summerfield, Q.1    MacLeod, A.2    McGrath, M.3    Brooke, M.4
  • 16
    • 0002700689 scopus 로고    scopus 로고
    • Psychology of human speechreading
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
    • P. M. T. Smeele, "Psychology of human speechreading," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 3-15.
    • (1996) Speechreading by Humans and Machines , pp. 3-15
    • Smeele, P.M.T.1
  • 17
    • 0003544881 scopus 로고    scopus 로고
    • Visionary speech: Looking ahead to practical speechreading systems
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
    • M. E. Hennecke, D. G. Stork, and K. V. Prasad, "Visionary speech: Looking ahead to practical speechreading systems," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 331-349.
    • (1996) Speechreading by Humans and Machines , pp. 331-349
    • Hennecke, M.E.1    Stork, D.G.2    Prasad, K.V.3
  • 18
    • 0021541159 scopus 로고
    • Automatic lipreading to enhance speech recognition
    • E. D. Petajan, "Automatic lipreading to enhance speech recognition," in Proc. Global Telecommunications Conf., 1984, pp. 265-272.
    • (1984) Proc. Global Telecommunications Conf. , pp. 265-272
    • Petajan, E.D.1
  • 21
    • 0001432664 scopus 로고    scopus 로고
    • On the integration of auditory and visual parameters in an HMM-based ASR
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
    • A. Adjoudani and C. Benoît, "On the integration of auditory and visual parameters in an HMM-based ASR," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 461-471.
    • (1996) Speechreading by Humans and Machines , pp. 461-471
    • Adjoudani, A.1    Benoît, C.2
  • 22
    • 0030376248 scopus 로고    scopus 로고
    • Robust audiovisual integration using semicontinuous hidden Markov models
    • Q. Su and P. L. Silsbee, "Robust audiovisual integration using semicontinuous hidden Markov models," in Proc. Int. Conf. Spoken Language Processing, 1996, pp. 42-45.
    • (1996) Proc. Int. Conf. Spoken Language Processing , pp. 42-45
    • Su, Q.1    Silsbee, P.L.2
  • 23
    • 0000789852 scopus 로고    scopus 로고
    • Channel separability in the audio visual integration of speech: A Bayesian approach
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
    • J. R. Movellan and G. Chadderdon, "Channel separability in the audio visual integration of speech: A Bayesian approach," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 473-487.
    • (1996) Speechreading by Humans and Machines , pp. 473-487
    • Movellan, J.R.1    Chadderdon, G.2
  • 26
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modeling for continuous speech recognition
    • Sept.
    • S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, pp. 141-151, Sept. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 29
    • 0036874999 scopus 로고    scopus 로고
    • Dynamic Bayesian networks for audio-visual speech recognition
    • Nov.
    • A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, "Dynamic Bayesian networks for audio-visual speech recognition," EURASIP J. Appl. Signal Process., vol. 2002, pp. 1274-1288, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Process. , vol.2002 , pp. 1274-1288
    • Nefian, A.V.1    Liang, L.2    Pi, X.3    Liu, X.4    Murphy, K.5
  • 32
    • 0003544881 scopus 로고    scopus 로고
    • Rationale for phoneme-viseme mapping and feature selection in visual speech recognition
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
    • A. J. Goldschen, O. N. Garcia, and E. D. Petajan, "Rationale for phoneme-viseme mapping and feature selection in visual speech recognition," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 505-515.
    • (1996) Speechreading by Humans and Machines , pp. 505-515
    • Goldschen, A.J.1    Garcia, O.N.2    Petajan, E.D.3
  • 36
    • 85009154155 scopus 로고    scopus 로고
    • Stream weight optimization of speech and lip image sequence for audio-visual speech recognition
    • S. Nakamura, H. Ito, and K. Shikano, "Stream weight optimization of speech and lip image sequence for audio-visual speech recognition," in Proc. Int. Conf. Spoken Language Processing, vol. 3, 2000, pp. 20-23.
    • (2000) Proc. Int. Conf. Spoken Language Processing , vol.3 , pp. 20-23
    • Nakamura, S.1    Ito, H.2    Shikano, K.3
  • 40
    • 0032314380 scopus 로고    scopus 로고
    • An image transform approach for HMM based automatic lipreading
    • G. Potamianos, H. P. Graf, and E. Cosatto, "An image transform approach for HMM based automatic lipreading," in Proc. Int. Conf. image Processing, vol. 1, 1998, pp. 173-177.
    • (1998) Proc. Int. Conf. Image Processing , vol.1 , pp. 173-177
    • Potamianos, G.1    Graf, H.P.2    Cosatto, E.3
  • 41
  • 44
    • 0035386489 scopus 로고    scopus 로고
    • A cascade visual front end for speaker independent automatic speechreading
    • July/Oct.
    • G. Potamianos, C. Neti, G. Iyengar, A. W. Senior, and A. Verma, "A cascade visual front end for speaker independent automatic speechreading," Int. J. Speech Technol., vol. 4, pp. 193-208, July/Oct. 2001.
    • (2001) Int. J. Speech Technol. , vol.4 , pp. 193-208
    • Potamianos, G.1    Neti, C.2    Iyengar, G.3    Senior, A.W.4    Verma, A.5
  • 45
    • 84925619981 scopus 로고    scopus 로고
    • Word dependent acoustic-labial weights in HMM-based speech recognition
    • P. Jourlin, "Word dependent acoustic-labial weights in HMM-based speech recognition," in Proc. Eur. Workshop Audio-Visual Speech Processing, 1997, pp. 69-72.
    • (1997) Proc. Eur. Workshop Audio-visual Speech Processing , pp. 69-72
    • Jourlin, P.1
  • 46
    • 0003770986 scopus 로고    scopus 로고
    • Comparing models for audiovisual fusion in a noisy-vowel recognition task
    • Nov.
    • P. Teissier, J. Robert-Ribes, and J. L. Schwartz, "Comparing models for audiovisual fusion in a noisy-vowel recognition task," IEEE Trans. Speech Audio Processing, vol. 7, pp. 629-642, Nov. 1999.
    • (1999) IEEE Trans. Speech Audio Processing , vol.7 , pp. 629-642
    • Teissier, P.1    Robert-Ribes, J.2    Schwartz, J.L.3
  • 47
    • 0036874527 scopus 로고    scopus 로고
    • Noise adaptive stream weighting in audio-visual speech recognition
    • Nov.
    • M. Heckmann, F. Berthommier, and K. Kroschel, "Noise adaptive stream weighting in audio-visual speech recognition," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1260-1273, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1260-1273
    • Heckmann, M.1    Berthommier, F.2    Kroschel, K.3
  • 51
    • 0038706765 scopus 로고    scopus 로고
    • Automatic speechreading using dynamic contours
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
    • B. Dalton, R. Kaucic, and A. Blake, "Automatic speechreading using dynamic contours," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 373-382.
    • (1996) Speechreading by Humans and Machines , pp. 373-382
    • Dalton, B.1    Kaucic, R.2    Blake, A.3
  • 53
    • 0035791288 scopus 로고    scopus 로고
    • HMM based audio-visual speech recognition integrating geometric- and appearance-based visual features
    • M. T. Chan, "HMM based audio-visual speech recognition integrating geometric- and appearance-based visual features," in Proc. Workshop Multimedia Signal Processing, 2001, pp. 9-14.
    • (2001) Proc. Workshop Multimedia Signal Processing , pp. 9-14
    • Chan, M.T.1
  • 57
    • 0026903014 scopus 로고
    • Feature extraction from faces using deformable templates
    • A. L. Yuille, P. W. Hallinan, and D. S. Cohen, "Feature extraction from faces using deformable templates," Int. J. Comput. Vision, vol. 8, pp. 99-111, 1992.
    • (1992) Int. J. Comput. Vision , vol.8 , pp. 99-111
    • Yuille, A.L.1    Hallinan, P.W.2    Cohen, D.S.3
  • 60
    • 0036875048 scopus 로고    scopus 로고
    • Automatic speechreading with applications to human-computer interfaces
    • Nov.
    • X. Zhang, C. C. Broun, R. M. Mersereau, and M. Clements, "Automatic speechreading with applications to human-computer interfaces," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1228-1247, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1228-1247
    • Zhang, X.1    Broun, C.C.2    Mersereau, R.M.3    Clements, M.4
  • 61
    • 0036875015 scopus 로고    scopus 로고
    • Automatically building and evaluating statistical models for lipreading
    • Nov.
    • P. Daubias and P. Deléglise, "Automatically building and evaluating statistical models for lipreading," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1202-1212, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1202-1212
    • Daubias, P.1    Deléglise, P.2
  • 63
    • 0031648023 scopus 로고    scopus 로고
    • Example-based learning for view-based human face detection
    • Jan.
    • K.-K. Sung and T. Poggio, "Example-based learning for view-based human face detection," IEEE Trans. Pattern Anal. Machine Intell., vol. 20, pp. 39-51, Jan. 1998.
    • (1998) IEEE Trans. Pattern Anal. Machine Intell. , vol.20 , pp. 39-51
    • Sung, K.-K.1    Poggio, T.2
  • 67
    • 0035167625 scopus 로고    scopus 로고
    • Improved ROI and within frame discriminant features for lipreading
    • G. Potamianos and C. Neti, "Improved ROI and within frame discriminant features for lipreading," in Proc. Int. Conf. Image Processing, vol. 3, 2001, pp. 250-253.
    • (2001) Proc. Int. Conf. Image Processing , vol.3 , pp. 250-253
    • Potamianos, G.1    Neti, C.2
  • 69
    • 0004232640 scopus 로고
    • Philadelphia, PA: SIAM
    • I. Daubechies, Wavelets. Philadelphia, PA: SIAM, 1992.
    • (1992) Wavelets
    • Daubechies, I.1
  • 71
    • 0000874921 scopus 로고    scopus 로고
    • Dynamic features for visual speech-reading: A systematic comparison
    • M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge. MA: MIT Press
    • M. S. Gray, J. R. Movellan, and T. J. Sejnowski, "Dynamic features for visual speech-reading: A systematic comparison," in Advances in Neural Information Processing Systems, M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge. MA: MIT Press, 1997, vol. 9, pp. 751-757.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 751-757
    • Gray, M.S.1    Movellan, J.R.2    Sejnowski, T.J.3
  • 73
    • 0001769235 scopus 로고    scopus 로고
    • Time-varying information for visual speech perception
    • R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, ch. 3
    • L. D. Rosenblum and H. M. Saldana, "Time-varying information for visual speech perception," in Hearing by Eye II, R. Campbell, B. Dodd, and D. Burnham, Eds. Hove, U.K.: Psychology, 1998, ch. 3. pp. 61-81.
    • (1998) Hearing by Eye II , pp. 61-81
    • Rosenblum, L.D.1    Saldana, H.M.2
  • 78
    • 85032752352 scopus 로고    scopus 로고
    • Audiovisual speech processing. Lip reading and lip synchronization
    • Jan.
    • T. Chen, "Audiovisual speech processing. Lip reading and lip synchronization," IEEE Signal Processing Mag., vol. 18, pp. 9-21, Jan. 2001.
    • (2001) IEEE Signal Processing Mag. , vol.18 , pp. 9-21
    • Chen, T.1
  • 79
    • 0036875002 scopus 로고    scopus 로고
    • A support vector machine-based dynamic network for visual speech recognition applications
    • Nov.
    • M. Gordan, C. Kotropoulos, and I. Pitas, "A support vector machine-based dynamic network for visual speech recognition applications," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1248-1259, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1248-1259
    • Gordan, M.1    Kotropoulos, C.2    Pitas, I.3
  • 81
    • 0036874756 scopus 로고    scopus 로고
    • Moving talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus
    • Nov.
    • E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, "Moving talker, speaker-independent feature study, and baseline results using the CUAVE multimodal speech corpus," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1189-1201, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1189-1201
    • Patterson, E.K.1    Gurbuz, S.2    Tufekci, Z.3    Gowdy, J.N.4
  • 82
    • 0002358797 scopus 로고    scopus 로고
    • Discriminative learning of visual data for audiovisual speech recognition
    • A. Rogozan, "Discriminative learning of visual data for audiovisual speech recognition," Int. J. Artif. Intell. Tools, vol. 8, pp. 43-52, 1999.
    • (1999) Int. J. Artif. Intell. Tools , vol.8 , pp. 43-52
    • Rogozan, A.1
  • 83
    • 0030247984 scopus 로고    scopus 로고
    • Computer lipreading for improved accuracy in automatic speech recognition
    • Sept.
    • P. L. Silsbee and A. C. Bovik, "Computer lipreading for improved accuracy in automatic speech recognition," IEEE Trans. Speech Audio Processing, vol. 4, pp. 337-351, Sept. 1996.
    • (1996) IEEE Trans. Speech Audio Processing , vol.4 , pp. 337-351
    • Silsbee, P.L.1    Bovik, A.C.2
  • 84
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the EM algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. Royal Statist. Soc., vol. 39, pp. 1-38, 1977.
    • (1977) J. Royal Statist. Soc. , vol.39 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 88
    • 0034974093 scopus 로고    scopus 로고
    • Audio-visual enhancement of speech in noise
    • L. Girin, J.-L. Schwartz, and G. Feng, "Audio-visual enhancement of speech in noise," J. Acoust. Soc. Amer., vol. 109, pp. 3007-3020, 2001.
    • (2001) J. Acoust. Soc. Amer. , vol.109 , pp. 3007-3020
    • Girin, L.1    Schwartz, J.-L.2    Feng, G.3
  • 90
    • 85009232030 scopus 로고    scopus 로고
    • Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)
    • S. Deligne, G. Potamianos, and C. Neti, "Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)," in Proc. Int. Conf. Spoken Language Processing, 2002, pp. 1449-1452.
    • (2002) Proc. Int. Conf. Spoken Language Processing , pp. 1449-1452
    • Deligne, S.1    Potamianos, G.2    Neti, C.3
  • 91
    • 0026860706 scopus 로고
    • Methods of combining multiple classifiers and their applications in handwritten recognition
    • May/June
    • L. Xu, A. Krzyzak, and C. Y. Suen, "Methods of combining multiple classifiers and their applications in handwritten recognition," IEEE Trans. Syst., Man, Cybern., vol. 22, pp. 418-435, May/June 1992.
    • (1992) IEEE Trans. Syst., Man, Cybern. , vol.22 , pp. 418-435
    • Xu, L.1    Krzyzak, A.2    Suen, C.Y.3
  • 95
    • 0030355935 scopus 로고    scopus 로고
    • A new ASR approach based on independent processing and recombination of partial frequency bands
    • H. Bourlard and S. Dupont, "A new ASR approach based on independent processing and recombination of partial frequency bands," in Proc. Int. Conf. Spoken Language Processing, 1996, pp. 426-429.
    • (1996) Proc. Int. Conf. Spoken Language Processing , pp. 426-429
    • Bourlard, H.1    Dupont, S.2
  • 96
    • 84960942890 scopus 로고    scopus 로고
    • Test of several external posterior weighting functions for multiband full combination ASR
    • H. Glotin and F. Berthommier, "Test of several external posterior weighting functions for multiband full combination ASR," in Proc. Int. Conf. Spoken Language Processing, vol. 1, 2000, pp. 333-336.
    • (2000) Proc. Int. Conf. Spoken Language Processing , vol.1 , pp. 333-336
    • Glotin, H.1    Berthommier, F.2
  • 97
    • 85009091822 scopus 로고    scopus 로고
    • Audio-visual speech recognition using MCE-based HMM's and model-dependent stream weights
    • C. Miyajima, K. Tokuda, and T. Kitamura, "Audio-visual speech recognition using MCE-based HMM's and model-dependent stream weights," in Proc. Int. Conf. Spoken Language Processing, vol. 2, 2000, pp. 1023-1026.
    • (2000) Proc. Int. Conf. Spoken Language Processing , vol.2 , pp. 1023-1026
    • Miyajima, C.1    Tokuda, K.2    Kitamura, T.3
  • 102
    • 85133343575 scopus 로고    scopus 로고
    • Speech intelligibility derived from asynchronous processing of auditory-visual information
    • K. W. Grant and S. Greenberg, "Speech intelligibility derived from asynchronous processing of auditory-visual information," in Proc. Conf. Audio-Visual Speech Processing, 2001, pp. 132-137.
    • (2001) Proc. Conf. Audio-visual Speech Processing , pp. 132-137
    • Grant, K.W.1    Greenberg, S.2
  • 105
    • 82055176921 scopus 로고    scopus 로고
    • Fusion of audio-visual information for integrated speech processing
    • J. Bigun and F. Smeraldi, Eds. Berlin, Germany: Springer-Verlag
    • S. Nakamura, "Fusion of audio-visual information for integrated speech processing," in Audio- and Video-Based Biometric Person Authentication, J. Bigun and F. Smeraldi, Eds. Berlin, Germany: Springer-Verlag, 2001, pp. 127-143.
    • (2001) Audio- and Video-based Biometric Person Authentication , pp. 127-143
    • Nakamura, S.1
  • 107
    • 0000238336 scopus 로고
    • A simplex method for function minimization
    • J. A. Nelder and R. Mead, "A simplex method for function minimization," Comput. J., vol. 7, pp. 308-313, 1965.
    • (1965) Comput. J. , vol.7 , pp. 308-313
    • Nelder, J.A.1    Mead, R.2
  • 108
    • 0001437767 scopus 로고    scopus 로고
    • A new SNR-feature mapping for robust multistream speech recognition
    • F. Berthommier and H. Glotin, "A new SNR-feature mapping for robust multistream speech recognition," in Proc. Int. Congress Phonetic Sciences, 1999, pp. 711-715.
    • (1999) Proc. Int. Congress Phonetic Sciences , pp. 711-715
    • Berthommier, F.1    Glotin, H.2
  • 109
    • 85009153179 scopus 로고    scopus 로고
    • Stream confidence estimation for audio-visual speech recognition
    • G. Polamianos and C. Neti, "Stream confidence estimation for audio-visual speech recognition," in Proc. Int. Conf. Spoken Language Processing, vol. 3, 2000, pp. 746-749.
    • (2000) Proc. Int. Conf. Spoken Language Processing , vol.3 , pp. 746-749
    • Polamianos, G.1    Neti, C.2
  • 110
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • Apr.
    • J.-L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Processing, vol. 2, pp. 291-298, Apr. 1994.
    • (1994) IEEE Trans. Speech Audio Processing , vol.2 , pp. 291-298
    • Gauvain, J.-L.1    Lee, C.-H.2
  • 111
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 116
    • 84947917954 scopus 로고    scopus 로고
    • The M2VTS multimodal face database
    • J. Bigün, G. Chollet, and G. Borgefors, Eds. Berlin, Germany: Springer-Verlag
    • S. Pigeon and L. Vandendorpe, "The M2VTS multimodal face database," in Audio-and Video-based Biometric Person Authentication, J. Bigün, G. Chollet, and G. Borgefors, Eds. Berlin, Germany: Springer-Verlag, 1997, pp. 403-109.
    • (1997) Audio-and Video-based Biometric Person Authentication , pp. 403-1109
    • Pigeon, S.1    Vandendorpe, L.2
  • 120
    • 0032074310 scopus 로고    scopus 로고
    • Audio-visual integration in multimodal communication
    • May
    • T. Chen and R. R. Rao, "Audio-visual integration in multimodal communication," Proc. IEEE, vol. 86, pp. 837-852, May 1998.
    • (1998) Proc. IEEE , vol.86 , pp. 837-852
    • Chen, T.1    Rao, R.R.2
  • 121
  • 123
    • 21244474602 scopus 로고    scopus 로고
    • Audio-visual speaker recognition for broadcast news: Some fusion techniques
    • B. Maison, C. Neti, and A. Senior, "Audio-visual speaker recognition for broadcast news: Some fusion techniques," in Proc. Workshop Multimedia Signal Processing, 1999, pp. 161-167.
    • (1999) Proc. Workshop Multimedia Signal Processing , pp. 161-167
    • Maison, B.1    Neti, C.2    Senior, A.3
  • 124
    • 33747294606 scopus 로고
    • What can visual speech synthesis tell visual speech recognition?
    • Pacific Grove, CA
    • M. M. Cohen and D. W. Massaro, "What can visual speech synthesis tell visual speech recognition?," presented at the Asilomar Conf. Signals, Systems, Computers, Pacific Grove, CA, 1994.
    • (1994) Asilomar Conf. Signals, Systems, Computers
    • Cohen, M.M.1    Massaro, D.W.2
  • 125
    • 0029291072 scopus 로고
    • Lip synchronization using speech-assisted video processing
    • Apr.
    • T. Chen, H. P. Graf, and K. Wang, "Lip synchronization using speech-assisted video processing," IEEE Signal Processing Lett., vol.2, pp. 57-59, Apr. 1995.
    • (1995) IEEE Signal Processing Lett. , vol.2 , pp. 57-59
    • Chen, T.1    Graf, H.P.2    Wang, K.3
  • 126
    • 85069404424 scopus 로고    scopus 로고
    • Audio-visual unit selection for the synthesis of photo-realistic talking-heads
    • E. Cosatto, G. Potamianos, and H. P. Graf, "Audio-visual unit selection for the synthesis of photo-realistic talking-heads," in Proc. Int. Conf. Multimedia Expo, 2000, pp. 1097-1100.
    • (2000) Proc. Int. Conf. Multimedia Expo , pp. 1097-1100
    • Cosatto, E.1    Potamianos, G.2    Graf, H.P.3
  • 127
    • 0034271782 scopus 로고    scopus 로고
    • Photo-realistic talking-heads from image samples
    • Sept.
    • E. Cosatto and H. P. Graf, "Photo-realistic talking-heads from image samples," IEEE Trans. Multimedia, vol. 2, pp. 152-163, Sept. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , pp. 152-163
    • Cosatto, E.1    Graf, H.P.2
  • 128
  • 131
    • 84925591950 scopus 로고    scopus 로고
    • Audiovisual speech coder: Using vector quantization to exploit the audio/video correlation
    • E. Foucher, L. Girin, and G. Feng, "Audiovisual speech coder: Using vector quantization to exploit the audio/video correlation," in Proc. Conf. Audio-Visual Speech Processing, 1998, pp. 67-71.
    • (1998) Proc. Conf. Audio-visual Speech Processing , pp. 67-71
    • Foucher, E.1    Girin, L.2    Feng, G.3
  • 132
    • 0036874541 scopus 로고    scopus 로고
    • Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli
    • Nov.
    • D. Sodoyer, J.-L. Schwartz, L. Girin, J. Klinkisch, and C. Jutten, "Separation of audio-visual speech sources: A new approach exploiting the audio-visual coherence of speech stimuli," EURASIP J. Appl. Signal Processing, vol. 2002, pp. 1165-1173, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Processing , vol.2002 , pp. 1165-1173
    • Sodoyer, D.1    Schwartz, J.-L.2    Girin, L.3    Klinkisch, J.4    Jutten, C.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.