메뉴 건너뛰기




Volumn 98, Issue 10, 2010, Pages 1692-1715

Audiovisual information fusion in human-computer interfaces and intelligent environments: A survey

Author keywords

Audiovisual fusion; dynamic Bayesian networks (DBNs); hidden Markov models; human activity analysis; human activity modeling; information fusion; machine learning; multimodal systems

Indexed keywords

BAYESIAN NETWORKS; HIDDEN MARKOV MODELS; HIERARCHICAL SYSTEMS; INFORMATION FUSION; INTELLIGENT SYSTEMS; KNOWLEDGE BASED SYSTEMS; SEMANTICS; SPEECH RECOGNITION; SURVEYS;

EID: 77956978396     PISSN: 00189219     EISSN: None     Source Type: Journal    
DOI: 10.1109/JPROC.2010.2057231     Document Type: Review
Times cited : (104)

References (152)
  • 1
    • 0002988210 scopus 로고
    • Computing machinery and intelligence
    • LIX
    • A. M. Turing, "Computing machinery and intelligence" Mind, vol.LIX, no.236, pp. 433-460, 1950.
    • (1950) Mind , Issue.236 , pp. 433-460
    • Turing, A.M.1
  • 2
    • 85032752352 scopus 로고    scopus 로고
    • Audiovisual speech processing
    • Jan.
    • T. Chen, "Audiovisual speech processing" IEEE Signal Process. Mag., vol.18, no.1, pp. 9-21, Jan. 2001.
    • (2001) IEEE Signal Process. Mag. , vol.18 , Issue.1 , pp. 9-21
    • Chen, T.1
  • 3
    • 23044432651 scopus 로고    scopus 로고
    • Grounding words in perception and action: Computational insights
    • Aug.
    • D. Roy, "Grounding words in perception and action: Computational insights" Trends Cogn. Sci., vol.9, no.8, pp. 389-396, Aug. 2005.
    • (2005) Trends Cogn. Sci. , vol.9 , Issue.8 , pp. 389-396
    • Roy, D.1
  • 4
    • 34249663433 scopus 로고    scopus 로고
    • New trends in cognitive science: Integrative approaches to learning and development
    • Jan.
    • G. O. Deaÿk, M. S. Bartlett, and T. Jebara, "New trends in cognitive science: Integrative approaches to learning and development" Neurocomputing, vol.70, no.13-15, pp. 2139-2147, Jan. 2007.
    • (2007) Neurocomputing , vol.70 , Issue.13-15 , pp. 2139-2147
    • Deaÿk, G.O.1    Bartlett, M.S.2    Jebara, T.3
  • 5
    • 84867456688 scopus 로고    scopus 로고
    • A multimodal learning interface for grounding spoken language in sensory perceptions
    • C. Yu and D. H. Ballard, "A multimodal learning interface for grounding spoken language in sensory perceptions" ACM Trans. Appl. Perception, vol.1, no.1, pp. 57-80, 2004.
    • (2004) ACM Trans. Appl. Perception , vol.1 , Issue.1 , pp. 57-80
    • Yu, C.1    Ballard, D.H.2
  • 6
    • 0029756565 scopus 로고    scopus 로고
    • Information combination operators for data fusion: A comparative review with classification
    • Jan.
    • I. Bloch, "Information combination operators for data fusion: A comparative review with classification" IEEE Trans. Syst. Man Cybern. A, Syst. Humans, vol.26, no.1, pp. 52-67, Jan. 1996.
    • (1996) IEEE Trans. Syst. Man Cybern. A, Syst. Humans , vol.26 , Issue.1 , pp. 52-67
    • Bloch, I.1
  • 7
    • 34548206204 scopus 로고    scopus 로고
    • Multimodal human-computer interaction: A survey
    • A. Jaimes and N. Sebe, "Multimodal human-computer interaction: A survey" Comput. Vis. Image Understand., vol.108, no.1-2, pp. 116-134, 2007.
    • (2007) Comput. Vis. Image Understand. , vol.108 , Issue.1-2 , pp. 116-134
    • Jaimes, A.1    Sebe, N.2
  • 10
    • 47749150338 scopus 로고    scopus 로고
    • Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007
    • R. Stiefelhagen, R. Bowers, and J. Fiscus, Eds. Berlin, Germany: Springer-Verlag, 2008
    • R. Stiefelhagen, R. Bowers, and J. Fiscus, Eds. Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007. Berlin, Germany: Springer-Verlag, 2008, ser. Lecture Notes in Computer Science, vol.4625.
    • Ser. Lecture Notes in Computer Science , vol.4625
  • 13
    • 77956509104 scopus 로고    scopus 로고
    • Towards a vision-based system exploring 3D driver posture dynamics for driver assistance: Issues and possibilities
    • C. Tran and M. M. Trivedi, "Towards a vision-based system exploring 3D driver posture dynamics for driver assistance: Issues and possibilities" in Proc. IEEE Intell. Veh. Symp., 2010, pp. 179-184.
    • Proc. IEEE Intell. Veh. Symp. , vol.2010 , pp. 179-184
    • Tran, C.1    Trivedi, M.M.2
  • 14
    • 77956549360 scopus 로고    scopus 로고
    • Contextual framework for speech based emotion recognition in driver assistance system
    • A. Tawari and M. M. Trivedi, "Contextual framework for speech based emotion recognition in driver assistance system" in Proc. IEEE Intell. Veh. Symp., 2010, pp. 174-178.
    • Proc. IEEE Intell. Veh. Symp. , vol.2010 , pp. 174-178
    • Tawari, A.1    Trivedi, M.M.2
  • 15
    • 84890017127 scopus 로고    scopus 로고
    • K. Takeda, H. Erdogan, J. H. L. Hansen, and H. Abut, Eds. New York: Springer-Verlag
    • K. Takeda, H. Erdogan, J. H. L. Hansen, and H. Abut, Eds., In-Vehicle Corpus and Signal Processing for Driver Behavior. New York: Springer-Verlag, 2009.
    • (2009) Vehicle Corpus and Signal Processing for Driver Behavior
  • 16
    • 4544290191 scopus 로고    scopus 로고
    • Recent advances in the automatic recognition of audiovisual speech
    • Sep.
    • G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, "Recent advances in the automatic recognition of audiovisual speech" Proc. IEEE, vol.91, no.9, pp. 1306-1326, Sep. 2003.
    • (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
    • Potamianos, G.1    Neti, C.2    Gravier, G.3    Garg, A.4    Senior, A.W.5
  • 17
    • 70449556249 scopus 로고    scopus 로고
    • Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms
    • S. T. Shivappa, M. M. Trivedi, and B. D. Rao, "Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms" in Proc. IEEE Comput. Vis. Pattern Recognit. Workshop, 2009, pp. 107-114.
    • (2009) Proc. IEEE Comput. Vis. Pattern Recognit. Workshop , pp. 107-114
    • Shivappa, S.T.1    Trivedi, M.M.2    Rao, B.D.3
  • 19
    • 40249089621 scopus 로고    scopus 로고
    • Speech enhancement and recognition in meetings with an audio-visual sensor array
    • Nov.
    • H. K. Maganti, D. Gatica-Perez, and I. McCowan, "Speech enhancement and recognition in meetings with an audio-visual sensor array" IEEE Trans. Audio Speech Lang. Process., vol.15, no.8, pp. 2257-2269, Nov. 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2257-2269
    • Maganti, H.K.1    Gatica-Perez, D.2    McCowan, I.3
  • 21
    • 85069232505 scopus 로고    scopus 로고
    • Ten years after summerfield: A taxonomy of models for audio-visual fusion in speech perception
    • London, U.K.: Psychology Press
    • J. L. Schwartz, J. Robert-Ribes, and P. Escudier, "Ten years after summerfield: A taxonomy of models for audio-visual fusion in speech perception," Hearing by Eye II. London, U.K.: Psychology Press, 1998, pp. 85-108.
    • (1998) Hearing by Eye II , pp. 85-108
    • Schwartz, J.L.1    Robert-Ribes, J.2    Escudier, P.3
  • 25
    • 34948889993 scopus 로고    scopus 로고
    • Microphone arrays as generalized cameras for integrated audio visual processing
    • DOI: 10.1109/CVPR.2007
    • A. ODonovan and R. Duraiswami, "Microphone arrays as generalized cameras for integrated audio visual processing" in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2007, DOI: 10.1109/CVPR.2007. 383345.
    • (2007) Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. , pp. 383345
    • Odonovan, A.1    Duraiswami, R.2
  • 27
    • 37849004875 scopus 로고    scopus 로고
    • Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
    • A. Garg, G. Potamianos, C. Neti, and T. S. Huang, "Frame-dependent multi-stream reliability indicators for audio-visual speech recognition" in Proc. Int. Conf. Multimedia Expo, 2003, vol.III, pp. 605-608.
    • (2003) Proc. Int. Conf. Multimedia Expo , vol.3 , pp. 605-608
    • Garg, A.1    Potamianos, G.2    Neti, C.3    Huang, T.S.4
  • 28
    • 85133343575 scopus 로고    scopus 로고
    • Speech intelligibility derived from asynchronous processing of auditory-visual information
    • K. W. Grant and S. Greenberg, "Speech intelligibility derived from asynchronous processing of auditory-visual information," Proc. Workshop Audio-Visual Speech Process., 2001, pp. 132-137.
    • (2001) Proc. Workshop Audio-Visual Speech Process. , pp. 132-137
    • Grant, K.W.1    Greenberg, S.2
  • 29
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modeling for continuous speech recognition
    • Sep.
    • S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition" IEEE Trans. Multimedia, vol.2, no.3, pp. 141-151, Sep. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 30
    • 0034825241 scopus 로고    scopus 로고
    • Multi-stream adaptive evidence combination for noise robust ASR
    • Apr.
    • A. Morris, A. Hagen, H. Glotin, and H. Bourlard, "Multi-stream adaptive evidence combination for noise robust ASR" Speech Commun., vol.34, no.1-2, pp. 25-40, Apr. 2001.
    • (2001) Speech Commun. , vol.34 , Issue.1-2 , pp. 25-40
    • Morris, A.1    Hagen, A.2    Glotin, H.3    Bourlard, H.4
  • 33
    • 51449122700 scopus 로고    scopus 로고
    • Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition"
    • S. T. Shivappa, ". D. Rao, and M. M. Trivedi, "Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2008, pp. 2241-2244.
    • (2008) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 2241-2244
    • Shivappa, S.T.1    Rao, S.D.2    Trivedi, M.M.3
  • 34
    • 33646007315 scopus 로고    scopus 로고
    • Multimodal person recognition for human-vehicle interaction
    • Apr./Jun.
    • E. Erzin, Y. Yemez, A. M. Tekalp, A. Ercil, H. Erdogan, and H. Abut, "Multimodal person recognition for human-vehicle interaction" IEEE Multimedia Mag., vol.13, no.2, pp. 18-31, Apr./Jun. 2006.
    • (2006) IEEE Multimedia Mag. , vol.13 , Issue.2 , pp. 18-31
    • Erzin, E.1    Yemez, Y.2    Tekalp, A.M.3    Ercil, A.4    Erdogan, H.5    Abut, H.6
  • 35
    • 36248953326 scopus 로고    scopus 로고
    • Audiovisual head orientation estimation with particle filtering in multisensor scenarios
    • Jan. article no. 32
    • C. Canton-Ferrer, C. Segura, J. R. Casas, M. Pardaÿs, and J. Hernando, "Audiovisual head orientation estimation with particle filtering in multisensor scenarios" EURASIP J. Adv. Signal Process., vol.2008, Jan. 2008, article no. 32.
    • (2008) EURASIP J. Adv. Signal Process. , vol.2008
    • Canton-Ferrer, C.1    Segura, C.2    Casas, J.R.3    Pardaÿs, M.4    Hernando, J.5
  • 39
    • 77956964676 scopus 로고    scopus 로고
    • Multi-level particle filter fusion of features and cues for audio-visual person tracking Multimodal Technologies for Perception of Humans
    • Berlin, Germany: Springer-Verlag
    • K. Bernardin, T. Gehrig, and R. Stiefelhagen, "Multi-level particle filter fusion of features and cues for audio-visual person tracking Multimodal Technologies for Perception of Humans, vol. 4625. Berlin, Germany: Springer-Verlag, 2007, ser. Lecture Notes in Computer Science.
    • (2007) Ser. Lecture Notes in Computer Science , vol.4625
    • Bernardin, K.1    Gehrig, T.2    Stiefelhagen, R.3
  • 41
    • 38749151899 scopus 로고    scopus 로고
    • An iterative decoding algorithm for fusion of multi-modal information
    • Special Issue on Human-Activity Analysis in Multimedia Data, 2008, DOI: 10.1155/2008/478396, article id 478396
    • S. T. Shivappa, ". D. Rao, and M. M. Trivedi, "An iterative decoding algorithm for fusion of multi-modal information" Eurasip J. Adv. Signal Process., vol. 2008, Special Issue on Human-Activity Analysis in Multimedia Data, 2008, DOI: 10.1155/2008/478396, article id 478396.
    • (2008) Eurasip J. Adv. Signal Process.
    • Shivappa, S.T.1    Rao, S.D.2    Trivedi, M.M.3
  • 43
    • 0027297425 scopus 로고
    • Near Shannon limit error-correcting coding and decoding: Turbo-codes
    • May
    • C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon limit error-correcting coding and decoding: Turbo-codes" in Proc. IEEE Int. Conf. Commun., May 1993, vol.2, pp. 1064-1070.
    • (1993) Proc. IEEE Int. Conf. Commun. , vol.2 , pp. 1064-1070
    • Berrou, C.1    Glavieux, A.2    Thitimajshima, P.3
  • 44
    • 33750368310 scopus 로고    scopus 로고
    • An audio-visual corpus for speech perception and automatic speech recognition
    • Nov.
    • M. Cooke, J. Barker, S. Cunningham, and X. Shao, "An audio-visual corpus for speech perception and automatic speech recognition" J. Acoust. Soc. Amer., vol.120, no.5, pp. 2421-2424, Nov. 2006.
    • (2006) J. Acoust. Soc. Amer. , vol.120 , Issue.5 , pp. 2421-2424
    • Cooke, M.1    Barker, J.2    Cunningham, S.3    Shao, X.4
  • 46
    • 38049165243 scopus 로고    scopus 로고
    • A decision fusion system across time and classifiers for audio-visual person identification
    • Berlin, Germany: Springer-Verlag, 2006, ser. Lecture Notes in Computer Science
    • A. Stergiou, A. Pnevmatikakis, and L. Polymenakos, "A decision fusion system across time and classifiers for audio-visual person identification," Multimodal Technologies for Perception of Humans, vol. 4122. Berlin, Germany: Springer-Verlag, 2006, ser. Lecture Notes in Computer Science, pp. 223-232.
    • Multimodal Technologies for Perception of Humans , vol.4122 , pp. 223-232
    • Stergiou, A.1    Pnevmatikakis, A.2    Polymenakos, L.3
  • 48
    • 57549101447 scopus 로고    scopus 로고
    • Audiovisual synchronization and fusion using canonical correlation analysis
    • Nov.
    • M. E. Sargin, Y. Yemez, E. Erzin, and A. M. Tekalp, "Audiovisual synchronization and fusion using canonical correlation analysis" IEEE Trans. Multimedia, vol.9, no.7, pp. 1396-1403, Nov. 2007.
    • (2007) IEEE Trans. Multimedia , vol.9 , Issue.7 , pp. 1396-1403
    • Sargin, M.E.1    Yemez, Y.2    Erzin, E.3    Tekalp, A.M.4
  • 49
    • 73049098610 scopus 로고    scopus 로고
    • Audio-visual group recognition using diffusion maps
    • Jan.
    • Y. Keller, S. Lafon, R. Coifman, and S. Zucker, "Audio-visual group recognition using diffusion maps" IEEE Trans. Signal Process., vol.58, no.1, pp. 403-413, Jan. 2009.
    • (2009) IEEE Trans. Signal Process. , vol.58 , Issue.1 , pp. 403-413
    • Keller, Y.1    Lafon, S.2    Coifman, R.3    Zucker, S.4
  • 50
    • 0037475135 scopus 로고    scopus 로고
    • A survey of socially interactive robots
    • Mar.
    • T. Fong, I. Nourbakhsh, and K. Dautenhahn, "A survey of socially interactive robots" Robot. Autonom. Syst., vol.42, no.3-4, pp. 143-166, Mar. 2002.
    • (2002) Robot. Autonom. Syst. , vol.42 , Issue.3-4 , pp. 143-166
    • Fong, T.1    Nourbakhsh, I.2    Dautenhahn, K.3
  • 51
    • 38049168159 scopus 로고    scopus 로고
    • Towards communicating agents and avatars in virtual worlds
    • A. Nijholt and G. Hondorp, "Towards communicating agents and avatars in virtual worlds" in Proc. Eurographics, 2000, pp. 91-95.
    • (2000) Proc. Eurographics , pp. 91-95
    • Nijholt, A.1    Hondorp, G.2
  • 57
    • 11844267204 scopus 로고    scopus 로고
    • Dynamic context capture and distributed video arrays for intelligent spaces
    • Jan.
    • M. M. Trivedi, K. S. Huang, and I. Mikic, "Dynamic context capture and distributed video arrays for intelligent spaces" IEEE Trans. Syst. Man Cybern. A, Syst, Humans, vol.35, no.1, pp. 145-163, Jan. 2005.
    • (2005) IEEE Trans. Syst. Man Cybern. A, Syst, Humans , vol.35 , Issue.1 , pp. 145-163
    • Trivedi, M.M.1    Huang, K.S.2    Mikic, I.3
  • 58
    • 57849101125 scopus 로고    scopus 로고
    • Detecting small group activities from multimodal observations
    • O. Brdiczka, J. Maisonnasse, P. Reignier, and J. Crowley, "Detecting small group activities from multimodal observations" Appl. Intell., vol.30, no.1, pp. 47-57, 2007.
    • (2007) Appl. Intell. , vol.30 , Issue.1 , pp. 47-57
    • Brdiczka, O.1    Maisonnasse, J.2    Reignier, P.3    Crowley, J.4
  • 64
    • 21244492850 scopus 로고    scopus 로고
    • Real-time speaker tracking using particle filter sensor fusion
    • Mar.
    • Y. Chen and Y. Rui, "Real-time speaker tracking using particle filter sensor fusion" Proc. IEEE, vol.92, no.3, pp. 485-494, Mar. 2004.
    • (2004) Proc. IEEE , vol.92 , Issue.3 , pp. 485-494
    • Chen, Y.1    Rui, Y.2
  • 65
    • 0034507915 scopus 로고    scopus 로고
    • Look whos talking: Speaker detection using video and audio correlation
    • R. Cutler and L. S. Davis, "Look whos talking: Speaker detection using video and audio correlation" in Proc. IEEE Int. Conf. Multimedia Expo (III), 2000, vol.3, pp. 1589-1592.
    • (2000) Proc. IEEE Int. Conf. Multimedia Expo (III) , vol.3 , pp. 1589-1592
    • Cutler, R.1    Davis, L.S.2
  • 66
    • 0042349407 scopus 로고    scopus 로고
    • A graphical model for audiovisual object tracking
    • Jul.
    • M. Beal, N. Jojic, and H. Attias, "A graphical model for audiovisual object tracking" IEEE Trans. Pattern Anal. Mach. Intell., vol.25, no.7, pp. 828-836, Jul. 2003.
    • (2003) IEEE Trans. Pattern Anal. Mach. Intell. , vol.25 , Issue.7 , pp. 828-836
    • Beal, M.1    Jojic, N.2    Attias, H.3
  • 67
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • Dec. DOI: 10.1038/ 264746a0
    • H. McGurk and J. MacDonald, "Hearing lips and seeing voices" Lett. Nature, vol.264, pp. 746-748, Dec. 1976, DOI: 10.1038/ 264746a0.
    • (1976) Lett. Nature , vol.264 , pp. 746-748
    • McGurk, H.1    MacDonald, J.2
  • 68
    • 77956960200 scopus 로고    scopus 로고
    • D. G. Stork and M. E. Hennecke, Eds., Berlin, Germany: Springer-Verlag
    • D. G. Stork and M. E. Hennecke, Eds., Speechreading by Machines and Humans. Berlin, Germany: Springer-Verlag, 1996.
    • (1996) Speechreading by Machines and Humans
  • 71
    • 0024767890 scopus 로고
    • Integration of acoustic and visual speech signals using neural networks
    • Nov.
    • B. Yuhas, M. Goldstein, and T. Sejnowski, "Integration of acoustic and visual speech signals using neural networks" IEEE Commun. Mag., vol.27, no.11, pp. 65-71, Nov. 1989.
    • (1989) IEEE Commun. Mag. , vol.27 , Issue.11 , pp. 65-71
    • Yuhas, B.1    Goldstein, M.2    Sejnowski, T.3
  • 72
    • 84943272400 scopus 로고
    • Bimodal sensor integration on the example of speech-reading
    • C. Bregler, S. Manke, and A. Waibel, "Bimodal sensor integration on the example of speech-reading" in Proc. IEEE Int. Conf. Neural Netw., 1993, vol.2, pp. 667-671.
    • (1993) Proc. IEEE Int. Conf. Neural Netw. , vol.2 , pp. 667-671
    • Bregler, C.1    Manke, S.2    Waibel, A.3
  • 73
    • 0001432664 scopus 로고    scopus 로고
    • On the integration of auditory and visual parameters in an HMM-based ASR
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
    • A. Adjoudani and C. Benoit, "On the integration of auditory and visual parameters in an HMM-based ASR," Speechreading by Humans and Machines: Systems and Applications, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 461-472.
    • (1996) Speechreading by Humans and Machines: Systems and Applications , pp. 461-472
    • Adjoudani, A.1    Benoit, C.2
  • 77
    • 0030355935 scopus 로고    scopus 로고
    • A new ASR approach based on independent processing and recombination of partial frequency bands
    • H. Bourlard and S. Dupont, "A new ASR approach based on independent processing and recombination of partial frequency bands" in Proc. Int. Conf. Spoken Lang., 1996, vol.1, pp. 426-429.
    • (1996) Proc. Int. Conf. Spoken Lang. , vol.1 , pp. 426-429
    • Bourlard, H.1    Dupont, S.2
  • 79
    • 0031624666 scopus 로고    scopus 로고
    • Discriminative training of HMM stream exponents for audio-visual speech recognition
    • G. Potamianos and H. Graf, "Discriminative training of HMM stream exponents for audio-visual speech recognition" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 1998, vol.6, pp. 3733-3736.
    • (1998) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.6 , pp. 3733-3736
    • Potamianos, G.1    Graf, H.2
  • 80
    • 85009154155 scopus 로고    scopus 로고
    • Stream weight optimization of speech and lip image sequence for audio-visual speech recognition
    • S. Nakamura, H. Ito, and K. Shikano, "Stream weight optimization of speech and lip image sequence for audio-visual speech recognition" in Proc. Int. Conf. Spoken Lang., 2000, vol.3, pp. 20-24.
    • (2000) Proc. Int. Conf. Spoken Lang. , vol.3 , pp. 20-24
    • Nakamura, S.1    Ito, H.2    Shikano, K.3
  • 84
    • 85009135204 scopus 로고    scopus 로고
    • Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables
    • T. Stephenson, H. Bourlard, S. Bengio, and A. Morris, "Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables" in Proc. Int. Conf. Spoken Lang., 2000, vol.II, pp. 951-954.
    • (2000) Proc. Int. Conf. Spoken Lang. , vol.2 , pp. 951-954
    • Stephenson, T.1    Bourlard, H.2    Bengio, S.3    Morris, A.4
  • 87
    • 79958751716 scopus 로고    scopus 로고
    • Audio-visual speech recognition with a hybrid SVM-HMM system
    • Antalya, Turkey, Sep.
    • M. Gurban and J. Thiran, "Audio-visual speech recognition with a hybrid SVM-HMM system" in Proc. 13th Eur. Signal Process. Conf., Antalya, Turkey, Sep. 2005.
    • (2005) Proc. 13th Eur. Signal Process. Conf.
    • Gurban, M.1    Thiran, J.2
  • 89
    • 10244242647 scopus 로고    scopus 로고
    • Detection and separation of speech event using audio and video information fusion and its application to robust speech interface
    • F. Asano, K. Yamamoto, I. Hara, J. Ogata, T. Yoshimura, Y. Motomura, N. Ichimura, and H. Asoh, "Detection and separation of speech event using audio and video information fusion and its application to robust speech interface" EURASIP J. Appl. Signal Process., vol.2004, pp. 1727-1738, 2004.
    • (2004) EURASIP J. Appl. Signal Process. , vol.2004 , pp. 1727-1738
    • Asano, F.1    Yamamoto, K.2    Hara, I.3    Ogata, J.4    Yoshimura, T.5    Motomura, Y.6    Ichimura, N.7    Asoh, H.8
  • 90
    • 0346707503 scopus 로고    scopus 로고
    • Source localization in reverberant environments: Modeling and statistical analysis
    • Nov.
    • T. Gustafsson, ". D. Rao, and M. M. Trivedi, "Source localization in reverberant environments: Modeling and statistical analysis" IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 791-803, Nov. 2003.
    • (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.6 , pp. 791-803
    • Gustafsson, T.1    Rao, S.D.2    Trivedi, M.M.3
  • 91
    • 85132038963 scopus 로고
    • Neural network lipreading system for improved speech recognition
    • Jun
    • D. G. Stork, G. Wolff, and E. Levine, "Neural network lipreading system for improved speech recognition" in Proc. Int. Joint Conf. Neural Netw., Jun. 1992, vol.2, pp. 289-295.
    • (1992) Proc. Int. Joint Conf. Neural Netw. , vol.2 , pp. 289-295
    • Stork, D.G.1    Wolff, G.2    Levine, E.3
  • 92
    • 0030376248 scopus 로고    scopus 로고
    • Robust audiovisual integration using semicontinuous hidden Markov models
    • Q. Su and P. L. Silsbee, "Robust audiovisual integration using semicontinuous hidden Markov models" in Proc. Int. Conf. Spoken Lang., 1996, vol.1, pp. 42-45.
    • (1996) Proc. Int. Conf. Spoken Lang. , vol.1 , pp. 42-45
    • Su, Q.1    Silsbee, P.L.2
  • 93
    • 33846013241 scopus 로고    scopus 로고
    • Object tracking: A survey
    • article no. 13
    • A. Yilmaz, O. Javed, and M. Shah, "Object tracking: A survey" ACM Comput. Surv., vol.38, no.4, 2006, article no. 13.
    • (2006) ACM Comput. Surv. , vol.38 , Issue.4
    • Yilmaz, A.1    Javed, O.2    Shah, M.3
  • 96
    • 0035458007 scopus 로고    scopus 로고
    • Robust sound localization using multi-source audiovisual information fusion
    • Sep.
    • S. G. Z. P. Aarabi, "Robust sound localization using multi-source audiovisual information fusion" Inf. Fusion, vol.2, no.3, pp. 209-223, Sep. 2001.
    • (2001) Inf. Fusion , vol.2 , Issue.3 , pp. 209-223
    • Aarabi, S.G.Z.P.1
  • 98
    • 0037774471 scopus 로고    scopus 로고
    • Audiovisual localization of multiple speakers in a video teleconferencing setting
    • B. Kapralos, M. R. M. Jenkin, and E. Milios, "Audiovisual localization of multiple speakers in a video teleconferencing setting," Int. J. Imag. Syst, Technol., vol. 13, pp. 95-105, 2002.
    • (2002) Int. J. Imag. Syst, Technol. , vol.13 , pp. 95-105
    • Kapralos, B.1    Jenkin, M.R.M.2    Milios, E.3
  • 99
    • 77956766546 scopus 로고    scopus 로고
    • Audio visual fusion and tracking with multilevel iterative decoding: Framework and experimental evaluation
    • DOI: 10.1109/ JSTSP.2010.2057890
    • S. T. Shivappa, S. D. Rao, and M. M. Trivedi, "Audio visual fusion and tracking with multilevel iterative decoding: Framework and experimental evaluation" IEEE J. Sel. Top. Signal Process., 2010, DOI: 10.1109/ JSTSP.2010.2057890.
    • (2010) IEEE J. Sel. Top. Signal Process.
    • Shivappa, S.T.1    Rao, S.D.2    Trivedi, M.M.3
  • 100
    • 57349117033 scopus 로고    scopus 로고
    • Coordinate-free calibration of an acoustically driven camera pointing system
    • DOI: 10.1109/ICDSC. 2008.4635685
    • E. Ettinger and Y. Freund, "Coordinate-free calibration of an acoustically driven camera pointing system" in Proc. Int. Conf. Distrib. Smart Cameras, 2008, DOI: 10.1109/ICDSC. 2008.4635685.
    • (2008) Proc. Int. Conf. Distrib. Smart Cameras
    • Ettinger, E.1    Freund, Y.2
  • 102
    • 84899028297 scopus 로고    scopus 로고
    • Audio vision: Using audiovisual synchrony to locate sounds
    • J. Hershey and J. Movellan, "Audio vision: Using audiovisual synchrony to locate sounds" in Proc. Neural Inf. Process. Syst., 2000, pp. 813-819.
    • (2000) Proc. Neural Inf. Process. Syst. , pp. 813-819
    • Hershey, J.1    Movellan, J.2
  • 109
    • 0034512820 scopus 로고    scopus 로고
    • Emotional expressions in audiovisual human computer interaction
    • L. S. Chen and T. S. Huang, "Emotional expressions in audiovisual human computer interaction" in Proc. IEEE Int. Conf. Multimedia Expo, 2000, vol.1, pp. 423-426.
    • (2000) Proc. IEEE Int. Conf. Multimedia Expo , vol.1 , pp. 423-426
    • Chen, L.S.1    Huang, T.S.2
  • 114
    • 34147125327 scopus 로고    scopus 로고
    • Emotion recognition from the facial image and speech signal
    • Aug.
    • H. Go, K. Kwak, D. Lee, and M. Chun, "Emotion recognition from the facial image and speech signal" in Proc. SICE Annu. Conf., Aug. 2003, vol.3, pp. 2890-2895.
    • (2003) Proc. SICE Annu. Conf. , vol.3 , pp. 2890-2895
    • Go, H.1    Kwak, K.2    Lee, D.3    Chun, M.4
  • 115
    • 44049099067 scopus 로고    scopus 로고
    • Audio-visual affective expression recognition through multistream fused hmm
    • Jun.
    • Z. Zeng, J. Tu, ". M. Pianfetti, and T. S. Huang, "Audio-visual affective expression recognition through multistream fused hmm" IEEE Trans. Multimedia, vol. 10, no. 4, pp. 570-577, Jun. 2008.
    • (2008) IEEE Trans. Multimedia , vol.10 , Issue.4 , pp. 570-577
    • Zeng, Z.1    Tu, J.2    Pianfetti, S.M.3    Huang, T.S.4
  • 116
    • 78049362386 scopus 로고    scopus 로고
    • Decision level combination of multiple modalities for recognition and analysis of emotional expression
    • A. Metallinou, S. Lee, and S. Narayanan, "Decision level combination of multiple modalities for recognition and analysis of emotional expression" in IEEE Int. Conf. Acoust. Speech Signal Process., 2010, pp. 2462-2465.
    • IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2010 , pp. 2462-2465
    • Metallinou, A.1    Lee, S.2    Narayanan, S.3
  • 118
    • 33947384963 scopus 로고    scopus 로고
    • Audio-visual biometrics
    • Nov.
    • P. S. Aleksic and A. K. Katsaggelos, "Audio-visual biometrics" Proc. IEEE, vol.94, no.11, pp. 2025-2044, Nov. 2006.
    • (2006) Proc. IEEE , vol.94 , Issue.11 , pp. 2025-2044
    • Aleksic, P.S.1    Katsaggelos, A.K.2
  • 119
    • 70350111306 scopus 로고    scopus 로고
    • Biometric person authentication with liveness detection based on audio visual fusion
    • G. Chetty andM. Wagner, "Biometric person authentication with liveness detection based on audio visual fusion" Int. J. Biometrics, vol.1, no.4, pp. 463-478, 2009.
    • (2009) Int. J. Biometrics , vol.1 , Issue.4 , pp. 463-478
    • Chetty, G.1    Wagner, M.2
  • 120
    • 0035394653 scopus 로고    scopus 로고
    • Adaptive fusion of speech and lip information for robust speaker identification
    • Jul.
    • T. Wark and S. Sridharan, "Adaptive fusion of speech and lip information for robust speaker identification" Digital Signal Process., vol.11, no.3, pp. 169-186, Jul. 2001.
    • (2001) Digital Signal Process. , vol.11 , Issue.3 , pp. 169-186
    • Wark, T.1    Sridharan, S.2
  • 121
    • 4544228318 scopus 로고    scopus 로고
    • Identity verification using speech and face information
    • Sep.
    • C. Sanderson and K. Paliwal, "Identity verification using speech and face information" Digital Signal Process., vol.14, no.5, pp. 449-480, Sep. 2004.
    • (2004) Digital Signal Process. , vol.14 , Issue.5 , pp. 449-480
    • Sanderson, C.1    Paliwal, K.2
  • 122
    • 15044355748 scopus 로고    scopus 로고
    • Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems
    • Mar.
    • R. Snelick, U. Uludag, A. Mink, M. Indovina, and A. Jain, "Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems" IEEE Trans. Pattern Anal. Mach. Intell., vol.27, no.3, pp. 450-455, Mar. 2005.
    • (2005) IEEE Trans. Pattern Anal. Mach. Intell. , vol.27 , Issue.3 , pp. 450-455
    • Snelick, R.1    Uludag, U.2    Mink, A.3    Indovina, M.4    Jain, A.5
  • 125
  • 128
    • 26844533276 scopus 로고    scopus 로고
    • Multimodal speaker identification using an adaptive classifier cascade based on modality reliability
    • Oct.
    • E. Erzin, Y. Yemez, and A. M. Tekalp, "Multimodal speaker identification using an adaptive classifier cascade based on modality reliability" IEEE Trans. Multimedia, vol.7, no.5, pp. 840-852, Oct. 2005.
    • (2005) IEEE Trans. Multimedia , vol.7 , Issue.5 , pp. 840-852
    • Erzin, E.1    Yemez, Y.2    Tekalp, A.M.3
  • 129
  • 134
    • 33846227904 scopus 로고    scopus 로고
    • Automatic meeting segmentation using dynamic bayesian networks
    • Jan.
    • A. Dielmann and S. Renals, "Automatic meeting segmentation using dynamic bayesian networks" IEEE Trans. Multimedia, vol.9, no.1, pp. 25-36, Jan. 2007.
    • (2007) IEEE Trans. Multimedia , vol.9 , Issue.1 , pp. 25-36
    • Dielmann, A.1    Renals, S.2
  • 135
    • 48149111986 scopus 로고    scopus 로고
    • Robust multi-modal group action recognition in meetings from disturbed videos with the asynchronous hidden Markov model
    • M. Al-Hames, C. Lenz, S. Reiter, J. Schenk, F. Wallhoff, and G. Rigoll, "Robust multi-modal group action recognition in meetings from disturbed videos with the asynchronous hidden Markov model" in Proc. IEEE Int. Conf. Image Process., 2007, vol.2, pp. 213-216.
    • (2007) Proc. IEEE Int. Conf. Image Process. , vol.2 , pp. 213-216
    • Al-Hames, M.1    Lenz, C.2    Reiter, S.3    Schenk, J.4    Wallhoff, F.5    Rigoll, G.6
  • 136
    • 77956957378 scopus 로고    scopus 로고
    • Multimodal corpora: From models of natural interaction to systems and applications
    • Berlin, Germany: Springer-Verlag
    • M. Kipp, J. C. Martin, P. Paggio, and D. Heylen, Multimodal Corpora: From Models of Natural Interaction to Systems and Applications. Berlin, Germany: Springer-Verlag, 2009, ser. Lecture Notes on Artificial Intelligence.
    • (2009) Ser. Lecture Notes on Artificial Intelligence
    • Kipp, M.1    Martin, J.C.2    Paggio, P.3    Heylen, D.4
  • 137
    • 85009275134 scopus 로고    scopus 로고
    • The ISL meeting corpus: The impact of meeting type on speech style
    • Denver, CO, Sep.
    • S. Burger, V. MacLaren, and H. Yu, "The ISL meeting corpus: The impact of meeting type on speech style" in Proc. Int. Conf. Spoken Lang., Denver, CO, Sep. 16-20, 2002.
    • (2002) Proc. Int. Conf. Spoken Lang. , vol.16-20
    • Burger, S.1    MacLaren, V.2    Yu, H.3
  • 142
    • 50449105545 scopus 로고    scopus 로고
    • Interpretation of multiparty meetings: The AMI and AMIDA projects
    • Trento, Italy, May
    • S. Renals, T. Hain, and H. Bourlard, "Interpretation of multiparty meetings: The AMI and AMIDA projects" in Proc. Hands-Free Speech Commun. Microphone Arrays, Trento, Italy, May 6-8, 2008, pp. 115-118.
    • (2008) Proc. Hands-Free Speech Commun. Microphone Arrays , vol.6-8 , pp. 115-118
    • Renals, S.1    Hain, T.2    Bourlard, H.3
  • 144
    • 56749181028 scopus 로고    scopus 로고
    • Robust multimodal audio-visual processing for advanced context awareness in smart spaces
    • Jan.
    • A. Pnevmatikakis, J. Soldatos, F. Talantzis, and L. Polymenakos, "Robust multimodal audio-visual processing for advanced context awareness in smart spaces" Personal Ubiquitous Comput., vol.13, no.1, pp. 3-14, Jan. 2009.
    • (2009) Personal Ubiquitous Comput. , vol.13 , Issue.1 , pp. 3-14
    • Pnevmatikakis, A.1    Soldatos, J.2    Talantzis, F.3    Polymenakos, L.4
  • 147
    • 0034245149 scopus 로고    scopus 로고
    • A Bayesian computer vision system for modeling human interactions
    • Aug.
    • N. M. Oliver, ". Rosario, and A. Pentland, "A Bayesian computer vision system for modeling human interactions" IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 831-843, Aug. 2000.
    • (2000) IEEE Trans. Pattern Anal. Mach. Intell. , vol.22 , Issue.8 , pp. 831-843
    • Oliver, N.M.1    Rosario, S.2    Pentland, A.3
  • 151
    • 34249701400 scopus 로고    scopus 로고
    • A unified model of early word learning: Integrating statistical and social cues
    • C. Yu and D. H. Ballard, "A unified model of early word learning: Integrating statistical and social cues" Neurocomputing, vol.70, no.13-15, pp. 2149-2165, 2007.
    • (2007) Neurocomputing , vol.70 , Issue.13-15 , pp. 2149-2165
    • Yu, C.1    Ballard, D.H.2
  • 152
    • 34247550023 scopus 로고    scopus 로고
    • Visually-guided attention enhances target identification in a complex auditory scene
    • V. Best, E. J. Ozmeral, and B. G. Shinn-Cunningham, "Visually-guided attention enhances target identification in a complex auditory scene" J. Assoc. Res. Otolaryngol., vol.8, no.2, pp. 294-304, 2007.
    • (2007) J. Assoc. Res. Otolaryngol. , vol.8 , Issue.2 , pp. 294-304
    • Best, V.1    Ozmeral, E.J.2    Shinn-Cunningham, B.G.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.