SCOPUS 정보 검색 플랫폼

Volumn 98, Issue 10, 2010, Pages 1692-1715

Audiovisual information fusion in human-computer interfaces and intelligent environments: A survey

(3) Shivappa, Shankar T a Trivedi, Mohan Manubhai a Rao, Bhaskar D a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

Audiovisual fusion; dynamic Bayesian networks (DBNs); hidden Markov models; human activity analysis; human activity modeling; information fusion; machine learning; multimodal systems

Indexed keywords

BAYESIAN NETWORKS; HIDDEN MARKOV MODELS; HIERARCHICAL SYSTEMS; INFORMATION FUSION; INTELLIGENT SYSTEMS; KNOWLEDGE BASED SYSTEMS; SEMANTICS; SPEECH RECOGNITION; SURVEYS;

AUDIO-VISUAL FUSION; DYNAMIC BAYESIAN NETWORKS; HUMAN ACTIVITIES; HUMAN ACTIVITY ANALYSIS; MULTIMODAL SYSTEM;

LEARNING SYSTEMS;

EID: 77956978396 PISSN: 00189219 EISSN: None Source Type: Journal
DOI: 10.1109/JPROC.2010.2057231 Document Type: Review

Times cited : (104)

References (152)

1
- 0002988210
- Computing machinery and intelligence
- LIX
- A. M. Turing, "Computing machinery and intelligence" Mind, vol.LIX, no.236, pp. 433-460, 1950.
- (1950) Mind , Issue.236 , pp. 433-460
- Turing, A.M.¹

2
- 85032752352
- Audiovisual speech processing
- Jan.
- T. Chen, "Audiovisual speech processing" IEEE Signal Process. Mag., vol.18, no.1, pp. 9-21, Jan. 2001.
- (2001) IEEE Signal Process. Mag. , vol.18 , Issue.1 , pp. 9-21
- Chen, T.¹

3
- 23044432651
- Grounding words in perception and action: Computational insights
- Aug.
- D. Roy, "Grounding words in perception and action: Computational insights" Trends Cogn. Sci., vol.9, no.8, pp. 389-396, Aug. 2005.
- (2005) Trends Cogn. Sci. , vol.9 , Issue.8 , pp. 389-396
- Roy, D.¹

4
- 34249663433
- New trends in cognitive science: Integrative approaches to learning and development
- Jan.
- G. O. Deaÿk, M. S. Bartlett, and T. Jebara, "New trends in cognitive science: Integrative approaches to learning and development" Neurocomputing, vol.70, no.13-15, pp. 2139-2147, Jan. 2007.
- (2007) Neurocomputing , vol.70 , Issue.13-15 , pp. 2139-2147
- Deaÿk, G.O.¹ Bartlett, M.S.² Jebara, T.³

5
- 84867456688
- A multimodal learning interface for grounding spoken language in sensory perceptions
- C. Yu and D. H. Ballard, "A multimodal learning interface for grounding spoken language in sensory perceptions" ACM Trans. Appl. Perception, vol.1, no.1, pp. 57-80, 2004.
- (2004) ACM Trans. Appl. Perception , vol.1 , Issue.1 , pp. 57-80
- Yu, C.¹ Ballard, D.H.²

6
- 0029756565
- Information combination operators for data fusion: A comparative review with classification
- Jan.
- I. Bloch, "Information combination operators for data fusion: A comparative review with classification" IEEE Trans. Syst. Man Cybern. A, Syst. Humans, vol.26, no.1, pp. 52-67, Jan. 1996.
- (1996) IEEE Trans. Syst. Man Cybern. A, Syst. Humans , vol.26 , Issue.1 , pp. 52-67
- Bloch, I.¹

7
- 34548206204
- Multimodal human-computer interaction: A survey
- A. Jaimes and N. Sebe, "Multimodal human-computer interaction: A survey" Comput. Vis. Image Understand., vol.108, no.1-2, pp. 116-134, 2007.
- (2007) Comput. Vis. Image Understand. , vol.108 , Issue.1-2 , pp. 116-134
- Jaimes, A.¹ Sebe, N.²

8
- 65349116659
- Forensic audio and visual evidence 2004-2007: A review
- Lyon, France, Oct.
- J. Bijhold, A. Ruifrok, M. Jessen, Z. Geradts, and S. Ehrhardt, "Forensic audio and visual evidence 2004-2007: A review" in Proc. 15th INTERPOL Forens. Sci. Symp., Lyon, France, Oct. 2007, pp. 372-413.
- (2007) Proc. 15th INTERPOL Forens. Sci. Symp. , pp. 372-413
- Bijhold, J.¹ Ruifrok, A.² Jessen, M.³ Geradts, Z.⁴ Ehrhardt, S.⁵

9
- 35348847678
- "The CLEAR 2006 evaluation," Multimodal technologies for perception of humans
- Berlin, Germany: Springer-Verlag
- R. Stiefelhagen, K. Bernardin, R. Bowers, J. S. Garofolo, D. Mostefa, and P. Soundararajan, "The CLEAR 2006 evaluation," Multimodal Technologies for Perception of Humans, vol. 4122. Berlin, Germany: Springer-Verlag, 2007, ser. Lecture Notes in Computer Science, pp. 1-44.
- (2007) Ser. Lecture Notes in Computer Science , vol.4122 , pp. 1-44
- Stiefelhagen, R.¹ Bernardin, K.² Bowers, R.³ Garofolo, J.S.⁴ Mostefa, D.⁵ Soundararajan, P.⁶

10
- 47749150338
- Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007
- R. Stiefelhagen, R. Bowers, and J. Fiscus, Eds. Berlin, Germany: Springer-Verlag, 2008
- R. Stiefelhagen, R. Bowers, and J. Fiscus, Eds. Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007. Berlin, Germany: Springer-Verlag, 2008, ser. Lecture Notes in Computer Science, vol.4625.
- Ser. Lecture Notes in Computer Science , vol.4625

11
- 79960357998
- Capturing multimodal interaction at medical meetings in a hospital setting: Opportunities and challenges
- B. Kane, S. Luz, and J. Su, "Capturing multimodal interaction at medical meetings in a hospital setting: Opportunities and challenges" in Proc. LREC Workshop Multimodal Corpora: Adv. Capturing, Coding Analyzing Multimodality, 2010, pp. 140-145.
- Proc. LREC Workshop Multimodal Corpora: Adv. Capturing, Coding Analyzing Multimodality , vol.2010 , pp. 140-145
- Kane, B.¹ Luz, S.² Su, J.³

12
- 80054934252
- A multimodal corpus recorded in a health smart home
- A. Fleury, M. Vacher, F. Portet, P. Chahuara, and N. Noury, "A multimodal corpus recorded in a health smart home" in Proc. LREC Workshop Multimodal Corpora: Adv. Capturing, Coding Analyzing Multimodality, 2010, pp. 99-105.
- Proc. LREC Workshop Multimodal Corpora: Adv. Capturing, Coding SAnalyzing Multimodality , vol.2010 , pp. 99-105
- Fleury, A.¹ Vacher, M.² Portet, F.³ Chahuara, P.⁴ Noury, N.⁵

13
- 77956509104
- Towards a vision-based system exploring 3D driver posture dynamics for driver assistance: Issues and possibilities
- C. Tran and M. M. Trivedi, "Towards a vision-based system exploring 3D driver posture dynamics for driver assistance: Issues and possibilities" in Proc. IEEE Intell. Veh. Symp., 2010, pp. 179-184.
- Proc. IEEE Intell. Veh. Symp. , vol.2010 , pp. 179-184
- Tran, C.¹ Trivedi, M.M.²

14
- 77956549360
- Contextual framework for speech based emotion recognition in driver assistance system
- A. Tawari and M. M. Trivedi, "Contextual framework for speech based emotion recognition in driver assistance system" in Proc. IEEE Intell. Veh. Symp., 2010, pp. 174-178.
- Proc. IEEE Intell. Veh. Symp. , vol.2010 , pp. 174-178
- Tawari, A.¹ Trivedi, M.M.²

15
- 84890017127
- K. Takeda, H. Erdogan, J. H. L. Hansen, and H. Abut, Eds. New York: Springer-Verlag
- K. Takeda, H. Erdogan, J. H. L. Hansen, and H. Abut, Eds., In-Vehicle Corpus and Signal Processing for Driver Behavior. New York: Springer-Verlag, 2009.
- (2009) Vehicle Corpus and Signal Processing for Driver Behavior

16
- 4544290191
- Recent advances in the automatic recognition of audiovisual speech
- Sep.
- G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, "Recent advances in the automatic recognition of audiovisual speech" Proc. IEEE, vol.91, no.9, pp. 1306-1326, Sep. 2003.
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ Senior, A.W.⁵

17
- 70449556249
- Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms
- S. T. Shivappa, M. M. Trivedi, and B. D. Rao, "Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms" in Proc. IEEE Comput. Vis. Pattern Recognit. Workshop, 2009, pp. 107-114.
- (2009) Proc. IEEE Comput. Vis. Pattern Recognit. Workshop , pp. 107-114
- Shivappa, S.T.¹ Trivedi, M.M.² Rao, B.D.³

18
- 70349205633
- Role of head pose estimation in speech acquisition from distant microphones
- S. T. Shivappa, ". D. Rao, and M. M. Trivedi, "Role of head pose estimation in speech acquisition from distant microphones" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2009, pp. 3557-3560.
- (2009) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 3557-3560
- Shivappa, S.T.¹ Rao, S.D.² Trivedi, M.M.³

19
- 40249089621
- Speech enhancement and recognition in meetings with an audio-visual sensor array
- Nov.
- H. K. Maganti, D. Gatica-Perez, and I. McCowan, "Speech enhancement and recognition in meetings with an audio-visual sensor array" IEEE Trans. Audio Speech Lang. Process., vol.15, no.8, pp. 2257-2269, Nov. 2007.
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2257-2269
- Maganti, H.K.¹ Gatica-Perez, D.² McCowan, I.³

20
- 0003770986
- Comparing models for audiovisual fusion in a noisy-vowel recognition task
- Nov.
- P. Teissier, J. Robert-Ribes, J. L. Schwartz, and A. Guerin-Dugue, "Comparing models for audiovisual fusion in a noisy-vowel recognition task" IEEE Trans. Speech Audio Process., vol.7, no.6, pp. 629-642, Nov. 1999.
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.6 , pp. 629-642
- Teissier, P.¹ Robert-Ribes, J.² Schwartz, J.L.³ Guerin-Dugue, A.⁴

21
- 85069232505
- Ten years after summerfield: A taxonomy of models for audio-visual fusion in speech perception
- London, U.K.: Psychology Press
- J. L. Schwartz, J. Robert-Ribes, and P. Escudier, "Ten years after summerfield: A taxonomy of models for audio-visual fusion in speech perception," Hearing by Eye II. London, U.K.: Psychology Press, 1998, pp. 85-108.
- (1998) Hearing by Eye II , pp. 85-108
- Schwartz, J.L.¹ Robert-Ribes, J.² Escudier, P.³

22
- 0035248924
- PCA versus LDA
- Feb.
- A. M. Martinez and A. C. Kak, "PCA versus LDA" IEEE Trans. Pattern Anal. Mach. Intell., vol.23, no.2, pp. 228-233, Feb. 2001.
- (2001) IEEE Trans. Pattern Anal. Mach. Intell. , vol.23 , Issue.2 , pp. 228-233
- Martinez, A.M.¹ Kak, A.C.²

23
- 0035790960
- Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins summer 2000 workshop
- C. Neti, G. Potamianos, J. Luettin, I. Matthews, H. Glotin, and D. Vergyri, "Large-vocabulary audio-visual speech recognition: A summary of the Johns Hopkins summer 2000 workshop" in Proc. IEEE Workshop Multimedia Signal Process., 2001, pp. 619-624.
- (2001) Proc. IEEE Workshop Multimedia Signal Process. , pp. 619-624
- Neti, C.¹ Potamianos, G.² Luettin, J.³ Matthews, I.⁴ Glotin, H.⁵ Vergyri, D.⁶

24
- 15044354466
- Automatic analysis of multimodal group actions in meetings
- Mar.
- I. McCowan, D. Gatica-Perez, S. Bengio, G. Lathoud, M. Barnard, and D. Zhang, "Automatic analysis of multimodal group actions in meetings" IEEE Trans. Pattern Anal. Mach. Intell., vol.27, no.3, pp. 305-317, Mar. 2005.
- (2005) IEEE Trans. Pattern Anal. Mach. Intell. , vol.27 , Issue.3 , pp. 305-317
- McCowan, I.¹ Gatica-Perez, D.² Bengio, S.³ Lathoud, G.⁴ Barnard, M.⁵ Zhang, D.⁶

25
- 34948889993
- Microphone arrays as generalized cameras for integrated audio visual processing
- DOI: 10.1109/CVPR.2007
- A. ODonovan and R. Duraiswami, "Microphone arrays as generalized cameras for integrated audio visual processing" in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., 2007, DOI: 10.1109/CVPR.2007. 383345.
- (2007) Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit. , pp. 383345
- Odonovan, A.¹ Duraiswami, R.²

26
- 33645672078
- Kalman filters for audio-video source localization
- T. Gehrig, K. Nickel, H. K. Ekenel, U. Klee, and J. McDonough, "Kalman filters for audio-video source localization" in Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. 2005, pp. 118-121.
- (2005) Proc. IEEE Workshop Appl. Signal Process. Audio Acoust., Oct. , pp. 118-121
- Gehrig, T.¹ Nickel, K.² Ekenel, H.K.³ Klee, U.⁴ McDonough, J.⁵

27
- 37849004875
- Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
- A. Garg, G. Potamianos, C. Neti, and T. S. Huang, "Frame-dependent multi-stream reliability indicators for audio-visual speech recognition" in Proc. Int. Conf. Multimedia Expo, 2003, vol.III, pp. 605-608.
- (2003) Proc. Int. Conf. Multimedia Expo , vol.3 , pp. 605-608
- Garg, A.¹ Potamianos, G.² Neti, C.³ Huang, T.S.⁴

28
- 85133343575
- Speech intelligibility derived from asynchronous processing of auditory-visual information
- K. W. Grant and S. Greenberg, "Speech intelligibility derived from asynchronous processing of auditory-visual information," Proc. Workshop Audio-Visual Speech Process., 2001, pp. 132-137.
- (2001) Proc. Workshop Audio-Visual Speech Process. , pp. 132-137
- Grant, K.W.¹ Greenberg, S.²

29
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Sep.
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition" IEEE Trans. Multimedia, vol.2, no.3, pp. 141-151, Sep. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

30
- 0034825241
- Multi-stream adaptive evidence combination for noise robust ASR
- Apr.
- A. Morris, A. Hagen, H. Glotin, and H. Bourlard, "Multi-stream adaptive evidence combination for noise robust ASR" Speech Commun., vol.34, no.1-2, pp. 25-40, Apr. 2001.
- (2001) Speech Commun. , vol.34 , Issue.1-2 , pp. 25-40
- Morris, A.¹ Hagen, A.² Glotin, H.³ Bourlard, H.⁴

31
- 0025681008
- Hidden Markov model decomposition of speech and noise
- Apr.
- A. P. Varga and R. K. Moore, "Hidden Markov model decomposition of speech and noise" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr. 1990, vol.2, pp. 845-848.
- (1990) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2 , pp. 845-848
- Varga, A.P.¹ Moore, R.K.²

32
- 0036297183
- A coupled HMM for audio-visual speech recognition
- A. V. Nefian, L. Liang, X. Pi, L. Xiaoxiang, C. Mao, and K. Murphy, "A coupled HMM for audio-visual speech recognition" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2002, vol.2, pp. 2013-2016.
- (2002) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2 , pp. 2013-2016
- Nefian, A.V.¹ Liang, L.² Pi, X.³ Xiaoxiang, L.⁴ Mao, C.⁵ Murphy, K.⁶

33
- 51449122700
- Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition"
- S. T. Shivappa, ". D. Rao, and M. M. Trivedi, "Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2008, pp. 2241-2244.
- (2008) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 2241-2244
- Shivappa, S.T.¹ Rao, S.D.² Trivedi, M.M.³

34
- 33646007315
- Multimodal person recognition for human-vehicle interaction
- Apr./Jun.
- E. Erzin, Y. Yemez, A. M. Tekalp, A. Ercil, H. Erdogan, and H. Abut, "Multimodal person recognition for human-vehicle interaction" IEEE Multimedia Mag., vol.13, no.2, pp. 18-31, Apr./Jun. 2006.
- (2006) IEEE Multimedia Mag. , vol.13 , Issue.2 , pp. 18-31
- Erzin, E.¹ Yemez, Y.² Tekalp, A.M.³ Ercil, A.⁴ Erdogan, H.⁵ Abut, H.⁶

35
- 36248953326
- Audiovisual head orientation estimation with particle filtering in multisensor scenarios
- Jan. article no. 32
- C. Canton-Ferrer, C. Segura, J. R. Casas, M. Pardaÿs, and J. Hernando, "Audiovisual head orientation estimation with particle filtering in multisensor scenarios" EURASIP J. Adv. Signal Process., vol.2008, Jan. 2008, article no. 32.
- (2008) EURASIP J. Adv. Signal Process. , vol.2008
- Canton-Ferrer, C.¹ Segura, C.² Casas, J.R.³ Pardaÿs, M.⁴ Hernando, J.⁵

36
- 35348860213
- Enabling multimodal humanrobot interaction for the Karlsruhe humanoid robot
- Oct.
- R. Stiefelhagen, H. K. Ekenel, C. Fugen, P. Gieselmann, H. Holzapfel, F. Kraft, K. Nickel, M. Voit, and A. Waibel, "Enabling multimodal humanrobot interaction for the Karlsruhe humanoid robot" IEEE Trans. Robot., vol.23, no.5, pp. 840-851, Oct. 2007.
- (2007) IEEE Trans. Robot. , vol.23 , Issue.5 , pp. 840-851
- Stiefelhagen, R.¹ Ekenel, H.K.² Fugen, C.³ Gieselmann, P.⁴ Holzapfel, H.⁵ Kraft, F.⁶ Nickel, K.⁷ Voit, M.⁸ Waibel, A.⁹

37
- 60849111400
- Person tracking with audio-visual cues using the iterative decoding framework
- S. T. Shivappa, M. M. Trivedi, and B. D. Rao, "Person tracking with audio-visual cues using the iterative decoding framework" in Proc. 5th IEEE Int. Conf. Adv. Video Signal Based Surveillance, 2008, pp. 260-267.
- (2008) Proc. 5th IEEE Int. Conf. Adv. Video Signal Based Surveillance , pp. 260-267
- Shivappa, S.T.¹ Trivedi, M.M.² Rao, B.D.³

38
- 32344434992
- A joint particle filter for audio-visual speaker tracking
- K. Nickel, T. Gehrig, R. Stiefelhagen, and J. McDonough, "A joint particle filter for audio-visual speaker tracking" in Proc. 7th Int. Conf. Multimodal Interfaces, 2005, pp. 61-68.
- (2005) Proc. 7th Int. Conf. Multimodal Interfaces , pp. 61-68
- Nickel, K.¹ Gehrig, T.² Stiefelhagen, R.³ McDonough, J.⁴

39
- 77956964676
- Multi-level particle filter fusion of features and cues for audio-visual person tracking Multimodal Technologies for Perception of Humans
- Berlin, Germany: Springer-Verlag
- K. Bernardin, T. Gehrig, and R. Stiefelhagen, "Multi-level particle filter fusion of features and cues for audio-visual person tracking Multimodal Technologies for Perception of Humans, vol. 4625. Berlin, Germany: Springer-Verlag, 2007, ser. Lecture Notes in Computer Science.
- (2007) Ser. Lecture Notes in Computer Science , vol.4625
- Bernardin, K.¹ Gehrig, T.² Stiefelhagen, R.³

40
- 44849121369
- CASSANDRA: Audio-video sensor fusion for aggression detection
- W. Zajdel, J. Krijnders, T. Andringa, and D. Gavrila, "CASSANDRA: Audio-video sensor fusion for aggression detection" in Proc. 4th IEEE Int. Conf. Adv. Video Signal Based Surveillance, 2007, pp. 200-205.
- (2007) Proc. 4th IEEE Int. Conf. Adv. Video Signal Based Surveillance , pp. 200-205
- Zajdel, W.¹ Krijnders, J.² Andringa, T.³ Gavrila, D.⁴

41
- 38749151899
- An iterative decoding algorithm for fusion of multi-modal information
- Special Issue on Human-Activity Analysis in Multimedia Data, 2008, DOI: 10.1155/2008/478396, article id 478396
- S. T. Shivappa, ". D. Rao, and M. M. Trivedi, "An iterative decoding algorithm for fusion of multi-modal information" Eurasip J. Adv. Signal Process., vol. 2008, Special Issue on Human-Activity Analysis in Multimedia Data, 2008, DOI: 10.1155/2008/478396, article id 478396.
- (2008) Eurasip J. Adv. Signal Process.
- Shivappa, S.T.¹ Rao, S.D.² Trivedi, M.M.³

42
- 63449120701
- Dynamic modality weighting for multi-stream HMMs in audio-visual speech recognition
- M. Gurban, J.-P. Thiran, T. Drugman, and T. Dutoit, "Dynamic modality weighting for multi-stream HMMs in audio-visual speech recognition" in Proc. 10th Int. Conf. Multimodal Interfaces, 2008, pp. 237-240.
- (2008) Proc. 10th Int. Conf. Multimodal Interfaces , pp. 237-240
- Gurban, M.¹ Thiran, J.-P.² Drugman, T.³ Dutoit, T.⁴

43
- 0027297425
- Near Shannon limit error-correcting coding and decoding: Turbo-codes
- May
- C. Berrou, A. Glavieux, and P. Thitimajshima, "Near Shannon limit error-correcting coding and decoding: Turbo-codes" in Proc. IEEE Int. Conf. Commun., May 1993, vol.2, pp. 1064-1070.
- (1993) Proc. IEEE Int. Conf. Commun. , vol.2 , pp. 1064-1070
- Berrou, C.¹ Glavieux, A.² Thitimajshima, P.³

44
- 33750368310
- An audio-visual corpus for speech perception and automatic speech recognition
- Nov.
- M. Cooke, J. Barker, S. Cunningham, and X. Shao, "An audio-visual corpus for speech perception and automatic speech recognition" J. Acoust. Soc. Amer., vol.120, no.5, pp. 2421-2424, Nov. 2006.
- (2006) J. Acoust. Soc. Amer. , vol.120 , Issue.5 , pp. 2421-2424
- Cooke, M.¹ Barker, J.² Cunningham, S.³ Shao, X.⁴

45
- 38049168628
- Berlin, Germany: Springer-Verlag ser. Lecture Notes in Computer Science
- M. Liu, H. Tang, H. Ning, and T. S. Huang, "Person identification based on multichannel and multimodality fusion," Multimodal Technologies for Perception of Humans, vol. 4122. Berlin, Germany: Springer-Verlag, 2006, ser. Lecture Notes in Computer Science, pp. 241-248.
- (2006) Person Identification Based on Multichannel and Multimodality Fusion S Multimodal Technologies for Perception of Humans , vol.4122 , pp. 241-248
- Liu, M.¹ Tang, H.² Ning, H.³ Huang, T.S.⁴

46
- 38049165243
- A decision fusion system across time and classifiers for audio-visual person identification
- Berlin, Germany: Springer-Verlag, 2006, ser. Lecture Notes in Computer Science
- A. Stergiou, A. Pnevmatikakis, and L. Polymenakos, "A decision fusion system across time and classifiers for audio-visual person identification," Multimodal Technologies for Perception of Humans, vol. 4122. Berlin, Germany: Springer-Verlag, 2006, ser. Lecture Notes in Computer Science, pp. 223-232.
- Multimodal Technologies for Perception of Humans , vol.4122 , pp. 223-232
- Stergiou, A.¹ Pnevmatikakis, A.² Polymenakos, L.³

47
- 33750541017
- Audio-visual affect recognition in activation-evaluation space
- Jul. DOI: 10.1109/ICME.2005.1521551
- Z. Zeng, Z. Zhang, ". Pianfetti, J. Tu, and T. S. Huang, "Audio-visual affect recognition in activation-evaluation space" in Proc. IEEE Int. Conf. Multimedia Expo, Jul. 2005, DOI: 10.1109/ICME.2005. 1521551.
- (2005) Proc. IEEE Int. Conf. Multimedia Expo
- Zeng, Z.¹ Zhang, Z.² Pianfetti, S.³ Tu, J.⁴ Huang, T.S.⁵

48
- 57549101447
- Audiovisual synchronization and fusion using canonical correlation analysis
- Nov.
- M. E. Sargin, Y. Yemez, E. Erzin, and A. M. Tekalp, "Audiovisual synchronization and fusion using canonical correlation analysis" IEEE Trans. Multimedia, vol.9, no.7, pp. 1396-1403, Nov. 2007.
- (2007) IEEE Trans. Multimedia , vol.9 , Issue.7 , pp. 1396-1403
- Sargin, M.E.¹ Yemez, Y.² Erzin, E.³ Tekalp, A.M.⁴

49
- 73049098610
- Audio-visual group recognition using diffusion maps
- Jan.
- Y. Keller, S. Lafon, R. Coifman, and S. Zucker, "Audio-visual group recognition using diffusion maps" IEEE Trans. Signal Process., vol.58, no.1, pp. 403-413, Jan. 2009.
- (2009) IEEE Trans. Signal Process. , vol.58 , Issue.1 , pp. 403-413
- Keller, Y.¹ Lafon, S.² Coifman, R.³ Zucker, S.⁴

50
- 0037475135
- A survey of socially interactive robots
- Mar.
- T. Fong, I. Nourbakhsh, and K. Dautenhahn, "A survey of socially interactive robots" Robot. Autonom. Syst., vol.42, no.3-4, pp. 143-166, Mar. 2002.
- (2002) Robot. Autonom. Syst. , vol.42 , Issue.3-4 , pp. 143-166
- Fong, T.¹ Nourbakhsh, I.² Dautenhahn, K.³

51
- 38049168159
- Towards communicating agents and avatars in virtual worlds
- A. Nijholt and G. Hondorp, "Towards communicating agents and avatars in virtual worlds" in Proc. Eurographics, 2000, pp. 91-95.
- (2000) Proc. Eurographics , pp. 91-95
- Nijholt, A.¹ Hondorp, G.²

52
- 77956966274
- Towards multi-modal interactions in virtual environments: A case study
- A. Nijholt, "Towards multi-modal interactions in virtual environments: A case study" in Actas-I, VI Simposio Internacional de Communicacion Social, 1999, pp. 25-28.
- (1999) Actas-I, VI Simposio Internacional de Communicacion Social , pp. 25-28
- Nijholt, A.¹

53
- 85032751879
- Sensitive talking heads
- T. S. Huang, M. A. Hasegawa-Johnson, S. M. Chu, and Z. Zeng, "Sensitive talking heads" IEEE Signal Process. Mag., vol.26, no.4, pp. 67-72, 2009.
- (2009) IEEE Signal Process. Mag. , vol.26 , Issue.4 , pp. 67-72
- Huang, T.S.¹ Hasegawa-Johnson, M.A.² Chu, S.M.³ Zeng, Z.⁴

54
- 0004000494
- Cambridge, MA: MIT Press, May
- C. Breazeal, Designing Sociable Robots. Cambridge, MA: MIT Press, May 2002.
- (2002) Designing Sociable Robots
- Breazeal, C.¹

55
- 0035439178
- Active vision for sociable robots
- Sep.
- C. Breazeal, A. Edsinger, P. Fitzpatrick, and B. Scassellati, "Active vision for sociable robots" IEEE Trans. Syst. Man Cybern. A, Syst. Human, vol.31, no.5, pp. 443-453, Sep. 2001.
- (2001) IEEE Trans. Syst. Man Cybern. A, Syst. Human , vol.31 , Issue.5 , pp. 443-453
- Breazeal, C.¹ Edsinger, A.² Fitzpatrick, P.³ Scassellati, B.⁴

56
- 34548725499
- A survey on context-aware systems
- Jun.
- F. R. M. Baldauf and S. Dustdar, "A survey on context-aware systems" Int. J. Ad Hoc Ubiquitous Comput., vol.2, no.4, pp. 263-277, Jun. 2007.
- (2007) Int. J. Ad Hoc Ubiquitous Comput. , vol.2 , Issue.4 , pp. 263-277
- Baldauf, F.R.M.¹ Dustdar, S.²

57
- 11844267204
- Dynamic context capture and distributed video arrays for intelligent spaces
- Jan.
- M. M. Trivedi, K. S. Huang, and I. Mikic, "Dynamic context capture and distributed video arrays for intelligent spaces" IEEE Trans. Syst. Man Cybern. A, Syst, Humans, vol.35, no.1, pp. 145-163, Jan. 2005.
- (2005) IEEE Trans. Syst. Man Cybern. A, Syst, Humans , vol.35 , Issue.1 , pp. 145-163
- Trivedi, M.M.¹ Huang, K.S.² Mikic, I.³

58
- 57849101125
- Detecting small group activities from multimodal observations
- O. Brdiczka, J. Maisonnasse, P. Reignier, and J. Crowley, "Detecting small group activities from multimodal observations" Appl. Intell., vol.30, no.1, pp. 47-57, 2007.
- (2007) Appl. Intell. , vol.30 , Issue.1 , pp. 47-57
- Brdiczka, O.¹ Maisonnasse, J.² Reignier, P.³ Crowley, J.⁴

59
- 32444450538
- Automatic detection of interaction groups
- O. Brdiczka, J. Maisonnasse, and P. Reignier, "Automatic detection of interaction groups" in Proc. 7th Int. Conf. Multimodal Interfaces, 2005, pp. 32-36.
- (2005) Proc. 7th Int. Conf. Multimodal Interfaces , pp. 32-36
- Brdiczka, O.¹ Maisonnasse, J.² Reignier, P.³

60
- 0032298772
- Integrating natural language and gesture in a robotics domain
- Sep.
- D. Perzanowski, A. C. Schultz, and W. Adams, "Integrating natural language and gesture in a robotics domain" in Proc. IEEE Int. Symp. Comput. Intell. Robot. Autom., Sep. 1998, pp. 247-252.
- (1998) Proc. IEEE Int. Symp. Comput. Intell. Robot. Autom. , pp. 247-252
- Perzanowski, D.¹ Schultz, A.C.² Adams, W.³

61
- 0035558140
- Multi-modal human robot interaction for map generation
- S. S. Ghidary, Y. Nakata, H. Saito, M. Hattori, and T. Takamori, "Multi-modal human robot interaction for map generation" in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., 2001, vol.4, pp. 2246-2251.
- (2001) Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. , vol.4 , pp. 2246-2251
- Ghidary, S.S.¹ Nakata, Y.² Saito, H.³ Hattori, M.⁴ Takamori, T.⁵

62
- 64149093817
- Audiovisual probabilistic tracking of multiple speakers in meetings
- Feb.
- D. Gatica-Perez, G. Lathoud, J. Odobez, and I. McCowan, "Audiovisual probabilistic tracking of multiple speakers in meetings" IEEE Trans. Audio Speech Lang. Process., vol.15, no.2, pp. 601-616, Feb. 2007.
- (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.2 , pp. 601-616
- Gatica-Perez, D.¹ Lathoud, G.² Odobez, J.³ McCowan, I.⁴

63
- 4544347587
- Multiple person and speaker activity tracking with a particle filter
- N. Checka, K. W. Wilson, M. R. Siracusa, and T. Darrell, "Multiple person and speaker activity tracking with a particle filter" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2004, vol.5, pp. 881-884.
- (2004) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.5 , pp. 881-884
- Checka, N.¹ Wilson, K.W.² Siracusa, M.R.³ Darrell, T.⁴

64
- 21244492850
- Real-time speaker tracking using particle filter sensor fusion
- Mar.
- Y. Chen and Y. Rui, "Real-time speaker tracking using particle filter sensor fusion" Proc. IEEE, vol.92, no.3, pp. 485-494, Mar. 2004.
- (2004) Proc. IEEE , vol.92 , Issue.3 , pp. 485-494
- Chen, Y.¹ Rui, Y.²

65
- 0034507915
- Look whos talking: Speaker detection using video and audio correlation
- R. Cutler and L. S. Davis, "Look whos talking: Speaker detection using video and audio correlation" in Proc. IEEE Int. Conf. Multimedia Expo (III), 2000, vol.3, pp. 1589-1592.
- (2000) Proc. IEEE Int. Conf. Multimedia Expo (III) , vol.3 , pp. 1589-1592
- Cutler, R.¹ Davis, L.S.²

66
- 0042349407
- A graphical model for audiovisual object tracking
- Jul.
- M. Beal, N. Jojic, and H. Attias, "A graphical model for audiovisual object tracking" IEEE Trans. Pattern Anal. Mach. Intell., vol.25, no.7, pp. 828-836, Jul. 2003.
- (2003) IEEE Trans. Pattern Anal. Mach. Intell. , vol.25 , Issue.7 , pp. 828-836
- Beal, M.¹ Jojic, N.² Attias, H.³

67
- 0017199877
- Hearing lips and seeing voices
- Dec. DOI: 10.1038/ 264746a0
- H. McGurk and J. MacDonald, "Hearing lips and seeing voices" Lett. Nature, vol.264, pp. 746-748, Dec. 1976, DOI: 10.1038/ 264746a0.
- (1976) Lett. Nature , vol.264 , pp. 746-748
- McGurk, H.¹ MacDonald, J.²

68
- 77956960200
- D. G. Stork and M. E. Hennecke, Eds., Berlin, Germany: Springer-Verlag
- D. G. Stork and M. E. Hennecke, Eds., Speechreading by Machines and Humans. Berlin, Germany: Springer-Verlag, 1996.
- (1996) Speechreading by Machines and Humans

69
- 0003699540
- Ph.D. dissertation, Dept. Electr. Comput. Eng., Univ. Illinois at Urbana-Champaign, Urbana, IL
- E. D. Petajan, "Automatic lipreading to enhance speech recognition (speech reading)" Ph.D. dissertation, Dept. Electr. Comput. Eng., Univ. Illinois at Urbana-Champaign, Urbana, IL, 1984.
- (1984) Automatic Lipreading to Enhance Speech Recognition (Speech Reading)
- Petajan, E.D.¹

70
- 84893671339
- An improved automatic lipreading system to enhance speech recognition
- E. Petajan, Ö. Bischoff, D. Bodoff, and N. M. Brooke, "An improved automatic lipreading system to enhance speech recognition" in Proc. SIGCHI Conf. Human Factors Comput. Syst., 1988, pp. 19-25.
- (1988) Proc. SIGCHI Conf. Human Factors Comput. Syst. , pp. 19-25
- Petajans, E.¹ Bischoff, Ö.² Bodoff, D.³ Brooke, N.M.⁴

71
- 0024767890
- Integration of acoustic and visual speech signals using neural networks
- Nov.
- B. Yuhas, M. Goldstein, and T. Sejnowski, "Integration of acoustic and visual speech signals using neural networks" IEEE Commun. Mag., vol.27, no.11, pp. 65-71, Nov. 1989.
- (1989) IEEE Commun. Mag. , vol.27 , Issue.11 , pp. 65-71
- Yuhas, B.¹ Goldstein, M.² Sejnowski, T.³

72
- 84943272400
- Bimodal sensor integration on the example of speech-reading
- C. Bregler, S. Manke, and A. Waibel, "Bimodal sensor integration on the example of speech-reading" in Proc. IEEE Int. Conf. Neural Netw., 1993, vol.2, pp. 667-671.
- (1993) Proc. IEEE Int. Conf. Neural Netw. , vol.2 , pp. 667-671
- Bregler, C.¹ Manke, S.² Waibel, A.³

73
- 0001432664
- On the integration of auditory and visual parameters in an HMM-based ASR
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag
- A. Adjoudani and C. Benoit, "On the integration of auditory and visual parameters in an HMM-based ASR," Speechreading by Humans and Machines: Systems and Applications, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer-Verlag, 1996, pp. 461-472.
- (1996) Speechreading by Humans and Machines: Systems and Applications , pp. 461-472
- Adjoudani, A.¹ Benoit, C.²

74
- 0030362752
- Speechreading using shape and intensity information
- Oct.
- J. Luettin, N. A. Thacker, and S. W. Beet, "Speechreading using shape and intensity information" in Proc. 4th Int. Conf. Spoken Lang., Oct. 1996, vol.1, pp. 58-61.
- (1996) Proc. 4th Int. Conf. Spoken Lang. , vol.1 , pp. 58-61
- Luettin, J.¹ Thacker, N.A.² Beet, S.W.³

75
- 85133531952
- Speaker independent audio-visual database for bimodal ASR
- G. Potamianos, E. Cosatto, H. Graf, and D. Roe, "Speaker independent audio-visual database for bimodal ASR" in Proc. Eur. Tut. Work. Audio-Visual Speech Process., 1997, pp. 65-68.
- (1997) Proc. Eur. Tut. Work. Audio-Visual Speech Process. , pp. 65-68
- Potamianos, G.¹ Cosatto, E.² Graf, H.³ Roe, D.⁴

76
- 0004244302
- Englewood Cliffs, NJ: Prentice-Hall
- L. Rabiner and B. Juang, Fundamentals of Speech Recognition. Englewood Cliffs, NJ: Prentice-Hall, 1993.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.²

77
- 0030355935
- A new ASR approach based on independent processing and recombination of partial frequency bands
- H. Bourlard and S. Dupont, "A new ASR approach based on independent processing and recombination of partial frequency bands" in Proc. Int. Conf. Spoken Lang., 1996, vol.1, pp. 426-429.
- (1996) Proc. Int. Conf. Spoken Lang. , vol.1 , pp. 426-429
- Bourlard, H.¹ Dupont, S.²

78
- 0031625499
- Discriminative model combination
- May
- P. Beyerlein, "Discriminative model combination" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 1998, vol.1, pp. 481-484.
- (1998) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 481-484
- Beyerlein, P.¹

79
- 0031624666
- Discriminative training of HMM stream exponents for audio-visual speech recognition
- G. Potamianos and H. Graf, "Discriminative training of HMM stream exponents for audio-visual speech recognition" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 1998, vol.6, pp. 3733-3736.
- (1998) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.6 , pp. 3733-3736
- Potamianos, G.¹ Graf, H.²

80
- 85009154155
- Stream weight optimization of speech and lip image sequence for audio-visual speech recognition
- S. Nakamura, H. Ito, and K. Shikano, "Stream weight optimization of speech and lip image sequence for audio-visual speech recognition" in Proc. Int. Conf. Spoken Lang., 2000, vol.3, pp. 20-24.
- (2000) Proc. Int. Conf. Spoken Lang. , vol.3 , pp. 20-24
- Nakamura, S.¹ Ito, H.² Shikano, K.³

81
- 17344376380
- Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR
- G. Gravier, S. Axelrod, G. Potamianos, and C. Neti, "Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2002, vol.1, pp. 853-1-853-6.
- (2002) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.1 , pp. 8531-8536
- Gravier, G.¹ Axelrod, S.² Potamianos, G.³ Neti, C.⁴

82
- 0036874999
- Dynamic Bayesian networks for audio-visual speech recognition
- A. V. Nefian, L. Liang, X. Pi, L. Xiaoxiang, and K. Murphy, "Dynamic Bayesian networks for audio-visual speech recognition" EURASIP J. Appl. Signal Process., vol.2002, no.1, pp. 1274-1288, 2002.
- (2002) EURASIP J. Appl. Signal Process. , vol.2002 , Issue.1 , pp. 1274-1288
- Nefian, A.V.¹ Liang, L.² Pi, X.³ Xiaoxiang, L.⁴ Murphy, K.⁵

83
- 0004158153
- Ph.D. dissertation, Dept. Comput. Sci., Univ. California, "erkeley, CA
- G. G. Zweig, "Speech recognition with dynamic Bayesian networks" Ph.D. dissertation, Dept. Comput. Sci., Univ. California, "erkeley, CA, 1998.
- (1998) Speech Recognition with Dynamic Bayesian Networks
- Zweig, G.G.¹

84
- 85009135204
- Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables
- T. Stephenson, H. Bourlard, S. Bengio, and A. Morris, "Automatic speech recognition using dynamic Bayesian networks with both acoustic and articulatory variables" in Proc. Int. Conf. Spoken Lang., 2000, vol.II, pp. 951-954.
- (2000) Proc. Int. Conf. Spoken Lang. , vol.2 , pp. 951-954
- Stephenson, T.¹ Bourlard, H.² Bengio, S.³ Morris, A.⁴

85
- 4544343002
- DBN based multi-stream models for audio-visual speech recognition
- May
- J. Gowdy, A. Subramanya, C. Bartels, and J. Bilmes, "DBN based multi-stream models for audio-visual speech recognition" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2004, pp. 993-996.
- (2004) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 993-996
- Gowdy, J.¹ Subramanya, A.² Bartels, C.³ Bilmes, J.⁴

86
- 10444270547
- Boosted audio-visual HMM for speech reading
- Oct.
- P. Yin, I. Essa, and J. M. Rehg, "Boosted audio-visual HMM for speech reading" in Proc. IEEE Int. Workshop Anal. Model. Faces Gestures, Oct. 2003, pp. 68-73.
- (2003) Proc. IEEE Int. Workshop Anal. Model. Faces Gestures , pp. 68-73
- Yin, P.¹ Essa, I.² Rehg, J.M.³

87
- 79958751716
- Audio-visual speech recognition with a hybrid SVM-HMM system
- Antalya, Turkey, Sep.
- M. Gurban and J. Thiran, "Audio-visual speech recognition with a hybrid SVM-HMM system" in Proc. 13th Eur. Signal Process. Conf., Antalya, Turkey, Sep. 2005.
- (2005) Proc. 13th Eur. Signal Process. Conf.
- Gurban, M.¹ Thiran, J.²

88
- 84963813655
- Audiovisual arrays for untethered spoken interfaces
- K. Wilson, V. Rangarajan, N. Checka, and T. Darrell, "Audiovisual arrays for untethered spoken interfaces" in Proc. 4th IEEE Int. Conf. Multimodal Interfaces, 2002, pp. 389-394.
- (2002) Proc. 4th IEEE Int. Conf. Multimodal Interfaces , pp. 389-394
- Wilson, K.¹ Rangarajan, V.² Checka, N.³ Darrell, T.⁴

89
- 10244242647
- Detection and separation of speech event using audio and video information fusion and its application to robust speech interface
- F. Asano, K. Yamamoto, I. Hara, J. Ogata, T. Yoshimura, Y. Motomura, N. Ichimura, and H. Asoh, "Detection and separation of speech event using audio and video information fusion and its application to robust speech interface" EURASIP J. Appl. Signal Process., vol.2004, pp. 1727-1738, 2004.
- (2004) EURASIP J. Appl. Signal Process. , vol.2004 , pp. 1727-1738
- Asano, F.¹ Yamamoto, K.² Hara, I.³ Ogata, J.⁴ Yoshimura, T.⁵ Motomura, Y.⁶ Ichimura, N.⁷ Asoh, H.⁸

90
- 0346707503
- Source localization in reverberant environments: Modeling and statistical analysis
- Nov.
- T. Gustafsson, ". D. Rao, and M. M. Trivedi, "Source localization in reverberant environments: Modeling and statistical analysis" IEEE Trans. Speech Audio Process., vol. 11, no. 6, pp. 791-803, Nov. 2003.
- (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.6 , pp. 791-803
- Gustafsson, T.¹ Rao, S.D.² Trivedi, M.M.³

91
- 85132038963
- Neural network lipreading system for improved speech recognition
- Jun
- D. G. Stork, G. Wolff, and E. Levine, "Neural network lipreading system for improved speech recognition" in Proc. Int. Joint Conf. Neural Netw., Jun. 1992, vol.2, pp. 289-295.
- (1992) Proc. Int. Joint Conf. Neural Netw. , vol.2 , pp. 289-295
- Stork, D.G.¹ Wolff, G.² Levine, E.³

92
- 0030376248
- Robust audiovisual integration using semicontinuous hidden Markov models
- Q. Su and P. L. Silsbee, "Robust audiovisual integration using semicontinuous hidden Markov models" in Proc. Int. Conf. Spoken Lang., 1996, vol.1, pp. 42-45.
- (1996) Proc. Int. Conf. Spoken Lang. , vol.1 , pp. 42-45
- Su, Q.¹ Silsbee, P.L.²

93
- 33846013241
- Object tracking: A survey
- article no. 13
- A. Yilmaz, O. Javed, and M. Shah, "Object tracking: A survey" ACM Comput. Surv., vol.38, no.4, 2006, article no. 13.
- (2006) ACM Comput. Surv. , vol.38 , Issue.4
- Yilmaz, A.¹ Javed, O.² Shah, M.³

94
- 0003343412
- Robust localization in reverberant rooms
- New York: Springer-Verlag
- J. H. DiBiase, H. F. Silverman, and M. S. Branstein, "Robust localization in reverberant rooms," Microphone Arrays: Signal Processing Techniques and Applications. New York: Springer-Verlag, 2001.
- (2001) Microphone Arrays: Signal Processing Techniques and Applications
- Dibiase, J.H.¹ Silverman, H.F.² Branstein, M.S.³

95
- 0033279153
- Audio-visual tracking for natural interactivity
- G. Pingali, G. Tunali, and I. Carlbom, "Audio-visual tracking for natural interactivity" in Proc. 7th ACM Int. Conf. Multimedia (Part 1), 1999, pp. 373-382.
- (1999) Proc. 7th ACM Int. Conf. Multimedia (Part 1) , pp. 373-382
- Pingali, G.¹ Tunali, G.² Carlbom, I.³

96
- 0035458007
- Robust sound localization using multi-source audiovisual information fusion
- Sep.
- S. G. Z. P. Aarabi, "Robust sound localization using multi-source audiovisual information fusion" Inf. Fusion, vol.2, no.3, pp. 209-223, Sep. 2001.
- (2001) Inf. Fusion , vol.2 , Issue.3 , pp. 209-223
- Aarabi, S.G.Z.P.¹

97
- 84880877816
- Real-time auditory and visual multiple-object tracking for humanoids
- K. Nakadai, K. Hidai, H. Mizoguchi, H. G. Okuno, and H. Kitano, "Real-time auditory and visual multiple-object tracking for humanoids" in Proc. Int. Joint. Conf. Artif. Intell., 2001, pp. 1425-1432.
- (2001) Proc. Int. Joint. Conf. Artif. Intell. , pp. 1425-1432
- Nakadai, K.¹ Hidai, K.² Mizoguchi, H.³ Okuno, H.G.⁴ Kitano, H.⁵

98
- 0037774471
- Audiovisual localization of multiple speakers in a video teleconferencing setting
- B. Kapralos, M. R. M. Jenkin, and E. Milios, "Audiovisual localization of multiple speakers in a video teleconferencing setting," Int. J. Imag. Syst, Technol., vol. 13, pp. 95-105, 2002.
- (2002) Int. J. Imag. Syst, Technol. , vol.13 , pp. 95-105
- Kapralos, B.¹ Jenkin, M.R.M.² Milios, E.³

99
- 77956766546
- Audio visual fusion and tracking with multilevel iterative decoding: Framework and experimental evaluation
- DOI: 10.1109/ JSTSP.2010.2057890
- S. T. Shivappa, S. D. Rao, and M. M. Trivedi, "Audio visual fusion and tracking with multilevel iterative decoding: Framework and experimental evaluation" IEEE J. Sel. Top. Signal Process., 2010, DOI: 10.1109/ JSTSP.2010.2057890.
- (2010) IEEE J. Sel. Top. Signal Process.
- Shivappa, S.T.¹ Rao, S.D.² Trivedi, M.M.³

100
- 57349117033
- Coordinate-free calibration of an acoustically driven camera pointing system
- DOI: 10.1109/ICDSC. 2008.4635685
- E. Ettinger and Y. Freund, "Coordinate-free calibration of an acoustically driven camera pointing system" in Proc. Int. Conf. Distrib. Smart Cameras, 2008, DOI: 10.1109/ICDSC. 2008.4635685.
- (2008) Proc. Int. Conf. Distrib. Smart Cameras
- Ettinger, E.¹ Freund, Y.²

101
- 0009622481
- Learning joint statistical models for audio-visual fusion and segregation
- J. W. Fisher, T. Darrell, W. T. Freeman, and P. A. Viola, "Learning joint statistical models for audio-visual fusion and segregation" in Proc. Neural Inf. Process. Syst., 2000, pp. 772-778.
- (2000) Proc. Neural Inf. Process. Syst. , pp. 772-778
- Fisher, J.W.¹ Darrell, T.² Freeman, W.T.³ Viola, P.A.⁴

102
- 84899028297
- Audio vision: Using audiovisual synchrony to locate sounds
- J. Hershey and J. Movellan, "Audio vision: Using audiovisual synchrony to locate sounds" in Proc. Neural Inf. Process. Syst., 2000, pp. 813-819.
- (2000) Proc. Neural Inf. Process. Syst. , pp. 813-819
- Hershey, J.¹ Movellan, J.²

103
- 0034844366
- Sequential Monte Carlo fusion of sound and vision for speaker tracking
- J. Vermaak, M. Gangnet, A. Blake, and P. Perez, "Sequential Monte Carlo fusion of sound and vision for speaker tracking" in Proc. 8th IEEE Int. Conf. Comput. Vis., vol.1, pp. 741-746.
- Proc. 8th IEEE Int. Conf. Comput. Vis. , vol.1 , pp. 741-746
- Vermaak, J.¹ Gangnet, M.² Blake, A.³ Perez, P.⁴

104
- 0036874485
- Joint audio-visual tracking using particle filters
- Jan.
- D. N. Zotkin, R. Duraiswami, and L. S. Davis, "Joint audio-visual tracking using particle filters" EURASIP J. Appl. Signal Process., vol.2002, pp. 1154-1164, Jan. 2002.
- (2002) EURASIP J. Appl. Signal Process. , vol.2002 , pp. 1154-1164
- Zotkin, D.N.¹ Duraiswami, R.² Davis, L.S.³

105
- 0345565782
- Audio-visual speaker tracking with importance particle filters
- D. G. Perez, G. Lathoud, I. McCowan, J. M. Odobez, and D. Moore, "Audio-visual speaker tracking with importance particle filters," Proc. Int. Conf. Image Process., 2003, vol. 2, pp. 25-28.
- (2003) Proc. Int. Conf. Image Process. , vol.2 , pp. 25-28
- Perez, D.G.¹ Lathoud, G.² McCowan, I.³ Odobez, J.M.⁴ Moore, D.⁵

106
- 84905387860
- Multimodal human emotion/ expression recognition
- Apr.
- L. S. Chen, T. S. Huang, T. Miyasato, and R. Nakatsu, "Multimodal human emotion/ expression recognition" in Proc. 3rd IEEE Int. Conf. Autom. Face Gesture Recognit., Apr. 1998, pp. 366-371.
- (1998) Proc. 3rd IEEE Int. Conf. Autom. Face Gesture Recognit. , pp. 366-371
- Chen, L.S.¹ Huang, T.S.² Miyasato, T.³ Nakatsu, R.⁴

107
- 84905386639
- Bimodal emotion recognition
- L. C. De Silva and P. C. Ng, "Bimodal emotion recognition" in Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit., 2000, pp. 332-335.
- (2000) Proc. 4th IEEE Int. Conf. Autom. Face Gesture Recognit. , pp. 332-335
- De Silva, L.C.¹ Ng, P.C.²

108
- 0034586727
- Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face
- Y. Yoshitomi, K. Sung-Ill, T. Kawano, and T. Kilazoe, "Effect of sensor fusion for recognition of emotional states using voice, face image and thermal image of face" in Proc. 9th IEEE Int. Workshop Robot Human Interactive Commun., 2000, pp. 178-183.
- (2000) Proc. 9th IEEE Int. Workshop Robot Human Interactive Commun. , pp. 178-183
- Yoshitomi, Y.¹ Sung-Ill, K.² Kawano, T.³ Kilazoe, T.⁴

109
- 0034512820
- Emotional expressions in audiovisual human computer interaction
- L. S. Chen and T. S. Huang, "Emotional expressions in audiovisual human computer interaction" in Proc. IEEE Int. Conf. Multimedia Expo, 2000, vol.1, pp. 423-426.
- (2000) Proc. IEEE Int. Conf. Multimedia Expo , vol.1 , pp. 423-426
- Chen, L.S.¹ Huang, T.S.²

110
- 14944351245
- Analysis of emotion recognition using facial expressions, speech and multimodal information
- C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, and S. Narayanan, "Analysis of emotion recognition using facial expressions, speech and multimodal information" in Proc. 6th Int. Conf. Multimodal Interfaces, 2004, pp. 205-211.
- (2004) Proc. 6th Int. Conf. Multimodal Interfaces , pp. 205-211
- Busso, C.¹ Deng, Z.² Yildirim, S.³ Bulut, M.⁴ Lee, C.M.⁵ Kazemzadeh, A.⁶ Lee, S.⁷ Neumann, U.⁸ Narayanan, S.⁹

111
- 33646820042
- Bimodal fusion of emotional data in an automotive environment
- Mar.
- S. Hoch, F. Althoff, G. McGlaun, and G. Rigoll, "Bimodal fusion of emotional data in an automotive environment" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Mar. 2005, vol.2, pp. 1085-1088.
- (2005) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2 , pp. 1085-1088
- Hoch, S.¹ Althoff, F.² McGlaun, G.³ Rigoll, G.⁴

112
- 34047198092
- Emotion recognition based on joint visual and audio cues
- N. Sebe, I. Cohen, T. Gevers, and T. S. Huang, "Emotion recognition based on joint visual and audio cues" in Proc. 18th Int. Conf. Pattern Recognit., 2006, pp. 1136-1139.
- (2006) Proc. 18th Int. Conf. Pattern Recognit. , pp. 1136-1139
- Sebe, N.¹ Cohen, I.² Gevers, T.³ Huang, T.S.⁴

113
- 0031378649
- Facial emotion recognition using multi-modal information
- Sep.
- L. C. D. Silva, T. Miyasato, and R. Nakatsu, "Facial emotion recognition using multi-modal information" in Proc. Int. Conf. Inf. Commun. Signal Process., Sep. 1997, vol.1, pp. 397-401.
- (1997) Proc. Int. Conf. Inf. Commun. Signal Process. , vol.1 , pp. 397-401
- Silva, L.C.D.¹ Miyasato, T.² Nakatsu, R.³

114
- 34147125327
- Emotion recognition from the facial image and speech signal
- Aug.
- H. Go, K. Kwak, D. Lee, and M. Chun, "Emotion recognition from the facial image and speech signal" in Proc. SICE Annu. Conf., Aug. 2003, vol.3, pp. 2890-2895.
- (2003) Proc. SICE Annu. Conf. , vol.3 , pp. 2890-2895
- Go, H.¹ Kwak, K.² Lee, D.³ Chun, M.⁴

115
- 44049099067
- Audio-visual affective expression recognition through multistream fused hmm
- Jun.
- Z. Zeng, J. Tu, ". M. Pianfetti, and T. S. Huang, "Audio-visual affective expression recognition through multistream fused hmm" IEEE Trans. Multimedia, vol. 10, no. 4, pp. 570-577, Jun. 2008.
- (2008) IEEE Trans. Multimedia , vol.10 , Issue.4 , pp. 570-577
- Zeng, Z.¹ Tu, J.² Pianfetti, S.M.³ Huang, T.S.⁴

116
- 78049362386
- Decision level combination of multiple modalities for recognition and analysis of emotional expression
- A. Metallinou, S. Lee, and S. Narayanan, "Decision level combination of multiple modalities for recognition and analysis of emotional expression" in IEEE Int. Conf. Acoust. Speech Signal Process., 2010, pp. 2462-2465.
- IEEE Int. Conf. Acoust. Speech Signal Process. , vol.2010 , pp. 2462-2465
- Metallinou, A.¹ Lee, S.² Narayanan, S.³

117
- 0742290133
- An introduction to biometric recognition
- Jan.
- A. K. Jain, A. Ross, and S. Prabhakar, "An introduction to biometric recognition" IEEE Trans. Circuits Syst. Video Technol., vol.14, no.1, pp. 4-20, Jan. 2004.
- (2004) IEEE Trans. Circuits Syst. Video Technol. , vol.14 , Issue.1 , pp. 4-20
- Jain, A.K.¹ Ross, A.² Prabhakar, S.³

118
- 33947384963
- Audio-visual biometrics
- Nov.
- P. S. Aleksic and A. K. Katsaggelos, "Audio-visual biometrics" Proc. IEEE, vol.94, no.11, pp. 2025-2044, Nov. 2006.
- (2006) Proc. IEEE , vol.94 , Issue.11 , pp. 2025-2044
- Aleksic, P.S.¹ Katsaggelos, A.K.²

119
- 70350111306
- Biometric person authentication with liveness detection based on audio visual fusion
- G. Chetty andM. Wagner, "Biometric person authentication with liveness detection based on audio visual fusion" Int. J. Biometrics, vol.1, no.4, pp. 463-478, 2009.
- (2009) Int. J. Biometrics , vol.1 , Issue.4 , pp. 463-478
- Chetty, G.¹ Wagner, M.²

120
- 0035394653
- Adaptive fusion of speech and lip information for robust speaker identification
- Jul.
- T. Wark and S. Sridharan, "Adaptive fusion of speech and lip information for robust speaker identification" Digital Signal Process., vol.11, no.3, pp. 169-186, Jul. 2001.
- (2001) Digital Signal Process. , vol.11 , Issue.3 , pp. 169-186
- Wark, T.¹ Sridharan, S.²

121
- 4544228318
- Identity verification using speech and face information
- Sep.
- C. Sanderson and K. Paliwal, "Identity verification using speech and face information" Digital Signal Process., vol.14, no.5, pp. 449-480, Sep. 2004.
- (2004) Digital Signal Process. , vol.14 , Issue.5 , pp. 449-480
- Sanderson, C.¹ Paliwal, K.²

122
- 15044355748
- Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems
- Mar.
- R. Snelick, U. Uludag, A. Mink, M. Indovina, and A. Jain, "Large-scale evaluation of multimodal biometric authentication using state-of-the-art systems" IEEE Trans. Pattern Anal. Mach. Intell., vol.27, no.3, pp. 450-455, Mar. 2005.
- (2005) IEEE Trans. Pattern Anal. Mach. Intell. , vol.27 , Issue.3 , pp. 450-455
- Snelick, R.¹ Uludag, U.² Mink, A.³ Indovina, M.⁴ Jain, A.⁵

123
- 33947381924
- Voice and facial image integration for person recognition
- Southampton, U.K., Apr.
- C. Chibelushi, F. Deravi, and J. Mason, "Voice and facial image integration for person recognition" in Proc. IEEE Int. Symp. Multimedia Technol. Future Appl., Southampton, U.K., Apr. 21-23, 1993.
- (1993) Proc. IEEE Int. Symp. Multimedia Technol. Future Appl. , vol.21-23
- Chibelushi, C.¹ Deravi, F.² Mason, J.³

124
- 0029393187
- Person identification using multiple cues
- Oct.
- R. Brunelli and D. Falavigna, "Person identification using multiple cues" IEEE Trans. Pattern Anal. Mach. Intell., vol.17, no.10, pp. 955-966, Oct. 1995.
- (1995) IEEE Trans. Pattern Anal. Mach. Intell. , vol.17 , Issue.10 , pp. 955-966
- Brunelli, R.¹ Falavigna, D.²

125
- 84870683671
- Integrating acoustic and labial information for speaker identification and verification
- P. Jourlin, J. Luettin, D. Genoud, and H. Wassner, "Integrating acoustic and labial information for speaker identification and verification" in Proc. Eur. Conf. Speech Commun. Technol., 1997, pp. 1603-1606.
- (1997) Proc. Eur. Conf. Speech Commun. Technol. , pp. 1603-1606
- Jourlin, P.¹ Luettin, J.² Genoud, D.³ Wassner, H.⁴

126
- 0032638088
- Robust speaker verification via fusion of speech and lip modalities
- Mar.
- T. Wark, S. Sridharan, and V. Chandran, "Robust speaker verification via fusion of speech and lip modalities" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Mar. 1999, vol.6, pp. 3061-3064.
- (1999) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.6 , pp. 3061-3064
- Wark, T.¹ Sridharan, S.² Chandran, V.³

127
- 0032675797
- Audio-visual person verification
- DOI: 10.1109/ CVPR.1999.786997
- S. Ben-Yacoub, J. Luettin, K. Jonsson, J. Matas, and J. Kittler, "Audio-visual person verification" in Proc. Conf. Comput. Vis. Pattern Recognit., 1999, DOI: 10.1109/ CVPR.1999.786997.
- (1999) Proc. Conf. Comput. Vis. Pattern Recognit.
- Ben-Yacoub, S.¹ Luettin, J.² Jonsson, K.³ Matas, J.⁴ Kittler, J.⁵

128
- 26844533276
- Multimodal speaker identification using an adaptive classifier cascade based on modality reliability
- Oct.
- E. Erzin, Y. Yemez, and A. M. Tekalp, "Multimodal speaker identification using an adaptive classifier cascade based on modality reliability" IEEE Trans. Multimedia, vol.7, no.5, pp. 840-852, Oct. 2005.
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.5 , pp. 840-852
- Erzin, E.¹ Yemez, Y.² Tekalp, A.M.³

129
- 84962200855
- Activity monitoring and summarization for an intelligent meeting room
- M. M. Trivedi, K. S. Huang, and I. Mikic, "Activity monitoring and summarization for an intelligent meeting room" in Proc. IEEE Int. Workshop Human Motion, 2000, pp. 107-112.
- (2000) Proc. IEEE Int. Workshop Human Motion , pp. 107-112
- Trivedi, M.M.¹ Huang, K.S.² Mikic, I.³

130
- 77956953323
- Multimodal meeting tracker
- Paris, France, Apr.
- M. Bett, R. Gross, H. Yu, X. Zhu, Y. Pan, J. Yang, and A. Waibel, "Multimodal meeting tracker" in Proc. Int. Conf. Adapt. Personalization Fusion Heterogeneous Inf., Paris, France, Apr. 12-14, 2000.
- (2000) Proc. Int. Conf. Adapt. Personalization Fusion Heterogeneous Inf. , vol.12-14
- Bett, M.¹ Gross, R.² Yu, H.³ Zhu, X.⁴ Pan, Y.⁵ Yang, J.⁶ Waibel, A.⁷

131
- 0034515443
- Towards a multimodal meeting record
- R. Gross, M. Bett, H. Yue, X. J. Zhu, Y. Pan, J. Yang, and A. Waibel, "Towards a multimodal meeting record," Proc. Int. Conf. Multimedia Expo, 2000, vol. 3, pp. 1593-1596.
- (2000) Proc. Int. Conf. Multimedia Expo , vol.3 , pp. 1593-1596
- Gross, R.¹ Bett, M.² Yue, H.³ Zhu, X.J.⁴ Pan, Y.⁵ Yang, J.⁶ Waibel, A.⁷

132
- 0033698728
- Gesture, speech, and gaze cues for discourse segmentation
- F. Quek, D. McNeill, R. Bryll, C. Kirbas, H. Arslan, K. E. McCullough, N. Furuyama, and R. Ansari, "Gesture, speech, and gaze cues for discourse segmentation" in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2000, vol.2, pp. 247-254.
- (2000) Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , vol.2 , pp. 247-254
- Quek, F.¹ McNeill, D.² Bryll, R.³ Kirbas, C.⁴ Arslan, H.⁵ McCullough, K.E.⁶ Furuyama, N.⁷ Ansari, R.⁸

133
- 84932605936
- Modeling individual and group actions in meetings: A two-layer HMM framework
- Jun. DOI: 10.1109/CVPR.2004.125
- D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan, and G. Lathoud, "Modeling individual and group actions in meetings: A two-layer HMM framework" in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit., Jun. 2004, DOI: 10.1109/CVPR.2004.125.
- (2004) Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognit.
- Zhang, D.¹ Gatica-Perez, D.² Bengio, S.³ McCowan, I.⁴ Lathoud, G.⁵

134
- 33846227904
- Automatic meeting segmentation using dynamic bayesian networks
- Jan.
- A. Dielmann and S. Renals, "Automatic meeting segmentation using dynamic bayesian networks" IEEE Trans. Multimedia, vol.9, no.1, pp. 25-36, Jan. 2007.
- (2007) IEEE Trans. Multimedia , vol.9 , Issue.1 , pp. 25-36
- Dielmann, A.¹ Renals, S.²

135
- 48149111986
- Robust multi-modal group action recognition in meetings from disturbed videos with the asynchronous hidden Markov model
- M. Al-Hames, C. Lenz, S. Reiter, J. Schenk, F. Wallhoff, and G. Rigoll, "Robust multi-modal group action recognition in meetings from disturbed videos with the asynchronous hidden Markov model" in Proc. IEEE Int. Conf. Image Process., 2007, vol.2, pp. 213-216.
- (2007) Proc. IEEE Int. Conf. Image Process. , vol.2 , pp. 213-216
- Al-Hames, M.¹ Lenz, C.² Reiter, S.³ Schenk, J.⁴ Wallhoff, F.⁵ Rigoll, G.⁶

136
- 77956957378
- Multimodal corpora: From models of natural interaction to systems and applications
- Berlin, Germany: Springer-Verlag
- M. Kipp, J. C. Martin, P. Paggio, and D. Heylen, Multimodal Corpora: From Models of Natural Interaction to Systems and Applications. Berlin, Germany: Springer-Verlag, 2009, ser. Lecture Notes on Artificial Intelligence.
- (2009) Ser. Lecture Notes on Artificial Intelligence
- Kipp, M.¹ Martin, J.C.² Paggio, P.³ Heylen, D.⁴

137
- 85009275134
- The ISL meeting corpus: The impact of meeting type on speech style
- Denver, CO, Sep.
- S. Burger, V. MacLaren, and H. Yu, "The ISL meeting corpus: The impact of meeting type on speech style" in Proc. Int. Conf. Spoken Lang., Denver, CO, Sep. 16-20, 2002.
- (2002) Proc. Int. Conf. Spoken Lang. , vol.16-20
- Burger, S.¹ MacLaren, V.² Yu, H.³

138
- 0141856326
- Meetings about meetings: Research at ICSI on speech in multiparty conversations
- Apr.
- N. Morgan, D. Baron, S. Bhagat, H. Carvey, R. Dhillon, J. Edwards, D. Gelbart, A. Janin, A. Krupski, ". Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C. Wooters, "Meetings about meetings: Research at ICSI on speech in multiparty conversations" in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., Apr. 2003, vol. 4, pp. 740-743.
- (2003) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , vol.4 , pp. 740-743
- Morgan, N.¹ Baron, D.² Bhagat, S.³ Carvey, H.⁴ Dhillon, R.⁵ Edwards, J.⁶ Gelbart, D.⁷ Janin, A.⁸ Krupski, A.⁹ Peskin, S.¹⁰ Pfau, T.¹¹ Shriberg, E.¹² Stolcke, A.¹³ Wooters, C.¹⁴

139
- 84924134587
- The NIST meeting room pilot corpus
- Lisbon, Portugal, May 26-28
- J. Garofolo, C. Laprum, M. Michel, V. Stanford, and E. Tabassi, "The NIST meeting room pilot corpus" in Proc. Lang. Resource Evaluat. Conf., Lisbon, Portugal, May 26-28, 2004.
- (2004) Proc. Lang. Resource Evaluat. Conf.
- Garofolo, J.¹ Laprum, C.² Michel, M.³ Stanford, V.⁴ Tabassi, E.⁵

140
- 41349114281
- The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms
- Dec.
- D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. M. Chu, J. R. Casas, J. Turmo, L. Cristoferetti, F. Tobia, A. Pnevmatikakis, V. Mylonakis, F. Talantzis, S. Burger, R. Stiefelhagen, K. Bernardin, and C. Rochet, "The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms" J. Lang. Resources Evaluat., vol.41, no.3-4, pp. 389-407, Dec. 2007.
- (2007) J. Lang. Resources Evaluat. , vol.41 , Issue.3-4 , pp. 389-407
- Mostefa, D.¹ Moreau, N.² Choukri, K.³ Potamianos, G.⁴ Chu, S.M.⁵ Casas, J.R.⁶ Turmo, J.⁷ Cristoferetti, L.⁸ Tobia, F.⁹ Pnevmatikakis, A.¹⁰ Mylonakis, V.¹¹ Talantzis, F.¹² Burger, S.¹³ Stiefelhagen, R.¹⁴ Bernardin, K.¹⁵ Rochet, C.¹⁶

141
- 44449098665
- Vace multimodal meeting corpus
- Edinburgh, U.K., Jul.
- L. Chen, R. T. Rose, Y. Qiao, I. Kimbara, F. Parrill, T. X. Han, J. Tu, Z. Huang, M. Harper, Y. Xiong, D. Mcneill, R. Tuttle, and T. Huang, "Vace multimodal meeting corpus" in Proc. Workshop Mach. Learn. Multimodal Interaction, Edinburgh, U.K., Jul. 11-13, 2005.
- (2005) Proc. Workshop Mach. Learn. Multimodal Interaction , vol.11-13
- Chen, L.¹ Rose, R.T.² Qiao, Y.³ Kimbara, I.⁴ Parrill, F.⁵ Han, T.X.⁶ Tu, J.⁷ Huang, Z.⁸ Harper, M.⁹ Xiong, Y.¹⁰ McNeill, D.¹¹ Tuttle, R.¹² Huang, T.¹³

142
- 50449105545
- Interpretation of multiparty meetings: The AMI and AMIDA projects
- Trento, Italy, May
- S. Renals, T. Hain, and H. Bourlard, "Interpretation of multiparty meetings: The AMI and AMIDA projects" in Proc. Hands-Free Speech Commun. Microphone Arrays, Trento, Italy, May 6-8, 2008, pp. 115-118.
- (2008) Proc. Hands-Free Speech Commun. Microphone Arrays , vol.6-8 , pp. 115-118
- Renals, S.¹ Hain, T.² Bourlard, H.³

143
- 74049104832
- Detecting, tracking and interacting with people in a public space
- S. Cheamanunkul, E. Ettinger, M. Jacobsen, P. Lai, and Y. Freund, "Detecting, tracking and interacting with people in a public space" in Proc. Int. Conf. Multimodal Interfaces, 2009, pp. 79-86.
- (2009) Proc. Int. Conf. Multimodal Interfaces , pp. 79-86
- Cheamanunkul, S.¹ Ettinger, E.² Jacobsen, M.³ Lai, P.⁴ Freund, Y.⁵

144
- 56749181028
- Robust multimodal audio-visual processing for advanced context awareness in smart spaces
- Jan.
- A. Pnevmatikakis, J. Soldatos, F. Talantzis, and L. Polymenakos, "Robust multimodal audio-visual processing for advanced context awareness in smart spaces" Personal Ubiquitous Comput., vol.13, no.1, pp. 3-14, Jan. 2009.
- (2009) Personal Ubiquitous Comput. , vol.13 , Issue.1 , pp. 3-14
- Pnevmatikakis, A.¹ Soldatos, J.² Talantzis, F.³ Polymenakos, L.⁴

145
- 74049139967
- A speaker diarization method based on the probabilistic fusion of audio-visual location information
- K. Ishizuka, S. Araki, K. Otsuka, T. Nakatani, and M. Fujimoto, "A speaker diarization method based on the probabilistic fusion of audio-visual location information" in Proc. Int. Conf. Multimodal Interfaces, 2009, pp. 55-62.
- (2009) Proc. Int. Conf. Multimodal Interfaces , pp. 55-62
- Ishizuka, K.¹ Araki, S.² Otsuka, K.³ Nakatani, T.⁴ Fujimoto, M.⁵

146
- 84963812624
- Layered representations for human activity recognition
- Oct.
- N. Oliver, E. Horvitz, and A. Garg, "Layered representations for human activity recognition" in Proc. Int. Conf. Multimodal Interfaces, Oct. 2002, p. 3.
- (2002) Proc. Int. Conf. Multimodal Interfaces , pp. 3
- Oliver, N.¹ Horvitz, E.² Garg, A.³

147
- 0034245149
- A Bayesian computer vision system for modeling human interactions
- Aug.
- N. M. Oliver, ". Rosario, and A. Pentland, "A Bayesian computer vision system for modeling human interactions" IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 8, pp. 831-843, Aug. 2000.
- (2000) IEEE Trans. Pattern Anal. Mach. Intell. , vol.22 , Issue.8 , pp. 831-843
- Oliver, N.M.¹ Rosario, S.² Pentland, A.³

148
- 57549113338
- Context-aware computing for assistive meeting system
- article no. 4
- P. Dai and G. Xu, "Context-aware computing for assistive meeting system" in Proc. 1st Int. Conf. Pervasive Technol. Related Assistive Environ., 2008, article no. 4.
- (2008) Proc. 1st Int. Conf. Pervasive Technol. Related Assistive Environ.
- Dai, P.¹ Xu, G.²

149
- 70350645426
- Probabilistic integration of sparse audio-visual cues for identity tracking
- K. Bernardin, R. Stiefelhagen, and A. Waibel, "Probabilistic integration of sparse audio-visual cues for identity tracking" in Proc. 16th ACM Int. Conf. Multimedia, 2008, pp. 151-158.
- (2008) Proc. 16th ACM Int. Conf. Multimedia , pp. 151-158
- Bernardin, K.¹ Stiefelhagen, R.² Waibel, A.³

150
- 77956956457
- Feedback in head gestures and speech
- Malta, May
- P. Paggio and C. Navarretta, "Feedback in head gestures and speech" in Proc. LREC Workshop Multimodal Corpora: Adv. Capturing, Coding Analyzing Multimodality, Malta, May 18, 2010.
- Proc. LREC Workshop Multimodal Corpora: Adv. Capturing, Coding Analyzing Multimodality , vol.18 , pp. 2010
- Paggio, P.¹ Navarretta, C.²

151
- 34249701400
- A unified model of early word learning: Integrating statistical and social cues
- C. Yu and D. H. Ballard, "A unified model of early word learning: Integrating statistical and social cues" Neurocomputing, vol.70, no.13-15, pp. 2149-2165, 2007.
- (2007) Neurocomputing , vol.70 , Issue.13-15 , pp. 2149-2165
- Yu, C.¹ Ballard, D.H.²

152
- 34247550023
- Visually-guided attention enhances target identification in a complex auditory scene
- V. Best, E. J. Ozmeral, and B. G. Shinn-Cunningham, "Visually-guided attention enhances target identification in a complex auditory scene" J. Assoc. Res. Otolaryngol., vol.8, no.2, pp. 294-304, 2007.
- (2007) J. Assoc. Res. Otolaryngol. , vol.8 , Issue.2 , pp. 294-304
- Best, V.¹ Ozmeral, E.J.² Shinn-Cunningham, B.G.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.