SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 4, Issue 5, 2010, Pages 882-894

Audio-visual fusion and tracking with multilevel iterative decoding: Framework and experimental evaluation

(3) Shivappa, Shankar T a Rao, Bhaskar D a Trivedi, Mohan Manubhai a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

Hierarchical frameworks; human activity analysis; humancomputer interaction; iterative decoding; person tracking

Indexed keywords

AUDIO AND VISUAL CUES; AUDIO AND VISUAL INFORMATION; AUDIO APPLICATIONS; AUDIO-VISUAL; AUDIO-VISUAL FUSION; DATA SETS; EXPERIMENTAL EVALUATION; HIERARCHICAL FRAMEWORKS; HUMAN ACTIVITY ANALYSIS; HUMAN COMMUNICATIONS; HUMAN COMPUTER INTERFACES; INTELLIGENT SPACES; MICROPHONE ARRAYS; MICROPHONE CALIBRATION; NATURAL INTERFACES; NEW APPROACHES; PARTICLE FILTER; PARTICLE FILTERING; PERSON TRACKING; SENSOR CALIBRATION; SENSOR CONFIGURATIONS; VIDEO INFORMATION;

CALIBRATION; CAMERAS; COMPUTER VISION; MICROPHONES; NONLINEAR FILTERING; SENSORS; SPEECH COMMUNICATION; SPEECH RECOGNITION; VISUAL COMMUNICATION;

ITERATIVE DECODING;

EID: 77956766546 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2010.2057890 Document Type: Article

Times cited : (37)

References (44)

1
- 11844267204
- Dynamic context capture and distributed video arrays for intelligent spaces
- Jan
- M. M. Trivedi, K. S. Huang, and I. Mikic, "Dynamic context capture and distributed video arrays for intelligent spaces," IEEE Trans. Syst., Man, Cybern., A, vol.35, no.1, pp. 145-163, Jan. 2005.
- (2005) IEEE Trans. Syst., Man, Cybern., A , vol.35 , Issue.1 , pp. 145-163
- Trivedi, M.M.¹ Huang, K.S.² Mikic, I.³

2
- 4944224241
- Active camera networks and semantic event databases for intelligent environments
- M. M. Trivedi, I. Mikic, and S. Bhonsle, "Active camera networks and semantic event databases for intelligent environments," in Proc. IEEE CVPR Workshop Human Modeling, Anal., Synth., 2000.
- (2000) Proc. IEEE CVPR Workshop Human Modeling, Anal., Synth
- Trivedi, M.M.¹ Mikic, I.² Bhonsle, S.³

3
- 60849111400
- Person tracking with audio-visual cues using the iterative decoding framework
- S. T. Shivappa, M. M. Trivedi, and B. D. Rao, "Person tracking with audio-visual cues using the iterative decoding framework," in Proc. 5th IEEE Int. Conf. Adv. Video Signal Based Surveill., 2008, pp. 260-267.
- (2008) Proc. 5th IEEE Int. Conf. Adv. Video Signal Based Surveill , pp. 260-267
- Shivappa, S.T.¹ Trivedi, M.M.² Rao, B.D.³

4
- 4944221356
- Layered representations for learning and inferring office activity from multiple sensory channels
- N. Oliver, A. Garg, and E. Horvitz, "Layered representations for learning and inferring office activity from multiple sensory channels," Comput. Vis. Image Understand., vol.96, no.2, pp. 163-180, 2004.
- (2004) Comput. Vis. Image Understand. , vol.96 , Issue.2 , pp. 163-180
- Oliver, N.¹ Garg, A.² Horvitz, E.³

5
- 84932605936
- Modeling individual and group actions in meetings: A two-layer HMM framework
- Jun
- D. Zhang, D. Gatica-Perez, S. Bengio, I. McCowan, and G. Lathoud, "Modeling individual and group actions in meetings: A two-layer HMM framework," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognition, Jun. 2004, p. 117.
- (2004) Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognition , pp. 117
- Zhang, D.¹ Gatica-Perez, D.² Bengio, S.³ McCowan, I.⁴ Lathoud, G.⁵

6
- 0034245149
- A Bayesian computer vision system for modeling human interactions
- Aug
- N. M. Oliver, B. Rosario, and A. Pentland, "A Bayesian computer vision system for modeling human interactions," IEEE Trans. Pattern Anal. Mach. Intell., vol.22, no.8, pp. 831-843, Aug. 2000.
- (2000) IEEE Trans. Pattern Anal. Mach. Intell. , vol.22 , Issue.8 , pp. 831-843
- Oliver, N.M.¹ Rosario, B.² Pentland, A.³

7
- 57549113338
- Context-aware computing for assistive meeting system
- P. Dai and G. Xu, "Context-aware computing for assistive meeting system," in Proc. 1st Int. Conf. Pervasive Technol. Rel. Assist. Environ., 2008.
- (2008) Proc. 1st Int. Conf. Pervasive Technol. Rel. Assist. Environ
- Dai, P.¹ Xu, G.²

8
- 70350645426
- Probabilistic integration of sparse audio-visual cues for identity tracking
- K. Bernardin, R. Stiefelhagen, and A. Waibel, "Probabilistic integration of sparse audio-visual cues for identity tracking," in Proc. 16th ACM Int. Conf. Multimedia, 2008.
- (2008) Proc. 16th ACM Int. Conf. Multimedia
- Bernardin, K.¹ Stiefelhagen, R.² Waibel, A.³

9
- 70349205633
- Role of head pose estimation in speech acquisition from distant microphones
- S. T. Shivappa, B. D. Rao, and M. M. Trivedi, "Role of head pose estimation in speech acquisition from distant microphones," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009, pp. 3557-3560.
- (2009) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 3557-3560
- Shivappa, S.T.¹ Rao, B.D.² Trivedi, M.M.³

10
- 40249089621
- Speech enhancement and recognition in meetings with an audio-visual sensor array
- Nov
- H. K. Maganti, D. Gatica-Perez, and I. McCowan, "Speech enhancement and recognition in meetings with an audio-visual sensor array," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.8, pp. 2257-2269, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2257-2269
- Maganti, H.K.¹ Gatica-Perez, D.² McCowan, I.³

11
- 70449556249
- Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms
- S. T. Shivappa, M. Trivedi, and B. D. Rao, "Hierarchical audio-visual cue integration framework for activity analysis in intelligent meeting rooms," in Proc. IEEE CVPR Workshop: ViSU'09, 2009, pp. 107-114.
- (2009) Proc. IEEE CVPR Workshop: ViSU'09 , pp. 107-114
- Shivappa, S.T.¹ Trivedi, M.² Rao, B.D.³

12
- 33846013241
- Object tracking: A survey
- A. Yilmaz, O. Javed, and M. Shah, "Object tracking: A survey," ACM Comput. Surv., 2006.
- (2006) ACM Comput. Surv
- Yilmaz, A.¹ Javed, O.² Shah, M.³

13
- 0009590598
- New York: Springer
- M. Brandstein and D. Ward, Microphone Arrays. New York: Springer, 2001.
- (2001) Microphone Arrays
- Brandstein, M.¹ Ward, D.²

14
- 3042551886
- Video arrays for real-time tracking of person, head, and face in an intelligent room
- K. S. Huang and M. M. Trivedi, "Video arrays for real-time tracking of person, head, and face in an intelligent room," Mach. Vis. Applicat., 2003.
- (2003) Mach. Vis. Applicat
- Huang, K.S.¹ Trivedi, M.M.²

15
- 0346707503
- Source localization in reverberant environments: Modeling and statistical analysis
- Nov
- T. Gustafsson, B. D. Rao, and M. M. Trivedi, "Source localization in reverberant environments: Modeling and statistical analysis," IEEE Trans. Speech Audio Process., vol.11, no.6, pp. 791-803, Nov. 2003.
- (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.6 , pp. 791-803
- Gustafsson, T.¹ Rao, B.D.² Trivedi, M.M.³

16
- 0003343412
- Robust localization in reverberant rooms
- New York: Springer
- J. H. DiBiase, H. F. Silverman, and M. S. Brandstein, "Robust localization in reverberant rooms," in Microphone Arrays: Signal Processing Techniques and Applications. New York: Springer, 2001.
- (2001) Microphone Arrays: Signal Processing Techniques and Applications
- Dibiase, J.H.¹ Silverman, H.F.² Brandstein, M.S.³

17
- 0033279153
- Audio-visual tracking for natural interactivity
- G. Pingali, G. Tunali, and I. Carlbom, "Audio-visual tracking for natural interactivity," in Proc. 7th ACM Int. Conf. Multimedia (Part 1), 1999.
- (1999) Proc. 7th ACM Int. Conf. Multimedia (Part 1)
- Pingali, G.¹ Tunali, G.² Carlbom, I.³

18
- 0035458007
- Robust sound localization using multi-source audiovisual information fusion
- S. G. Z. P. Aarabi, "Robust sound localization using multi-source audiovisual information fusion," Information Fusion, 2001.
- (2001) Information Fusion
- Aarabi, S.G.Z.P.¹

19
- 0034844366
- Sequential Monte Carlo fusion of sound and vision for speaker tracking
- J. Vermaak, M. Gangnet, A. Blake, and P. Perez, "Sequential Monte Carlo fusion of sound and vision for speaker tracking," in Proc. 8th IEEE Int. Conf. Comput. Vis., pp. 741-746.
- Proc. 8th IEEE Int. Conf. Comput. Vis , pp. 741-746
- Vermaak, J.¹ Gangnet, M.² Blake, A.³ Perez, P.⁴

20
- 84880877816
- Real-time auditory and visual multiple-object tracking for humanoids
- K. Nakadai, K. Hidai, H. Mizoguchi, H. G. Okuno, and H. Kitano, "Real-time auditory and visual multiple-object tracking for humanoids," in Proc. IJCAI, 2001.
- (2001) Proc. IJCAI
- Nakadai, K.¹ Hidai, K.² Mizoguchi, H.³ Okuno, H.G.⁴ Kitano, H.⁵

21
- 0042349407
- A graphical model for audiovisual object tracking
- Jul.
- M. Beal, N. Jojic, and H. Attias, "A graphical model for audiovisual object tracking," IEEE Trans. Pattern Anal. Mach. Intell., vol.25, no.7, pp. 828-836, Jul. 2003.
- (2003) IEEE Trans. Pattern Anal. Mach. Intell. , vol.25 , Issue.7 , pp. 828-836
- Beal, M.¹ Jojic, N.² Attias, H.³

22
- 0034507915
- Look who's talking: Speaker detection using video and audio correlation
- R. Cutler and L. S. Davis, "Look who's talking: Speaker detection using video and audio correlation," in Proc. IEEE Int. Conf. Multimedia Expo (III), 2000.
- (2000) Proc. IEEE Int. Conf. Multimedia Expo (III)
- Cutler, R.¹ Davis, L.S.²

23
- 0009622481
- Learning joint statistical models for audio-visual fusion and segregation
- J. W. Fisher, T. Darrell, W. T. Freeman, and P. A.Viola, "Learning joint statistical models for audio-visual fusion and segregation," in Proc. NIPS, 2000.
- (2000) Proc. NIPS
- Fisher, J.W.¹ Darrell, T.² Freeman, W.T.³ Viola, P.A.⁴

24
- 84899028297
- Audio vision: Using audiovisual synchrony to locate sounds
- J. Hershey and J. Movellan, "Audio vision: Using audiovisual synchrony to locate sounds," in Proc. NIPS, 2000.
- (2000) Proc. NIPS
- Hershey, J.¹ Movellan, J.²

25
- 0036874485
- Joint audio-visual tracking using particle filters
- D. N. Zotkin, R. Duraiswami, and L. S. Davis, "Joint audio-visual tracking using particle filters," EURASIP J. Appl. Signal Process., vol.2002, no.11, pp. 1154-1164, 2002.
- (2002) EURASIP J. Appl. Signal Process. , vol.2002 , Issue.11 , pp. 1154-1164
- Zotkin, D.N.¹ Duraiswami, R.² Davis, L.S.³

26
- 0345565782
- Audio-Visual Speaker Tracking with Importance Particle Filters
- D. G. Perez, G. Lathoud, I. McCowan, J. M. Odobez, and D. Moore, "Audio-Visual Speaker Tracking With Importance Particle Filters," in Proc. ICIP, 2003.
- (2003) Proc. ICIP
- Perez, D.G.¹ Lathoud, G.² McCowan, I.³ Odobez, J.M.⁴ Moore, D.⁵

27
- 21244492850
- Real-time speaker tracking using particle filter sensor fusion
- Mar
- Y. Chen and Y. Rui, "Real-time speaker tracking using particle filter sensor fusion," Proc. IEEE, vol.92, no.3, pp. 485-494, Mar. 2004.
- (2004) Proc. IEEE , vol.92 , Issue.3 , pp. 485-494
- Chen, Y.¹ Rui, Y.²

28
- 4544347587
- Multiple person and speaker activity tracking with a particle filter
- N. Checka, K. W. Wilson, M. R. Siracusa, and T. Darrell, "Multiple person and speaker activity tracking with a particle filter," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2004, vol.V, pp. 881-884.
- (2004) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.5 , pp. 881-884
- Checka, N.¹ Wilson, K.W.² Siracusa, M.R.³ Darrell, T.⁴

29
- 32344434992
- A joint particle filter for audio-visual speaker tracking
- K. Nickel, T. Gehrig, R. Stiefelhagen, and J. McDonough, "A joint particle filter for audio-visual speaker tracking," in Proc. 7th Int. Conf. Multimodal Interfaces, 2005.
- (2005) Proc. 7th Int. Conf. Multimodal Interfaces
- Nickel, K.¹ Gehrig, T.² Stiefelhagen, R.³ McDonough, J.⁴

30
- 33645672078
- Kalman filters for audio-video source localization
- Oct
- T. Gehrig, K. Nickel, H. K. Ekenel, U. Klee, and J. McDonough, "Kalman filters for audio-video source localization," in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust., Oct. 2005, pp. 118-121.
- (2005) Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust. , pp. 118-121
- Gehrig, T.¹ Nickel, K.² Ekenel, H.K.³ Klee, U.⁴ McDonough, J.⁵

31
- 64149093817
- Audiovisual probabilistic tracking of multiple speakers in meetings
- Feb.
- D. Gatica-Perez, G. Lathoud, J. Odobez, and I. McCowan, "Audiovisual probabilistic tracking of multiple speakers in meetings," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no.2, pp. 601-616, Feb. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.2 , pp. 601-616
- Gatica-Perez, D.¹ Lathoud, G.² Odobez, J.³ McCowan, I.⁴

32
- 37849022114
- Audio-visual multi-person tracking and identification for smart environments
- K. Bernardin and R. Stiefelhagen, "Audio-visual multi-person tracking and identification for smart environments," in Proc. ACM Int. Conf. Multimedia, 2007.
- (2007) Proc. ACM Int. Conf. Multimedia
- Bernardin, K.¹ Stiefelhagen, R.²

33
- 35348860213
- Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot
- Oct
- R. Stiefelhagen, H. K. Ekenel, C. Fugen, P. Gieselmann, H. Holzapfel, F. Kraft, K. Nickel, M. Voit, and A. Waibel, "Enabling multimodal human-robot interaction for the Karlsruhe humanoid robot," IEEE Trans. Robotics, vol.23, no.5, pp. 840-851, Oct. 2007.
- (2007) IEEE Trans. Robotics , vol.23 , Issue.5 , pp. 840-851
- Stiefelhagen, R.¹ Ekenel, H.K.² Fugen, C.³ Gieselmann, P.⁴ Holzapfel, H.⁵ Kraft, F.⁶ Nickel, K.⁷ Voit, M.⁸ Waibel, A.⁹

34
- 77956753881
- (Lecture Notes in Computer Science)
- R. Stiefelhagen, R. Bowers, and J. Fiscus, in Proc. Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007 (Lecture Notes in Computer Science).
- Proc. Multimodal Technologies for Perception of Humans: International Evaluation Workshops CLEAR 2007 and RT 2007
- Stiefelhagen, R.¹ Bowers, R.² Fiscus, J.³

35
- 77956738116
- Upc audio, video and multimodal person tracking systems in the clear evaluation campaign
- A. Abad, C. Canton-Ferrer, C. Segura, J. L. Landabaso, D. Macho, J. R. Casas, J. Hernando, M. Pardas, and C. Nadeu, "Upc audio, video and multimodal person tracking systems in the clear evaluation campaign," in Proc. 1st Int. CLEAR Evaluation Workshop-Multimodal Technologies for Perception of Humans, 2007.
- (2007) Proc. 1st Int. CLEAR Evaluation Workshop-Multimodal Technologies for Perception of Humans
- Abad, A.¹ Canton-Ferrer, C.² Segura, C.³ Landabaso, J.L.⁴ MacHo, D.⁵ Casas, J.R.⁶ Hernando, J.⁷ Pardas, M.⁸ Nadeu, C.⁹

36
- 44949210324
- The ait 3D audio/visual person tracker for clear 2007
- N. Katsarakis, F. Talantzis, A. Pnevmatikakis, and L. Polymenakos, "The ait 3D audio/visual person tracker for clear 2007," in Proc. 1st Int. CLEAR Evaluation Workshop-Multimodal Technologies for Perception of Humans, 2007.
- (2007) Proc. 1st Int. CLEAR Evaluation Workshop-Multimodal Technologies for Perception of Humans
- Katsarakis, N.¹ Talantzis, F.² Pnevmatikakis, A.³ Polymenakos, L.⁴

37
- 38749151899
- An iterative decoding algorithm for fusion of multi-modal information
- S. T. Shivappa, B. D. Rao, and M. M. Trivedi, "An iterative decoding algorithm for fusion of multi-modal information," EURASIP J. Adv. Signal Process.-Special Iss. Human-Activity Analysis in Multimedia Data, 2008.
- (2008) EURASIP J. Adv. Signal Process.-Special Iss. Human-Activity Analysis in Multimedia Data
- Shivappa, S.T.¹ Rao, B.D.² Trivedi, M.M.³

38
- 0004235293
- New York: Academic
- Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association. New York: Academic, 1988.
- (1988) Tracking and Data Association
- Bar-Shalom, Y.¹ Fortmann, T.E.²

39
- 0016990291
- The generalized correlation method for estimation of time delay
- Aug
- C. H. Knapp and G. C. Carter, "The generalized correlation method for estimation of time delay," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-24, no.4, pp. 320-327, Aug. 1976.
- (1976) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-24 , Issue.4 , pp. 320-327
- Knapp, C.H.¹ Carter, G.C.²

40
- 0030701369
- A robust method for speech signal time-delay estimation in reverberant rooms
- M. Brandstein and H. Silverman, "A robust method for speech signal time-delay estimation in reverberant rooms," in Proc. ICASSP, 1997, pp. 375-378.
- (1997) Proc. ICASSP , pp. 375-378
- Brandstein, M.¹ Silverman, H.²

41
- 34948889993
- Microphone arrays as generalized cameras for integrated audio visual processing
- A. O'Donovan and R. Duraiswami, "Microphone arrays as generalized cameras for integrated audio visual processing," in Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognition, 2007, pp. 1-8.
- (2007) Proc. IEEE Int. Conf. Comput. Vis. Pattern Recognition , pp. 1-8
- O'Donovan, A.¹ Duraiswami, R.²

42
- 0016037512
- Optimal decoding of linear codes for minimizing symbol error rate
- Mar
- L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, "Optimal decoding of linear codes for minimizing symbol error rate," IEEE Trans. Inf. Theory, vol.IT-20, no.2, pp. 284-287, Mar. 1974.
- (1974) IEEE Trans. Inf. Theory , vol.IT-20 , Issue.2 , pp. 284-287
- Bahl, L.¹ Cocke, J.² Jelinek, F.³ Raviv, J.⁴

43
- 67650122797
- Random projection trees for vector quantization
- Jul.
- S. Dasgupta and Y. Freund, "Random projection trees for vector quantization," IEEE Trans. Inf. Theory, vol.55, no.7, pp. 3229-3242, Jul. 2009.
- (2009) IEEE Trans. Inf. Theory , vol.55 , Issue.7 , pp. 3229-3242
- Dasgupta, S.¹ Freund, Y.²

44
- 41349114281
- The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms
- Dec
- D. Mostefa, N. Moreau, K. Choukri, G. Potamianos, S. M. Chu, J. R. Casas, J. Turmo, L. Cristoferetti, F. Tobia, A. Pnevmatikakis, V. Mylonakis, F. Talantzis, S. Burger, R. Stiefelhagen, K. Bernardin, and C. Rochet, "The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms," J. Lang. Res. Eval., vol.41, no. (3-4), pp. 389-407, Dec. 2007.
- (2007) J. Lang. Res. Eval. , vol.41 , Issue.3-4 , pp. 389-407
- Mostefa, D.¹ Moreau, N.² Choukri, K.³ Potamianos, G.⁴ Chu, S.M.⁵ Casas, J.R.⁶ Turmo, J.⁷ Cristoferetti, L.⁸ Tobia, F.⁹ Pnevmatikakis, A.¹⁰ Mylonakis, V.¹¹ Talantzis, F.¹² Burger, S.¹³ Stiefelhagen, R.¹⁴ Bernardin, K.¹⁵ Rochet, C.¹⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.