SCOPUS 정보 검색 플랫폼

IEEE Transactions on Multimedia

Volumn 10, Issue 8, 2008, Pages 1541-1551

Boosting-based multimodal speaker detection for distributed meeting videos

(8) Zhang, Cha a Yin, Pei b Rui, Yong a,c Cutler, Ross d Viola, Paul a Sun, Xinding d Pinto, Nelson d Zhang, Zhengyou a

a MICROSOFT RESEARCH (United States)

b Georgia Institute of Technology (United States)

c MICROSOFT RESEARCH ASIA (China)

d MICROSOFT (United States)

Author keywords

Audiovisual fusion; Boosting; Speaker detection

Indexed keywords

FEATURE EXTRACTION; MAGNETOPLASMA; SPEECH RECOGNITION; VISUAL COMMUNICATION;

AUDIO AND VISUAL INFORMATIONS; AUDIOVISUAL FUSION; BOOSTING; COMMERCIAL PRODUCTS; DISTRIBUTED MEETINGS; ERROR RATES; FEATURE LEVELS; HIGH EFFICIENCIES; HIGH RESOLUTIONS; MICROSOFT; MULTI MODALS; SOUND SOURCE LOCALIZATIONS; SPEAKER DETECTION; VISUAL FEATURES;

AUDIO ACOUSTICS;

EID: 57849134738 PISSN: 15209210 EISSN: None Source Type: Journal
DOI: 10.1109/TMM.2008.2007344 Document Type: Article

Times cited : (37)

References (37)

1
- 33646779170
- Smart room: Participant and speaker localization and identification
- C. Busso, S. Hernanz, C. Chu, S. Kwon, S. Lee, P. Georgiou, I. Cohen, and S. Narayanan, "Smart room: Participant and speaker localization and identification", in Proc. IEEE ICASSP, 2005.
- (2005) Proc. IEEE ICASSP
- Busso, C.¹ Hernanz, S.² Chu, C.³ Kwon, S.⁴ Lee, S.⁵ Georgiou, P.⁶ Cohen, I.⁷ Narayanan, S.⁸

2
- 0038715064
- Distributed meetings: A meeting capture and broadcasting system
- R. Cutler, Y. Rui, A. Gupta, J. Cadiz, I. Tashev, L. He, A. Colburn, Z. Zhang, Z. Liu, and S. Silverbert, "Distributed meetings: A meeting capture and broadcasting system", in Proc. ACM Conf. Multimedia, 2002.
- (2002) Proc. ACM Conf. Multimedia
- Cutler, R.¹ Rui, Y.² Gupta, A.³ Cadiz, J.⁴ Tashev, I.⁵ He, L.⁶ Colburn, A.⁷ Zhang, Z.⁸ Liu, Z.⁹ Silverbert, S.¹⁰

3
- 34250699813
- Audio-Visual Localization of Multiple Speakers in a Video Teleconferencing Setting
- Canada
- B. Kapralos, M. Jenkin, and E. Milios, "Audio-Visual Localization of Multiple Speakers in a Video Teleconferencing Setting", Tech. Rep. York University, Canada, 2002.
- (2002) Tech. Rep. York University
- Kapralos, B.¹ Jenkin, M.² Milios, E.³

4
- 0038038521
- A multimodal speaker detection and tracking system for teleconferencing
- B. Yoshimi and G. Pingali, "A multimodal speaker detection and tracking system for teleconferencing", in Proc. ACM Conf. Multimedia, 2002.
- (2002) Proc. ACM Conf. Multimedia
- Yoshimi, B.¹ Pingali, G.²

5
- 2142812371
- Robust real-time face detection
- P. Viola and M. Jones, "Robust real-time face detection", Int. J. Comput. Vis., vol. 57, no. 2, pp. 137-154, 2004.
- (2004) Int. J. Comput. Vis , vol.57 , Issue.2 , pp. 137-154
- Viola, P.¹ Jones, M.²

6
- 0031385284
- Voice source localization for automatic camera pointing system in videoconferencing
- H. Wang and P. Chu, "Voice source localization for automatic camera pointing system in videoconferencing", in Proc. IEEE ICASSP, 1997.
- (1997) Proc. IEEE ICASSP
- Wang, H.¹ Chu, P.²

7
- 33646794986
- Sound source localization for circular arrays of directional microphones
- Y. Rui, D. Florencio, W. Lam, and J. Su, "Sound source localization for circular arrays of directional microphones", in Proc. IEEE ICASSP, 2005.
- (2005) Proc. IEEE ICASSP
- Rui, Y.¹ Florencio, D.² Lam, W.³ Su, J.⁴

8
- 0001432664
- On the integration of auditory and visual parameters in an HMM-based ASR
- Berlin, Germany: Springer
- A. Adjoudani and C. Benoit, "On the integration of auditory and visual parameters in an HMM-based ASR", in Speechreading by Humans and Machines. Berlin, Germany: Springer, 1996, pp. 461-471.
- (1996) Speechreading by Humans and Machines , pp. 461-471
- Adjoudani, A.¹ Benoit, C.²

9
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Sep
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition", IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

10
- 8844259704
- Discovery and fusion of salient multi-modal features towards news story segmentation
- W. Hsu, S.-F. Chang, C.-W. Huang, L. Kennedy, C.-Y. Lin, and G. Iyengar, "Discovery and fusion of salient multi-modal features towards news story segmentation", in SPIE Electronic Imaging, 2004.
- (2004) SPIE Electronic Imaging
- Hsu, W.¹ Chang, S.-F.² Huang, C.-W.³ Kennedy, L.⁴ Lin, C.-Y.⁵ Iyengar, G.⁶

11
- 0030685285
- Coupled hidden Markov models for complex action recognition
- M. Brand, N. Oliver, and A. Pentland, "Coupled hidden Markov models for complex action recognition", in Proc. IEEE CVPR, 1997.
- (1997) Proc. IEEE CVPR
- Brand, M.¹ Oliver, N.² Pentland, A.³

12
- 84908294933
- Duration dependent input output Markov models for audio-visual event detection
- M. Naphade, A. Garg, and T. Huang, "Duration dependent input output Markov models for audio-visual event detection", in Proc. IEEEICME, 2001.
- (2001) Proc. IEEEICME
- Naphade, M.¹ Garg, A.² Huang, T.³

13
- 57849091620
- Speaker change detection using joint audiovisual statistics
- G. Iyengar and C. Neti, "Speaker change detection using joint audiovisual statistics", in Int. RIAO Conf., 2000.
- (2000) Int. RIAO Conf
- Iyengar, G.¹ Neti, C.²

14
- 34250714316
- Information Theoretic Optimization of Audio Features for Multimodal Speaker Detection
- EPFL, Lausanne, Switzerland
- P. Besson and M. Kunt, "Information Theoretic Optimization of Audio Features for Multimodal Speaker Detection", Tech. Rep. Signal Processing Institute, EPFL, Lausanne, Switzerland, 2005.
- (2005) Tech. Rep. Signal Processing Institute
- Besson, P.¹ Kunt, M.²

15
- 0009622481
- Learning joint statistical models for audio-visual fusion and segregation
- J. Fisher, III, T. Darrell, W. Freeman, and P. Viola, "Learning joint statistical models for audio-visual fusion and segregation", in NIPS, 2000, pp. 772-778.
- (2000) NIPS , pp. 772-778
- Fisher III, J.¹ Darrell, T.² Freeman, W.³ Viola, P.⁴

16
- 84899028297
- Audio vision: Using audio-visual synchrony to locate sounds
- J. Hershey and J. Movellan, "Audio vision: Using audio-visual synchrony to locate sounds", in Advances in Neural Information Processing Systems, 2000.
- (2000) Advances in Neural Information Processing Systems
- Hershey, J.¹ Movellan, J.²

17
- 34250762971
- Multimodal speaker detection using error feedback dynamic Bayesian networks
- V. Pavlović, A. Garg, J. Rehg, and T. Huang, "Multimodal speaker detection using error feedback dynamic Bayesian networks", in Proc. IEEE CVPR, 2001.
- (2001) Proc. IEEE CVPR
- Pavlović, V.¹ Garg, A.² Rehg, J.³ Huang, T.⁴

18
- 0036874485
- Logistic regression, adaboost and bregman distances
- D. N. Zotkin, R. Duraiswami, and L. S. Davis, "Logistic regression, adaboost and bregman distances", EURASIP J. Appl. Signal Process., vol. 2002, no. 11, pp. 1154-1164, 2002.
- (2002) EURASIP J. Appl. Signal Process , vol.2002 , Issue.11 , pp. 1154-1164
- Zotkin, D.N.¹ Duraiswami, R.² Davis, L.S.³

19
- 0034844366
- Sequential Monte Carlo fusion of sound and vision for speaker tracking
- J. Vermaak, M. Gangnet, A. Black, and P. Pérez, "Sequential Monte Carlo fusion of sound and vision for speaker tracking", in Proc. IEEE ICCV, 2001.
- (2001) Proc. IEEE ICCV
- Vermaak, J.¹ Gangnet, M.² Black, A.³ Pérez, P.⁴

20
- 20444478554
- Speaker localisation using audiovisual synchrony: An empirical study
- H. Nock, G. Iyengar, and C. Neti, "Speaker localisation using audiovisual synchrony: An empirical study", in Proc. CIVR, 2003.
- (2003) Proc. CIVR
- Nock, H.¹ Iyengar, G.² Neti, C.³

21
- 0034507915
- Look who's talking: Speaker detection using video and audio correlation
- R. Cutler and L. Davis, "Look who's talking: Speaker detection using video and audio correlation", in Proc. IEEE ICME, 2000.
- (2000) Proc. IEEE ICME
- Cutler, R.¹ Davis, L.²

22
- 0344044776
- Audio-video sensor fusion with probabilistic graphical models
- M. Beal, H. Attias, and N. Jojic, "Audio-video sensor fusion with probabilistic graphical models", in Proc. ECCV, 2002.
- (2002) Proc. ECCV
- Beal, M.¹ Attias, H.² Jojic, N.³

23
- 21244492850
- Real-time speaker tracking using particle filter sensor fusion
- Y. Chen and Y. Rui, "Real-time speaker tracking using particle filter sensor fusion", Proc. IEEE, vol. 92, pp. 485-494, 2004.
- (2004) Proc. IEEE , vol.92 , pp. 485-494
- Chen, Y.¹ Rui, Y.²

24
- 32344434992
- A joint particle filter for audio-visual speaker tracking
- K. Nickel, T. Gehrig, R. Stiefelhagen, and J. McDonough, "A joint particle filter for audio-visual speaker tracking", in ICMI, 2005.
- (2005) ICMI
- Nickel, K.¹ Gehrig, T.² Stiefelhagen, R.³ McDonough, J.⁴

25
- 34547516010
- Maximum likelihood sound source localization for multiple directional microphones
- C. Zhang, Z. Zhang, and D. Florêncio, "Maximum likelihood sound source localization for multiple directional microphones", in ICASSP, 2007.
- (2007) ICASSP
- Zhang, C.¹ Zhang, Z.² Florêncio, D.³

26
- 4644273800
- Source localization in reverberant environments: Performance bounds and ml estimation
- T. Gustafsson, B. Rao, and M. Trivedi, "Source localization in reverberant environments: Performance bounds and ml estimation", in Proc. ICASSP, 2001.
- (2001) Proc. ICASSP
- Gustafsson, T.¹ Rao, B.² Trivedi, M.³

27
- 0030701369
- A robust method for speech signal time-delay estimation in reverberant rooms
- M. Brandstein and H. Silverman, "A robust method for speech signal time-delay estimation in reverberant rooms", in Proc. ICASSP, 1997.
- (1997) Proc. ICASSP
- Brandstein, M.¹ Silverman, H.²

28
- 0003660631
- Tech. Rep. Stanford Univ, Dept. Statistics, Stanford, CA
- J. Friedman, T. Hastie, and R. Tibshirani, "Additive Logistic Regression: A Statistical View of Boosting", Tech. Rep. Stanford Univ., Dept. Statistics, Stanford, CA, 1998.
- (1998) Additive Logistic Regression: A Statistical View of Boosting
- Friedman, J.¹ Hastie, T.² Tibshirani, R.³

29
- 0033281701
- Improved boosting algorithms using confidence-rated predictions
- R. Schapire and Y. Singer, "Improved boosting algorithms using confidence-rated predictions." Mach. Learn., vol. 37, no. 3, pp. 297-336, 1999.
- (1999) Mach. Learn , vol.37 , Issue.3 , pp. 297-336
- Schapire, R.¹ Singer, Y.²

30
- 0344983340
- Detecting pedestrians using patterns of motion and appearance
- P. Viola, M. Jones, and D. Snow, "Detecting pedestrians using patterns of motion and appearance", in Proc. IEEE ICCV, 2003.
- (2003) Proc. IEEE ICCV
- Viola, P.¹ Jones, M.² Snow, D.³

31
- 0003396283
- A System for Video Surveillance and Monitoring
- Robotics Inst, Pittsburgh, PA
- R. Collins, A. Lipton, T. Kanade, H. Fujiyoshi, D. Duggins, Y. Tsin, D. Tolliver, N. Enomoto, and O. Hasegawa, "A System for Video Surveillance and Monitoring", Tech. Rep. Carnegie Mellon Univ., Robotics Inst., Pittsburgh, PA, 2000.
- (2000) Tech. Rep. Carnegie Mellon Univ
- Collins, R.¹ Lipton, A.² Kanade, T.³ Fujiyoshi, H.⁴ Duggins, D.⁵ Tsin, Y.⁶ Tolliver, D.⁷ Enomoto, N.⁸ Hasegawa, O.⁹

32
- 0031187308
- Pfinder: Realtime tracking of the human body
- Jul
- C. Wren, A. Azarbayejani, T. Darrell, and A. Pentland, "Pfinder: Realtime tracking of the human body", IEEE Trans. Pattern. Anal. Mach. Intell., vol. 19, no. 7, pp. 780-785, Jul. 1997.
- (1997) IEEE Trans. Pattern. Anal. Mach. Intell , vol.19 , Issue.7 , pp. 780-785
- Wren, C.¹ Azarbayejani, A.² Darrell, T.³ Pentland, A.⁴

33
- 0003486467
- Multimedia sensor fusion for intelligent camera control and human computer interaction
- Ph.D. dissertation, Dept. Elect. Eng, North Carolina Start Univ, Raleigh
- S. Goodridge, "Multimedia sensor fusion for intelligent camera control and human computer interaction", Ph.D. dissertation, Dept. Elect. Eng., North Carolina Start Univ., Raleigh, 1997.
- (1997)
- Goodridge, S.¹

34
- 0036643072
- Logistic regression, Adaboost and Bregman distances
- M. Collins, R. Schapire, and Y. Singer, "Logistic regression, Adaboost and Bregman distances", Mach. Learn., vol. 48, no. 1-3, pp. 253-285, 2002.
- (2002) Mach. Learn , vol.48 , Issue.1-3 , pp. 253-285
- Collins, M.¹ Schapire, R.² Singer, Y.³

35
- 84898978212
- Boosting algorithms as gradient decent
- L. Mason, J. Baxter, P. Bartlett, and M. Frean, "Boosting algorithms as gradient decent", in NIPS, 2000.
- (2000) NIPS
- Mason, L.¹ Baxter, J.² Bartlett, P.³ Frean, M.⁴

36
- 70450187477
- Multiple-instance pruning for learning efficient cascade detectors
- C. Zhang and P. Viola, "Multiple-instance pruning for learning efficient cascade detectors", in NIPS, 2007.
- (2007) NIPS
- Zhang, C.¹ Viola, P.²

37
- 33645146449
- Histograms of oriented gradients for human detection
- N. Dalai and B. Triggs, "Histograms of oriented gradients for human detection", in Proc. CVPR, 2005.
- (2005) Proc. CVPR
- Dalai, N.¹ Triggs, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.