SCOPUS 정보 검색 플랫폼

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volumn 34, Issue 1, 2012, Pages 79-93

Multimodal Speaker diarization

(3) Noulas, Athanasios a Englebienne, Gwenn a Kröse, Ben J A a

a UNIVERSITY OF AMSTERDAM (Netherlands)

Author keywords

audiovisual fusion; dynamic Bayesian networks; Speaker diarization

Indexed keywords

AUDIO AND VIDEO; AUDIO STREAM; AUDIO-BASED; AUDIO-VISUAL FUSION; BROADCAST VIDEO; DATA SETS; DYNAMIC BAYESIAN NETWORK; DYNAMIC BAYESIAN NETWORKS; EXPECTATION-MAXIMIZATION ALGORITHMS; LABELED TRAINING DATA; MEETING VIDEO; MODALITY ANALYSIS; MODEL PARAMETERS; MULTI-MODAL; MULTIMODAL FRAMEWORKS; PROBABILISTIC FRAMEWORK; RECORDING EQUIPMENT; SPEAKER DIARIZATION; VIDEO STREAMS;

ALGORITHMS; HIDDEN MARKOV MODELS; SPEECH RECOGNITION;

BAYESIAN NETWORKS;

EID: 81855191839 PISSN: 01628828 EISSN: None Source Type: Journal
DOI: 10.1109/TPAMI.2011.47 Document Type: Article

Times cited : (60)

References (36)

1
- 33646380923
- Approaches and applications of speaker diarization
- D.A. Reynolds and P.A. Torres-Carrasquillo, "Approaches and Applications of Speaker Diarization," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, pp. 953-956, 2010.
- (2010) Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing , pp. 953-956
- Reynolds, D.A.¹ Torres-Carrasquillo, P.A.²

2
- 47749152568
- The rich transcription 2007 meeting recognition evaluation
- Springer-Verlag
- J.G. Fiscus, J. Ajot, and J.S. Garofolo, "The Rich Transcription 2007 Meeting Recognition Evaluation," Multimodal Technologies for Perception of Humans, pp. 373-389, Springer-Verlag, 2008.
- (2008) Multimodal Technologies for Perception of Humans , pp. 373-389
- Fiscus, J.G.¹ Ajot, J.² Garofolo, J.S.³

3
- 0002606824
- Transcription of broadcast news shows with the IBM large vocabulary speech recognition system
- R. Bakis, S. Chen, P. Gopalakrishnan, R. Gopinath, L. Polymenakos, and M. Franz, "Transcription of Broadcast News Shows with the IBM Large Vocabulary Speech Recognition System," Proc. Speech Recognition Workshop, pp. 67-72, 1997.
- (1997) Proc. Speech Recognition Workshop , pp. 67-72
- Bakis, R.¹ Chen, S.² Gopalakrishnan, P.³ Gopinath, R.⁴ Polymenakos, L.⁵ Franz, M.⁶

4
- 85128356454
- Partitioning and transcription of broadcast news data
- J. luc Gauvain, L. Lamel, and G. Adda, "Partitioning and Transcription of Broadcast News Data," Proc. Int'l Conf. Spoken Language Processing, pp. 1335-1338, 1998.
- (1998) Proc. Int'l Conf. Spoken Language Processing , pp. 1335-1338
- Luc Gauvain, J.¹ Lamel, L.² Adda, G.³

5
- 85073258179
- Feature warping for robust speaker verification
- J. Pelecanos and S. Sridharan, "Feature Warping for Robust Speaker Verification," Proc. Int'l Speech Comm. Assoc. Workshop Speaker Recognition: A Speaker Oddyssey, 2001.
- (2001) Proc. Int'l Speech Comm. Assoc. Workshop Speaker Recognition: A Speaker Oddyssey
- Pelecanos, J.¹ Sridharan, S.²

6
- 33745196256
- Spectral cross-correlation features for audio indexing of broadcast news and meetings
- 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
- M. Yamaguchi, M. Yamashita, and S. Matsunaga, "Spectral Cross-Correlation Features for Audio Indexing of Broadcast News and Meetings," Proc. Ninth European Conf. Speech Comm. and Technology, pp. 613-616, 2005. (Pubitemid 43908137)
- (2005) 9th European Conference on Speech Communication and Technology , pp. 613-616
- Yamaguchi, M.¹ Yamashita, M.² Matsunaga, S.³

7
- 0034273195
- DISTBIC: A speaker-based segmentation for audio data indexing
- P. Delacourt, D. Kryze, and C.J. Wellekens, "DISTBIC: A Speaker-Based Segmentation for Audio Data Indexing," Speech Comm., vol. 32, pp. 111-126, 2000.
- (2000) Speech Comm. , vol.32 , pp. 111-126
- Delacourt, P.¹ Kryze, D.² Wellekens, C.J.³

8
- 84875953283
- Clustering via the bayesian information criterion with applications in speech recognition
- S.S. Chenn and P. Gopalakrishnan, "Clustering via the Bayesian Information Criterion with Applications in Speech Recognition," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, vol. 2, pp. 645-648, 1998.
- (1998) Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing , vol.2 , pp. 645-648
- Chenn, S.S.¹ Gopalakrishnan, P.²

9
- 0034857759
- Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition
- K. Mori and S. Nakagawa, "Speaker Change Detection and Speaker Clustering Using VQ Distortion for Broadcast News Speech Recognition," Systems and Computers in Japan, vol. 34, pp. 413-416, 2001. (Pubitemid 32839275)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 413-416
- Mori, K.¹ Nakagawa, S.²

10
- 85009142161
- A novel method for two-speaker segmentation
- R. Gangadharaiah, B. Narayanaswamy, and Narayanaswamy, "A Novel Method for Two-Speaker Segmentation," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, 2004.
- (2004) Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing
- Gangadharaiah, R.¹ Narayanaswamy, B.² Narayanaswamy³

11
- 47749119617
- The ICSI RT07s speaker diarization system
- Springer-Verlag
- C. Wooters and M. Huijbregts, "The ICSI RT07s Speaker Diarization System," Multimodal Technologies for Perception of Humans, pp. 509-519, Springer-Verlag, 2008.
- (2008) Multimodal Technologies for Perception of Humans , pp. 509-519
- Wooters, C.¹ Huijbregts, M.²

12
- 34948829598
- Harmony in motion
- Z. Barzelay and Y.Y. Schechner, "Harmony in Motion," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, 2007.
- (2007) Proc. IEEE CS Conf. Computer Vision and Pattern Recognition
- Barzelay, Z.¹ Schechner, Y.Y.²

13
- 24644451644
- Pixels that sound
- E. Kidron, Y.Y. Schechner, and M. Elad, "Pixels That Sound," Proc. IEEE CS Conf. Computer Vision and Pattern Recognition, pp. 88-95, 2005.
- (2005) Proc. IEEE CS Conf. Computer Vision and Pattern Recognition , pp. 88-95
- Kidron, E.¹ Schechner, Y.Y.² Elad, M.³

14
- 84899028297
- Using audio-visual synchrony to locate sounds
- MIT Press
- J. Hershey and J. Movellan, "Using Audio-Visual Synchrony to Locate Sounds," Advances in Neural Information Processing Systems, vol. 12, pp. 813-819, MIT Press, 1999.
- (1999) Advances in Neural Information Processing Systems , vol.12 , pp. 813-819
- Hershey, J.¹ Movellan, J.²

15
- 84947720445
- Audiovisual segmentation and the cocktail party effect
- T. Darrell, J.W. Fisher III, and P. Viola, "Audiovisual Segmentation and the Cocktail Party Effect," Proc. Int'l Conf. Multimodal Interfaces, pp. 32-40, 2000.
- (2000) Proc. Int'l Conf. Multimodal Interfaces , pp. 32-40
- Darrell, T.¹ Fisher Iii, J.W.² Viola, P.³

16
- 4243096131
- Multimodal processing by finding common cause
- H.J. Nock, G. Iyengar, and C. Neti, "Multimodal Processing by Finding Common Cause," Comm. ACM, vol. 47, no. 1, pp. 51-56, 2004.
- (2004) Comm. ACM , vol.47 , Issue.1 , pp. 51-56
- Nock, H.J.¹ Iyengar, G.² Neti, C.³

17
- 84908470296
- Audio-visual synchrony for detection of monologues in video archives
- G. Iyengar, H.J. Nock, and C. Neti, "Audio-Visual Synchrony for Detection of Monologues in Video Archives," Proc. IEEE Int'l Conf. Multimedia and Expo, pp. 329-332, 2003.
- (2003) Proc. IEEE Int'l Conf. Multimedia and Expo , pp. 329-332
- Iyengar, G.¹ Nock, H.J.² Neti, C.³

18
- 84949961905
- Probabilistic models and informative subspaces for audiovisual correspondence
- J.W. Fisher III and T. Darrell, "Probabilistic Models and Informative Subspaces for Audiovisual Correspondence," Proc. European Conf. Computer Vision, Part III, pp. 592-603, 2002.
- (2002) Proc. European Conf. Computer Vision, Part III , pp. 592-603
- Fisher Iii, J.W.¹ Darrell, T.²

19
- 0038715064
- Distributed meetings: A meeting capture and broadcasting system
- R. Cutler, Y. Rui, A. Gupta, J. Cadiz, I. Tashev, L. wei He, A. Colburn, Z.Z.Z. Liu, and S. Silverberg, "Distributed Meetings: A Meeting Capture and Broadcasting System," Proc. 10th ACM Int'l Conf. Multimedia, 2002.
- (2002) Proc. 10th ACM Int'l Conf. Multimedia
- Cutler, R.¹ Rui, Y.² Gupta, A.³ Cadiz, J.⁴ Tashev, I.⁵ Wei He, L.⁶ Colburn, A.⁷ Liu, Z.Z.Z.⁸ Silverberg, S.⁹

20
- 21244492850
- Real-time speaker tracking using particle filter sensor fusion
- DOI 10.1109/JPROC.2003.823146, Sequential State Estimation: From Kalman Filters to Particles Filters
- Y. Chen and Y. Rui, "Real-Time Speaker Tracking Using Particle Filter Sensor Fusion," Proc. IEEE, vol. 92, no. 3, pp. 485-494, Mar. 2004. (Pubitemid 40890755)
- (2004) Proceedings of the IEEE , vol.92 , Issue.3 , pp. 485-494
- Chen, Y.¹ Rui, Y.²

21
- 4544347587
- Multiple person and speaker activity tracking with a particle filter
- May
- N. Checka, K. Wilson, M. Siracusa, and T. Darrell, "Multiple Person and Speaker Activity Tracking with a Particle Filter," Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing, May 2004.
- (2004) Proc. IEEE Int'l Conf. Acoustics, Speech, and Signal Processing
- Checka, N.¹ Wilson, K.² Siracusa, M.³ Darrell, T.⁴

22
- 81855192139
- IDIAP-RR 66, IDIAP
- D. Gatica-Perez, G. Lathoud, J.-M. Odobez, and I. McCowan, "Multimodal Multispeaker Probabilistic Tracking in Meetings," IDIAP-RR 66, IDIAP, 2004.
- (2004) Multimodal Multispeaker Probabilistic Tracking in Meetings
- Gatica-Perez, D.¹ Lathoud, G.² Odobez, J.-M.³ McCowan, I.⁴

23
- 0344044776
- Audio-video sensor fusion with probabilistic graphical models
- M.J. Beal, H. Attias, and N. Jojic, "Audio-Video Sensor Fusion with Probabilistic Graphical Models," Proc. European Conf. Computer Vision, 2002.
- (2002) Proc. European Conf. Computer Vision
- Beal, M.J.¹ Attias, H.² Jojic, N.³

24
- 0031268341
- Factorial hidden markov models
- MIT Press
- Z. Ghahramani, M.I. Jordan, and P. Smyth, "Factorial Hidden Markov Models," Machine Learning, MIT Press, 1997.
- (1997) Machine Learning
- Ghahramani, Z.¹ Jordan, M.I.² Smyth, P.³

25
- 0024610919
- A tutorial on hidden markov models and selected applications in speech recognition
- Feb.
- L.R. Rabiner, "A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
- (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.R.¹

26
- 33644532974
- Robust real-time object detection
- P. Viola and M. Jones, "Robust Real-Time Object Detection," Proc. Second Int'l Workshop Statistical and Computational Theories of Vision-Modeling, Learning, Computing, and Sampling, 2001.
- (2001) Proc. Second Int'l Workshop Statistical and Computational Theories of Vision-Modeling, Learning, Computing, and Sampling
- Viola, P.¹ Jones, M.²

27
- 24644524200
- Visual categorization with bags of keypoints
- C. Dance, J. Willamowski, L. Fan, C. Bray, and G. Csurka, "Visual Categorization with Bags of Keypoints," Proc. European Conf. Computer Vision Int'l Workshop Statistical Learning in Computer Vision, 2004.
- (2004) Proc. European Conf. Computer Vision Int'l Workshop Statistical Learning in Computer Vision
- Dance, C.¹ Willamowski, J.² Fan, L.³ Bray, C.⁴ Csurka, G.⁵

28
- 0009622481
- Learning joint statistical models for audio-visual fusion and segregation
- MIT Press
- T. Darrell, J.W. Fisher III, W.T. Freeman, and P. Viola, "Learning Joint Statistical Models for Audio-Visual Fusion and Segregation," Advances in Neural Information Processing Systems, vol. 13, pp. 772-778, MIT Press, 2000.
- (2000) Advances in Neural Information Processing Systems , vol.13 , pp. 772-778
- Darrell, T.¹ Fisher Iii, J.W.² Freeman, W.T.³ Viola, P.⁴

29
- 0001185873
- An essay towards solving a problem in the doctrine of chances
- B. Thomas, "An Essay Towards Solving a Problem in the Doctrine of Chances," Philosophical Trans. Royal Soc., vol. 53, pp. 370-418, 1763.
- Philosophical Trans. Royal Soc. , vol.53 , Issue.1763 , pp. 370-418
- Thomas, B.¹

30
- 78650455882
- Cross entropy for learning in multi-modal streams
- A.K. Noulas, N. Vlassis, and B.J.A. Krö se, "Cross Entropy for Learning in Multi-Modal Streams," Proc. Joint Workshop Multi-Modal Interaction and Related Machine Learning Algorithms, 2007.
- (2007) Proc. Joint Workshop Multi-Modal Interaction and Related Machine Learning Algorithms
- Noulas, A.K.¹ Vlassis, N.² Kröse, B.J.A.³

31
- 34547231084
- EM detection of common origin of multi-modal cues
- DOI 10.1145/1180995.1181037, ICMI'06: 8th International Conference on Multimodal Interfaces, Conference Proceedings
- A.K. Noulas and B.J.A. Krö se, "EM Detection of Common Origin of Multi-Modal Cues," Proc. Int'l Conf. Multimodal Interfaces, pp. 201-208, 2006. (Pubitemid 47113450)
- (2006) ICMI'06: 8th International Conference on Multimodal Interfaces, Conference Proceeding , pp. 201-208
- Noulas, A.K.¹ Krose, B.J.A.²

32
- 57849156992
- A hybrid generative-discriminative approach to speaker diarization
- A.K. Noulas and B.J.A. Krose, "A Hybrid Generative-Discriminative Approach to Speaker Diarization," Proc. Fifth Int'l Workshop Machine Learning for Multimodal Interaction, pp. 98-109, 2008.
- (2008) Proc. Fifth Int'l Workshop Machine Learning for Multimodal Interaction , pp. 98-109
- Noulas, A.K.¹ Krose, B.J.A.²

33
- 81855171435
- PhD dissertation Univ. of Amsterdam
- A. Noulas, "Audiovisual Fusion for Speaker Diarization," PhD dissertation, Univ. of Amsterdam, 2010.
- (2010) Audiovisual Fusion for Speaker Diarization
- Noulas, A.¹

34
- 58049136519
- Announcing the AMI meeting corpus
- Jan.-Mar.
- J. Carletta, "Announcing the AMI Meeting Corpus," The ELRA Newsletter, vol. 1, no. 1, pp. 3-5, Jan.-Mar. 2006.
- (2006) The ELRA Newsletter , vol.1 , Issue.1 , pp. 3-5
- Carletta, J.¹

35
- 24144486697
- IDIAP, IDIAPCOM 07
- D.C. Moore, "The IDIAP Smart Meeting Room," IDIAP, IDIAPCOM 07, 2002.
- (2002) The IDIAP Smart Meeting Room
- Moore, D.C.¹

36
- 34548346846
- Automatic cluster complexity and quantity selection: Towards robust speaker diarization
- X. Anguera, C. Wooters, and J. Hernando, "Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization," Proc. Third Joint Workshop Multimodal Interaction and Related Machine Learning Algorithms, pp. 248-256, 2006.
- (2006) Proc. Third Joint Workshop Multimodal Interaction and Related Machine Learning Algorithms , pp. 248-256
- Anguera, X.¹ Wooters, C.² Hernando, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.