메뉴 건너뛰기




Volumn 20, Issue 2, 2012, Pages 356-370

Speaker Diarization: A Review of Recent Research

Author keywords

[No Author keywords available]

Indexed keywords


EID: 85008530405     PISSN: 15587916     EISSN: 15587924     Source Type: Journal    
DOI: 10.1109/TASL.2011.2125954     Document Type: Article
Times cited : (702)

References (130)
  • 1
    • 85008554815 scopus 로고    scopus 로고
    • The NIST Rich Transcription 2009 (RT′09) Evaluation
    • [Online]. Available: http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf
    • “The NIST Rich Transcription 2009 (RT′09) Evaluation,” NIST, 2009 [Online]. Available: http://www.itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09-meeting-eval-plan-v2.pdf.
    • (2009) NIST
  • 2
    • 34047261805 scopus 로고    scopus 로고
    • An overview of automatic speaker diarization systems
    • Sep.
    • S. Tranter and D. Reynolds, “An overview of automatic speaker diarization systems,” IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, pp. 1557–1565, Sep. 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.5 , pp. 1557-1565
    • Tranter, S.1    Reynolds, D.2
  • 3
    • 33947685454 scopus 로고    scopus 로고
    • Nuts and flakes: A study of data characteristics in speaker diarization
    • N. Mirghafori and C. Wooters, “Nuts and flakes: A study of data characteristics in speaker diarization,” in Proc. ICASSP, 2006.
    • (2006) Proc. ICASSP
    • Mirghafori, N.1    Wooters, C.2
  • 4
    • 44849123928 scopus 로고    scopus 로고
    • Robust speaker diarization for meetings
    • Ph. D. dissertation, Univ. Politecnica de Catalunya, Barcelona, Spain
    • X. Anguera, “Robust speaker diarization for meetings,” Ph. D. dissertation, Univ. Politecnica de Catalunya, Barcelona, Spain, 2006.
    • (2006)
    • Anguera, X.1
  • 5
    • 66149116378 scopus 로고    scopus 로고
    • Computationally efficient and robust BIC-based speaker segmentation
    • Jul.
    • M. Kotti, E. Benetos, and C. Kotropoulos, “Computationally efficient and robust BIC-based speaker segmentation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp. 920–933, Jul. 2008.
    • (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.5 , pp. 920-933
    • Kotti, M.1    Benetos, E.2    Kotropoulos, C.3
  • 6
    • 47749123507 scopus 로고    scopus 로고
    • Multi-stage speaker diarization for conference and lecture meetings
    • Baltimore, MD, May 8--11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag
    • X. Zhu, C. Barras, L. Lamel, and J. -L. Gauvain, “Multi-stage speaker diarization for conference and lecture meetings,” in Proc. Multimodal Technol. Perception ofHumans: Int. Eval. Workshops CLEAR 2007 and RT 2007, Baltimore, MD, May 8--11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag, 2008, pp. 533–542.
    • (2008) Proc. Multimodal Technol. Perception ofHumans: Int. Eval. Workshops CLEAR 2007 and RT 2007 , pp. 533-542
    • Zhu, X.1    Barras, C.2    Lamel, L.3    Gauvain, J.-L.4
  • 7
    • 67349120575 scopus 로고    scopus 로고
    • Speaker diarization using autoassociative neural networks
    • S. Jothilakshmi, V. Ramalingam, and S. Palanivel, “Speaker diarization using autoassociative neural networks,” Eng. Applicat. Artif. Intell., vol. 22, no. 4-5, pp. 667–675, 2009.
    • (2009) Eng. Applicat. Artif. Intell. , vol.22 , Issue.4-5 , pp. 667-675
    • Jothilakshmi, S.1    Ramalingam, V.2    Palanivel, S.3
  • 8
    • 44949197897 scopus 로고    scopus 로고
    • Robust speaker diarization for meetings: ICSI RT06s evaluation system
    • Sep.
    • X. Anguera, C. Wooters, and J. Hernando, “Robust speaker diarization for meetings: ICSI RT06s evaluation system,” in Proc. ICSLP, Pittsburgh, PA, Sep. 2006.
    • (2006) Proc. ICSLP, Pittsburgh, PA
    • Anguera, X.1    Wooters, C.2    Hernando, J.3
  • 10
    • 33947630340 scopus 로고    scopus 로고
    • Fast incremental clustering of Gaussian mixture speaker models for scaling up retrieval in on-line broadcast
    • May
    • J. Rougui, M. Rziza, D. Aboutajdine, M. Gelgon, and J. Martinez, “Fast incremental clustering of Gaussian mixture speaker models for scaling up retrieval in on-line broadcast,” in Proc. ICASSP, May 2006, vol. 5, pp. 521–524.
    • (2006) Proc. ICASSP , vol.5 , pp. 521-524
    • Rougui, J.1    Rziza, M.2    Aboutajdine, D.3    Gelgon, M.4    Martinez, J.5
  • 13
    • 79959827767 scopus 로고    scopus 로고
    • The IIR-NTU speaker diarization systems for RT 2009
    • Melbourne, FL
    • T. Nguyen, et al., “The IIR-NTU speaker diarization systems for RT 2009,” in Proc. RT′09, NIST Rich Transcription Workshop, Melbourne, FL, 2009.
    • (2009) Proc. RT′09, NIST Rich Transcription Workshop
    • Nguyen, T.1
  • 15
  • 17
    • 78049378635 scopus 로고    scopus 로고
    • TheLIA-EURECOM RT′09 speaker diarization system: Enhancements in speaker modelling and cluster purification
    • Mar. 14-19
    • S. Bozonnet, N. W. D. Evans, and C. Fredouille, “TheLIA-EURECOM RT′09 speaker diarization system: Enhancements in speaker modelling and cluster purification,” in Proc. ICASSP, Dallas, TX, Mar. 14-19, 2010, pp. 4958–4961.
    • (2010) Proc. ICASSP, Dallas, TX , pp. 4958-4961
    • Bozonnet, S.1    Evans, N.W.D.2    Fredouille, C.3
  • 18
    • 44849112917 scopus 로고    scopus 로고
    • Agglomerative information bottleneck for speaker diarization of meetings data
    • Dec.
    • D. Vijayasenan, F. Valente, and H. Bourlard, “Agglomerative information bottleneck for speaker diarization of meetings data,” in Proc. ASRU, Dec. 2007, pp. 250–255.
    • (2007) Proc. ASRU , pp. 250-255
    • Vijayasenan, D.1    Valente, F.2    Bourlard, H.3
  • 19
    • 68649087212 scopus 로고    scopus 로고
    • Aninformationtheoretic approach to speaker diarization of meeting data
    • Sep.
    • D. Vijayasenan, F. Valente, and H. Bourlard, “Aninformationtheoretic approach to speaker diarization of meeting data,” IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 7, pp. 1382–1393, Sep. 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.7 , pp. 1382-1393
    • Vijayasenan, D.1    Valente, F.2    Bourlard, H.3
  • 20
    • 84972808999 scopus 로고
    • Estimating normal means with a conjugate style dirichlet process prior
    • S. McEachern, “Estimating normal means with a conjugate style dirichlet process prior,” in Proc. Commun. Statist.: Simul. Comput., 1994, vol. 23, pp. 727–741.
    • (1994) Proc. Commun. Statist.: Simul. Comput. , vol.23 , pp. 727-741
    • McEachern, S.1
  • 21
    • 0027803368 scopus 로고
    • Keeping the neural networks simpleby minimizing the description length of the weights
    • COLT ′93
    • G. E. Hinton and D. van Camp, “Keeping the neural networks simpleby minimizing the description length of the weights,” in Proc. 6th Annu. Conf. Comput. Learn. Theory, New York, 1993, COLT ′93, pp. 5–13.
    • (1993) Proc. 6th Annu. Conf. Comput. Learn. Theory, New York , pp. 5-13
    • Hinton, G.E.1    van Camp, D.2
  • 23
    • 70450171620 scopus 로고    scopus 로고
    • Variational Bayesian methods for audio indexing
    • Ph. D. dissertation, Eurecom Inst., Sophia-Antipolis, France
    • F. Valente, “Variational Bayesian methods for audio indexing,” Ph. D. dissertation, Eurecom Inst., Sophia-Antipolis, France, 2005.
    • (2005)
    • Valente, F.1
  • 25
    • 70450151829 scopus 로고    scopus 로고
    • Bayesian Analysis of Speaker Diarization with Eigenvoice Priors
    • Technical Report. Montreal, QC, Canada: CRIM
    • P. Kenny, “Bayesian Analysis of Speaker Diarization with Eigenvoice Priors,” Technical Report. Montreal, QC, Canada: CRIM, 2008.
    • (2008)
    • Kenny, P.1
  • 26
    • 79959818340 scopus 로고    scopus 로고
    • A novel speaker binary key derived from anchor models
    • X. Anguera and J. -F. Bonastre, “A novel speaker binary key derived from anchor models,” in Proc. Interspeech, 2010.
    • (2010) Proc. Interspeech
    • Anguera, X.1    Bonastre, J.-F.2
  • 27
    • 80051641843 scopus 로고    scopus 로고
    • Fast speaker diarization based on binary keys
    • X. Anguera and J. -F. Bonastre, “Fast speaker diarization based on binary keys,” in Proc. ICASSP, 2011.
    • (2011) Proc. ICASSP
    • Anguera, X.1    Bonastre, J.-F.2
  • 30
    • 34548352841 scopus 로고    scopus 로고
    • Friends and enemies: A novel initialization for speaker diarization
    • Sep.
    • X. Anguera, C. Wooters, and J. Hernando, “Friends and enemies: A novel initialization for speaker diarization,” in Proc. ICSLP, Pittsburgh, PA, Sep. 2006.
    • (2006) Proc. ICSLP, Pittsburgh, PA
    • Anguera, X.1    Wooters, C.2    Hernando, J.3
  • 31
    • 84946742526 scopus 로고    scopus 로고
    • A robust speaker clustering algorithm
    • J. Ajmera, “A robust speaker clustering algorithm,” in Proc. ASRU, 2003, pp. 411--416.
    • (2003) Proc. ASRU , pp. 411-416
    • Ajmera, J.1
  • 32
    • 33947706786 scopus 로고    scopus 로고
    • Purity algorithms for speaker diarization of meetings data
    • May
    • X. Anguera, C. Wooters, and J. Hernando, “Purity algorithms for speaker diarization of meetings data,” in Proc. ICASSP, Toulouse, France, May 2006, pp. 1025--1028.
    • (2006) Proc. ICASSP, Toulouse, France , pp. 1025-1028
    • Anguera, X.1    Wooters, C.2    Hernando, J.3
  • 34
    • 0028516097 scopus 로고
    • Text independent speaker identification
    • Oct.
    • H. Gish and M. Schmidt, “Text independent speaker identification,” IEEE Signal Process. Mag., vol. 11, no. 4, pp. 18--32, Oct. 1994.
    • (1994) IEEE Signal Process. Mag. , vol.11 , Issue.4 , pp. 18-32
    • Gish, H.1    Schmidt, M.2
  • 38
    • 33846209880 scopus 로고    scopus 로고
    • The NIST 2004 spring rich transcription evaluation: Two-axis merging strategy in the context of multiple distant microphone based meeting speaker segmentation
    • Montreal, QC, Canada
    • C. Fredouille, D. Moraru, S. Meignier, L. Besacier, and J. -F. Bonastre, “The NIST 2004 spring rich transcription evaluation: Two-axis merging strategy in the context of multiple distant microphone based meeting speaker segmentation,” in Proc. NIST 2004 Spring Rich Transcript. Eval. Workshop, Montreal, QC, Canada, 2004.
    • (2004) Proc. NIST 2004 Spring Rich Transcript. Eval. Workshop
    • Fredouille, C.1    Moraru, D.2    Meignier, S.3    Besacier, L.4    Bonastre, J.-F.5
  • 42
    • 50449086237 scopus 로고    scopus 로고
    • Acoustic beamforming for speaker diarization of meetings
    • Sep.
    • X. Anguera, C. Wooters, and J. Hernando, “Acoustic beamforming for speaker diarization of meetings,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2011--2023, Sep. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 2011-2023
    • Anguera, X.1    Wooters, C.2    Hernando, J.3
  • 46
    • 4344607755 scopus 로고    scopus 로고
    • Likelihood maximizing beam-forming for robust hands-free speech recognition
    • Sep.
    • M. L. Seltzer, B. Raj, and R. M. Stern, “Likelihood maximizing beam-forming for robust hands-free speech recognition,” IEEE Trans. Speech Audio Process., vol. 12, no. 5, pp. 489–498, Sep. 2004.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.5 , pp. 489-498
    • Seltzer, M.L.1    Raj, B.2    Stern, R.M.3
  • 47
    • 0019928857 scopus 로고
    • An alternative approach to linearly constrained adaptive beamforming
    • Jan.
    • L. J. Griffiths and C. W. Jim, “An alternative approach to linearly constrained adaptive beamforming,” IEEE Trans. Antennas Propagat., vol. AP-30, no. 1, pp. 27--34, Jan. 1982.
    • (1982) IEEE Trans. Antennas Propagat. , vol.AP-30 , Issue.1 , pp. 27-34
    • Griffiths, L.J.1    Jim, C.W.2
  • 52
    • 47749103773 scopus 로고    scopus 로고
    • Progress in the AMIDA speaker diarization system for meeting data
    • Baltimore, MD, May 8-11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag
    • D. A. V. Leeuwen and M. Konecn$yA, “Progress in the AMIDA speaker diarization system for meeting data,” in Proc. Multimodal Technol. for Percept. of Humans: Int. Eval Workshops CLEAR 2007 and RT 2007, Baltimore, MD, May 8-11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag, 2008, pp. 475--483.
    • (2008) Proc. Multimodal Technol. for Percept. of Humans: Int. Eval Workshops CLEAR 2007 and RT 2007 , pp. 475-483
    • Leeuwen, D.A.V.1    Konecn$yA, M.2
  • 53
  • 54
    • 34547526911 scopus 로고    scopus 로고
    • Enhanced SVM training for robust speech activity detection
    • A. Temko, D. Macho, and C. Nadeu, “Enhanced SVM training for robust speech activity detection,” in Proc. ICASSP, Honolulu, HI, 2007, pp. 1025--1028.
    • (2007) Proc. ICASSP, Honolulu, HI , pp. 1025-1028
    • Temko, A.1    Macho, D.2    Nadeu, C.3
  • 59
    • 0036816475 scopus 로고    scopus 로고
    • Content analysis for audio classification and segmentation
    • Oct.
    • L. Lu, H. -J. Zhang, and H. Jiang, “Content analysis for audio classification and segmentation,” IEEE Trans. Speech Audio Process., vol. 10, no. 7, pp. 504--516, Oct. 2002.
    • (2002) IEEE Trans. Speech Audio Process. , vol.10 , Issue.7 , pp. 504-516
    • Lu, L.1    Zhang, H.-J.2    Jiang, H.3
  • 60
    • 85008578854 scopus 로고    scopus 로고
    • Improving speaker segmentation via speaker identification and text segmentation
    • Sep.
    • R. Li, Q. Jin, and T. Schultz, “Improving speaker segmentation via speaker identification and text segmentation,” in Proc. Interspeech, Sep. 2009, pp. 3073--3076.
    • (2009) Proc. Interspeech , pp. 3073-3076
    • Li, R.1    Jin, Q.2    Schultz, T.3
  • 61
    • 77951283289 scopus 로고    scopus 로고
    • Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted gmms
    • M. Ben, M. Betser, F. Bimbot, and G. Gravier, “Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted gmms,” in Proc. ICSLP, Jeju Island, Korea, 2004.
    • (2004) Proc. ICSLP, Jeju Island, Korea
    • Ben, M.1    Betser, M.2    Bimbot, F.3    Gravier, G.4
  • 62
    • 77249176190 scopus 로고    scopus 로고
    • The AMI speaker diarization system for NIST RT06s meeting data
    • Berlin, Germany: Springer-Verlag, Lecture Notes in Computer Science
    • D. Van Leeuwen and M. Huijbregts, “The AMI speaker diarization system for NIST RT06s meeting data,” in Machine Learning for Multimodal Interaction. Berlin, Germany: Springer-Verlag, 2007, vol. 4299, Lecture Notes in Computer Science, pp. 371--384.
    • (2007) Machine Learning for Multimodal Interaction , vol.4299 , pp. 371-384
    • Van Leeuwen, D.1    Huijbregts, M.2
  • 64
    • 0034857759 scopus 로고    scopus 로고
    • Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition
    • K. Mori and S. Nakagawa, “Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition,” in Proc. ICASSP, 2001, pp. 413--416.
    • (2001) Proc. ICASSP , pp. 413-416
    • Mori, K.1    Nakagawa, S.2
  • 65
    • 3543144948 scopus 로고    scopus 로고
    • Robust speaker change detection
    • J. Ajmera and I. McCowan, “Robust speaker change detection,” IEEE Signal Process. Lett., vol. 11, pp. 649--651, 2004.
    • (2004) IEEE Signal Process. Lett. , vol.11 , pp. 649-651
    • Ajmera, J.1    McCowan, I.2
  • 66
    • 33645326073 scopus 로고    scopus 로고
    • Real-time unsupervised speaker change detection
    • L. Lu and H. -J. Zhang, “Real-time unsupervised speaker change detection,” in 16th Int. Conf. Pattern Recognit., 2002, vol. 2, pp. 358--361.
    • (2002) 16th Int. Conf. Pattern Recognit. , vol.2 , pp. 358-361
    • Lu, L.1    Zhang, H.-J.2
  • 67
    • 85008578884 scopus 로고    scopus 로고
    • Evolutive speaker segmentation using a repository system
    • X. Anguera and J. Hernando, “Evolutive speaker segmentation using a repository system,” in Proc. Interspeech, 2004.
    • (2004) Proc. Interspeech
    • Anguera, X.1    Hernando, J.2
  • 68
    • 33846242627 scopus 로고    scopus 로고
    • Speaker diarization for multi-party meetings using acoustic fusion
    • Nov.
    • X. Anguera, C. Wooters, and J. Hernando, “Speaker diarization for multi-party meetings using acoustic fusion,” in Proc. ASRU, Nov. 2005, pp. 426--431.
    • (2005) Proc. ASRU , pp. 426-431
    • Anguera, X.1    Wooters, C.2    Hernando, J.3
  • 69
    • 33746354301 scopus 로고    scopus 로고
    • Unsupervised speaker change detection using probabilistic pattern matching
    • Aug.
    • A. Malegaonkar, A. Ariyaeeinia, P. Sivakumaran, and J. Fortuna, “Unsupervised speaker change detection using probabilistic pattern matching,” IEEE Signal Process. Lett., vol. 13, no. 8, pp. 509--512, Aug. 2006.
    • (2006) IEEE Signal Process. Lett. , vol.13 , Issue.8 , pp. 509-512
    • Malegaonkar, A.1    Ariyaeeinia, A.2    Sivakumaran, P.3    Fortuna, J.4
  • 70
    • 0026400244 scopus 로고
    • Segregation of speakers for speech recognition and speaker identification
    • M. -H. Siu, G. Yu, and H. Gish, “Segregation of speakers for speech recognition and speaker identification,” in Proc. ICASSP′91, 1991, pp. 873--876.
    • (1991) Proc. ICASSP′91 , pp. 873-876
    • Siu, M.-H.1    Yu, G.2    Gish, H.3
  • 71
    • 0034273195 scopus 로고    scopus 로고
    • DISTBIC: A speaker-based segmentation for audio data indexing
    • P. Delacourt and C. Wellekens, “DISTBIC: A speaker-based segmentation for audio data indexing,” Speech Commun., pp. 111--126, 2000.
    • (2000) Speech Commun. , pp. 111-126
    • Delacourt, P.1    Wellekens, C.2
  • 72
    • 84867210169 scopus 로고    scopus 로고
    • Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling
    • S. S. Han and K. J. Narayanan, “Agglomerative hierarchical speaker clustering using incremental Gaussian mixture cluster modeling,” in Proc. Interspeech′08, Brisbane, Australia, 2008, pp. 20--23.
    • (2008) Proc. Interspeech′08, Brisbane, Australia , pp. 20-23
    • Han, S.S.1    Narayanan, K.J.2
  • 74
    • 85119434191 scopus 로고    scopus 로고
    • Fast speaker change detection for broadcast news transcription and indexing
    • Sep.
    • D. Liu and F. Kubala, “Fast speaker change detection for broadcast news transcription and indexing,” in Proc. Eurospeech′99, Sep. 1999, pp. 1031–1034.
    • (1999) Proc. Eurospeech′99 , pp. 1031-1034
    • Liu, D.1    Kubala, F.2
  • 75
    • 0002782496 scopus 로고    scopus 로고
    • Automatic segmentation, classification and clustering of broadcast news audio
    • M. A. Siegler, U. Jain, B. Raj, and R. M. Stern, “Automatic segmentation, classification and clustering of broadcast news audio,” in Proc. DARPA Speech Recognit. Workshop, 1997, pp. 97–99.
    • (1997) Proc. DARPA Speech Recognit. Workshop , pp. 97-99
    • Siegler, M.A.1    Jain, U.2    Raj, B.3    Stern, R.M.4
  • 77
    • 70350349017 scopus 로고    scopus 로고
    • Speaker diarization: From broadcast news to lectures
    • X. Zhu, C. Barras, L. Lamel, and J. -L. Gauvain, “Speaker diarization: From broadcast news to lectures,” in Proc. MLMI, 2006, pp. 396–406.
    • (2006) Proc. MLMI , pp. 396-406
    • Zhu, X.1    Barras, C.2    Lamel, L.3    Gauvain, J.-L.4
  • 78
    • 51449100003 scopus 로고    scopus 로고
    • Novel inter-cluster distance measure combining GLR and ICR for improved agglomerative hierarchical speaker clustering
    • Apr.
    • K. Han and S. Narayanan, “Novel inter-cluster distance measure combining GLR and ICR for improved agglomerative hierarchical speaker clustering,” in Proc. ICASSP, Apr. 2008, pp. 4373–4376.
    • (2008) Proc. ICASSP , pp. 4373-4376
    • Han, K.1    Narayanan, S.2
  • 79
    • 77956276107 scopus 로고    scopus 로고
    • Experiments on speakertracking and segmentation in radio broadcast news
    • D. Moraru, M. Ben, and G. Gravier, “Experiments on speakertracking and segmentation in radio broadcast news,” in Proc. ICSLP, 2005.
    • (2005) Proc. ICSLP
    • Moraru, D.1    Ben, M.2    Gravier, G.3
  • 81
    • 64249126167 scopus 로고    scopus 로고
    • Trainable speaker diarization
    • Aug.
    • H. Aronowitz, “Trainable speaker diarization,” in Proc. Interspeech, Aug. 2007, pp. 1861–1864.
    • (2007) Proc. Interspeech , pp. 1861-1864
    • Aronowitz, H.1
  • 85
    • 47749127366 scopus 로고    scopus 로고
    • Speaker diarization for conference room: The UPC RT07s evaluation system
    • Baltimore, MD, May 8-11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag
    • J. Luque, X. Anguera, A. Temko, and J. Hernando, “Speaker diarization for conference room: The UPC RT07s evaluation system,” in Proc. Multimodal Technol. Perception of Humans: Int. Eval. Workshops CLEAR 2007 and RT 2007, Baltimore, MD, May 8-11, 2007, Revised Selected Papers, Berlin, Heidelberg: Springer-Verlag, 2008, pp. 543–553.
    • (2008) Proc. Multimodal Technol. Perception of Humans: Int. Eval. Workshops CLEAR 2007 and RT 2007 , pp. 543-553
    • Luque, J.1    Anguera, X.2    Temko, A.3    Hernando, J.4
  • 86
    • 34548351229 scopus 로고    scopus 로고
    • Speaker diarization for multiple distant microphone meetings: Mixing acoustic features and interchannel time differences
    • J. Pardo, X. Anguera, and C. Wooters, “Speaker diarization for multiple distant microphone meetings: Mixing acoustic features and interchannel time differences,” in Proc. Interspeech, 2006.
    • (2006) Proc. Interspeech
    • Pardo, J.1    Anguera, X.2    Wooters, C.3
  • 87
    • 0141591540 scopus 로고    scopus 로고
    • Location based speaker segmentation
    • G. Lathoud and I. M. Cowan, “Location based speaker segmentation,” in Proc. ICASSP, 2003, vol. 1, pp. 176–179.
    • (2003) Proc. ICASSP , vol.1 , pp. 176-179
    • Lathoud, G.1    Cowan, I.M.2
  • 88
    • 33746619064 scopus 로고    scopus 로고
    • Speaker turn detection based on between-chan-nels differences
    • D. Ellis and J. C. Liu, “Speaker turn detection based on between-chan-nels differences,” in Proc. ICASSP, 2004.
    • (2004) Proc. ICASSP
    • Ellis, D.1    Liu, J.C.2
  • 89
    • 4544339441 scopus 로고    scopus 로고
    • Clustering and segmenting speakers and their locations in meetings
    • J. Ajmera, G. Lathoud, and L. McCowan, “Clustering and segmenting speakers and their locations in meetings,” in Proc. ICASSP, 2004, vol. 1, pp. 605–608.
    • (2004) Proc. ICASSP , vol.1 , pp. 605-608
    • Ajmera, J.1    Lathoud, G.2    McCowan, L.3
  • 90
    • 34548351229 scopus 로고    scopus 로고
    • Speaker diarization for multiple distant microphone meetings: Mixing acoustic features and inter-channel time differences
    • J. M. Pardo, X. Anguera, and C. Wooters, “Speaker diarization for multiple distant microphone meetings: Mixing acoustic features and inter-channel time differences,” in Proc. Interspeech, 2006.
    • (2006) Proc. Interspeech
    • Pardo, J.M.1    Anguera, X.2    Wooters, C.3
  • 91
    • 34548310397 scopus 로고    scopus 로고
    • Speaker diarization for mul-tiple-distant-microphone meetings using several sources of information
    • Sep.
    • J. Pardo, X. Anguera, and C. Wooters, “Speaker diarization for mul-tiple-distant-microphone meetings using several sources of information,” IEEE Trans. Comput., vol. 56, no. 9, pp. 1212–1224, Sep. 2007.
    • (2007) IEEE Trans. Comput. , vol.56 , Issue.9 , pp. 1212-1224
    • Pardo, J.1    Anguera, X.2    Wooters, C.3
  • 92
    • 70349220969 scopus 로고    scopus 로고
    • Speaker diarization using unsupervised discriminant analysis of inter-channel delay features
    • Apr.
    • N. W. D. Evans, C. Fredouille, and J. -F. Bonastre, “Speaker diarization using unsupervised discriminant analysis of inter-channel delay features,” in Proc. ICASSP, Apr. 2009, pp. 4061–4064.
    • (2009) Proc. ICASSP , pp. 4061-4064
    • Evans, N.W.D.1    Fredouille, C.2    Bonastre, J.-F.3
  • 93
    • 70450179598 scopus 로고    scopus 로고
    • Speaker identification using warped MVDR cepstral features
    • M. Wölfel, Q. Yang, Q. Jin, and T. Schultz, “Speaker identification using warped MVDR cepstral features,” in Proc. Interspeech, 2009.
    • (2009) Proc. Interspeech
    • Wölfel, M.1    Yang, Q.2    Jin, Q.3    Schultz, T.4
  • 94
    • 63649094710 scopus 로고    scopus 로고
    • Higher-level features in speaker recognition
    • C. MÜller, Ed. Berlin, Heidelberg, Germany: Springer, Lecture Notes in Artificial Intelligence
    • E. Shriberg, “Higher-level features in speaker recognition,” in Speaker Classification I, C. MÜller, Ed. Berlin, Heidelberg, Germany: Springer, 2007, vol. 4343, Lecture Notes in Artificial Intelligence.
    • (2007) Speaker Classification I , vol.4343
    • Shriberg, E.1
  • 95
    • 77956543877 scopus 로고    scopus 로고
    • Tuning-robust initialization methods for speaker diarization
    • Nov.
    • D. Imseng and G. Friedland, “Tuning-robust initialization methods for speaker diarization,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2028–2037, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2028-2037
    • Imseng, D.1    Friedland, G.2
  • 97
    • 85009145345 scopus 로고    scopus 로고
    • Observations on overlap: Findings and implications for automatic processing of multi-party conversations
    • E. Shriberg, A. Stolcke, and D. Baron, “Observations on overlap: Findings and implications for automatic processing of multi-party conversations,” in Proc. Eurospeech′01, Aalborg, Denmark, 2001, pp. 1359–1362.
    • (2001) Proc. Eurospeech′01, Aalborg, Denmark , pp. 1359-1362
    • Shriberg, E.1    Stolcke, A.2    Baron, D.3
  • 98
    • 33947640630 scopus 로고    scopus 로고
    • Speaker overlaps and ASR errors in meetings: Effects before, during, and after the overlap
    • O. Çetin and E. Shriberg, “Speaker overlaps and ASR errors in meetings: Effects before, during, and after the overlap,” in Proc. ICASSP, Toulouse, France, 2006, pp. 357–360.
    • (2006) Proc. ICASSP, Toulouse, France , pp. 357-360
    • Çetin, O.1    Shriberg, E.2
  • 99
    • 51449111990 scopus 로고    scopus 로고
    • Overlapped speech detection for improved speaker diarization in multiparty meetings
    • K. Boakye, B. Trueba-Hornero, O. Vinyals, and G. Friedland, “Overlapped speech detection for improved speaker diarization in multiparty meetings,” in Proc. ICASSP, 2008, pp. 4353–4356.
    • (2008) Proc. ICASSP , pp. 4353-4356
    • Boakye, K.1    Trueba-Hornero, B.2    Vinyals, O.3    Friedland, G.4
  • 100
    • 78649290108 scopus 로고    scopus 로고
    • Handling overlapped speech in speaker diariza-tion
    • M. S. thesis, Univ. Politecnica de Catalunya, Barcelona, Spain
    • B. Trueba-Hornero, “Handling overlapped speech in speaker diariza-tion,” M. S. thesis, Univ. Politecnica de Catalunya, Barcelona, Spain, 2008.
    • (2008)
    • Trueba-Hornero, B.1
  • 101
    • 84890528960 scopus 로고    scopus 로고
    • Audio segmentation for meetings speech processing
    • Ph. D. dissertation, Univ. of California, Berkeley
    • K. Boakye, “Audio segmentation for meetings speech processing,” Ph. D. dissertation, Univ. of California, Berkeley, 2008.
    • (2008)
    • Boakye, K.1
  • 102
    • 44849101173 scopus 로고    scopus 로고
    • Efficient use of overlap information in speaker diarization
    • S. Otterson and M. Ostendorf, “Efficient use of overlap information in speaker diarization,” in Proc. ASRU, Kyoto, Japan, 2007, pp. 686–686.
    • (2007) Proc. ASRU, Kyoto, Japan , pp. 686
    • Otterson, S.1    Ostendorf, M.2
  • 103
    • 0032136330 scopus 로고    scopus 로고
    • Robust speech recognition using the modulation spectrogram
    • B. E. D. Kingsbury, N. Morgan, and S. Greenberg, “Robust speech recognition using the modulation spectrogram,” Speech Commun., vol. 25, no. 1-3, pp. 117–132, 1998.
    • (1998) Speech Commun. , vol.25 , Issue.1-3 , pp. 117-132
    • Kingsbury, B.E.D.1    Morgan, N.2    Greenberg, S.3
  • 104
    • 35248827017 scopus 로고    scopus 로고
    • Speaker localization using audiovisual synchrony: An empirical study
    • H. J. Nock, G. Iyengar, and C. Neti, “Speaker localization using audiovisual synchrony: An empirical study,” Lecture Notes in Comput. Sci., vol. 2728, pp. 565–570, 2003.
    • (2003) Lecture Notes in Comput. Sci. , vol.2728 , pp. 565-570
    • Nock, H.J.1    Iyengar, G.2    Neti, C.3
  • 107
    • 0031268341 scopus 로고    scopus 로고
    • Factorial hidden Markov models
    • Nov.
    • Z. Ghahramani and M. I. Jordan, “Factorial hidden Markov models,” Mach. Learn., vol. 29, pp. 245–273, Nov. 1997.
    • (1997) Mach. Learn. , vol.29 , pp. 245-273
    • Ghahramani, Z.1    Jordan, M.I.2
  • 109
    • 1542572925 scopus 로고    scopus 로고
    • Multi-modal speech recognition using optical-flow analysis for lip images
    • S. Tamura, K. Iwano, and S. Furui, “Multi-modal speech recognition using optical-flow analysis for lip images,” Real World Speech Process., vol. 36, no. 2-3, pp. 117–124, 2004.
    • (2004) Real World Speech Process. , vol.36 , Issue.2-3 , pp. 117-124
    • Tamura, S.1    Iwano, K.2    Furui, S.3
  • 110
    • 0029746565 scopus 로고    scopus 로고
    • Cross-modal prediction in audio-visual communication
    • T. Chen and R. Rao, “Cross-modal prediction in audio-visual communication,” in Proc. ICASSP, 1996, vol. 4, pp. 2056–2059.
    • (1996) Proc. ICASSP , vol.4 , pp. 2056-2059
    • Chen, T.1    Rao, R.2
  • 111
    • 0009622481 scopus 로고    scopus 로고
    • Learningjoint statistical models for audio-visual fusion and segregation
    • J. W. Fisher, T. Darrell, W. T. Freeman, and P. A. Viola, “Learningjoint statistical models for audio-visual fusion and segregation,” in Proc. NIPS, 2000, pp. 772–778.
    • (2000) Proc. NIPS , pp. 772-778
    • Fisher, J.W.1    Darrell, T.2    Freeman, W.T.3    Viola, P.A.4
  • 112
    • 2642562769 scopus 로고    scopus 로고
    • Speaker association with signal-level audiovisual fusion
    • Jun.
    • J. W. Fisher and T. Darrell, “Speaker association with signal-level audiovisual fusion,” IEEE Trans. Multimedia, vol. 6, no. 3, pp. 406–413, Jun. 2004.
    • (2004) IEEE Trans. Multimedia , vol.6 , Issue.3 , pp. 406-413
    • Fisher, J.W.1    Darrell, T.2
  • 113
    • 41549121431 scopus 로고    scopus 로고
    • Exploiting audio-visual correlation in coding of talking head sequences
    • Mar.
    • R. Rao and T. Chen, “Exploiting audio-visual correlation in coding of talking head sequences,” in Proc. Int. Picture Coding Symp., Mar. 1996.
    • (1996) Proc. Int. Picture Coding Symp.
    • Rao, R.1    Chen, T.2
  • 114
    • 34547527871 scopus 로고    scopus 로고
    • Dynamic dependency tests for audio-visual speaker association
    • Apr.
    • M. Siracusa and J. Fisher, “Dynamic dependency tests for audio-visual speaker association,” in Proc. ICASSP, Apr. 2007, pp. 457–460.
    • (2007) Proc. ICASSP , pp. 457-460
    • Siracusa, M.1    Fisher, J.2
  • 115
    • 0036299249 scopus 로고    scopus 로고
    • CUAVE: A new audio-visual database for multimodal human-computer interface research
    • E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy, “CUAVE: A new audio-visual database for multimodal human-computer interface research,” in Proc. ICASSP, 2002, pp. 2017–2020.
    • (2002) Proc. ICASSP , pp. 2017-2020
    • Patterson, E.K.1    Gurbuz, S.2    Tufekci, Z.3    Gowdy, J.N.4
  • 119
    • 72449135653 scopus 로고    scopus 로고
    • Working with very sparse data to detect speaker and listener participation in a meetings corpus
    • May
    • N. Campbell and N. Suzuki, “Working with very sparse data to detect speaker and listener participation in a meetings corpus,” in Proc. Workshop Programme, May 2006, vol. 10.
    • (2006) Proc. Workshop Programme , vol.10
    • Campbell, N.1    Suzuki, N.2
  • 120
    • 70349214881 scopus 로고    scopus 로고
    • Multimodal speaker diarization of real-world meetings using compressed-domain video features
    • Apr.
    • G. Friedland, H. Hung, and C. Yeo, “Multimodal speaker diarization of real-world meetings using compressed-domain video features,” in Proc. ICASSP, Apr. 2009, pp. 4069–4072.
    • (2009) Proc. ICASSP , pp. 4069-4072
    • Friedland, G.1    Hung, H.2    Yeo, C.3
  • 123
    • 51449095036 scopus 로고    scopus 로고
    • Combination of ag-glomerative and sequential clustering for speaker diarization
    • D. Vijayasenan, F. Valente, and H. Bourlard, “Combination of ag-glomerative and sequential clustering for speaker diarization,” in Proc. ICASSP, Las Vegas, NV, 2008, pp. 4361–4364.
    • (2008) Proc. ICASSP, Las Vegas, NV , pp. 4361-4364
    • Vijayasenan, D.1    Valente, F.2    Bourlard, H.3
  • 124
    • 79959849996 scopus 로고    scopus 로고
    • Speaker diarization: Combination of the LIUM and IRIT systems
    • E. El-Khoury, C. Senac, and S. Meignier, “Speaker diarization: Combination of the LIUM and IRIT systems,” in Internal Report, 2008.
    • (2008) Internal Report
    • El-Khoury, E.1    Senac, C.2    Meignier, S.3
  • 125
    • 36749008026 scopus 로고    scopus 로고
    • Combining Gaussianized/non-Gaussianized features to improve speaker diarization of telephone conversations
    • Dec.
    • V. Gupta, P. Kenny, P. Ouellet, G. Boulianne, and P. Dumouchel, “Combining Gaussianized/non-Gaussianized features to improve speaker diarization of telephone conversations,” in IEEE Signal Process. Lett., Dec. 2007, vol. 14, no. 12, pp. 1040–1043.
    • (2007) IEEE Signal Process. Lett. , vol.14 , Issue.12 , pp. 1040-1043
    • Gupta, V.1    Kenny, P.2    Ouellet, P.3    Boulianne, G.4    Dumouchel, P.5
  • 126
    • 0001120413 scopus 로고
    • A Bayesian analysis of some nonparametric problems
    • T. S. Ferguson, “A Bayesian analysis of some nonparametric problems,” Ann. Statist., vol. 1, no. 2, pp. 209–230, 1973.
    • (1973) Ann. Statist. , vol.1 , Issue.2 , pp. 209-230
    • Ferguson, T.S.1
  • 130
    • 84869766113 scopus 로고    scopus 로고
    • The blame game: Performance analysis of speaker diarization system components
    • Aug.
    • M. Huijbregts and C. Wooters, “The blame game: Performance analysis of speaker diarization system components,” in Proc. Interspeech, Aug. 2007, pp. 1857--1860.
    • (2007) Proc. Interspeech , pp. 1857-1860
    • Huijbregts, M.1    Wooters, C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.