메뉴 건너뛰기




Volumn 14, Issue 5, 2006, Pages 1557-1565

An overview of automatic speaker diarization systems

Author keywords

Speaker diarization; Speaker segmentation and clustering

Indexed keywords

INPUT AUDIO CHANNELS; SIGNAL ENERGY; SPEAKER DIARIZATION; SPEAKER SEGMENTATION AND CLUSTERING;

EID: 34047261805     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2006.878256     Document Type: Review
Times cited : (542)

References (58)
  • 2
    • 0029765670 scopus 로고    scopus 로고
    • Real-time discrimination of broadcast speech/music
    • Atlanta, GA, May
    • J. Saunders, "Real-time discrimination of broadcast speech/music," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. II, Atlanta, GA, May 1996, pp. 993-996.
    • (1996) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.2 , pp. 993-996
    • Saunders, J.1
  • 3
    • 0032181880 scopus 로고    scopus 로고
    • Audio feature extraction and analysis for scene segmentation and classification
    • Oct
    • Z. Liu, Y. Wang, and T. Chen, "Audio feature extraction and analysis for scene segmentation and classification," J. VLSI Signal Process. Syst., vol. 20, no. 1-2, pp. 61-79, Oct. 1998.
    • (1998) J. VLSI Signal Process. Syst , vol.20 , Issue.1-2 , pp. 61-79
    • Liu, Z.1    Wang, Y.2    Chen, T.3
  • 4
    • 0033677117 scopus 로고    scopus 로고
    • A method for direct audio search with applications to indexing and retrieval
    • Istanbul, Turkey, Jun
    • S. E. Johnson and P. C. Woodland, "A method for direct audio search with applications to indexing and retrieval," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 3, Istanbul, Turkey, Jun. 2000, pp. 1427-1430.
    • (2000) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.3 , pp. 1427-1430
    • Johnson, S.E.1    Woodland, P.C.2
  • 9
    • 85009110150 scopus 로고    scopus 로고
    • Speaker recognition in a multi-speaker environment
    • Aalborg, Denmark, Sep
    • A. Martin and M. Przybocki, "Speaker recognition in a multi-speaker environment," in Proc. Eur. Conf. Speech Commun. Technol., vol. 2, Aalborg, Denmark, Sep. 2001, pp. 787-790.
    • (2001) Proc. Eur. Conf. Speech Commun. Technol , vol.2 , pp. 787-790
    • Martin, A.1    Przybocki, M.2
  • 11
    • 34047268275 scopus 로고    scopus 로고
    • Toward Robust speaker segmentation: The ICSI-SRI Fall 2004 Diarization System
    • Palisades, NY, Nov, Online, Available
    • C. Wooters, J. Fung, B. Peskin, and X. Anguera, "Toward Robust speaker segmentation: The ICSI-SRI Fall 2004 Diarization System," in Proc. Fall 2004 Rich Transcription Workshop (RT-04), Palisades, NY, Nov. 2004, [Online]. Available: http://www.icsi.berkeley.edu/cgi-bin/pubs/ publication.pl?ID=000100.
    • (2004) Proc. Fall 2004 Rich Transcription Workshop (RT-04)
    • Wooters, C.1    Fung, J.2    Peskin, B.3    Anguera, X.4
  • 12
    • 34047267089 scopus 로고    scopus 로고
    • P. Nguyen, L. Rigazio, Y. Moh, and J. C. Junqua. Rich transcription 2002 site report. Panasonic speech technology laboratory (PSTL). presented at Proc. Rich Transcription Workshop (RT-02). [Online]. Available: http://www.nist.gov/speech/tests/rt/rt2002/presentations/rt02.pdf
    • P. Nguyen, L. Rigazio, Y. Moh, and J. C. Junqua. Rich transcription 2002 site report. Panasonic speech technology laboratory (PSTL). presented at Proc. Rich Transcription Workshop (RT-02). [Online]. Available: http://www.nist.gov/speech/tests/rt/rt2002/presentations/rt02.pdf
  • 13
    • 85128356454 scopus 로고    scopus 로고
    • Partitioning and transcription of broadcast news data
    • Sydney, Australia, Dec
    • J.-L. Gauvain, L. Lamel, and G. Adda, "Partitioning and transcription of broadcast news data," in Proc. Int. Conf. Spoken Lang. Process., vol. 4, Sydney, Australia, Dec. 1998, pp. 1335-1338.
    • (1998) Proc. Int. Conf. Spoken Lang. Process , vol.4 , pp. 1335-1338
    • Gauvain, J.-L.1    Lamel, L.2    Adda, G.3
  • 14
    • 33745185104 scopus 로고    scopus 로고
    • Combining speaker identification and BIC for speaker diarization
    • Lisbon, Portugal, Sep
    • X. Zhu, C. Barras, S. Meignier, and J.-L. Gauvain, "Combining speaker identification and BIC for speaker diarization," in Proc. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal, Sep. 2005, pp. 2441-2444.
    • (2005) Proc. Eur. Conf. Speech Commun. Technol , pp. 2441-2444
    • Zhu, X.1    Barras, C.2    Meignier, S.3    Gauvain, J.-L.4
  • 15
    • 34047264090 scopus 로고    scopus 로고
    • The MIT Lincoln Laboratory RT-04F diarization systems: Applications to broadcast audio and telephone conversations
    • Palisades, NY, Nov
    • D. A. Reynolds and P. Torres-Carrasquillo, "The MIT Lincoln Laboratory RT-04F diarization systems: Applications to broadcast audio and telephone conversations," in Proc. Fall 2004 Rich Transcription Workshop (RT-04), Palisades, NY, Nov. 2004.
    • (2004) Proc. Fall 2004 Rich Transcription Workshop (RT-04)
    • Reynolds, D.A.1    Torres-Carrasquillo, P.2
  • 18
    • 85119434191 scopus 로고    scopus 로고
    • Fast speaker change detection for broadcast news transcription and indexing
    • Budapest, Hungary, Sep
    • D. Liu and F. Kubala, "Fast speaker change detection for broadcast news transcription and indexing," in Proc. Eur. Conf. Speech Commun. Technol., vol. III, Budapest, Hungary, Sep. 1999, pp. 1031-1034.
    • (1999) Proc. Eur. Conf. Speech Commun. Technol , vol.3 , pp. 1031-1034
    • Liu, D.1    Kubala, F.2
  • 19
    • 29044442235 scopus 로고    scopus 로고
    • Step-by-Step and integrated approaches in broadcast news speaker diarization
    • to be published, Sep
    • S. Meignier, D. Moraru, C. Fredouille, J.-F. Bonastre, and L. Besacier, "Step-by-Step and integrated approaches in broadcast news speaker diarization," Comput. Speech Lang., no. 20, pp. 303-330, Sep. 2005, to be published.
    • (2005) Comput. Speech Lang , Issue.20 , pp. 303-330
    • Meignier, S.1    Moraru, D.2    Fredouille, C.3    Bonastre, J.-F.4    Besacier, L.5
  • 21
    • 4544280424 scopus 로고    scopus 로고
    • Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
    • Montreal, QC, Canada, May
    • S. E. Tranter, K. Yu, G. Evermann, and P. C. Woodland, "Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech," in Proc. ICASSP, vol. I, Montreal, QC, Canada, May 2004, pp. 753-756.
    • (2004) Proc. ICASSP , vol.1 , pp. 753-756
    • Tranter, S.E.1    Yu, K.2    Evermann, G.3    Woodland, P.C.4
  • 22
    • 4544259164 scopus 로고    scopus 로고
    • A cross-channel modeling approach for automatic segmentation of conversational telephone speech
    • St. Thomas, U.S. Virgin Islands, Dec
    • D. Liu and F. Kubala, "A cross-channel modeling approach for automatic segmentation of conversational telephone speech," in Proc. IEEE ASRU Workshop, St. Thomas, U.S. Virgin Islands, Dec. 2003, pp. 333-338.
    • (2003) Proc. IEEE ASRU Workshop , pp. 333-338
    • Liu, D.1    Kubala, F.2
  • 26
    • 0141469852 scopus 로고    scopus 로고
    • Multispeaker speech activity detection for the ICSI meeting recorder
    • Trento, Italy, Dec
    • T. Pfau, D. Ellis, and A. Stolcke, "Multispeaker speech activity detection for the ICSI meeting recorder," in Proc. IEEE ASRU Workshop, Trento, Italy, Dec. 2001, pp. 107-110.
    • (2001) Proc. IEEE ASRU Workshop , pp. 107-110
    • Pfau, T.1    Ellis, D.2    Stolcke, A.3
  • 29
    • 85009089453 scopus 로고    scopus 로고
    • Unsupervised audio stream segmentation and clustering via the Bayesian information criterion
    • Beijing, China, Oct
    • B. Zhou and J. Hansen, "Unsupervised audio stream segmentation and clustering via the Bayesian information criterion," in Proc. Int. Conf. Spoken Language Process., vol. 3, Beijing, China, Oct. 2000, pp. 714-717.
    • (2000) Proc. Int. Conf. Spoken Language Process , vol.3 , pp. 714-717
    • Zhou, B.1    Hansen, J.2
  • 30
    • 0002782496 scopus 로고    scopus 로고
    • Automatic segmentation, classification and clustering of broadcast news
    • Chantilly, VA, Feb
    • M. A. Siegler, U. Jain, B. Raj, and R. M. Stem, "Automatic segmentation, classification and clustering of broadcast news," in Proc. DARPA Speech Recognition Workshop, Chantilly, VA, Feb. 1997, pp. 97-99.
    • (1997) Proc. DARPA Speech Recognition Workshop , pp. 97-99
    • Siegler, M.A.1    Jain, U.2    Raj, B.3    Stem, R.M.4
  • 35
    • 34047268274 scopus 로고    scopus 로고
    • D. Moraru, S. Meignier, L. Besacier, J.-F. Bonastre, and I. Magrin-Chagnolleau. The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation, presented at Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.. [Online]. Available: http://www.lia.univ-avignon.fr/fich.art/339-moricassp2003.pdf
    • D. Moraru, S. Meignier, L. Besacier, J.-F. Bonastre, and I. Magrin-Chagnolleau. The ELISA consortium approaches in speaker segmentation during the NIST 2002 speaker recognition evaluation, presented at Proc. IEEE Int. Conf. Acoust., Speech, Signal Process.. [Online]. Available: http://www.lia.univ-avignon.fr/fich.art/339-moricassp2003.pdf
  • 36
    • 34047263722 scopus 로고    scopus 로고
    • Segmentation, classification and clustering of an Italian corpus
    • Paris, France, Apr, Online, Available
    • M. Cettolo, "Segmentation, classification and clustering of an Italian corpus," in Proc. Recherche d'Information Assisté par Ordinateur (RIAO), Paris, France, Apr. 2000, [Online]. Available: http://munst.itc.it/people/cettolo/papers/riao00a.ps.gz.
    • (2000) Proc. Recherche d'Information Assisté par Ordinateur (RIAO)
    • Cettolo, M.1
  • 37
    • 84946742526 scopus 로고    scopus 로고
    • J. Ajmera and C. Wooters, A Robust Speaker Clustering Algorithm, in Proc. IEEE ASRU Workshop, St Thomas, U.S. Virgin Islands, Nov. 2003, pp. 411-416.
    • J. Ajmera and C. Wooters, "A Robust Speaker Clustering Algorithm," in Proc. IEEE ASRU Workshop, St Thomas, U.S. Virgin Islands, Nov. 2003, pp. 411-416.
  • 38
    • 33745219648 scopus 로고    scopus 로고
    • The development of the Cambridge University RT-04 diarization system
    • Palisades, NY, Nov, Online, Available
    • S. E. Tranter, M. J. F. Gales, R. Sinha, S. Umesh, and P. C. Woodland, "The development of the Cambridge University RT-04 diarization system," in Proc. Fall 2004 Rich Transcription Workshop (RT-04), Palisades, NY, Nov. 2004, [Online]. Available: http://mi.eng.cam.ac.uk/reports/ abstracts/tranter_rt04.html.
    • (2004) Proc. Fall 2004 Rich Transcription Workshop (RT-04)
    • Tranter, S.E.1    Gales, M.J.F.2    Sinha, R.3    Umesh, S.4    Woodland, P.C.5
  • 40
    • 77951283289 scopus 로고    scopus 로고
    • Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs
    • Jeju Island, Korea, Oct
    • M. Ben, M. Betser, F. Bimbot, and G. Gravier, "Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs," in Proc. Int. Conf. Spoken Language Processing, Jeju Island, Korea, Oct. 2004, pp. 2329-2332.
    • (2004) Proc. Int. Conf. Spoken Language Processing , pp. 2329-2332
    • Ben, M.1    Betser, M.2    Bimbot, F.3    Gravier, G.4
  • 45
    • 0141702107 scopus 로고    scopus 로고
    • Feature and score normalization for speaker verification of cellular data
    • Hong Kong, China, Apr
    • C. Barras and J.-L. Gauvain, "Feature and score normalization for speaker verification of cellular data," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. II, Hong Kong, China, Apr. 2003, pp. 49-52.
    • (2003) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.2 , pp. 49-52
    • Barras, C.1    Gauvain, J.-L.2
  • 47
    • 33947677676 scopus 로고    scopus 로고
    • Who really spoke when? - Finding speaker turns and identities in audio
    • Toulouse, France, May
    • S. E. Tranter, "Who really spoke when? - Finding speaker turns and identities in audio," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. I, Toulouse, France, May 2006, pp. 1013-1016.
    • (2006) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.1 , pp. 1013-1016
    • Tranter, S.E.1
  • 49
    • 4544361649 scopus 로고    scopus 로고
    • The ELISA consortium approaches in speaker segmentation during the NIST 2003 Rich Transcription evaluation
    • Montreal, QC, Canada, May
    • D. Moraru, S. Meignier, C. Fredouille, L. Besacier, and J.-F. Donastre, "The ELISA consortium approaches in speaker segmentation during the NIST 2003 Rich Transcription evaluation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, Montreal, QC, Canada, May 2004, pp. 373-376.
    • (2004) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.1 , pp. 373-376
    • Moraru, D.1    Meignier, S.2    Fredouille, C.3    Besacier, L.4    Donastre, J.-F.5
  • 50
    • 33646790196 scopus 로고    scopus 로고
    • Two-way cluster voting to improve speaker diarization performance
    • Philadelphia, PA, Mar
    • S. E. Tranter, "Two-way cluster voting to improve speaker diarization performance," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. I, Philadelphia, PA, Mar. 2005, pp. 753-756.
    • (2005) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.1 , pp. 753-756
    • Tranter, S.E.1
  • 51
    • 33745212266 scopus 로고    scopus 로고
    • Online speaker adaptation and tracking for real-time speech recognition
    • Lisbon, Portugal, Sep
    • D. Liu, D. Kiecza, A. Srivastava, and F. Kubala, "Online speaker adaptation and tracking for real-time speech recognition," in Proc. Eur. Conf. Speech Commun. Technol., Lisbon, Portugal, Sep. 2005, pp. 281-284.
    • (2005) Proc. Eur. Conf. Speech Commun. Technol , pp. 281-284
    • Liu, D.1    Kiecza, D.2    Srivastava, A.3    Kubala, F.4
  • 53
    • 85009080849 scopus 로고    scopus 로고
    • Speaker segmentation and clustering in meetings
    • Montreal, QC, Canada, May, Online, Available
    • Q. Jin, K. Laskowski, T. Schultz, and A. Waibel, "Speaker segmentation and clustering in meetings," in Proc. ICASSP Meeting Recognition Workshop, Montreal, QC, Canada, May 2004, [Online]. Available: http://isl.ira.uka.de/publications/SchultzJin_NIST04.pdf.
    • (2004) Proc. ICASSP Meeting Recognition Workshop
    • Jin, Q.1    Laskowski, K.2    Schultz, T.3    Waibel, A.4
  • 55
    • 0002871462 scopus 로고    scopus 로고
    • Integrated technologies for indexing spoken language
    • Feb
    • F. Kubala, S. Colbath, D. Liu, A. Srivastava, and J. Makhoul, "Integrated technologies for indexing spoken language," Commun. ACM, vol. 43, no. 2, pp. 48-56, Feb. 2000.
    • (2000) Commun. ACM , vol.43 , Issue.2 , pp. 48-56
    • Kubala, F.1    Colbath, S.2    Liu, D.3    Srivastava, A.4    Makhoul, J.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.