메뉴 건너뛰기




Volumn 88, Issue 5, 2008, Pages 1091-1124

Speaker segmentation and clustering

Author keywords

Diarization; Speaker clustering; Speaker segmentation

Indexed keywords

CLUSTERING ALGORITHMS; PROBABILISTIC LOGICS;

EID: 38949122754     PISSN: 01651684     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.sigpro.2007.11.017     Document Type: Review
Times cited : (87)

References (103)
  • 1
    • 84889324982 scopus 로고    scopus 로고
    • A. Solomonoff, A. Mielke, M. Schmidt, H. Gish, Clustering speakers by their voices, in: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 757-760.
    • A. Solomonoff, A. Mielke, M. Schmidt, H. Gish, Clustering speakers by their voices, in: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 757-760.
  • 2
    • 33745200276 scopus 로고    scopus 로고
    • R. Sinha, S.E. Tranter, M.J.F. Gales, P.C. Woodland, The Cambridge University March 2005 speaker diarisation system, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005, pp. 2437-2440.
    • R. Sinha, S.E. Tranter, M.J.F. Gales, P.C. Woodland, The Cambridge University March 2005 speaker diarisation system, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005, pp. 2437-2440.
  • 3
    • 38949205073 scopus 로고    scopus 로고
    • ISO/IEC 15938-4:2001, Multimedia content description interface-part 4: audio, Version 1.0, 2001.
    • ISO/IEC 15938-4:2001, Multimedia content description interface-part 4: audio, Version 1.0, 2001.
  • 5
    • 4544361760 scopus 로고    scopus 로고
    • H.G. Kim, T. Sikora, Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Montreal, Canada, May 2004, pp. 925-928.
    • H.G. Kim, T. Sikora, Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Montreal, Canada, May 2004, pp. 925-928.
  • 6
    • 84979955147 scopus 로고    scopus 로고
    • H.G. Kim, T. Sikora, Audio spectrum projection based on several basis decomposition algorithms applied to general sound recognition and audio segmentation, in: Proceedings of the 12th European Signal Processing Conference, Vienna, Austria, September 2004, pp. 1047-1050.
    • H.G. Kim, T. Sikora, Audio spectrum projection based on several basis decomposition algorithms applied to general sound recognition and audio segmentation, in: Proceedings of the 12th European Signal Processing Conference, Vienna, Austria, September 2004, pp. 1047-1050.
  • 7
    • 34547324377 scopus 로고    scopus 로고
    • M. Kotti, E. Benetos, C. Kotropoulos, Automatic speaker change detection with the Bayesian information criterion using MPEG-7 features and a fusion scheme, in: Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, Kos, Greece, May 2006.
    • M. Kotti, E. Benetos, C. Kotropoulos, Automatic speaker change detection with the Bayesian information criterion using MPEG-7 features and a fusion scheme, in: Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, Kos, Greece, May 2006.
  • 8
    • 34247559206 scopus 로고    scopus 로고
    • M. Kotti, L.G.P.M. Martins, E. Benetos, J.S. Cardoso, C. Kotropoulos, Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches, in: Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, Toronto, Canada, July 2006, pp. 1101-1104.
    • M. Kotti, L.G.P.M. Martins, E. Benetos, J.S. Cardoso, C. Kotropoulos, Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches, in: Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, Toronto, Canada, July 2006, pp. 1101-1104.
  • 9
    • 64149092838 scopus 로고    scopus 로고
    • W.H. Tsai, S.S. Cheng, H.M. Wang, Speaker clustering of speech utterances using a voice characteristic reference space, in: Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea, October 2004.
    • W.H. Tsai, S.S. Cheng, H.M. Wang, Speaker clustering of speech utterances using a voice characteristic reference space, in: Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea, October 2004.
  • 10
    • 4544247119 scopus 로고    scopus 로고
    • D. Liu, F. Kubala, Online speaker clustering, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 333-336.
    • D. Liu, F. Kubala, Online speaker clustering, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 333-336.
  • 11
    • 84875953283 scopus 로고    scopus 로고
    • S.S. Chen, P.S. Gopalakrishnan, Clustering via the Bayesian information criterion with applications in speech recognition, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 645-648.
    • S.S. Chen, P.S. Gopalakrishnan, Clustering via the Bayesian information criterion with applications in speech recognition, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 645-648.
  • 12
    • 0141809272 scopus 로고    scopus 로고
    • S. Meignier, J.F. Bonastre, S. Igounet, E-HMM approach for learning and adapting sound models for speaker indexing, in: Proceedings of the Odyssey Speaker and Language Recognition Workshop, Crete, Greece, June 2001, pp. 175-180.
    • S. Meignier, J.F. Bonastre, S. Igounet, E-HMM approach for learning and adapting sound models for speaker indexing, in: Proceedings of the Odyssey Speaker and Language Recognition Workshop, Crete, Greece, June 2001, pp. 175-180.
  • 13
    • 85009289298 scopus 로고    scopus 로고
    • J. Ajmera, H. Bourlard, I. Lapidot, I. McCowan, Unknown-multiple speaker clustering using HMM, in: Proceedings of the International Conference on Spoken Language Processing, CO, USA, September 2002, pp. 573-576.
    • J. Ajmera, H. Bourlard, I. Lapidot, I. McCowan, Unknown-multiple speaker clustering using HMM, in: Proceedings of the International Conference on Spoken Language Processing, CO, USA, September 2002, pp. 573-576.
  • 14
    • 33745185104 scopus 로고    scopus 로고
    • X. Zhu, C. Barras, S. Meignier, J.-L. Gauvain, Combining speaker identification and BIC for speaker diarization, in: Proceedings of the InterSpeech, Lisbon, Portugal, September 2005, pp. 2441-2444.
    • X. Zhu, C. Barras, S. Meignier, J.-L. Gauvain, Combining speaker identification and BIC for speaker diarization, in: Proceedings of the InterSpeech, Lisbon, Portugal, September 2005, pp. 2441-2444.
  • 15
  • 18
    • 0031233424 scopus 로고    scopus 로고
    • Speaker recognition: a tutorial
    • Campbell J.P. Speaker recognition: a tutorial. Proc. IEEE 85 9 (September 1997) 1437-1462
    • (1997) Proc. IEEE , vol.85 , Issue.9 , pp. 1437-1462
    • Campbell, J.P.1
  • 19
    • 0034505639 scopus 로고    scopus 로고
    • V. Wan, W.M. Campbell, Support vector machines for speaker verification and identification, in: Proceedings of the Neural Networks for Signal Processing, vol. 10, Sydney, Australia, December 2000, pp. 775-784.
    • V. Wan, W.M. Campbell, Support vector machines for speaker verification and identification, in: Proceedings of the Neural Networks for Signal Processing, vol. 10, Sydney, Australia, December 2000, pp. 775-784.
  • 20
    • 0033884858 scopus 로고    scopus 로고
    • Speaker verification using adapted Gaussian mixture models
    • Reynolds D.A., Quatiery T.F., and Dunn R.B. Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10 (October 2000) 19-41
    • (2000) Digital Signal Process. , vol.10 , pp. 19-41
    • Reynolds, D.A.1    Quatiery, T.F.2    Dunn, R.B.3
  • 22
    • 38949191461 scopus 로고    scopus 로고
    • National Institute of Standards and Technology (NIST)-The Segmentation Task: Find the Story Boundaries 〈http://www.nist.gov/speech/tests/tdt/tdt99/presentations/NIST_segmentation/index.htm〉.
    • National Institute of Standards and Technology (NIST)-The Segmentation Task: Find the Story Boundaries 〈http://www.nist.gov/speech/tests/tdt/tdt99/presentations/NIST_segmentation/index.htm〉.
  • 23
    • 38949110169 scopus 로고    scopus 로고
    • The Center for Spoken Language Research of the Colorado University (CSLR) 〈http://cslr.colorado.edu/〉.
    • The Center for Spoken Language Research of the Colorado University (CSLR) 〈http://cslr.colorado.edu/〉.
  • 24
    • 38949163851 scopus 로고    scopus 로고
    • International Computer Science Institute-Speech Research Group Berkeley 〈http://www.icsi.berkeley.edu/groups/speech/〉.
    • International Computer Science Institute-Speech Research Group Berkeley 〈http://www.icsi.berkeley.edu/groups/speech/〉.
  • 25
    • 38949198872 scopus 로고    scopus 로고
    • Speech Analysis and Interpretation Laboratory (SAIL) at the University of Southern California 〈http://sail.usc.edu/projectsIntro.php〉.
    • Speech Analysis and Interpretation Laboratory (SAIL) at the University of Southern California 〈http://sail.usc.edu/projectsIntro.php〉.
  • 26
    • 38949096946 scopus 로고    scopus 로고
    • International Speech Technology and Research (STAR) Laboratory at Stanford research institute (SRI) 〈http://www.speech.sri.com/projects/sieve/〉.
    • International Speech Technology and Research (STAR) Laboratory at Stanford research institute (SRI) 〈http://www.speech.sri.com/projects/sieve/〉.
  • 27
    • 38949095530 scopus 로고    scopus 로고
    • Microsoft Audio Projects 〈http://research.microsoft.com/users/llu/Audioprojects.aspx〉.
    • Microsoft Audio Projects 〈http://research.microsoft.com/users/llu/Audioprojects.aspx〉.
  • 28
    • 38949093086 scopus 로고    scopus 로고
    • The Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP) Research Institute 〈http://www.idiap.ch/speech_processing.php〉.
    • The Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP) Research Institute 〈http://www.idiap.ch/speech_processing.php〉.
  • 29
    • 38949141761 scopus 로고    scopus 로고
    • The Laboratoire d'Informatique pour la Mècanique et les Sciences de l'Ingènieur (LIMSI) Spoken Language Processing Group 〈http://www.limsi.fr/TLP〉.
    • The Laboratoire d'Informatique pour la Mècanique et les Sciences de l'Ingènieur (LIMSI) Spoken Language Processing Group 〈http://www.limsi.fr/TLP〉.
  • 30
    • 38949115082 scopus 로고    scopus 로고
    • The Department of Speech, Music and Hearing of the Royal Institute of Technology (KTH) at Stockholm 〈http://www.speech.kth.se〉.
    • The Department of Speech, Music and Hearing of the Royal Institute of Technology (KTH) at Stockholm 〈http://www.speech.kth.se〉.
  • 31
    • 38949152506 scopus 로고    scopus 로고
    • The Chair of Computer Science VI, Computer Science Department, Aachen University 〈http://www-i6.informatik.rwth-aachen.de〉.
    • The Chair of Computer Science VI, Computer Science Department, Aachen University 〈http://www-i6.informatik.rwth-aachen.de〉.
  • 32
    • 38949134678 scopus 로고    scopus 로고
    • The Infant Speech Segmentation Project at Berkeley University 〈http://www-gse.berkeley.edu/research/completed/InfantSpeech.html〉.
    • The Infant Speech Segmentation Project at Berkeley University 〈http://www-gse.berkeley.edu/research/completed/InfantSpeech.html〉.
  • 33
    • 38949101288 scopus 로고    scopus 로고
    • Language Science Research Group, Washington University 〈http://lsrg.cs.wustl.edu〉.
    • Language Science Research Group, Washington University 〈http://lsrg.cs.wustl.edu〉.
  • 34
    • 38949182068 scopus 로고    scopus 로고
    • The University College of London Psychology Speech Group, speech segmentation issues 〈http://www.speech.psychol.ucl.ac.uk〉.
    • The University College of London Psychology Speech Group, speech segmentation issues 〈http://www.speech.psychol.ucl.ac.uk〉.
  • 36
    • 0037700756 scopus 로고    scopus 로고
    • L. Lu, H. Zhang, Speaker change detection and tracking in real-time news broadcast analysis, in: Proceedings of the ACM Multimedia 2002, Juan-les-Pins, France, December 2002, pp. 602-610.
    • L. Lu, H. Zhang, Speaker change detection and tracking in real-time news broadcast analysis, in: Proceedings of the ACM Multimedia 2002, Juan-les-Pins, France, December 2002, pp. 602-610.
  • 37
    • 17444365032 scopus 로고    scopus 로고
    • Unsupervised speaker segmentation and tracking in real-time audio content analysis
    • Lu L., and Zhang H. Unsupervised speaker segmentation and tracking in real-time audio content analysis. Multimedia Systems 10 4 (April 2005) 332-343
    • (2005) Multimedia Systems , vol.10 , Issue.4 , pp. 332-343
    • Lu, L.1    Zhang, H.2
  • 38
    • 38949090534 scopus 로고    scopus 로고
    • A. Tritschler, R. Gopinath, Improved speaker segmentation and segments clustering using the Bayesian information criterion, in: Proceedings of the 6th European Conference on Speech Communication and Technology, Budapest, Hungary, September 1999, pp. 679-682.
    • A. Tritschler, R. Gopinath, Improved speaker segmentation and segments clustering using the Bayesian information criterion, in: Proceedings of the 6th European Conference on Speech Communication and Technology, Budapest, Hungary, September 1999, pp. 679-682.
  • 39
    • 0034273195 scopus 로고    scopus 로고
    • DISTBIC: a speaker-based segmentation for audio data indexing
    • Delacourt P., and Wellekens C.J. DISTBIC: a speaker-based segmentation for audio data indexing. Speech Comm. 32 (September 2000) 111-126
    • (2000) Speech Comm. , vol.32 , pp. 111-126
    • Delacourt, P.1    Wellekens, C.J.2
  • 40
    • 85009282223 scopus 로고    scopus 로고
    • S. Know, S. Narayanan, Speaker change detection using a new weighted distance measure, in: Proceedings of the International Conference on Spoken Language, vol. 4, CO, USA, September 2002, pp. 2537-2540.
    • S. Know, S. Narayanan, Speaker change detection using a new weighted distance measure, in: Proceedings of the International Conference on Spoken Language, vol. 4, CO, USA, September 2002, pp. 2537-2540.
  • 41
    • 85143189670 scopus 로고    scopus 로고
    • T. Wu, L. Lu, K. Chen, H. Zhang, UBM-based real-time speaker segmentation for broadcasting news, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Hong Kong, April 2003, pp. 193-196.
    • T. Wu, L. Lu, K. Chen, H. Zhang, UBM-based real-time speaker segmentation for broadcasting news, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Hong Kong, April 2003, pp. 193-196.
  • 42
    • 85009212151 scopus 로고    scopus 로고
    • S.S. Cheng, H.M. Wang, A sequential metric-based audio segmentation method via the Bayesian information criterion, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 945-948.
    • S.S. Cheng, H.M. Wang, A sequential metric-based audio segmentation method via the Bayesian information criterion, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 945-948.
  • 44
    • 33646789869 scopus 로고    scopus 로고
    • H. Kim, D. Elter, T. Sikora, Hybrid speaker-based segmentation system using model-level clustering, in: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. I, Philadelphia, USA, March 2005, pp. 745-748.
    • H. Kim, D. Elter, T. Sikora, Hybrid speaker-based segmentation system using model-level clustering, in: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. I, Philadelphia, USA, March 2005, pp. 745-748.
  • 46
    • 27644599375 scopus 로고    scopus 로고
    • Unsupervised speaker indexing using generic models
    • Know S., and Narayanan S. Unsupervised speaker indexing using generic models. IEEE Trans. Speech Audio Process. 13 5 (September 2005) 1004-1013
    • (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 1004-1013
    • Know, S.1    Narayanan, S.2
  • 47
    • 33745000055 scopus 로고    scopus 로고
    • Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs
    • Wu C.H., Chiu Y.H., Shia C.J., and Lin C.Y. Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs. IEEE Trans. Audio Speech Language Process. 14 1 (January 2006) 266-276
    • (2006) IEEE Trans. Audio Speech Language Process. , vol.14 , Issue.1 , pp. 266-276
    • Wu, C.H.1    Chiu, Y.H.2    Shia, C.J.3    Lin, C.Y.4
  • 48
    • 38949101287 scopus 로고    scopus 로고
    • T. Wu, L. Lu, K. Chen, H. Zhang, Universal background models for real-time speaker change detection, in: Proceedings of the 9th International Conference on Multimedia Modeling, Tamshui, Taiwan, January 2003, pp. 135-149.
    • T. Wu, L. Lu, K. Chen, H. Zhang, Universal background models for real-time speaker change detection, in: Proceedings of the 9th International Conference on Multimedia Modeling, Tamshui, Taiwan, January 2003, pp. 135-149.
  • 49
    • 4544280424 scopus 로고    scopus 로고
    • S.E. Tranter, K. Yu, G. Evermann, P.C. Woodland, Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 433-477.
    • S.E. Tranter, K. Yu, G. Evermann, P.C. Woodland, Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 433-477.
  • 50
    • 0141814632 scopus 로고    scopus 로고
    • D. Wang, L. Lu, H.J. Zhang, Speech segmentation without speech recognition, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 468-471.
    • D. Wang, L. Lu, H.J. Zhang, Speech segmentation without speech recognition, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 468-471.
  • 51
    • 33947127409 scopus 로고    scopus 로고
    • Multiple change-point audio segmentation and classification using an MDL-based Gaussian model
    • Wu C.H., and Hsieh C.H. Multiple change-point audio segmentation and classification using an MDL-based Gaussian model. IEEE Trans. Audio Speech Language Process. 14 2 (March 2006) 647-657
    • (2006) IEEE Trans. Audio Speech Language Process. , vol.14 , Issue.2 , pp. 647-657
    • Wu, C.H.1    Hsieh, C.H.2
  • 52
    • 4544369704 scopus 로고    scopus 로고
    • R. Huang, J.H.L. Hansen, Advances in unsupervised audio segmentation for the broadcast news and ngsw corpora, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 741-744.
    • R. Huang, J.H.L. Hansen, Advances in unsupervised audio segmentation for the broadcast news and ngsw corpora, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 741-744.
  • 56
  • 57
    • 0002038020 scopus 로고
    • Pitch and voicing determination
    • Furui S., and Sondhi M.M. (Eds), Marcel Dekker Inc., New York
    • Hess W.J. Pitch and voicing determination. In: Furui S., and Sondhi M.M. (Eds). Advances in Speech Signal Processing (1991), Marcel Dekker Inc., New York
    • (1991) Advances in Speech Signal Processing
    • Hess, W.J.1
  • 58
    • 33746410556 scopus 로고    scopus 로고
    • Emotional speech recognition: resources, features, and methods
    • Ververidis D., and Kotropoulos C. Emotional speech recognition: resources, features, and methods. Speech Comm. 48 9 (September 2006) 1162-1181
    • (2006) Speech Comm. , vol.48 , Issue.9 , pp. 1162-1181
    • Ververidis, D.1    Kotropoulos, C.2
  • 59
    • 84990950602 scopus 로고    scopus 로고
    • B. Li, Y. Li, C. Wang, C. Zhang, A new efficient pitch-tracking algorithm, in: Proceedings of the 2003 IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, vol. 2, Hunan, China, October 2003, pp. 1102-1107.
    • B. Li, Y. Li, C. Wang, C. Zhang, A new efficient pitch-tracking algorithm, in: Proceedings of the 2003 IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, vol. 2, Hunan, China, October 2003, pp. 1102-1107.
  • 60
    • 0033692969 scopus 로고    scopus 로고
    • T. Kemp, M. Schmidt, M. Westphal, A. Waibel, Strategies for automatic segmentation of audio data, in: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, Istanbul, Turkey, June 2000, pp. 1423-1426.
    • T. Kemp, M. Schmidt, M. Westphal, A. Waibel, Strategies for automatic segmentation of audio data, in: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, Istanbul, Turkey, June 2000, pp. 1423-1426.
  • 61
    • 33646769986 scopus 로고    scopus 로고
    • M. Collet, D. Charlet, F. Bimbot, A correlation metric for speaker tracking using anchor models, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 713-716.
    • M. Collet, D. Charlet, F. Bimbot, A correlation metric for speaker tracking using anchor models, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 713-716.
  • 62
    • 0032139769 scopus 로고    scopus 로고
    • Automatic segmentation of speech recorded in unknown noisy channel characteristics
    • Pellom B.L., and Hansen J.H.L. Automatic segmentation of speech recorded in unknown noisy channel characteristics. Speech Comm. 25 1-3 (August 1998) 97-116
    • (1998) Speech Comm. , vol.25 , Issue.1-3 , pp. 97-116
    • Pellom, B.L.1    Hansen, J.H.L.2
  • 63
    • 0037401304 scopus 로고    scopus 로고
    • Speech/music segmentation using entropy and dynamism features in a HMM classification framework
    • Ajmera J., McCowan I., and Bourland H. Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Speech Comm. 40 3 (May 2003) 351-363
    • (2003) Speech Comm. , vol.40 , Issue.3 , pp. 351-363
    • Ajmera, J.1    McCowan, I.2    Bourland, H.3
  • 64
    • 4544303183 scopus 로고    scopus 로고
    • N. Mesgarani, S. Shamma, M. Slaney, Speech discrimination based on multiscale spectro-temporal modulations, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 601-604.
    • N. Mesgarani, S. Shamma, M. Slaney, Speech discrimination based on multiscale spectro-temporal modulations, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 601-604.
  • 65
    • 84863671030 scopus 로고    scopus 로고
    • J.A. Arias, J. Pinquier, R. Andè-Obrecht, Evaluation of classification techniques for audio indexing, in: Proceedings of the 13th European Signal Processing Conference, Antalya, Turkey, September 2005.
    • J.A. Arias, J. Pinquier, R. Andè-Obrecht, Evaluation of classification techniques for audio indexing, in: Proceedings of the 13th European Signal Processing Conference, Antalya, Turkey, September 2005.
  • 66
    • 33644539859 scopus 로고    scopus 로고
    • Audio-based description and structuring of videos
    • Harb H., and Chen L. Audio-based description and structuring of videos. Internat. J. Digital Libraries 6 1 (February 2006) 70-81
    • (2006) Internat. J. Digital Libraries , vol.6 , Issue.1 , pp. 70-81
    • Harb, H.1    Chen, L.2
  • 67
    • 0029352294 scopus 로고
    • Second-order statistical measures for text-independent speaker identification
    • Bimbot F., Magrin-Chagnolleau I., and Mathan L. Second-order statistical measures for text-independent speaker identification. Speech Comm. 17 1-2 (August 1995) 177-192
    • (1995) Speech Comm. , vol.17 , Issue.1-2 , pp. 177-192
    • Bimbot, F.1    Magrin-Chagnolleau, I.2    Mathan, L.3
  • 69
    • 85143190520 scopus 로고    scopus 로고
    • M. Cettolo, M. Vescovi, Efficient audio segmentation algorithms based on the BIC, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, Hong Kong, April 2003, pp. 537-540.
    • M. Cettolo, M. Vescovi, Efficient audio segmentation algorithms based on the BIC, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, Hong Kong, April 2003, pp. 537-540.
  • 70
    • 85009210477 scopus 로고    scopus 로고
    • M. Vescovi, M. Cettolo, R. Rizzi, A DP algorithm for speaker change detection, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 2997-3000.
    • M. Vescovi, M. Cettolo, R. Rizzi, A DP algorithm for speaker change detection, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 2997-3000.
  • 71
    • 38949102855 scopus 로고    scopus 로고
    • Q. Jin, K. Laskowski, T. Schultz, A. Waibel, Speaker segmentation and clustering in meetings, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.
    • Q. Jin, K. Laskowski, T. Schultz, A. Waibel, Speaker segmentation and clustering in meetings, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.
  • 72
    • 10844275417 scopus 로고    scopus 로고
    • Evaluation of BIC-based algorithms for audio segmentation
    • Cettolo M., Vescovi M., and Rizzi R. Evaluation of BIC-based algorithms for audio segmentation. Comput. Speech Language 19 (April 2005) 1004-1013
    • (2005) Comput. Speech Language , vol.19 , pp. 1004-1013
    • Cettolo, M.1    Vescovi, M.2    Rizzi, R.3
  • 73
    • 0001011286 scopus 로고
    • Robust procedures in multivariate analysis I: robust covariance estimation
    • Campbell N.A. Robust procedures in multivariate analysis I: robust covariance estimation. Appl. Statist. 29 3 (1980) 231-237
    • (1980) Appl. Statist. , vol.29 , Issue.3 , pp. 231-237
    • Campbell, N.A.1
  • 74
    • 85009128756 scopus 로고    scopus 로고
    • S. Cheng, H. Wang, Metric SEQDAC: a hybrid approach for audio segmentation, in: Proceedings of the 8th International Conference on Spoken Language Processing, Jeju, Korea, October 2004, pp. 1617-1620.
    • S. Cheng, H. Wang, Metric SEQDAC: a hybrid approach for audio segmentation, in: Proceedings of the 8th International Conference on Spoken Language Processing, Jeju, Korea, October 2004, pp. 1617-1620.
  • 75
    • 0026400244 scopus 로고    scopus 로고
    • H. Gish, M.H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification, in: Proceedings of the 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, Canada, April 1991, pp. 873-876.
    • H. Gish, M.H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification, in: Proceedings of the 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, Canada, April 1991, pp. 873-876.
  • 76
    • 4544339441 scopus 로고    scopus 로고
    • J. Ajmera, G. Lathoud, I. McCowan, Clustering and segmenting speakers and their locations in meetings, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 605-608.
    • J. Ajmera, G. Lathoud, I. McCowan, Clustering and segmenting speakers and their locations in meetings, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 605-608.
  • 77
    • 38949206466 scopus 로고    scopus 로고
    • D.P.W. Ellis, J.C. Liu, Speaker turn segmentation based on between-channel differences, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.
    • D.P.W. Ellis, J.C. Liu, Speaker turn segmentation based on between-channel differences, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.
  • 78
    • 38949203922 scopus 로고    scopus 로고
    • J. Alabiso, R. MacIntyre, D. Graff, 1997 English Broadcast News Transcripts (HUB4), Linguistic Data Consortium, Philadelphia, 1998.
    • J. Alabiso, R. MacIntyre, D. Graff, 1997 English Broadcast News Transcripts (HUB4), Linguistic Data Consortium, Philadelphia, 1998.
  • 81
    • 85021249401 scopus 로고    scopus 로고
    • M. Federica, D. Giordani, P. Caletti, Development and evaluation of an Italian broadcast news corpus, in: Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, May-June 2000, pp. 921-924.
    • M. Federica, D. Giordani, P. Caletti, Development and evaluation of an Italian broadcast news corpus, in: Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, May-June 2000, pp. 921-924.
  • 82
    • 38949190789 scopus 로고    scopus 로고
    • S. Chen, P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the Bayesian information criterion, in: Proceedings of the DARPA Broadcast News Transcription Understanding Workshop, Landsdowne, VA, February 1998, pp. 127-132.
    • S. Chen, P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the Bayesian information criterion, in: Proceedings of the DARPA Broadcast News Transcription Understanding Workshop, Landsdowne, VA, February 1998, pp. 127-132.
  • 84
    • 38949211653 scopus 로고    scopus 로고
    • Linguistic Data Consortium, Philadelphia
    • Graff D. TDT3 Mandarin Audio (2001), Linguistic Data Consortium, Philadelphia
    • (2001) TDT3 Mandarin Audio
    • Graff, D.1
  • 85
    • 0242323752 scopus 로고    scopus 로고
    • Unified fusion rules for multisensor multihypothesis network decision systems
    • Zhu Y., and Rong X. Unified fusion rules for multisensor multihypothesis network decision systems. IEEE Trans. System Man Cybernet. 33 4 (July 2003) 502-513
    • (2003) IEEE Trans. System Man Cybernet. , vol.33 , Issue.4 , pp. 502-513
    • Zhu, Y.1    Rong, X.2
  • 86
    • 38949122539 scopus 로고    scopus 로고
    • M. Kotti, E. Benetos, C. Kotropoulos, Computationally efficient and robust BIC-based speaker segmentation, IEEE Trans. Audio Speech Language Process., in revision.
    • M. Kotti, E. Benetos, C. Kotropoulos, Computationally efficient and robust BIC-based speaker segmentation, IEEE Trans. Audio Speech Language Process., in revision.
  • 87
    • 38949110862 scopus 로고    scopus 로고
    • The Linguistic Data Consortium 〈http://www.ldc.upenn.edu/〉.
    • The Linguistic Data Consortium 〈http://www.ldc.upenn.edu/〉.
  • 88
    • 35348882681 scopus 로고    scopus 로고
    • Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion
    • Almpanidis G., and Kotropoulos C. Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion. Speech Comm. 50 1 (January 2008) 38-55
    • (2008) Speech Comm. , vol.50 , Issue.1 , pp. 38-55
    • Almpanidis, G.1    Kotropoulos, C.2
  • 89
    • 33745190484 scopus 로고    scopus 로고
    • W.-H. Tsai, H.-M. Wang, Speaker clustering of unknown utterances based on maximum purity estimation, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005.
    • W.-H. Tsai, H.-M. Wang, Speaker clustering of unknown utterances based on maximum purity estimation, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005.
  • 90
    • 38949136524 scopus 로고    scopus 로고
    • J.-L. Gauvain, L. Lamel, G. Adda, Partitioning and transcription of broadcast news data, in: Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, December 1998, pp. 1335-1338.
    • J.-L. Gauvain, L. Lamel, G. Adda, Partitioning and transcription of broadcast news data, in: Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, December 1998, pp. 1335-1338.
  • 92
    • 84946742526 scopus 로고    scopus 로고
    • J. Ajmera, C. Wooters, A robust speaker clustering algorithm, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Virgin Islands, November 2003, pp. 411-416.
    • J. Ajmera, C. Wooters, A robust speaker clustering algorithm, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Virgin Islands, November 2003, pp. 411-416.
  • 93
    • 38949104255 scopus 로고    scopus 로고
    • I. Voitovetsky, H. Guterman, A. Cohen, Validity criterion for unsupervised speaker recognition, in: Proceedings of the First Workshop Text, Speech, and Dialogue, Brno, Czech Republic, September 1998, pp. 321-326.
    • I. Voitovetsky, H. Guterman, A. Cohen, Validity criterion for unsupervised speaker recognition, in: Proceedings of the First Workshop Text, Speech, and Dialogue, Brno, Czech Republic, September 1998, pp. 321-326.
  • 94
    • 0031331636 scopus 로고    scopus 로고
    • I. Voitovetsky, H. Guterman, A. Cohen, Unsupervised speaker classification using self-organizing maps, in: Proceedings of the IEEE Workshop Neural Networks for Signal Processing, Amelia Island, USA, September 1997, pp. 578-587.
    • I. Voitovetsky, H. Guterman, A. Cohen, Unsupervised speaker classification using self-organizing maps, in: Proceedings of the IEEE Workshop Neural Networks for Signal Processing, Amelia Island, USA, September 1997, pp. 578-587.
  • 95
    • 84864281086 scopus 로고    scopus 로고
    • I. Lapidot, H. Guterman, Resolution limitation in speakers clustering and segmentation problems, in: Proceedings of the 2001: A Speaker Odyssey, The Speaker Recognition Workshop, Chania, Greece, June 18-22, 2001, pp. 169-173.
    • I. Lapidot, H. Guterman, Resolution limitation in speakers clustering and segmentation problems, in: Proceedings of the 2001: A Speaker Odyssey, The Speaker Recognition Workshop, Chania, Greece, June 18-22, 2001, pp. 169-173.
  • 96
    • 0036650810 scopus 로고    scopus 로고
    • Unsupervised speaker recognition based on competition between self-organizing maps
    • Lapidot I., Guterman H., and Cohen A. Unsupervised speaker recognition based on competition between self-organizing maps. IEEE Trans. Neural Networks 13 4 (July 2002) 877-887
    • (2002) IEEE Trans. Neural Networks , vol.13 , Issue.4 , pp. 877-887
    • Lapidot, I.1    Guterman, H.2    Cohen, A.3
  • 97
    • 38949193377 scopus 로고    scopus 로고
    • 1998 HUB4 Broadcast News Evaluation English Test Material, Linguistic Data Consortium, Philadelphia, 2000.
    • 1998 HUB4 Broadcast News Evaluation English Test Material, Linguistic Data Consortium, Philadelphia, 2000.
  • 99
    • 38949107492 scopus 로고    scopus 로고
    • M. Przybocki, A. Martin, 2001 NIST Speaker Recognition Evaluation Corpus, Linguistic Data Consortium, Philadelphia, 2002.
    • M. Przybocki, A. Martin, 2001 NIST Speaker Recognition Evaluation Corpus, Linguistic Data Consortium, Philadelphia, 2002.
  • 100
    • 38949099187 scopus 로고    scopus 로고
    • H. Jin, F. Kubala, R. Schwartz, Automatic speaker clustering, in: Proceedings of the Speech Recognition Workshop, Chantilly, Virginia, 1997, pp. 108-111.
    • H. Jin, F. Kubala, R. Schwartz, Automatic speaker clustering, in: Proceedings of the Speech Recognition Workshop, Chantilly, Virginia, 1997, pp. 108-111.
  • 102
    • 38949156053 scopus 로고    scopus 로고
    • C. Barras, X. Zhu, S. Meignier, J.-L. Gauvain, Improving speaker diarization, in: Proceedings of the Fall Rich Transcription Workshop (RT-04), Palisades, NY, November 2004 [Online]. Available: 〈http://www.limsi.fr/Individu/barras/publis/rt04f_diarization.pdf〉.
    • C. Barras, X. Zhu, S. Meignier, J.-L. Gauvain, Improving speaker diarization, in: Proceedings of the Fall Rich Transcription Workshop (RT-04), Palisades, NY, November 2004 [Online]. Available: 〈http://www.limsi.fr/Individu/barras/publis/rt04f_diarization.pdf〉.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.