SCOPUS 정보 검색 플랫폼

Signal Processing

Volumn 88, Issue 5, 2008, Pages 1091-1124

Speaker segmentation and clustering

(3) Kotti, Margarita a Moschou, Vassiliki a Kotropoulos, Constantine a

a ARISTOTLE UNIVERSITY OF THESSALONIKI (Greece)

Author keywords

Diarization; Speaker clustering; Speaker segmentation

Indexed keywords

CLUSTERING ALGORITHMS; PROBABILISTIC LOGICS;

MOVIE ANALYSIS; SPEAKER CLUSTERING; SPEAKER SEGMENTATION;

SPEECH PROCESSING;

EID: 38949122754 PISSN: 01651684 EISSN: None Source Type: Journal
DOI: 10.1016/j.sigpro.2007.11.017 Document Type: Review

Times cited : (87)

References (103)

1
- 84889324982
- A. Solomonoff, A. Mielke, M. Schmidt, H. Gish, Clustering speakers by their voices, in: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 757-760.
- A. Solomonoff, A. Mielke, M. Schmidt, H. Gish, Clustering speakers by their voices, in: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 757-760.

2
- 33745200276
- R. Sinha, S.E. Tranter, M.J.F. Gales, P.C. Woodland, The Cambridge University March 2005 speaker diarisation system, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005, pp. 2437-2440.
- R. Sinha, S.E. Tranter, M.J.F. Gales, P.C. Woodland, The Cambridge University March 2005 speaker diarisation system, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005, pp. 2437-2440.

3
- 38949205073
- ISO/IEC 15938-4:2001, Multimedia content description interface-part 4: audio, Version 1.0, 2001.
- ISO/IEC 15938-4:2001, Multimedia content description interface-part 4: audio, Version 1.0, 2001.

4
- 0003769779
- Wiley, West Sussex, England
- Manjunath B.S., Salembier P., Sikora T., and Salembier P. Introduction to MPEG 7: Multimedia Content Description Language (2002), Wiley, West Sussex, England
- (2002) Introduction to MPEG 7: Multimedia Content Description Language
- Manjunath, B.S.¹ Salembier, P.² Sikora, T.³ Salembier, P.⁴

5
- 4544361760
- H.G. Kim, T. Sikora, Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Montreal, Canada, May 2004, pp. 925-928.
- H.G. Kim, T. Sikora, Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Montreal, Canada, May 2004, pp. 925-928.

6
- 84979955147
- H.G. Kim, T. Sikora, Audio spectrum projection based on several basis decomposition algorithms applied to general sound recognition and audio segmentation, in: Proceedings of the 12th European Signal Processing Conference, Vienna, Austria, September 2004, pp. 1047-1050.
- H.G. Kim, T. Sikora, Audio spectrum projection based on several basis decomposition algorithms applied to general sound recognition and audio segmentation, in: Proceedings of the 12th European Signal Processing Conference, Vienna, Austria, September 2004, pp. 1047-1050.

7
- 34547324377
- M. Kotti, E. Benetos, C. Kotropoulos, Automatic speaker change detection with the Bayesian information criterion using MPEG-7 features and a fusion scheme, in: Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, Kos, Greece, May 2006.
- M. Kotti, E. Benetos, C. Kotropoulos, Automatic speaker change detection with the Bayesian information criterion using MPEG-7 features and a fusion scheme, in: Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, Kos, Greece, May 2006.

8
- 34247559206
- M. Kotti, L.G.P.M. Martins, E. Benetos, J.S. Cardoso, C. Kotropoulos, Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches, in: Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, Toronto, Canada, July 2006, pp. 1101-1104.
- M. Kotti, L.G.P.M. Martins, E. Benetos, J.S. Cardoso, C. Kotropoulos, Automatic speaker segmentation using multiple features and distance measures: a comparison of three approaches, in: Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, Toronto, Canada, July 2006, pp. 1101-1104.

9
- 64149092838
- W.H. Tsai, S.S. Cheng, H.M. Wang, Speaker clustering of speech utterances using a voice characteristic reference space, in: Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea, October 2004.
- W.H. Tsai, S.S. Cheng, H.M. Wang, Speaker clustering of speech utterances using a voice characteristic reference space, in: Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea, October 2004.

10
- 4544247119
- D. Liu, F. Kubala, Online speaker clustering, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 333-336.
- D. Liu, F. Kubala, Online speaker clustering, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 333-336.

11
- 84875953283
- S.S. Chen, P.S. Gopalakrishnan, Clustering via the Bayesian information criterion with applications in speech recognition, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 645-648.
- S.S. Chen, P.S. Gopalakrishnan, Clustering via the Bayesian information criterion with applications in speech recognition, in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Seattle, USA, May 1998, pp. 645-648.

12
- 0141809272
- S. Meignier, J.F. Bonastre, S. Igounet, E-HMM approach for learning and adapting sound models for speaker indexing, in: Proceedings of the Odyssey Speaker and Language Recognition Workshop, Crete, Greece, June 2001, pp. 175-180.
- S. Meignier, J.F. Bonastre, S. Igounet, E-HMM approach for learning and adapting sound models for speaker indexing, in: Proceedings of the Odyssey Speaker and Language Recognition Workshop, Crete, Greece, June 2001, pp. 175-180.

13
- 85009289298
- J. Ajmera, H. Bourlard, I. Lapidot, I. McCowan, Unknown-multiple speaker clustering using HMM, in: Proceedings of the International Conference on Spoken Language Processing, CO, USA, September 2002, pp. 573-576.
- J. Ajmera, H. Bourlard, I. Lapidot, I. McCowan, Unknown-multiple speaker clustering using HMM, in: Proceedings of the International Conference on Spoken Language Processing, CO, USA, September 2002, pp. 573-576.

14
- 33745185104
- X. Zhu, C. Barras, S. Meignier, J.-L. Gauvain, Combining speaker identification and BIC for speaker diarization, in: Proceedings of the InterSpeech, Lisbon, Portugal, September 2005, pp. 2441-2444.
- X. Zhu, C. Barras, S. Meignier, J.-L. Gauvain, Combining speaker identification and BIC for speaker diarization, in: Proceedings of the InterSpeech, Lisbon, Portugal, September 2005, pp. 2441-2444.

15
- 29044442235
- Step-by-step and integrated approaches in broadcast news speaker diarization
- Meignier S., Moraru D., Fredouille C., Bonastre J.F., and Besacier L. Step-by-step and integrated approaches in broadcast news speaker diarization. Comput. Speech Language 20 2-3 (April-July 2006) 303-330
- (2006) Comput. Speech Language , vol.20 , Issue.2-3 , pp. 303-330
- Meignier, S.¹ Moraru, D.² Fredouille, C.³ Bonastre, J.F.⁴ Besacier, L.⁵

16
- 34047266609
- Multistage speaker diarization of broadcast news
- Barras C., Zhu X., Meignier S., and Gauvain J.L. Multistage speaker diarization of broadcast news. IEEE Trans. Audio Speech Language Process. 14 5 (September 2006) 1505-1512
- (2006) IEEE Trans. Audio Speech Language Process. , vol.14 , Issue.5 , pp. 1505-1512
- Barras, C.¹ Zhu, X.² Meignier, S.³ Gauvain, J.L.⁴

17
- 34047261805
- An overview of automatic speaker diarization systems
- Tranter S.E., and Reynolds D.A. An overview of automatic speaker diarization systems. IEEE Trans. Audio Speech Language Process. 14 5 (September 2006) 1557-1565
- (2006) IEEE Trans. Audio Speech Language Process. , vol.14 , Issue.5 , pp. 1557-1565
- Tranter, S.E.¹ Reynolds, D.A.²

18
- 0031233424
- Speaker recognition: a tutorial
- Campbell J.P. Speaker recognition: a tutorial. Proc. IEEE 85 9 (September 1997) 1437-1462
- (1997) Proc. IEEE , vol.85 , Issue.9 , pp. 1437-1462
- Campbell, J.P.¹

19
- 0034505639
- V. Wan, W.M. Campbell, Support vector machines for speaker verification and identification, in: Proceedings of the Neural Networks for Signal Processing, vol. 10, Sydney, Australia, December 2000, pp. 775-784.
- V. Wan, W.M. Campbell, Support vector machines for speaker verification and identification, in: Proceedings of the Neural Networks for Signal Processing, vol. 10, Sydney, Australia, December 2000, pp. 775-784.

20
- 0033884858
- Speaker verification using adapted Gaussian mixture models
- Reynolds D.A., Quatiery T.F., and Dunn R.B. Speaker verification using adapted Gaussian mixture models. Digital Signal Process. 10 (October 2000) 19-41
- (2000) Digital Signal Process. , vol.10 , pp. 19-41
- Reynolds, D.A.¹ Quatiery, T.F.² Dunn, R.B.³

21
- 34047266379
- Progress in the CU-HTK broadcast news transcription system
- Gales M.J.F., Kim D.Y., Woodland P.C., Chan H.Y., Mrva D., Sinha R., and Tranter S.E. Progress in the CU-HTK broadcast news transcription system. IEEE Trans. Speech Audio Process. 14 5 (September 2006) 1513-1525
- (2006) IEEE Trans. Speech Audio Process. , vol.14 , Issue.5 , pp. 1513-1525
- Gales, M.J.F.¹ Kim, D.Y.² Woodland, P.C.³ Chan, H.Y.⁴ Mrva, D.⁵ Sinha, R.⁶ Tranter, S.E.⁷

22
- 38949191461
- National Institute of Standards and Technology (NIST)-The Segmentation Task: Find the Story Boundaries 〈http://www.nist.gov/speech/tests/tdt/tdt99/presentations/NIST_segmentation/index.htm〉.
- National Institute of Standards and Technology (NIST)-The Segmentation Task: Find the Story Boundaries 〈http://www.nist.gov/speech/tests/tdt/tdt99/presentations/NIST_segmentation/index.htm〉.

23
- 38949110169
- The Center for Spoken Language Research of the Colorado University (CSLR) 〈http://cslr.colorado.edu/〉.
- The Center for Spoken Language Research of the Colorado University (CSLR) 〈http://cslr.colorado.edu/〉.

24
- 38949163851
- International Computer Science Institute-Speech Research Group Berkeley 〈http://www.icsi.berkeley.edu/groups/speech/〉.
- International Computer Science Institute-Speech Research Group Berkeley 〈http://www.icsi.berkeley.edu/groups/speech/〉.

25
- 38949198872
- Speech Analysis and Interpretation Laboratory (SAIL) at the University of Southern California 〈http://sail.usc.edu/projectsIntro.php〉.
- Speech Analysis and Interpretation Laboratory (SAIL) at the University of Southern California 〈http://sail.usc.edu/projectsIntro.php〉.

26
- 38949096946
- International Speech Technology and Research (STAR) Laboratory at Stanford research institute (SRI) 〈http://www.speech.sri.com/projects/sieve/〉.
- International Speech Technology and Research (STAR) Laboratory at Stanford research institute (SRI) 〈http://www.speech.sri.com/projects/sieve/〉.

27
- 38949095530
- Microsoft Audio Projects 〈http://research.microsoft.com/users/llu/Audioprojects.aspx〉.
- Microsoft Audio Projects 〈http://research.microsoft.com/users/llu/Audioprojects.aspx〉.

28
- 38949093086
- The Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP) Research Institute 〈http://www.idiap.ch/speech_processing.php〉.
- The Institut Dalle Molle d'Intelligence Artificielle Perceptive (IDIAP) Research Institute 〈http://www.idiap.ch/speech_processing.php〉.

29
- 38949141761
- The Laboratoire d'Informatique pour la Mècanique et les Sciences de l'Ingènieur (LIMSI) Spoken Language Processing Group 〈http://www.limsi.fr/TLP〉.
- The Laboratoire d'Informatique pour la Mècanique et les Sciences de l'Ingènieur (LIMSI) Spoken Language Processing Group 〈http://www.limsi.fr/TLP〉.

30
- 38949115082
- The Department of Speech, Music and Hearing of the Royal Institute of Technology (KTH) at Stockholm 〈http://www.speech.kth.se〉.
- The Department of Speech, Music and Hearing of the Royal Institute of Technology (KTH) at Stockholm 〈http://www.speech.kth.se〉.

31
- 38949152506
- The Chair of Computer Science VI, Computer Science Department, Aachen University 〈http://www-i6.informatik.rwth-aachen.de〉.
- The Chair of Computer Science VI, Computer Science Department, Aachen University 〈http://www-i6.informatik.rwth-aachen.de〉.

32
- 38949134678
- The Infant Speech Segmentation Project at Berkeley University 〈http://www-gse.berkeley.edu/research/completed/InfantSpeech.html〉.
- The Infant Speech Segmentation Project at Berkeley University 〈http://www-gse.berkeley.edu/research/completed/InfantSpeech.html〉.

33
- 38949101288
- Language Science Research Group, Washington University 〈http://lsrg.cs.wustl.edu〉.
- Language Science Research Group, Washington University 〈http://lsrg.cs.wustl.edu〉.

34
- 38949182068
- The University College of London Psychology Speech Group, speech segmentation issues 〈http://www.speech.psychol.ucl.ac.uk〉.
- The University College of London Psychology Speech Group, speech segmentation issues 〈http://www.speech.psychol.ucl.ac.uk〉.

35
- 0003793552
- Prentice-Hall, Englewood Cliffs, NJ
- Oppenheim A.V., and Schafer R.W. Digital Signal Processing (1975), Prentice-Hall, Englewood Cliffs, NJ
- (1975) Digital Signal Processing
- Oppenheim, A.V.¹ Schafer, R.W.²

36
- 0037700756
- L. Lu, H. Zhang, Speaker change detection and tracking in real-time news broadcast analysis, in: Proceedings of the ACM Multimedia 2002, Juan-les-Pins, France, December 2002, pp. 602-610.
- L. Lu, H. Zhang, Speaker change detection and tracking in real-time news broadcast analysis, in: Proceedings of the ACM Multimedia 2002, Juan-les-Pins, France, December 2002, pp. 602-610.

37
- 17444365032
- Unsupervised speaker segmentation and tracking in real-time audio content analysis
- Lu L., and Zhang H. Unsupervised speaker segmentation and tracking in real-time audio content analysis. Multimedia Systems 10 4 (April 2005) 332-343
- (2005) Multimedia Systems , vol.10 , Issue.4 , pp. 332-343
- Lu, L.¹ Zhang, H.²

38
- 38949090534
- A. Tritschler, R. Gopinath, Improved speaker segmentation and segments clustering using the Bayesian information criterion, in: Proceedings of the 6th European Conference on Speech Communication and Technology, Budapest, Hungary, September 1999, pp. 679-682.
- A. Tritschler, R. Gopinath, Improved speaker segmentation and segments clustering using the Bayesian information criterion, in: Proceedings of the 6th European Conference on Speech Communication and Technology, Budapest, Hungary, September 1999, pp. 679-682.

39
- 0034273195
- DISTBIC: a speaker-based segmentation for audio data indexing
- Delacourt P., and Wellekens C.J. DISTBIC: a speaker-based segmentation for audio data indexing. Speech Comm. 32 (September 2000) 111-126
- (2000) Speech Comm. , vol.32 , pp. 111-126
- Delacourt, P.¹ Wellekens, C.J.²

40
- 85009282223
- S. Know, S. Narayanan, Speaker change detection using a new weighted distance measure, in: Proceedings of the International Conference on Spoken Language, vol. 4, CO, USA, September 2002, pp. 2537-2540.
- S. Know, S. Narayanan, Speaker change detection using a new weighted distance measure, in: Proceedings of the International Conference on Spoken Language, vol. 4, CO, USA, September 2002, pp. 2537-2540.

41
- 85143189670
- T. Wu, L. Lu, K. Chen, H. Zhang, UBM-based real-time speaker segmentation for broadcasting news, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Hong Kong, April 2003, pp. 193-196.
- T. Wu, L. Lu, K. Chen, H. Zhang, UBM-based real-time speaker segmentation for broadcasting news, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Hong Kong, April 2003, pp. 193-196.

42
- 85009212151
- S.S. Cheng, H.M. Wang, A sequential metric-based audio segmentation method via the Bayesian information criterion, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 945-948.
- S.S. Cheng, H.M. Wang, A sequential metric-based audio segmentation method via the Bayesian information criterion, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 945-948.

43
- 3543144948
- Robust speaker change detection
- Ajmera J., McCowan I., and Bourlard H. Robust speaker change detection. IEEE Signal Process. Lett. 11 8 (August 2004) 649-651
- (2004) IEEE Signal Process. Lett. , vol.11 , Issue.8 , pp. 649-651
- Ajmera, J.¹ McCowan, I.² Bourlard, H.³

44
- 33646789869
- H. Kim, D. Elter, T. Sikora, Hybrid speaker-based segmentation system using model-level clustering, in: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. I, Philadelphia, USA, March 2005, pp. 745-748.
- H. Kim, D. Elter, T. Sikora, Hybrid speaker-based segmentation system using model-level clustering, in: Proceedings of the 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. I, Philadelphia, USA, March 2005, pp. 745-748.

45
- 22544475615
- 2 statistic and the Bayesian information criterion
- 2 statistic and the Bayesian information criterion. IEEE Trans. Speech Audio Process. 13 4 (July 2005) 467-474
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.4 , pp. 467-474
- Zhou, B.¹ Hansen, J.H.L.²

46
- 27644599375
- Unsupervised speaker indexing using generic models
- Know S., and Narayanan S. Unsupervised speaker indexing using generic models. IEEE Trans. Speech Audio Process. 13 5 (September 2005) 1004-1013
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 1004-1013
- Know, S.¹ Narayanan, S.²

47
- 33745000055
- Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs
- Wu C.H., Chiu Y.H., Shia C.J., and Lin C.Y. Automatic segmentation and identification of mixed-language speech using delta-BIC and LSA-based GMMs. IEEE Trans. Audio Speech Language Process. 14 1 (January 2006) 266-276
- (2006) IEEE Trans. Audio Speech Language Process. , vol.14 , Issue.1 , pp. 266-276
- Wu, C.H.¹ Chiu, Y.H.² Shia, C.J.³ Lin, C.Y.⁴

48
- 38949101287
- T. Wu, L. Lu, K. Chen, H. Zhang, Universal background models for real-time speaker change detection, in: Proceedings of the 9th International Conference on Multimedia Modeling, Tamshui, Taiwan, January 2003, pp. 135-149.
- T. Wu, L. Lu, K. Chen, H. Zhang, Universal background models for real-time speaker change detection, in: Proceedings of the 9th International Conference on Multimedia Modeling, Tamshui, Taiwan, January 2003, pp. 135-149.

49
- 4544280424
- S.E. Tranter, K. Yu, G. Evermann, P.C. Woodland, Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 433-477.
- S.E. Tranter, K. Yu, G. Evermann, P.C. Woodland, Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 433-477.

50
- 0141814632
- D. Wang, L. Lu, H.J. Zhang, Speech segmentation without speech recognition, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 468-471.
- D. Wang, L. Lu, H.J. Zhang, Speech segmentation without speech recognition, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 468-471.

51
- 33947127409
- Multiple change-point audio segmentation and classification using an MDL-based Gaussian model
- Wu C.H., and Hsieh C.H. Multiple change-point audio segmentation and classification using an MDL-based Gaussian model. IEEE Trans. Audio Speech Language Process. 14 2 (March 2006) 647-657
- (2006) IEEE Trans. Audio Speech Language Process. , vol.14 , Issue.2 , pp. 647-657
- Wu, C.H.¹ Hsieh, C.H.²

52
- 4544369704
- R. Huang, J.H.L. Hansen, Advances in unsupervised audio segmentation for the broadcast news and ngsw corpora, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 741-744.
- R. Huang, J.H.L. Hansen, Advances in unsupervised audio segmentation for the broadcast news and ngsw corpora, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, Canada, May 2004, pp. 741-744.

53
- 84889435599
- Wiley, West Sussex, England
- Kim H.-G., Moreau N., and Sikora T. MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval (2005), Wiley, West Sussex, England
- (2005) MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
- Kim, H.-G.¹ Moreau, N.² Sikora, T.³

54
- 0003424145
- Wiley, IEEE, New York
- Deller J.R., Hansen J.H.L., and Proakis J.G. Discrete-Time Processing of Speech Signals (1999), Wiley, IEEE, New York
- (1999) Discrete-Time Processing of Speech Signals
- Deller, J.R.¹ Hansen, J.H.L.² Proakis, J.G.³

55
- 0004056285
- Pearson Education, Prentice-Hall, Upper River Saddle, NJ
- Huang X.D., Acero A., and Hon H.-S. Spoken Language Processing: A Guide to Theory, Algorithm, and System Development (2001), Pearson Education, Prentice-Hall, Upper River Saddle, NJ
- (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
- Huang, X.D.¹ Acero, A.² Hon, H.-S.³

56
- 0000618817
- New methods of pitch extraction
- Sondhi M.M. New methods of pitch extraction. IEEE Trans. Audio Electroacoustics 16 2 (June 1968) 262-266
- (1968) IEEE Trans. Audio Electroacoustics , vol.16 , Issue.2 , pp. 262-266
- Sondhi, M.M.¹

57
- 0002038020
- Pitch and voicing determination
- Furui S., and Sondhi M.M. (Eds), Marcel Dekker Inc., New York
- Hess W.J. Pitch and voicing determination. In: Furui S., and Sondhi M.M. (Eds). Advances in Speech Signal Processing (1991), Marcel Dekker Inc., New York
- (1991) Advances in Speech Signal Processing
- Hess, W.J.¹

58
- 33746410556
- Emotional speech recognition: resources, features, and methods
- Ververidis D., and Kotropoulos C. Emotional speech recognition: resources, features, and methods. Speech Comm. 48 9 (September 2006) 1162-1181
- (2006) Speech Comm. , vol.48 , Issue.9 , pp. 1162-1181
- Ververidis, D.¹ Kotropoulos, C.²

59
- 84990950602
- B. Li, Y. Li, C. Wang, C. Zhang, A new efficient pitch-tracking algorithm, in: Proceedings of the 2003 IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, vol. 2, Hunan, China, October 2003, pp. 1102-1107.
- B. Li, Y. Li, C. Wang, C. Zhang, A new efficient pitch-tracking algorithm, in: Proceedings of the 2003 IEEE International Conference on Robotics, Intelligent Systems and Signal Processing, vol. 2, Hunan, China, October 2003, pp. 1102-1107.

60
- 0033692969
- T. Kemp, M. Schmidt, M. Westphal, A. Waibel, Strategies for automatic segmentation of audio data, in: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, Istanbul, Turkey, June 2000, pp. 1423-1426.
- T. Kemp, M. Schmidt, M. Westphal, A. Waibel, Strategies for automatic segmentation of audio data, in: Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 3, Istanbul, Turkey, June 2000, pp. 1423-1426.

61
- 33646769986
- M. Collet, D. Charlet, F. Bimbot, A correlation metric for speaker tracking using anchor models, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 713-716.
- M. Collet, D. Charlet, F. Bimbot, A correlation metric for speaker tracking using anchor models, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, April 2003, pp. 713-716.

62
- 0032139769
- Automatic segmentation of speech recorded in unknown noisy channel characteristics
- Pellom B.L., and Hansen J.H.L. Automatic segmentation of speech recorded in unknown noisy channel characteristics. Speech Comm. 25 1-3 (August 1998) 97-116
- (1998) Speech Comm. , vol.25 , Issue.1-3 , pp. 97-116
- Pellom, B.L.¹ Hansen, J.H.L.²

63
- 0037401304
- Speech/music segmentation using entropy and dynamism features in a HMM classification framework
- Ajmera J., McCowan I., and Bourland H. Speech/music segmentation using entropy and dynamism features in a HMM classification framework. Speech Comm. 40 3 (May 2003) 351-363
- (2003) Speech Comm. , vol.40 , Issue.3 , pp. 351-363
- Ajmera, J.¹ McCowan, I.² Bourland, H.³

64
- 4544303183
- N. Mesgarani, S. Shamma, M. Slaney, Speech discrimination based on multiscale spectro-temporal modulations, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 601-604.
- N. Mesgarani, S. Shamma, M. Slaney, Speech discrimination based on multiscale spectro-temporal modulations, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 601-604.

65
- 84863671030
- J.A. Arias, J. Pinquier, R. Andè-Obrecht, Evaluation of classification techniques for audio indexing, in: Proceedings of the 13th European Signal Processing Conference, Antalya, Turkey, September 2005.
- J.A. Arias, J. Pinquier, R. Andè-Obrecht, Evaluation of classification techniques for audio indexing, in: Proceedings of the 13th European Signal Processing Conference, Antalya, Turkey, September 2005.

66
- 33644539859
- Audio-based description and structuring of videos
- Harb H., and Chen L. Audio-based description and structuring of videos. Internat. J. Digital Libraries 6 1 (February 2006) 70-81
- (2006) Internat. J. Digital Libraries , vol.6 , Issue.1 , pp. 70-81
- Harb, H.¹ Chen, L.²

67
- 0029352294
- Second-order statistical measures for text-independent speaker identification
- Bimbot F., Magrin-Chagnolleau I., and Mathan L. Second-order statistical measures for text-independent speaker identification. Speech Comm. 17 1-2 (August 1995) 177-192
- (1995) Speech Comm. , vol.17 , Issue.1-2 , pp. 177-192
- Bimbot, F.¹ Magrin-Chagnolleau, I.² Mathan, L.³

68
- 85008020310
- SpeechFind: advances in spoken document retrieval for a national gallery of the spoken word
- Hansen J.H.L., Huang R., Zhou B., Seadle M., Deller J.R., Gurijala A.R., Kurimo M., and Angkititrakul P. SpeechFind: advances in spoken document retrieval for a national gallery of the spoken word. IEEE Trans. Speech Audio Process. 13 5 (September 2005) 712-730
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 712-730
- Hansen, J.H.L.¹ Huang, R.² Zhou, B.³ Seadle, M.⁴ Deller, J.R.⁵ Gurijala, A.R.⁶ Kurimo, M.⁷ Angkititrakul, P.⁸

69
- 85143190520
- M. Cettolo, M. Vescovi, Efficient audio segmentation algorithms based on the BIC, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, Hong Kong, April 2003, pp. 537-540.
- M. Cettolo, M. Vescovi, Efficient audio segmentation algorithms based on the BIC, in: Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 6, Hong Kong, April 2003, pp. 537-540.

70
- 85009210477
- M. Vescovi, M. Cettolo, R. Rizzi, A DP algorithm for speaker change detection, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 2997-3000.
- M. Vescovi, M. Cettolo, R. Rizzi, A DP algorithm for speaker change detection, in: Proceedings of the 8th European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, pp. 2997-3000.

71
- 38949102855
- Q. Jin, K. Laskowski, T. Schultz, A. Waibel, Speaker segmentation and clustering in meetings, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.
- Q. Jin, K. Laskowski, T. Schultz, A. Waibel, Speaker segmentation and clustering in meetings, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.

72
- 10844275417
- Evaluation of BIC-based algorithms for audio segmentation
- Cettolo M., Vescovi M., and Rizzi R. Evaluation of BIC-based algorithms for audio segmentation. Comput. Speech Language 19 (April 2005) 1004-1013
- (2005) Comput. Speech Language , vol.19 , pp. 1004-1013
- Cettolo, M.¹ Vescovi, M.² Rizzi, R.³

73
- 0001011286
- Robust procedures in multivariate analysis I: robust covariance estimation
- Campbell N.A. Robust procedures in multivariate analysis I: robust covariance estimation. Appl. Statist. 29 3 (1980) 231-237
- (1980) Appl. Statist. , vol.29 , Issue.3 , pp. 231-237
- Campbell, N.A.¹

74
- 85009128756
- S. Cheng, H. Wang, Metric SEQDAC: a hybrid approach for audio segmentation, in: Proceedings of the 8th International Conference on Spoken Language Processing, Jeju, Korea, October 2004, pp. 1617-1620.
- S. Cheng, H. Wang, Metric SEQDAC: a hybrid approach for audio segmentation, in: Proceedings of the 8th International Conference on Spoken Language Processing, Jeju, Korea, October 2004, pp. 1617-1620.

75
- 0026400244
- H. Gish, M.H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification, in: Proceedings of the 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, Canada, April 1991, pp. 873-876.
- H. Gish, M.H. Siu, R. Rohlicek, Segregation of speakers for speech recognition and speaker identification, in: Proceedings of the 1991 IEEE International Conference on Acoustics, Speech, and Signal Processing, Toronto, Canada, April 1991, pp. 873-876.

76
- 4544339441
- J. Ajmera, G. Lathoud, I. McCowan, Clustering and segmenting speakers and their locations in meetings, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 605-608.
- J. Ajmera, G. Lathoud, I. McCowan, Clustering and segmenting speakers and their locations in meetings, in: Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, Canada, May 2004, pp. 605-608.

77
- 38949206466
- D.P.W. Ellis, J.C. Liu, Speaker turn segmentation based on between-channel differences, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.
- D.P.W. Ellis, J.C. Liu, Speaker turn segmentation based on between-channel differences, in: Proceedings of the NIST Meeting Recognition Workshop, Montreal, Canada, May 2004, pp. 112-117.

78
- 38949203922
- J. Alabiso, R. MacIntyre, D. Graff, 1997 English Broadcast News Transcripts (HUB4), Linguistic Data Consortium, Philadelphia, 1998.
- J. Alabiso, R. MacIntyre, D. Graff, 1997 English Broadcast News Transcripts (HUB4), Linguistic Data Consortium, Philadelphia, 1998.

79
- 0003548585
- Linguistic Data Consortium, Philadelphia
- Garofolo J.S. DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus (1993), Linguistic Data Consortium, Philadelphia
- (1993) DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus
- Garofolo, J.S.¹

80
- 33749625988
- Linguistic Data Consortium, Philadelphia
- Godfrey J.J., and Holliman E. Switchboard-1 Release 2 (1997), Linguistic Data Consortium, Philadelphia
- (1997) Switchboard-1 Release 2
- Godfrey, J.J.¹ Holliman, E.²

81
- 85021249401
- M. Federica, D. Giordani, P. Caletti, Development and evaluation of an Italian broadcast news corpus, in: Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, May-June 2000, pp. 921-924.
- M. Federica, D. Giordani, P. Caletti, Development and evaluation of an Italian broadcast news corpus, in: Proceedings of the 2nd International Conference on Language Resources and Evaluation, Athens, Greece, May-June 2000, pp. 921-924.

82
- 38949190789
- S. Chen, P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the Bayesian information criterion, in: Proceedings of the DARPA Broadcast News Transcription Understanding Workshop, Landsdowne, VA, February 1998, pp. 127-132.
- S. Chen, P. Gopalakrishnan, Speaker, environment and channel change detection and clustering via the Bayesian information criterion, in: Proceedings of the DARPA Broadcast News Transcription Understanding Workshop, Landsdowne, VA, February 1998, pp. 127-132.

83
- 33745184949
- MATBN: a mandarin Chinese broadcast news corpus
- Wang H.M., Chen B., Kuo J.W., and Cheng S.S. MATBN: a mandarin Chinese broadcast news corpus. Comput. Linguistics Chinese Language Process. 10 2 (June 2005) 219-236
- (2005) Comput. Linguistics Chinese Language Process. , vol.10 , Issue.2 , pp. 219-236
- Wang, H.M.¹ Chen, B.² Kuo, J.W.³ Cheng, S.S.⁴

84
- 38949211653
- Linguistic Data Consortium, Philadelphia
- Graff D. TDT3 Mandarin Audio (2001), Linguistic Data Consortium, Philadelphia
- (2001) TDT3 Mandarin Audio
- Graff, D.¹

85
- 0242323752
- Unified fusion rules for multisensor multihypothesis network decision systems
- Zhu Y., and Rong X. Unified fusion rules for multisensor multihypothesis network decision systems. IEEE Trans. System Man Cybernet. 33 4 (July 2003) 502-513
- (2003) IEEE Trans. System Man Cybernet. , vol.33 , Issue.4 , pp. 502-513
- Zhu, Y.¹ Rong, X.²

86
- 38949122539
- M. Kotti, E. Benetos, C. Kotropoulos, Computationally efficient and robust BIC-based speaker segmentation, IEEE Trans. Audio Speech Language Process., in revision.
- M. Kotti, E. Benetos, C. Kotropoulos, Computationally efficient and robust BIC-based speaker segmentation, IEEE Trans. Audio Speech Language Process., in revision.

87
- 38949110862
- The Linguistic Data Consortium 〈http://www.ldc.upenn.edu/〉.
- The Linguistic Data Consortium 〈http://www.ldc.upenn.edu/〉.

88
- 35348882681
- Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion
- Almpanidis G., and Kotropoulos C. Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion. Speech Comm. 50 1 (January 2008) 38-55
- (2008) Speech Comm. , vol.50 , Issue.1 , pp. 38-55
- Almpanidis, G.¹ Kotropoulos, C.²

89
- 33745190484
- W.-H. Tsai, H.-M. Wang, Speaker clustering of unknown utterances based on maximum purity estimation, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005.
- W.-H. Tsai, H.-M. Wang, Speaker clustering of unknown utterances based on maximum purity estimation, in: Proceedings of the European Conference on Speech Communication and Technology, Lisbon, Portugal, September 2005.

90
- 38949136524
- J.-L. Gauvain, L. Lamel, G. Adda, Partitioning and transcription of broadcast news data, in: Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, December 1998, pp. 1335-1338.
- J.-L. Gauvain, L. Lamel, G. Adda, Partitioning and transcription of broadcast news data, in: Proceedings of the International Conference on Spoken Language Processing, Sydney, Australia, December 1998, pp. 1335-1338.

91
- 0004161991
- Prentice-Hall, Englewood Cliffs, NJ
- Jain A.K., and Dubes R.C. Algorithms for Clustering Data (1988), Prentice-Hall, Englewood Cliffs, NJ
- (1988) Algorithms for Clustering Data
- Jain, A.K.¹ Dubes, R.C.²

92
- 84946742526
- J. Ajmera, C. Wooters, A robust speaker clustering algorithm, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Virgin Islands, November 2003, pp. 411-416.
- J. Ajmera, C. Wooters, A robust speaker clustering algorithm, in: Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, Virgin Islands, November 2003, pp. 411-416.

93
- 38949104255
- I. Voitovetsky, H. Guterman, A. Cohen, Validity criterion for unsupervised speaker recognition, in: Proceedings of the First Workshop Text, Speech, and Dialogue, Brno, Czech Republic, September 1998, pp. 321-326.
- I. Voitovetsky, H. Guterman, A. Cohen, Validity criterion for unsupervised speaker recognition, in: Proceedings of the First Workshop Text, Speech, and Dialogue, Brno, Czech Republic, September 1998, pp. 321-326.

94
- 0031331636
- I. Voitovetsky, H. Guterman, A. Cohen, Unsupervised speaker classification using self-organizing maps, in: Proceedings of the IEEE Workshop Neural Networks for Signal Processing, Amelia Island, USA, September 1997, pp. 578-587.
- I. Voitovetsky, H. Guterman, A. Cohen, Unsupervised speaker classification using self-organizing maps, in: Proceedings of the IEEE Workshop Neural Networks for Signal Processing, Amelia Island, USA, September 1997, pp. 578-587.

95
- 84864281086
- I. Lapidot, H. Guterman, Resolution limitation in speakers clustering and segmentation problems, in: Proceedings of the 2001: A Speaker Odyssey, The Speaker Recognition Workshop, Chania, Greece, June 18-22, 2001, pp. 169-173.
- I. Lapidot, H. Guterman, Resolution limitation in speakers clustering and segmentation problems, in: Proceedings of the 2001: A Speaker Odyssey, The Speaker Recognition Workshop, Chania, Greece, June 18-22, 2001, pp. 169-173.

96
- 0036650810
- Unsupervised speaker recognition based on competition between self-organizing maps
- Lapidot I., Guterman H., and Cohen A. Unsupervised speaker recognition based on competition between self-organizing maps. IEEE Trans. Neural Networks 13 4 (July 2002) 877-887
- (2002) IEEE Trans. Neural Networks , vol.13 , Issue.4 , pp. 877-887
- Lapidot, I.¹ Guterman, H.² Cohen, A.³

97
- 38949193377
- 1998 HUB4 Broadcast News Evaluation English Test Material, Linguistic Data Consortium, Philadelphia, 2000.
- 1998 HUB4 Broadcast News Evaluation English Test Material, Linguistic Data Consortium, Philadelphia, 2000.

98
- 84863340525
- Linguistic Data Consortium, Philadelphia
- Graff D., and Alabiso J. 1996 English Broadcast News Transcripts (HUB4) (1997), Linguistic Data Consortium, Philadelphia
- (1997) 1996 English Broadcast News Transcripts (HUB4)
- Graff, D.¹ Alabiso, J.²

99
- 38949107492
- M. Przybocki, A. Martin, 2001 NIST Speaker Recognition Evaluation Corpus, Linguistic Data Consortium, Philadelphia, 2002.
- M. Przybocki, A. Martin, 2001 NIST Speaker Recognition Evaluation Corpus, Linguistic Data Consortium, Philadelphia, 2002.

100
- 38949099187
- H. Jin, F. Kubala, R. Schwartz, Automatic speaker clustering, in: Proceedings of the Speech Recognition Workshop, Chantilly, Virginia, 1997, pp. 108-111.
- H. Jin, F. Kubala, R. Schwartz, Automatic speaker clustering, in: Proceedings of the Speech Recognition Workshop, Chantilly, Virginia, 1997, pp. 108-111.

101
- 38949098345
- Linguistic Data Consortium, Philadelphia
- Fiscus J., Garofolo J., Przybocki M., Fisher W., and Pallett D. 1997 English Broadcast News Speech (HUB4) (1998), Linguistic Data Consortium, Philadelphia
- (1998) 1997 English Broadcast News Speech (HUB4)
- Fiscus, J.¹ Garofolo, J.² Przybocki, M.³ Fisher, W.⁴ Pallett, D.⁵

102
- 38949156053
- C. Barras, X. Zhu, S. Meignier, J.-L. Gauvain, Improving speaker diarization, in: Proceedings of the Fall Rich Transcription Workshop (RT-04), Palisades, NY, November 2004 [Online]. Available: 〈http://www.limsi.fr/Individu/barras/publis/rt04f_diarization.pdf〉.
- C. Barras, X. Zhu, S. Meignier, J.-L. Gauvain, Improving speaker diarization, in: Proceedings of the Fall Rich Transcription Workshop (RT-04), Palisades, NY, November 2004 [Online]. Available: 〈http://www.limsi.fr/Individu/barras/publis/rt04f_diarization.pdf〉.

103
- 84863340525
- Linguistic Data Consortium, Philadelphia
- Graff D., Alabiso J., Fiscus J., Garofolo J., Fisher W., and Pallett D. 1996 English Broadcast News Dev and Eval (HUB4) (1997), Linguistic Data Consortium, Philadelphia
- (1997) 1996 English Broadcast News Dev and Eval (HUB4)
- Graff, D.¹ Alabiso, J.² Fiscus, J.³ Garofolo, J.⁴ Fisher, W.⁵ Pallett, D.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.