SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 16, Issue 5, 2008, Pages 920-933

Computationally efficient and robust BIC-based speaker segmentation

(3) Kotti, Margarita a Benetos, Emmanouil a Kotropoulos, Constantine a

a ARISTOTLE UNIVERSITY OF THESSALONIKI (Greece)

Author keywords

Automatic speaker segmentation; Bayesian infor mation criterion (BIC); Inverse Gaussian distribution; Simultaneous diagonalization; Speaker utterance duration distribution; Speech analysis

Indexed keywords

AUTOMATIC SPEAKER SEGMENTATION; BAYESIAN INFOR-MATION CRITERION (BIC); INVERSE GAUSSIAN DISTRIBUTION; SIMULTANEOUS DIAGONALIZATION; SPEAKER UTTERANCE DURATION DISTRIBUTION;

BAYESIAN NETWORKS; COVARIANCE MATRIX; GAUSSIAN DISTRIBUTION; LINEAR PROGRAMMING; MAXIMUM LIKELIHOOD ESTIMATION; SPEECH ANALYSIS;

COMPUTATIONAL EFFICIENCY;

EID: 66149116378 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2008.925152 Document Type: Article

Times cited : (32)

References (45)

1
- 4544361760
- "Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation,"
- Montreal, QC, Canada, May
- H. G. Kim and T. Sikora, "Comparison of MPEG-7 audio spectrum projection features and MFCC applied to speaker recognition, sound classification and audio segmentation," in Proc. 2004 IEEE Int. Conf. Acoust., Speech, Signal Process., Montreal, QC, Canada, May 2004, vol. 5, pp. 925-928.
- (2004) In Proc. 2004 IEEE Int. Conf. Acoust., Speech, Signal Process. , vol.5 , pp. 925-928
- Kim, H.G.¹ Sikora, T.²

2
- 66149121579
- "The Segmentation Task: Find the Story Boundaries." [Online]. Available: http://www.nist.gov/speech/tests/tdt/tdt99/presenta-tions/NIST- segmentation/index.htm
- "The Segmentation Task: Find the Story Boundaries."

3
- 84924195028
- "NIST Rich Transcription Evaluation," [Online]. Available:http://www.nist.gov/speech/tests/rt/
- "NIST Rich Transcription Evaluation,"

4
- 0141478771
- "UBM-based real-time speaker segmentation for broadcasting news,"
- Hong Kong, Apr.
- T. Wu, L. Lu, K. Chen, and H. Zhang, "UBM-based real-time speaker segmentation for broadcasting news," in Proc. 2003 IEEE Int. Conf. Acoust., Speech, Signal Process., Hong Kong, Apr. 2003, vol. 2, pp. 193-196.
- (2003) In Proc. 2003 IEEE Int. Conf. Acoust., Speech, Signal Process. , vol.2 , pp. 193-196
- Wu, T.¹ Lu, L.² Chen, K.³ Zhang, H.⁴

5
- 27644599375
- "Unsupervised speaker indexing using generic models,"
- Sep.
- S. Know and S. Narayanan, "Unsupervised speaker indexing using generic models," IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 1004-1013, Sep. 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 1004-1013
- Know, S.¹ Narayanan, S.²

6
- 33646769986
- A correlation metric for speaker tracking using anchor models
- DOI 10.1109/ICASSP.2005.1415213, 1415213, 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Speech Processing
- M. Collet, D. Charlet, and F. Bimbot, "A correlation metric for speaker tracking using anchor models," in Proc. 2005 IEEE Int. Conf. Acoust., Speech, Signal Process., Philadelphia, PA, Mar. 2005, vol. 1, pp. 713-716. (Pubitemid 43761252)
- (2005) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.I
- Collet, M.¹ Charlet, D.² Bimbot, F.³

7
- 78650540904
- "Improved speaker segmentation and segments clustering using the Bayesian information criterion,"
- Sep.
- A. Tritschler and R. Gopinath, "Improved speaker segmentation and segments clustering using the Bayesian information criterion," in Proc. 6th Eur. Conf. Speech Commun. Techoi, Budapest, Hungary, Sep. 1999, pp. 679-682.
- (1999) In Proc. 6th Eur. Conf. Speech Commun. Techol., Budapest, Hungary , pp. 679-682
- Tritschler, A.¹ Gopinath, R.²

8
- 33646789869
- Hybrid speaker-based segmentation system using model-level clustering
- DOI 10.1109/ICASSP.2005.1415221, 1415221, 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Speech Processing
- H. Kim, D. filter, and T. Sikora, "Hybrid speaker-based segmentation system using model-level clustering," in Proc. 2005 IEEE Int. Conf. Acoust., Speech, Signal Process., Philadelphia, PA, Mar. 2005, vol. I, pp. 745-748. (Pubitemid 43761260)
- (2005) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.I
- Kim, H.-G.¹ Ertelt, D.² Sikora, T.³

9
- 29044442235
- "Step-by-step and integrated approaches in broadcast news speaker diarization,"
- Apr.-Jul.
- S. Meignier, D. Moraru, C. Fredouille, J. F. Bonastre, and L. Besacier, "Step-by-step and integrated approaches in broadcast news speaker diarization," Compul. Speech Lang., vol. 20, no. 2-3, pp. 303-330, Apr.-Jul. 2006.
- (2006) Comput. Speech Lang. , vol.20 , Issue.2-3 , pp. 303-330
- Meignier, S.¹ Moraru, D.² Fredouille, C.³ Bonastre, J.F.⁴ Besacier, L.⁵

10
- 84863671030
- "Evaluation of clas-sification techniques for audio indexing,"
- Sep., CD-ROM
- J. A. Arias, J. Pinquier, and R. Andre-Obrecht, "Evaluation of classification techniques for audio indexing," in Proc. 13th Eur. Signal Process. Conf., Antalya, Turkey, Sep. 2005, CD-ROM.
- (2005) In Proc. 13th Eur. Signal Process. Conf., Antalya, Turkey
- Arias, J.A.¹ Pinquier, J.² Andre-Obrecht, R.³

11
- 85009282223
- "Speaker change detection using a new weighted distance measure,"
- Sep.
- S. Know and S. Narayanan, "Speaker change detection using a new weighted distance measure," in Proc. Int. Conf. Spoken Lang., Sep. 2002, vol. 4, pp. 2537-2540.
- (2002) In Proc. Int. Conf. Spoken Lang. , vol.4 , pp. 2537-2540
- Know, S.¹ Narayanan, S.²

12
- 0034273195
- "DISTBIC: A speaker-based segmen-tation for audio data indexing,"
- Sep.
- P. Delacourt and C. J. Wellekens, "DISTBIC: A speaker-based segmentation for audio data indexing," Speech Commun., vol. 32, pp. 111-126, Sep. 2000.
- (2000) Speech Commun. , vol.32 , pp. 111-126
- Delacourt, P.¹ Wellekens, C.J.²

13
- 17444365032
- "Unsupervised speaker segmentation and tracking in real-time audio content analysis,"
- Apr.
- L. Lu and H. Zhang, "Unsupervised speaker segmentation and tracking in real-time audio content analysis," Multimedia Syst., vol. 10, no. 4, pp. 332-343, Apr. 2005.
- (2005) Multimedia Syst. , vol.10 , Issue.4 , pp. 332-343
- Lu, L.¹ Zhang, H.²

14
- 33644539859
- "Audio-based description and structuring of videos,"
- Feb.
- H. Harb and L. Chen, "Audio-based description and structuring of videos," Int. J. Digital Libraries, vol. 6, no. 1, pp. 70-81, Feb. 2006.
- (2006) Int. J. Digital Libraries , vol.6 , Issue.1 , pp. 70-81
- Harb, H.¹ Chen, L.²

15
- 22544475615
- 2 statistic and the Bayesian information criterion,"
- Jul.
- B. Zhou and J. H. L. Hansen, "Efficient audio stream segmentation via the combined T2 statistic and the Bayesian information criterion," IEEE Trans. Audio, Speech, Lang. Process., vol. 13, no. 4, pp. 467-174, Jul. 2005.
- (2005) IEEE Trans. Audio, Speech, Lang. Process. , vol.13 , Issue.4 , pp. 467-474
- Zhou, B.¹ Hansen, J.H.L.²

16
- 85008020310
- "SpeechFind: Advances in spoken document retrieval for a national gallery of the spoken word,"
- Sep.
- J. H. L. Hansen, R. Huang, B. Zhou, M. Seadle, J. R. Deller, A. R. Gurijala, M. Kurimo, and P. Angkititrakul, "SpeechFind: Advances in spoken document retrieval for a national gallery of the spoken word," IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 712-730, Sep. 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.5 , pp. 712-730
- Hansen, J.H.L.¹ Huang, R.² Zhou, B.³ Seadle, M.⁴ Deller, J.R.⁵ Gurijala, A.R.⁶ Kurimo, M.⁷ Angkititrakul, P.⁸

17
- 0141519364
- "Efficient audio segmentation algorithms based on the BIC,"
- Apr.
- M. Cettolo and M. Vescovi, "Efficient audio segmentation algorithms based on the BIC," in Proc. 2003 IEEE Int. Conf. Acoust., Speech, Signal Process., Hong Kong, Apr. 2003, vol. 6, pp. 537-540.
- (2003) In Proc. 2003 IEEE Int. Conf. Acoust., Speech, Signal Process., Hong Kong , vol.6 , pp. 537-540
- Cettolo, M.¹ Vescovi, M.²

18
- 3543144948
- "Robust speaker change de-tection,"
- Aug.
- J. Ajmera, I. McCowan, and H. Bourlard, "Robust speaker change de-tection," IEEE Signal Process. Lett., vol. 11, no. 8, pp. 649-651, Aug. 2004.
- (2004) IEEE Signal Process. Lett. , vol.11 , Issue.8 , pp. 649-651
- Ajmera, J.¹ McCowan, I.² Bourlard, H.³

19
- 33646380923
- "Approaches and applica-tions of audio diarization,"
- Mar.
- D. A. Reynolds and P. Torres-Carrasquillo, "Approaches and applica-tions of audio diarization," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Philadelphia, PA, Mar. 2005, vol. 5, pp. 953-956.
- (2005) In Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Philadelphia, PA , vol.5 , pp. 953-956
- Reynolds, D.A.¹ Torres-Carrasquillo, P.²

20
- 33745000055
- "Automatic segmenta-tion and identification of mixed-language speech using delta-BIC and LSA-based GMMs,"
- Jan.
- C. H. Wu, Y. H. Chiu, C. J. Shia, and C. Y. Lin, "Automatic segmenta-tion and identification of mixed-language speech using delta-BIC and LSA-based GMMs," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 1, pp. 266-276, Jan. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.1 , pp. 266-276
- Wu, C.H.¹ Chiu, Y.H.² Shia, C.J.³ Lin, C.Y.⁴

21
- 34247559206
- Automatic speaker segmentation using multiple features and distance measures: A comparison of three approaches
- DOI 10.1109/ICME.2006.262727, 4036796, 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings
- M. Kotti, L. G. P. M. Martins, E. Benetos, J. S. Cardoso, and C. Kotropoulos, "Automatic speaker segmentation using multiple fea-tures and distance measures: A comparison of three approaches," in Proc. IEEE Int. Conf. Multimedia Expo, Toronto, ON, Canada, Jul. 2006, pp. 1101-1104. (Pubitemid 46679913)
- (2006) 2006 IEEE International Conference on Multimedia and Expo, ICME 2006 - Proceedings , vol.2006 , pp. 1101-1104
- Kotti, M.¹ Martins, L.G.P.M.² Benetos, E.³ Cardoso, J.S.⁴ Kotropoulos, C.⁵

22
- 33947127409
- Multiple change-point audio segmentation and classification using an MDL-based Gaussian model
- DOI 10.1109/TSA.2005.852988
- C. H. Wu and C. H. Hsieh, "Multiple change-point audio segmentation and classification using an MDL-based Gaussian model," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 2, pp. 647-657, Mar. 2006. (Pubitemid 46405361)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.2 , pp. 647-657
- Wu, C.-H.¹ Hsieh, C.-H.²

23
- 85009128756
- "Metric SEQDAC: A hybrid approach for audio segmentation,"
- Jeju, Korea, Oct.
- S. Cheng and H. Wang, "Metric SEQDAC: A hybrid approach for audio segmentation," in Proc. 8th Int. Conf. Spoken Lang. Process., Jeju, Korea, Oct. 2004, pp. 1617-1620.
- (2004) In Proc. 8th Int. Conf. Spoken Lang. Process. , pp. 1617-1620
- Cheng, S.¹ Wang, H.²

24
- 84902981532
- London, U.K.: Wiley
- F. Van der Heijden, R. P. W. Duin, D. de Ridder, and D. M. J. Tax, Classification, Parameter Estimation and State Estimation: An Engi-neering Approach Using MATLAB. London, U.K.: Wiley, 2004.
- (2004) Classification, Parameter Estimation and State Estimation: An Engi-neering Approach Using MATLAB
- Van Der Heijden, F.¹ Duin, R.P.W.² De Ridder, D.³ Tax, D.M.J.⁴

25
- 0001011286
- "Robust procedures in multivariate analysis I: Robust covariance estimation,"
- N. A. Campbell, "Robust procedures in multivariate analysis I: Robust covariance estimation," Appl. Statist., vol. 29, no. 3, pp. 231-237,1980.
- (1980) Appl. Statist. , vol.29 , Issue.3 , pp. 231-237
- Campbell, N.A.¹

26
- 0004255301
- New York: Wiley
- G. A. F. Seber, Multivariate Observations. New York: Wiley, 1994.
- (1994) Multivariate Observations
- Seber, G.A.F.¹

27
- 0032633354
- "Covariance estimation with limited training samples,"
- Jul.
- S. Tadjudin and D. A. Landgrebe, "Covariance estimation with limited training samples,," IEEE Trans. Geosci. Remote Sen., vol. 37, no. 4, pp. 2113-2118, Jul. 1999.
- (1999) IEEE Trans. Geosci. Remote Sen. , vol.37 , Issue.4 , pp. 2113-2118
- Tadjudin, S.¹ Landgrebe, D.A.²

28
- 3042518464
- "DARPA TIMIT Acoustic-phonetic continuous speech corpus,"
- Philadelphia, PA
- J. S. Garofolo, "DARPA TIMIT Acoustic-phonetic continuous speech corpus," in Linguistic Data Consortium, Philadelphia, PA, 1993.
- In Linguistic Data Consortium , vol.1993
- Garofolo, J.S.¹

29
- 0003454580
- Basel, Germany: Birkhauser Verlag
- R. D. Reiss and M. Thomas, Statistical Analysis of Extreme Values. Basel, Germany: Birkhauser Verlag, 1997.
- (1997) Statistical Analysis of Extreme Values
- Reiss, R.D.¹ Thomas, M.²

30
- 35648941166
- A neural network approach to audio-assisted movie dialogue detection
- DOI 10.1016/j.neucom.2007.08.006, PII S0925231207002275, Dedicated Hardware Architectures for Intelligent Systems
- M. Kotti, E. Benetos, C. Kotropoulos, and I. Pitas, "A neural network approach to audio-assisted movie dialogue detection," Neurocomput., Special Iss.: Adv. Neural Netw. for Speech Audio Process., vol. 71, no. 1-3, pp. 157-166, Dec. 2007. (Pubitemid 350028667)
- (2007) Neurocomputing , vol.71 , Issue.1-3 , pp. 157-166
- Kotti, M.¹ Benetos, E.² Kotropoulos, C.³ Pitas, I.⁴

31
- 0000995459
- "The inverse Gaussian distribution and its statistical application-A review,"
- J. L. Folks and R. S. Chhikara, "The inverse Gaussian distribution and its statistical application-A review," J. R. Statist. Soc. B, vol. 40, pp. 263-289, 1978.
- (1978) J. R. Statist. Soc. B , vol.40 , pp. 263-289
- Folks, J.L.¹ Chhikara, R.S.²

32
- 0003446320
- New York: Wiley
- N. L. Johnson, S. Kotz, and S. Balakrishnan, Continuous Univariate Distributions. , New York: Wiley, 1994, vol. 1.
- (1994) Continuous Univariate Distributions , vol.1
- Johnson, N.L.¹ Kotz, S.² Balakrishnan, S.³

33
- 0036920132
- "Perpetual American pro-cesses under Levi processes,"
- Jun.
- S. I. Boyarchenko and S. Z. Levendorskii, "Perpetual American pro-cesses under Levi processes," SI AM J. Control Optim., vol. 40, no. 6, pp. 1514-1516, Jun. 2001.
- (2001) SI AM J. Control Optim. , vol.40 , Issue.6 , pp. 1514-1516
- Boyarchenko, S.I.¹ Levendorskii, S.Z.²

34
- 0001302407
- "Statistical properties of inverse Gaussian distribu-tions I,"
- Jun.
- M. C. K. Tweedie, "Statistical properties of inverse Gaussian distribu-tions I," Ann. Math. Statist., vol. 28, no. 2, pp. 362-377, Jun. 1957.
- (1957) Ann. Math. Statist. , vol.28 , Issue.2 , pp. 362-377
- Tweedie, M.C.K.¹

35
- 0004056285
- Upper River Saddle, NJ: Pearson Education-Prentice-Hall
- X. D. Huang, A. Acero, and H.-S. Hon, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development. Upper River Saddle, NJ: Pearson Education-Prentice-Hall, 2001.
- (2001) Spoken Language Processing: A Guide to Theory, Algorithm, and System Development
- Huang, X.D.¹ Acero, A.² Hon, H.-S.³

36
- 0031381525
- Wrappers for feature subset selection
- PII S000437029700043X
- R. Kohavi and G. H. John, "Wrappers for feature subset selection," Artif. lntell, vol. 97, no. 1-2, pp. 273-324, Dec. 1997. (Pubitemid 127401107)
- (1997) Artificial Intelligence , vol.97 , Issue.1-2 , pp. 273-324
- Kohavi, R.¹ John, G.H.²

37
- 35348882681
- Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian information criterion
- DOI 10.1016/j.specom.2007.06.005, PII S0167639307001197
- G. Almpanidis and C. Kotropoulos, "Phonemic segmentation using the generalised Gamma distribution and small sample Bayesian informa-tion criterion," Speech Commun., vol. 50, no. 1, pp. 38-55, Jan. 2008. (Pubitemid 47576260)
- (2008) Speech Communication , vol.50 , Issue.1 , pp. 38-55
- Almpanidis, G.¹ Kotropoulos, C.²

38
- 38949122754
- "Speaker segmentation and clustering,"
- May
- M. Kotti, V. Moschou, and C. Kotropoulos, "Speaker segmentation and clustering," Signal Process., vol. 88, no. 5, pp. 1091-1124, May 2008.
- (2008) Signal Process. , vol.88 , Issue.5 , pp. 1091-1124
- Kotti, M.¹ Moschou, V.² Kotropoulos, C.³

39
- 84900334310
- "The Tukey honestly significant dif-ference procedure and its control of the type I error-rate,"
- New Orleans, LA, CD-ROM
- J. J. Barnette and J. E. McLean, "The Tukey honestly significant dif-ference procedure and its control of the type I error-rate," in Proc. Annu. Meeting Mid-South Edit. Res. Assoc, New Orleans, LA, 1998, CD-ROM.
- (1998) In Proc. Annu. Meeting Mid-South Edit. Res. Assoc
- Barnette, J.J.¹ McLean, J.E.²

40
- 34047266609
- Multistage speaker diarization of broadcast news
- DOI 10.1109/TASL.2006.878261
- C. Barras, X. Zhu, S. Meignier, and J. L. Gauvain, "Multistage speaker diarization of broadcast news," IEEE Trans. Audio Speech, Lang. Process., vol. 14, no. 5, pp. 1505-1512, Sep. 2006. (Pubitemid 46547577)
- (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.5 , pp. 1505-1512
- Barras, C.¹ Zhu, X.² Meignier, S.³ Gauvain, J.-L.⁴

41
- 66149089715
- "1997 english broadcast news speech (HUB4),"
- Philadelphia, PA
- J. Fiscus, "1997 English broadcast news speech (HUB4)," Linguistic Data Consortium, Philadelphia, PA, 1998.
- (1998) Linguistic Data Consortium
- Fiscus, J.¹

42
- 66149132423
- "1997 HUB4 english evaluation speech and transcripts,"
- Philadelphia, PA
- D. Graff, J. Fiscus, and J. Garofolo, "1997 HUB4 English evaluation speech and transcripts," Linguistic Data Consortium, Philadelphia, PA, 2002.
- (2002) Linguistic Data Consortium
- Graff, D.¹ Fiscus, J.² Garofolo, J.³

43
- 0003438512
- New York: Wiley
- R. R. Korfhase, Information Storage and Retrieval. New York: Wiley, 1997.
- (1997) Information Storage and Retrieval
- Korfhase, R.R.¹

44
- 66149128215
- "RT-03 MDE training data speech,"
- Philadelphia, PA
- S. Strassel, C. Walker, and H. Lee, "RT-03 MDE training data speech," Linguistic Data Consortium, Philadelphia, PA, 2004.
- (2004) Linguistic Data Consortium
- Strassel, S.¹ Walker, C.² Lee, H.³

45
- 0004236492
- 3rd ed. Bal-timore, MD: Johns Hopkins Univ. Press
- G.H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. Bal-timore, MD: Johns Hopkins Univ. Press, 1996.
- (1996) Matrix Computations
- Golub, G.H.¹ Van Loan, C.F.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.