메뉴 건너뛰기




Volumn 14, Issue 3, 2006, Pages 907-919

Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora

Author keywords

Audio classification; Audio segmentation; Bayesian information criterion; Broadcast news transcription; Feature analysis; Feature processing; Gaussian mixture model (GMM) networks; Noisy environments; Rich transcription; Speaker segmentation

Indexed keywords

AUDIO CLASSIFICATION; AUDIO SEGMENTATION; BAYESIAN INFORMATION CRITERION; BROADCAST NEWS TRANSCRIPTION; FEATURE ANALYSIS; FEATURE PROCESSING; NOISY ENVIRONMENTS; RICH TRANSCRIPTION; SPEAKER SEGMENTATION; WEIGHTED GMM NETWORKS (WGN);

EID: 34047274787     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSA.2005.858057     Document Type: Article
Times cited : (72)

References (41)
  • 1
    • 0036288688 scopus 로고    scopus 로고
    • A new speaker change detection method for two-speaker segmentation
    • Orlando, FL
    • A. Adami, S. Kajarekar, and H. Hermansky, "A new speaker change detection method for two-speaker segmentation," in Proc. ICASSP, vol. 4, Orlando, FL, 2002, pp. 13-17.
    • (2002) Proc. ICASSP , vol.4 , pp. 13-17
    • Adami, A.1    Kajarekar, S.2    Hermansky, H.3
  • 3
    • 0003648234 scopus 로고
    • An Introduction to Multivariate Statistical Analysis
    • T. Anderson, An Introduction to Multivariate Statistical Analysis. New York: Wiley, 1958, pp. 101-125.
    • (1958) New York: Wiley , pp. 101-125
    • Anderson, T.1
  • 4
    • 4544345457 scopus 로고    scopus 로고
    • Model selection criteria for acoustic segmentation
    • Paris, France, Sep
    • M. Cettolo and M. Federico, "Model selection criteria for acoustic segmentation," in Proc. ISCA ITRW ASR2000 Workshop, Paris, France, Sep. 2000, pp. 221-227.
    • (2000) Proc. ISCA ITRW ASR2000 Workshop , pp. 221-227
    • Cettolo, M.1    Federico, M.2
  • 5
    • 0002595416 scopus 로고    scopus 로고
    • Speaker, environment and channel change detection and clustering via the Bayesian information criterion
    • Lansdowne, VA
    • S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," in Proc. Broadcast News Transcr. Under. Workshop, Lansdowne, VA, 1998, pp. 127-132.
    • (1998) Proc. Broadcast News Transcr. Under. Workshop , pp. 127-132
    • Chen, S.1    Gopalakrishnan, P.2
  • 7
    • 0034842452 scopus 로고    scopus 로고
    • M VDR-based feature extraction for robust speech recognition
    • Salt Lake City, UT
    • S. Dharanipragada and B. Rao, "M VDR-based feature extraction for robust speech recognition," in Proc. ICASSP 2001, vol. 1, Salt Lake City, UT, 2001, pp. 7-11.
    • (2001) Proc. ICASSP 2001 , vol.1 , pp. 7-11
    • Dharanipragada, S.1    Rao, B.2
  • 8
    • 0027622731 scopus 로고
    • Cepstral parameter compensation for HMM recognition in noise
    • M. J. F. Gales and S. J. Young, "Cepstral parameter compensation for HMM recognition in noise," Speech Commun., vol. 12, pp. 231-239, 1993.
    • (1993) Speech Commun , vol.12 , pp. 231-239
    • Gales, M.J.F.1    Young, S.J.2
  • 9
    • 85128356454 scopus 로고    scopus 로고
    • Partitioning arid transcription of broadcast news data
    • Sydney, Australia, Dec
    • J.-L. Gauvain, L. Lamel, and G. Adda, "Partitioning arid transcription of broadcast news data," in Proc. ICSLP, vol. 2, Sydney, Australia, Dec 1998, pp. 1335-1338.
    • (1998) Proc. ICSLP , vol.2 , pp. 1335-1338
    • Gauvain, J.-L.1    Lamel, L.2    Adda, G.3
  • 10
    • 0036567851 scopus 로고    scopus 로고
    • The LIMSI broadcast news transcription system
    • J.-L. Gauvain, L. Lamel, and G. Adda, "The LIMSI broadcast news transcription system," Speech Commun., vol. 37, pp. 89-108, 2002.
    • (2002) Speech Commun , vol.37 , pp. 89-108
    • Gauvain, J.-L.1    Lamel, L.2    Adda, G.3
  • 14
    • 1842658044 scopus 로고
    • Robust feature estimation and objective quality assessment for noisy speech recognition using the credit card corpus
    • May
    • J. H. L. Hansen and L. Arslan, "Robust feature estimation and objective quality assessment for noisy speech recognition using the credit card corpus," IEEE Trans. Speech Audio Process., vol. 3, no. 3, pp. 169-184, May 1995.
    • (1995) IEEE Trans. Speech Audio Process , vol.3 , Issue.3 , pp. 169-184
    • Hansen, J.H.L.1    Arslan, L.2
  • 15
    • 0006337427 scopus 로고    scopus 로고
    • A study of broadcast news audio stream segmentation and segment clustering
    • Sep
    • M. Harris, X. Aubert, R. Haeb-Umbach, and P. Beyerlein, "A study of broadcast news audio stream segmentation and segment clustering," EuroSpeech, vol. 3, pp. 1027-1030, Sep 1999.
    • (1999) EuroSpeech , vol.3 , pp. 1027-1030
    • Harris, M.1    Aubert, X.2    Haeb-Umbach, R.3    Beyerlein, P.4
  • 16
    • 0141588436 scopus 로고    scopus 로고
    • Speaker tracking,
    • M.S. thesis, Eng. Dept, Cambridge Univ, U.K
    • S. Johnson, "Speaker tracking," M.S. thesis, Eng. Dept., Cambridge Univ., U.K., 1997.
    • (1997)
    • Johnson, S.1
  • 18
    • 0022806994 scopus 로고
    • Spectral analysis and discrimination by zero-crossings
    • Nov
    • B. Kedem, "Spectral analysis and discrimination by zero-crossings," Proc. IEEE, vol. 74, no. 11, pp. 1477-1493, Nov. 1986.
    • (1986) Proc. IEEE , vol.74 , Issue.11 , pp. 1477-1493
    • Kedem, B.1
  • 19
    • 0027210171 scopus 로고
    • Some useful properties of teager's energy operator
    • Minneapolis, MN, Apr
    • J. F. Kaiser, "Some useful properties of teager's energy operator," in Proc. ICASSP, vol. 3, Minneapolis, MN, Apr. 1993, pp. 149-152.
    • (1993) Proc. ICASSP , vol.3 , pp. 149-152
    • Kaiser, J.F.1
  • 20
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang , vol.9 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 21
    • 0034273520 scopus 로고    scopus 로고
    • Content-based audio classification and retrieval using the nearest feature line method
    • Sep
    • S. Z. Li, "Content-based audio classification and retrieval using the nearest feature line method," IEEE Trans. Speech Audio Process., vol. 8, no. 5, pp. 619-625, Sep. 2000.
    • (2000) IEEE Trans. Speech Audio Process , vol.8 , Issue.5 , pp. 619-625
    • Li, S.Z.1
  • 22
    • 0036816475 scopus 로고    scopus 로고
    • Content analysis for audio classification and segmentation
    • Oct
    • L. Lu, H. Zhang, and H. Jiang, "Content analysis for audio classification and segmentation," IEEE Trans. Speech Audio Process., vol. 10, no. 7, pp. 504-516, Oct. 2002.
    • (2002) IEEE Trans. Speech Audio Process , vol.10 , Issue.7 , pp. 504-516
    • Lu, L.1    Zhang, H.2    Jiang, H.3
  • 23
    • 0037700756 scopus 로고    scopus 로고
    • Speaker change detection and tracking in real-time news broadcasting analysis
    • Juan-les-Pins, France, Dec
    • L. Lu and H. Zhang, "Speaker change detection and tracking in real-time news broadcasting analysis," in Proc. ACM Multimedia, Juan-les-Pins, France, Dec. 2002, pp. 602-610.
    • (2002) Proc. ACM Multimedia , pp. 602-610
    • Lu, L.1    Zhang, H.2
  • 24
    • 85135164500 scopus 로고    scopus 로고
    • Evaluating feature set performance using the f-ratio and j-measures
    • Rhodes, Greece, Sep
    • S. Nicholson, B. Milner, and S. Cox, "Evaluating feature set performance using the f-ratio and j-measures," in Proc. EuroSpeech, vol. 1, Rhodes, Greece, Sep. 1997, pp. 413-416.
    • (1997) Proc. EuroSpeech , vol.1 , pp. 413-416
    • Nicholson, S.1    Milner, B.2    Cox, S.3
  • 25
    • 34047246071 scopus 로고    scopus 로고
    • The National Gallery of the Spoken Word NGSW, Online] Available
    • The National Gallery of the Spoken Word (NGSW). [Online] Available: http://www.ngsw.org/
  • 27
    • 0029765670 scopus 로고    scopus 로고
    • Real time discrimination of broadcast speech/music
    • Atlanta, GA, May
    • J. Saunders, "Real time discrimination of broadcast speech/music," in Proc. ICASSP, vol. 2, Atlanta, GA, May 1996, pp. 993-996.
    • (1996) Proc. ICASSP , vol.2 , pp. 993-996
    • Saunders, J.1
  • 28
    • 0030648077 scopus 로고    scopus 로고
    • Construction and evaluation of a robust multifeature speech/music discriminator
    • Munich, Germany
    • E. Scheirer and M. Slaney, "Construction and evaluation of a robust multifeature speech/music discriminator," in Proc. ICASSP, vol. 2, Munich, Germany, 1997, pp. 1331-1334.
    • (1997) Proc. ICASSP , vol.2 , pp. 1331-1334
    • Scheirer, E.1    Slaney, M.2
  • 29
    • 0002782496 scopus 로고    scopus 로고
    • Automatic segmentation, classification and clustering of broadcast news audio
    • Chantilly, VA
    • M. Siegler, U. Jain, B. Raj, and R. M. Stem, "Automatic segmentation, classification and clustering of broadcast news audio," in Proc. DARPA Speech Recognition Workshop, Chantilly, VA, 1997, pp. 97-99.
    • (1997) Proc. DARPA Speech Recognition Workshop , pp. 97-99
    • Siegler, M.1    Jain, U.2    Raj, B.3    Stem, R.M.4
  • 30
    • 34047260339 scopus 로고    scopus 로고
    • Specification of the 1996 Hub4 broadcast news evaluation
    • Chantilly, VA
    • R. M. Stem, "Specification of the 1996 Hub4 broadcast news evaluation," in Proc. DARPA Speech Recognition Workshop, Chantilly, VA, 1997, pp. 3-6.
    • (1997) Proc. DARPA Speech Recognition Workshop , pp. 3-6
    • Stem, R.M.1
  • 31
    • 34047273844 scopus 로고    scopus 로고
    • SpeechFind, On-Line Search Engine for the National Gallery of the Spoken Word. [Online]. Available: http://SpeechFind.colorado.edu/
    • SpeechFind, On-Line Search Engine for the National Gallery of the Spoken Word. [Online]. Available: http://SpeechFind.colorado.edu/
  • 32
    • 34047268671 scopus 로고    scopus 로고
    • TIMIT (recorded at Texas Instruments, transcribed at Mass. Inst. Technol.) Acoustic-Phonetic Continuous Speech Corpus. [Online]. Available: http://www.ldc.upenn.edu/
    • TIMIT (recorded at Texas Instruments, transcribed at Mass. Inst. Technol.) Acoustic-Phonetic Continuous Speech Corpus. [Online]. Available: http://www.ldc.upenn.edu/
  • 34
    • 0032657771 scopus 로고    scopus 로고
    • Progress in broadcast news transcription at dragon systems
    • Mar
    • S. Wegmann, P. Zhan, and L. Gillick, "Progress in broadcast news transcription at dragon systems," Proc. ICASSP, vol. 1, pp. 33-36, Mar. 1999.
    • (1999) Proc. ICASSP , vol.1 , pp. 33-36
    • Wegmann, S.1    Zhan, P.2    Gillick, L.3
  • 35
    • 0030242072 scopus 로고    scopus 로고
    • Content-based classification, search and retrieval of audio
    • E. Wold, T. Blum, D. Keislar, and J. Wheaton, "Content-based classification, search and retrieval of audio," IEEE Multimedia Mag., vol. 3, no. 3, pp. 27-36, 1996.
    • (1996) IEEE Multimedia Mag , vol.3 , Issue.3 , pp. 27-36
    • Wold, E.1    Blum, T.2    Keislar, D.3    Wheaton, J.4
  • 36
    • 34047264547 scopus 로고    scopus 로고
    • Perceptual MVDR-based cepstral coefficients (PMCCs) for noise-robust speech recognition
    • Hong Kong, China, Apr
    • U. Yapanel and S. Dharanipragada, "Perceptual MVDR-based cepstral coefficients (PMCCs) for noise-robust speech recognition," in Proc. ICASSP, vol. I, Hong Kong, China, Apr. 2003, pp. 6-10.
    • (2003) Proc. ICASSP , vol.1 , pp. 6-10
    • Yapanel, U.1    Dharanipragada, S.2
  • 37
    • 85009164449 scopus 로고    scopus 로고
    • A new perspective on feature extraction for robust in-vehicle speech recognition
    • Geneva, Switzerland, Sep
    • U. Yapanel and J. H. L. Hansen, "A new perspective on feature extraction for robust in-vehicle speech recognition," in Proc. EuroSpeech, Geneva, Switzerland, Sep. 2003, pp. 1281-1284.
    • (2003) Proc. EuroSpeech , pp. 1281-1284
    • Yapanel, U.1    Hansen, J.H.L.2
  • 38
    • 0035340677 scopus 로고    scopus 로고
    • Audio content analysis for online audiovisual data segmentation and classification
    • Jul
    • T. Zhang and C.-C. J. Kuo, "Audio content analysis for online audiovisual data segmentation and classification," IEEE Trans. Speech Audio Process., vol. 9, no. 4, pp. 441-457, Jul. 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.4 , pp. 441-457
    • Zhang, T.1    Kuo, C.-C.J.2
  • 39
    • 85009089453 scopus 로고    scopus 로고
    • Unsupervised audio stream segmentation and clustering via the Baysian information criterion
    • Beijing, China, Oct
    • B. Zhou and J. H. L. Hansen, "Unsupervised audio stream segmentation and clustering via the Baysian information criterion," in Proc. ICSLP 2000, vol. 1, Beijing, China, Oct. 2000, pp. 714-717.
    • (2000) Proc. ICSLP 2000 , vol.1 , pp. 714-717
    • Zhou, B.1    Hansen, J.H.L.2
  • 40
    • 85009275098 scopus 로고    scopus 로고
    • Spcechfind: An experimental on-line spoken document retrieval system for historical audio archives
    • Denver, CO, Sep
    • B. Zhou and J. H. L. Hansen, "Spcechfind: an experimental on-line spoken document retrieval system for historical audio archives," in Proc. ICSLP, vol. 3, Denver, CO, Sep. 2002, pp. 1969-1972.
    • (2002) Proc. ICSLP , vol.3 , pp. 1969-1972
    • Zhou, B.1    Hansen, J.H.L.2
  • 41
    • 0035278948 scopus 로고    scopus 로고
    • Nonlinear feature based classification of speech under stress
    • May
    • G. Zhou, J. H. L. Hansen, and J. F. Kaiser, "Nonlinear feature based classification of speech under stress," IEEE Trans. Speech Audio Process., vol. 9. no. 3, pp. 201-216, May 2001.
    • (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.3 , pp. 201-216
    • Zhou, G.1    Hansen, J.H.L.2    Kaiser, J.F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.