SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 14, Issue 3, 2006, Pages 907-919

Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora

(2) Rongqing, Huang a,b,c Hansen, John H L a,b,c

a IEEE (United States)

b UNIVERSITY OF COLORADO (United States)

c UNIVERSITY OF TEXAS AT DALLAS (United States)

Author keywords

Audio classification; Audio segmentation; Bayesian information criterion; Broadcast news transcription; Feature analysis; Feature processing; Gaussian mixture model (GMM) networks; Noisy environments; Rich transcription; Speaker segmentation

Indexed keywords

AUDIO CLASSIFICATION; AUDIO SEGMENTATION; BAYESIAN INFORMATION CRITERION; BROADCAST NEWS TRANSCRIPTION; FEATURE ANALYSIS; FEATURE PROCESSING; NOISY ENVIRONMENTS; RICH TRANSCRIPTION; SPEAKER SEGMENTATION; WEIGHTED GMM NETWORKS (WGN);

ALGORITHMS; CLASSIFICATION (OF INFORMATION); FILTER BANKS; GAUSSIAN DISTRIBUTION; SPEECH RECOGNITION;

ACOUSTIC SIGNAL PROCESSING;

EID: 34047274787 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TSA.2005.858057 Document Type: Article

Times cited : (72)

References (41)

1
- 0036288688
- A new speaker change detection method for two-speaker segmentation
- Orlando, FL
- A. Adami, S. Kajarekar, and H. Hermansky, "A new speaker change detection method for two-speaker segmentation," in Proc. ICASSP, vol. 4, Orlando, FL, 2002, pp. 13-17.
- (2002) Proc. ICASSP , vol.4 , pp. 13-17
- Adami, A.¹ Kajarekar, S.² Hermansky, H.³

2
- 34047256659
- IDIAP, Martigny, Switzerland, Tech. Rep. IDIAP-RR 01-26
- J. Ajmera and I. McCowan, "Speech/music discrimination using entropy and dynamism features in a HMM classification framework," IDIAP, Martigny, Switzerland, Tech. Rep. IDIAP-RR 01-26, 2001.
- (2001) Speech/music discrimination using entropy and dynamism features in a HMM classification framework
- Ajmera, J.¹ McCowan, I.²

3
- 0003648234
- An Introduction to Multivariate Statistical Analysis
- T. Anderson, An Introduction to Multivariate Statistical Analysis. New York: Wiley, 1958, pp. 101-125.
- (1958) New York: Wiley , pp. 101-125
- Anderson, T.¹

4
- 4544345457
- Model selection criteria for acoustic segmentation
- Paris, France, Sep
- M. Cettolo and M. Federico, "Model selection criteria for acoustic segmentation," in Proc. ISCA ITRW ASR2000 Workshop, Paris, France, Sep. 2000, pp. 221-227.
- (2000) Proc. ISCA ITRW ASR2000 Workshop , pp. 221-227
- Cettolo, M.¹ Federico, M.²

5
- 0002595416
- Speaker, environment and channel change detection and clustering via the Bayesian information criterion
- Lansdowne, VA
- S. Chen and P. Gopalakrishnan, "Speaker, environment and channel change detection and clustering via the Bayesian information criterion," in Proc. Broadcast News Transcr. Under. Workshop, Lansdowne, VA, 1998, pp. 127-132.
- (1998) Proc. Broadcast News Transcr. Under. Workshop , pp. 127-132
- Chen, S.¹ Gopalakrishnan, P.²

6
- 0003424145
- 2nd ed. New York: IEEE Press
- J. R. Deller, J. H. L. Hansen, and J. G. Proakis, Discrete-time processing of speech signals, 2nd ed. New York: IEEE Press, 2000, pp. 245-251.
- (2000) Discrete-time processing of speech signals , pp. 245-251
- Deller, J.R.¹ Hansen, J.H.L.² Proakis, J.G.³

7
- 0034842452
- M VDR-based feature extraction for robust speech recognition
- Salt Lake City, UT
- S. Dharanipragada and B. Rao, "M VDR-based feature extraction for robust speech recognition," in Proc. ICASSP 2001, vol. 1, Salt Lake City, UT, 2001, pp. 7-11.
- (2001) Proc. ICASSP 2001 , vol.1 , pp. 7-11
- Dharanipragada, S.¹ Rao, B.²

8
- 0027622731
- Cepstral parameter compensation for HMM recognition in noise
- M. J. F. Gales and S. J. Young, "Cepstral parameter compensation for HMM recognition in noise," Speech Commun., vol. 12, pp. 231-239, 1993.
- (1993) Speech Commun , vol.12 , pp. 231-239
- Gales, M.J.F.¹ Young, S.J.²

9
- 85128356454
- Partitioning arid transcription of broadcast news data
- Sydney, Australia, Dec
- J.-L. Gauvain, L. Lamel, and G. Adda, "Partitioning arid transcription of broadcast news data," in Proc. ICSLP, vol. 2, Sydney, Australia, Dec 1998, pp. 1335-1338.
- (1998) Proc. ICSLP , vol.2 , pp. 1335-1338
- Gauvain, J.-L.¹ Lamel, L.² Adda, G.³

10
- 0036567851
- The LIMSI broadcast news transcription system
- J.-L. Gauvain, L. Lamel, and G. Adda, "The LIMSI broadcast news transcription system," Speech Commun., vol. 37, pp. 89-108, 2002.
- (2002) Speech Commun , vol.37 , pp. 89-108
- Gauvain, J.-L.¹ Lamel, L.² Adda, G.³

11
- 0017097474
- Distance measures for speech processing
- Oct
- A. H. Gray Jr. and J. D. Markel, "Distance measures for speech processing," IEEE Trans. Acoust., Speech Signal Process., vol. ASSP-24, no. 5, pp. 380-391, Oct. 1976.
- (1976) IEEE Trans. Acoust., Speech Signal Process , vol.ASSP-24 , Issue.5 , pp. 380-391
- Gray Jr., A.H.¹ Markel, J.D.²

12
- 0002751623
- Segment generation and clustering in the HTK broadcast news transcription system
- Lansdowne, VA
- T. Hain, S. Johnson, A. Tuerk, P. Woodland, and S. Young, "Segment generation and clustering in the HTK broadcast news transcription system," in DARPA Broadcast News Transcr. Under. Workshop, Lansdowne, VA, 1998, pp. 133-137.
- (1998) DARPA Broadcast News Transcr. Under. Workshop , pp. 133-137
- Hain, T.¹ Johnson, S.² Tuerk, A.³ Woodland, P.⁴ Young, S.⁵

13
- 34047274021
- CU-Move: Advanced in in-vehicle speech systems for route navigation
- H. Abut, J. H. L. Hansen, and K. Takeda, Eds. New York: Springer-Verlag
- J. H. L. Hansen, X. Zhang, M. Akbacak, U. Yapanel, B. Pellom, W. Ward, and P. Angkititrakul, "CU-Move: Advanced in in-vehicle speech systems for route navigation," in DSP for In-Vehicle and Mobile Systems, H. Abut, J. H. L. Hansen, and K. Takeda, Eds. New York: Springer-Verlag, 2004.
- (2004) DSP for In-Vehicle and Mobile Systems
- Hansen, J.H.L.¹ Zhang, X.² Akbacak, M.³ Yapanel, U.⁴ Pellom, B.⁵ Ward, W.⁶ Angkititrakul, P.⁷

14
- 1842658044
- Robust feature estimation and objective quality assessment for noisy speech recognition using the credit card corpus
- May
- J. H. L. Hansen and L. Arslan, "Robust feature estimation and objective quality assessment for noisy speech recognition using the credit card corpus," IEEE Trans. Speech Audio Process., vol. 3, no. 3, pp. 169-184, May 1995.
- (1995) IEEE Trans. Speech Audio Process , vol.3 , Issue.3 , pp. 169-184
- Hansen, J.H.L.¹ Arslan, L.²

15
- 0006337427
- A study of broadcast news audio stream segmentation and segment clustering
- Sep
- M. Harris, X. Aubert, R. Haeb-Umbach, and P. Beyerlein, "A study of broadcast news audio stream segmentation and segment clustering," EuroSpeech, vol. 3, pp. 1027-1030, Sep 1999.
- (1999) EuroSpeech , vol.3 , pp. 1027-1030
- Harris, M.¹ Aubert, X.² Haeb-Umbach, R.³ Beyerlein, P.⁴

16
- 0141588436
- Speaker tracking,
- M.S. thesis, Eng. Dept, Cambridge Univ, U.K
- S. Johnson, "Speaker tracking," M.S. thesis, Eng. Dept., Cambridge Univ., U.K., 1997.
- (1997)
- Johnson, S.¹

17
- 0003946510
- New York: Springer-Verlag
- I. T. Jolliffe, Principal Component Analysis. New York: Springer-Verlag, 1986.
- (1986) Principal Component Analysis
- Jolliffe, I.T.¹

18
- 0022806994
- Spectral analysis and discrimination by zero-crossings
- Nov
- B. Kedem, "Spectral analysis and discrimination by zero-crossings," Proc. IEEE, vol. 74, no. 11, pp. 1477-1493, Nov. 1986.
- (1986) Proc. IEEE , vol.74 , Issue.11 , pp. 1477-1493
- Kedem, B.¹

19
- 0027210171
- Some useful properties of teager's energy operator
- Minneapolis, MN, Apr
- J. F. Kaiser, "Some useful properties of teager's energy operator," in Proc. ICASSP, vol. 3, Minneapolis, MN, Apr. 1993, pp. 149-152.
- (1993) Proc. ICASSP , vol.3 , pp. 149-152
- Kaiser, J.F.¹

20
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
- (1995) Comput. Speech Lang , vol.9 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

21
- 0034273520
- Content-based audio classification and retrieval using the nearest feature line method
- Sep
- S. Z. Li, "Content-based audio classification and retrieval using the nearest feature line method," IEEE Trans. Speech Audio Process., vol. 8, no. 5, pp. 619-625, Sep. 2000.
- (2000) IEEE Trans. Speech Audio Process , vol.8 , Issue.5 , pp. 619-625
- Li, S.Z.¹

22
- 0036816475
- Content analysis for audio classification and segmentation
- Oct
- L. Lu, H. Zhang, and H. Jiang, "Content analysis for audio classification and segmentation," IEEE Trans. Speech Audio Process., vol. 10, no. 7, pp. 504-516, Oct. 2002.
- (2002) IEEE Trans. Speech Audio Process , vol.10 , Issue.7 , pp. 504-516
- Lu, L.¹ Zhang, H.² Jiang, H.³

23
- 0037700756
- Speaker change detection and tracking in real-time news broadcasting analysis
- Juan-les-Pins, France, Dec
- L. Lu and H. Zhang, "Speaker change detection and tracking in real-time news broadcasting analysis," in Proc. ACM Multimedia, Juan-les-Pins, France, Dec. 2002, pp. 602-610.
- (2002) Proc. ACM Multimedia , pp. 602-610
- Lu, L.¹ Zhang, H.²

24
- 85135164500
- Evaluating feature set performance using the f-ratio and j-measures
- Rhodes, Greece, Sep
- S. Nicholson, B. Milner, and S. Cox, "Evaluating feature set performance using the f-ratio and j-measures," in Proc. EuroSpeech, vol. 1, Rhodes, Greece, Sep. 1997, pp. 413-416.
- (1997) Proc. EuroSpeech , vol.1 , pp. 413-416
- Nicholson, S.¹ Milner, B.² Cox, S.³

25
- 34047246071
- The National Gallery of the Spoken Word NGSW, Online] Available
- The National Gallery of the Spoken Word (NGSW). [Online] Available: http://www.ngsw.org/

26
- 0003425258
- Englewood Cliffs, NJ: Prentice-Hall
- L. R. Rabiner and R. W. Schafer, Digital signal processing of speech signals. Englewood Cliffs, NJ: Prentice-Hall, 1978.
- (1978) Digital signal processing of speech signals
- Rabiner, L.R.¹ Schafer, R.W.²

27
- 0029765670
- Real time discrimination of broadcast speech/music
- Atlanta, GA, May
- J. Saunders, "Real time discrimination of broadcast speech/music," in Proc. ICASSP, vol. 2, Atlanta, GA, May 1996, pp. 993-996.
- (1996) Proc. ICASSP , vol.2 , pp. 993-996
- Saunders, J.¹

28
- 0030648077
- Construction and evaluation of a robust multifeature speech/music discriminator
- Munich, Germany
- E. Scheirer and M. Slaney, "Construction and evaluation of a robust multifeature speech/music discriminator," in Proc. ICASSP, vol. 2, Munich, Germany, 1997, pp. 1331-1334.
- (1997) Proc. ICASSP , vol.2 , pp. 1331-1334
- Scheirer, E.¹ Slaney, M.²

29
- 0002782496
- Automatic segmentation, classification and clustering of broadcast news audio
- Chantilly, VA
- M. Siegler, U. Jain, B. Raj, and R. M. Stem, "Automatic segmentation, classification and clustering of broadcast news audio," in Proc. DARPA Speech Recognition Workshop, Chantilly, VA, 1997, pp. 97-99.
- (1997) Proc. DARPA Speech Recognition Workshop , pp. 97-99
- Siegler, M.¹ Jain, U.² Raj, B.³ Stem, R.M.⁴

30
- 34047260339
- Specification of the 1996 Hub4 broadcast news evaluation
- Chantilly, VA
- R. M. Stem, "Specification of the 1996 Hub4 broadcast news evaluation," in Proc. DARPA Speech Recognition Workshop, Chantilly, VA, 1997, pp. 3-6.
- (1997) Proc. DARPA Speech Recognition Workshop , pp. 3-6
- Stem, R.M.¹

31
- 34047273844
- SpeechFind, On-Line Search Engine for the National Gallery of the Spoken Word. [Online]. Available: http://SpeechFind.colorado.edu/
- SpeechFind, On-Line Search Engine for the National Gallery of the Spoken Word. [Online]. Available: http://SpeechFind.colorado.edu/

32
- 34047268671
- TIMIT (recorded at Texas Instruments, transcribed at Mass. Inst. Technol.) Acoustic-Phonetic Continuous Speech Corpus. [Online]. Available: http://www.ldc.upenn.edu/
- TIMIT (recorded at Texas Instruments, transcribed at Mass. Inst. Technol.) Acoustic-Phonetic Continuous Speech Corpus. [Online]. Available: http://www.ldc.upenn.edu/

33
- 0033281269
- Multifeature audio segmentation for browsing and annotation
- New Paltz, NY, Oct. 17-20, w99-4
- G. Tzanetakis and P. Cook, "Multifeature audio segmentation for browsing and annotation," in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, Oct. 17-20, 1999, pp. w99-1-w99-4.
- (1999) Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
- Tzanetakis, G.¹ Cook, P.²

34
- 0032657771
- Progress in broadcast news transcription at dragon systems
- Mar
- S. Wegmann, P. Zhan, and L. Gillick, "Progress in broadcast news transcription at dragon systems," Proc. ICASSP, vol. 1, pp. 33-36, Mar. 1999.
- (1999) Proc. ICASSP , vol.1 , pp. 33-36
- Wegmann, S.¹ Zhan, P.² Gillick, L.³

35
- 0030242072
- Content-based classification, search and retrieval of audio
- E. Wold, T. Blum, D. Keislar, and J. Wheaton, "Content-based classification, search and retrieval of audio," IEEE Multimedia Mag., vol. 3, no. 3, pp. 27-36, 1996.
- (1996) IEEE Multimedia Mag , vol.3 , Issue.3 , pp. 27-36
- Wold, E.¹ Blum, T.² Keislar, D.³ Wheaton, J.⁴

36
- 34047264547
- Perceptual MVDR-based cepstral coefficients (PMCCs) for noise-robust speech recognition
- Hong Kong, China, Apr
- U. Yapanel and S. Dharanipragada, "Perceptual MVDR-based cepstral coefficients (PMCCs) for noise-robust speech recognition," in Proc. ICASSP, vol. I, Hong Kong, China, Apr. 2003, pp. 6-10.
- (2003) Proc. ICASSP , vol.1 , pp. 6-10
- Yapanel, U.¹ Dharanipragada, S.²

37
- 85009164449
- A new perspective on feature extraction for robust in-vehicle speech recognition
- Geneva, Switzerland, Sep
- U. Yapanel and J. H. L. Hansen, "A new perspective on feature extraction for robust in-vehicle speech recognition," in Proc. EuroSpeech, Geneva, Switzerland, Sep. 2003, pp. 1281-1284.
- (2003) Proc. EuroSpeech , pp. 1281-1284
- Yapanel, U.¹ Hansen, J.H.L.²

38
- 0035340677
- Audio content analysis for online audiovisual data segmentation and classification
- Jul
- T. Zhang and C.-C. J. Kuo, "Audio content analysis for online audiovisual data segmentation and classification," IEEE Trans. Speech Audio Process., vol. 9, no. 4, pp. 441-457, Jul. 2001.
- (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.4 , pp. 441-457
- Zhang, T.¹ Kuo, C.-C.J.²

39
- 85009089453
- Unsupervised audio stream segmentation and clustering via the Baysian information criterion
- Beijing, China, Oct
- B. Zhou and J. H. L. Hansen, "Unsupervised audio stream segmentation and clustering via the Baysian information criterion," in Proc. ICSLP 2000, vol. 1, Beijing, China, Oct. 2000, pp. 714-717.
- (2000) Proc. ICSLP 2000 , vol.1 , pp. 714-717
- Zhou, B.¹ Hansen, J.H.L.²

40
- 85009275098
- Spcechfind: An experimental on-line spoken document retrieval system for historical audio archives
- Denver, CO, Sep
- B. Zhou and J. H. L. Hansen, "Spcechfind: an experimental on-line spoken document retrieval system for historical audio archives," in Proc. ICSLP, vol. 3, Denver, CO, Sep. 2002, pp. 1969-1972.
- (2002) Proc. ICSLP , vol.3 , pp. 1969-1972
- Zhou, B.¹ Hansen, J.H.L.²

41
- 0035278948
- Nonlinear feature based classification of speech under stress
- May
- G. Zhou, J. H. L. Hansen, and J. F. Kaiser, "Nonlinear feature based classification of speech under stress," IEEE Trans. Speech Audio Process., vol. 9. no. 3, pp. 201-216, May 2001.
- (2001) IEEE Trans. Speech Audio Process , vol.9 , Issue.3 , pp. 201-216
- Zhou, G.¹ Hansen, J.H.L.² Kaiser, J.F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.