SCOPUS 정보 검색 플랫폼

Semantic Multimedia and Ontologies: Theory and Applications

Volumn , Issue , 2008, Pages 123-162

Audio content analysis

(5) Burred, Juan José a Haller, Martin a Jin, Shan a Samour, Amjad a Sikora, Thomas a

a TECHNISCHE UNIVERSITÄT BERLIN (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 84892229349 PISSN: None EISSN: None Source Type: Book
DOI: 10.1007/978-1-84800-076-6_5 Document Type: Chapter

Times cited : (4)

References (106)

1
- 84866482243
- An ontology-based approach to information management for music analysis systems
- Abdallah, S., Raimond, Y. and Sandler, M. (2006), An ontology-based approach to information management for music analysis systems, in 'Proceedings of the 120th Convention of the Audio Engineering Society'.
- (2006) Proceedings of the 120th Convention of the Audio Engineering Society
- Abdallah, S.¹ Raimond, Y.² Sandler, M.³

2
- 0036288688
- A new speaker change detection method for two-speaker segmentation
- '
- Adami, A. G., Kajarekar, S. S. and Hermansky, H. (2002), A new speaker change detection method for two-speaker segmentation, in 'Proceedings of the ICASSP', Vol. 4, pp. 3908-3911.
- (2002) Proceedings of the ICASSP , vol.4 , pp. 3908-3911
- Adami, A.G.¹ Kajarekar, S.S.² Hermansky, H.³

3
- 84863671030
- Evaluation of classification techniques for audio indexing
- Arias, A., Pinquier, J. and André-Obrecht, R. (2005), Evaluation of classification techniques for audio indexing, in 'Proceedings of the EUSIPCO'.
- (2005) Proceedings of the EUSIPCO
- Arias, A.¹ Pinquier, J.² André-Obrecht, R.³

4
- 70349442041
- An accurate timbre model for musical instruments and its application to classification
- Burred, J. J., Röbel, A. and Rodet, X. (2006), An accurate timbre model for musical instruments and its application to classification, in 'Proceedings of the First Workshop on Learning the Semantics of Audio Signals (LSAS)', pp. 22-32.
- (2006) Proceedings of the First Workshop on Learning the Semantics of Audio Signals (LSAS) , pp. 22-32
- Burred, J.J.¹ Röbel, A.² Rodet, X.³

5
- 0031233424
- Speaker recognition: A tutorial
- Campbell, J. P. (1997), 'Speaker recognition: A tutorial', Proc. IEEE 85(9), 1437-1462.
- (1997) Proc. IEEE , vol.85 , Issue.9 , pp. 1437-1462
- Campbell, J.P.¹

6
- 0032638667
- A comparison of features for speech, music discrimination
- Carey, M. J., Parris, E. S. and Lloyd-Thomas, H. (1999), A comparison of features for speech, music discrimination, in 'Proceedings of the ICASSP', Vol. 1, pp. 149-152.
- (1999) Proceedings of the ICASSP , vol.1 , pp. 149-152
- Carey, M.J.¹ Parris, E.S.² Lloyd-Thomas, H.³

7
- 0035364397
- MPEG-7 sound-recognition tools
- Casey, M. (2001), 'MPEG-7 sound-recognition tools', IEEE Trans. Circ. Syst. Video Tech. 11(6), 737-747.
- (2001) IEEE Trans. Circ. Syst. Video Tech. , vol.11 , Issue.6 , pp. 737-747
- Casey, M.¹

8
- 33845439374
- Foafing the music: Bridging the semantic gap in music recommendation
- of LNCS
- Celma, O. (2006), Foafing the music: Bridging the semantic gap in music recommendation, in 'Proceedings of the 5th International Semantic Web Conference', Vol. 4273 of LNCS, pp. 927-934.
- (2006) Proceedings of the 5th International Semantic Web Conference , vol.4273 , pp. 927-934
- Celma, O.¹

9
- 77949508743
- Bridging the music semantic gap
- Celma, O., Herrera, P. and Serra, X. (2006), Bridging the music semantic gap, in 'Proceedings of the ESWC 2006 Workshop on Mastering the Gap: From Information Extraction to Semantic Representation'.
- (2006) Proceedings of the ESWC 2006 Workshop on Mastering the Gap: From Information Extraction to Semantic Representation
- Celma, O.¹ Herrera, P.² Serra, X.³

10
- 84941264849
- Structural analysis of musical signals for indexing and thumbnailing
- Chai, W. and Vercoe, B. (2003), Structural analysis of musical signals for indexing and thumbnailing, in 'Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries', pp. 27-34.
- (2003) Proceedings of the 3rd ACM/IEEE-CS Joint Conference on Digital Libraries , pp. 27-34
- Chai, W.¹ Vercoe, B.²

11
- 0002595416
- Speaker, environment and channel change detection and clustering via the bayesian information criterion
- Chen, S. S. and Gopalakrishnan, P. S. (1998), Speaker, environment and channel change detection and clustering via the bayesian information criterion, in 'Proceedings of the DARPA Speech Recognition Workshop'.
- (1998) Proceedings of the DARPA Speech Recognition Workshop
- Chen, S.S.¹ Gopalakrishnan, P.S.²

12
- 85009212151
- A sequential metric-based audio segmentation method via the bayesian information criterion
- Cheng, S.-s. and Wang, H.-M. (2003), A sequential metric-based audio segmentation method via the bayesian information criterion, in 'Proceedings of the EUROSPEECH', pp. 945-948.
- (2003) Proceedings of the EUROSPEECH , pp. 945-948
- Cheng, S.-S.¹ Wang, H.-M.²

13
- 84892290921
- of LNCS, Springer, New York
- Coden, A., Brown, E. W. and Srinivasan, S., eds (2001), Proceedings of the ACM SIGIR 2001 Workshop on Information Retrieval Techniques for Speech Applications, Vol. 2273 of LNCS, Springer, New York.
- (2001) Proceedings of the ACM SIGIR 2001 Workshop on Information Retrieval Techniques for Speech Applications , vol.2273
- Coden, A.¹ Brown, E.W.² Srinivasan, S.³

14
- 0030381663
- Unsupervised speaker segmentation in telephone conversations.
- Cohen, A. and Lapidus, V. (1996), Unsupervised speaker segmentation in telephone conversations., in 'Proceedings of the Nineteenth Convention of Electrical and Electronics Engineers', pp. 102-105.
- (1996) Proceedings of the Nineteenth Convention of Electrical and Electronics Engineers , pp. 102-105
- Cohen, A.¹ Lapidus, V.²

15
- 0003603515
- Cambridge University Press, Cambridge
- Cole, R. A., Mariani, J., Uszkoreit, H., Zaenen, A. and Zue, V., eds (1998), Survey of the state of the art in Human Language Technology, Cambridge University Press, Cambridge.
- (1998) Survey of the State of the Art in Human Language Technology
- Cole, R.A.¹ Mariani, J.² Uszkoreit, H.³ Zaenen, A.⁴ Zue, V.⁵

16
- 0038368765
- Combination of similarity measures for effective spoken document retrieval
- Crestani, F. (2003), 'Combination of similarity measures for effective spoken document retrieval', J. Inform. Sci. 29(2), 87-96.
- (2003) J. Inform. Sci. , vol.29 , Issue.2 , pp. 87-96
- Crestani, F.¹

17
- 34250285795
- U-statistic hierarchical clustering
- D'Andrade, R. (1978), U-statistic hierarchical clustering, in 'Psychometrika', Vol. 43, pp. 59-68.
- (1978) Psychometrika , vol.43 , pp. 59-68
- D'Andrade, R.¹

18
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Davis, S. B. and Mermelstein, P. (1980), 'Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences', IEEE Trans. Acoust., Speech, Signal Process. 28(4), 357-366.
- (1980) IEEE Trans. Acoust., Speech, Signal Process. , vol.28 , Issue.4 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

19
- 0034273195
- DISTBIC: A speaker-based segmentation for audio data indexing
- Delacourt, P. and Wellekens, C. J. (2000), 'DISTBIC: A speaker-based segmentation for audio data indexing', Speech Comm. 32(1), 111-126.
- (2000) Speech Comm. , vol.32 , Issue.1 , pp. 111-126
- Delacourt, P.¹ Wellekens, C.J.²

20
- 0037237084
- Music information retrieval
- Downie, J. S. (2003), 'Music information retrieval', Annu. Rev. Inform. Sci. Tech. 37, 295-342.
- (2003) Annu. Rev. Inform. Sci. Tech. , vol.37 , pp. 295-342
- Downie, J.S.¹

21
- 0003922190
- Wiley Interscience, New York
- Duda, R. O., Hart, P. E. and Stork, D. G. (2000), Pattern Classification, Wiley Interscience, New York.
- (2000) Pattern Classification
- Duda, R.O.¹ Hart, P.E.² Stork, D.G.³

22
- 80155144871
- Phoneme-level indexing for fast and vocabulary-independent voice/voice retrieval
- Ferrieux, A. and Peillon, S. (1999), Phoneme-level indexing for fast and vocabulary-independent voice/voice retrieval, in 'Proceedings of the ESCA ITRW on Accessing Information in Spoken Audio', pp. 60-63.
- (1999) Proceedings of the ESCA ITRW on Accessing Information in Spoken Audio , pp. 60-63
- Ferrieux, A.¹ Peillon, S.²

23
- 84908266591
- The beat spectrum: A new approach to rhythm analysis
- Foote, J. (2001), The beat spectrum: A new approach to rhythm analysis, in 'Proceedings of the ICME', pp. 881-884.
- (2001) Proceedings of the ICME , pp. 881-884
- Foote, J.¹

24
- 57649180845
- Content-based retrieval of music and audio
- C.-C. Jay Kuo et al., ed.
- Foote, J. T. (1997), Content-based retrieval of music and audio, in C.-C. Jay Kuo et al., ed., 'Proceedings of the Electronic Imaging', Vol. 3229, pp. 138-147.
- (1997) Proceedings of the Electronic Imaging , vol.3229 , pp. 138-147
- Foote, J.T.¹

25
- 85128356454
- Partitioning and transcription of broadcast news data
- Gauvain, J.-L., Lamel, L. and Adda, G. (1998), Partitioning and transcription of broadcast news data, in 'Proceedings of the ICSLP', Vol. 5, pp. 1335-1338.
- (1998) Proceedings of the ICSLP , vol.5 , pp. 1335-1338
- Gauvain, J.-L.¹ Lamel, L.² Adda, G.³

26
- 0028516097
- Text-independent speaker identification
- Gish, H. and Schmidt, M. (1994), 'Text-independent speaker identification', IEEE Signal Process Mag. 11(4), 18-32.
- (1994) IEEE Signal Process Mag. , vol.11 , Issue.4 , pp. 18-32
- Gish, H.¹ Schmidt, M.²

27
- 0026400244
- Segregation of speakers for speech recognition and speaker identification.
- Gish, H., Siu, M.-H. and Rohlicek, R. (1991), Segregation of speakers for speech recognition and speaker identification., in 'Proceedings of the ICASSP', pp. 873-876.
- (1991) Proceedings of the ICASSP , pp. 873-876
- Gish, H.¹ Siu, M.-H.² Rohlicek, R.³

28
- 0030372637
- A probabilistic framework for feature-based speech recognition
- Glass, J., Chang, J. and McCandless, M. (1996), A probabilistic framework for feature-based speech recognition, in 'Proceedings of the ICSLP', Vol. 4, pp. 2277-2280.
- (1996) Proceedings of the ICSLP , vol.4 , pp. 2277-2280
- Glass, J.¹ Chang, J.² McCandless, M.³

29
- 84892200966
- A first approach to speech retrieval
- Technical Report 238, ETH Zrich
- Glavitsch, U. (1995), A first approach to speech retrieval, Technical Report 238, ETH Zrich, Institute of Information Systems.
- (1995) Institute of Information Systems
- Glavitsch, U.¹

30
- 0027151606
- Recognition of environmental sounds
- Goldhor, R. S. (1993), Recognition of environmental sounds, in 'Proceedings of the ICASSP', Vol. 1, pp. 149-152.
- (1993) Proceedings of the ICASSP , vol.1 , pp. 149-152
- Goldhor, R.S.¹

31
- 36549057588
- PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain
- Gómez, E. (2006), Tonal Description of Music Audio Signals, PhD thesis, Universitat Pompeu Fabra, Barcelona, Spain.
- (2006) Tonal Description of Music Audio Signals
- Gómez, E.¹

32
- 33847144778
- Boosting for content-based audio classification and retrieval: An evaluation
- Guo, G., Zhang, H.-J. and Li, S. Z. (2001), Boosting for content-based audio classification and retrieval: An evaluation, in 'Proceedings of the ICME', pp. 1200-1203.
- (2001) Proceedings of the ICME , pp. 1200-1203
- Guo, G.¹ Zhang, H.-J.² Li, S.Z.³

33
- 78650812377
- Using knowledge-based scores for identifying best speech recognition hypothesis
- Gurevych, I. and Porzel, R. (2003), Using knowledge-based scores for identifying best speech recognition hypothesis, in 'Proceedings of the ISCA ITRW on Error Handling in Spoken Dialog Systems', pp. 77-81.
- (2003) Proceedings of the ISCA ITRW on Error Handling in Spoken Dialog Systems , pp. 77-81
- Gurevych, I.¹ Porzel, R.²

34
- 0033639509
- Overview of the sixth text retrieval conference (trec-6)
- Harman, D. (2000), 'Overview of the sixth text retrieval conference (trec-6)', Inform. Process. Manag. 36(1), 3-35.
- (2000) Inform. Process. Manag. , vol.36 , Issue.1 , pp. 3-35
- Harman, D.¹

35
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Hermansky, H. (1990), 'Perceptual linear predictive (PLP) analysis of speech', J. Acoust. Soc. Am. 87(4), 1738-1752.
- (1990) J. Acoust. Soc. Am. , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

36
- 17444404830
- Automatic classification of musical instrument sounds
- Herrera, P., Peeters, G. and Dubnov, S. (2003), 'Automatic classification of musical instrument sounds', J. New. Music Res. 32(1), 3-21.
- (2003) J. New. Music Res. , vol.32 , Issue.1 , pp. 3-21
- Herrera, P.¹ Peeters, G.² Dubnov, S.³

37
- 85009070699
- Automatic metric-based speech segmentation for broadcast news via principal component analysis
- Hung, J.-W., Wang, H.-M. and Lee, L.-S. (2000), Automatic metric-based speech segmentation for broadcast news via principal component analysis, in 'Proceedings of the ICSLP', Vol. 4, pp. 121-124.
- (2000) Proceedings of the ICSLP , vol.4 , pp. 121-124
- Hung, J.-W.¹ Wang, H.-M.² Lee, L.-S.³

38
- 84868108409
- ISO/IEC
- ISO/IEC (2002), '15938-4:2002 - Information technology - Multimedia content description interface - Part 4: Audio'.
- (2002) 15938-4:2002 - Information Technology - Multimedia Content Description Interface - Part 4: Audio

39
- 84892265451
- ISO/IEC
- ISO/IEC (2004), 15938-4:2002/amd 1:2004 - Information technology - Multimedia content description interface - Part 4: Audio, Amendment 1: Audio extensions'.
- (2004) 15938-4:2002/amd 1:2004 - Information Technology - Multimedia Content Description Interface - Part 4: Audio, Amendment 1: Audio extensions'

40
- 84892238225
- ISO/IEC
- ISO/IEC (2006), '15938-4:2002/amd 2:2006 - Information technology - Multimedia content description interface - Part 4: Audio, Amendment 2: High-level descriptors'.
- (2006) 15938-4:2002/amd 2:2006 - Information Technology - Multimedia Content Description Interface - Part 4: Audio, Amendment 2: High-level Descriptors

41
- 0004671920
- PhD thesis, University of Cambridge, UK
- James, D. (1995), The application of classical information retrieval techniques to spoken documents, PhD thesis, University of Cambridge, UK.
- (1995) The Application of Classical Information Retrieval Techniques to Spoken Documents
- James, D.¹

42
- 0014129195
- Hierarchical clustering schemes
- Johnson, S. C. (1967), 'Hierarchical clustering schemes', Psychometrika 32(3), 241-254.
- (1967) Psychometrika , vol.32 , Issue.3 , pp. 241-254
- Johnson, S.C.¹

43
- 84868891735
- Unsupervised speaker change detection for broadcast news segmentation
- Jørgensen, K. W., Mølgaard, L. L. and Hansen, L. K. (2006), Unsupervised speaker change detection for broadcast news segmentation, in 'Proceedings of the EUSIPCO'.
- (2006) Proceedings of the EUSIPCO
- Jørgensen, K.W.¹ Mølgaard, L.L.² Hansen, L.K.³

44
- 41149104768
- Speaker change detection using support vector machines
- Kartik, V., Satish, D. S. and Sekhar, C. C. (2005), Speaker change detection using support vector machines, in 'Proceedings of the ISCA ITRW on Non-linear Speech Processing', pp. 130-136.
- (2005) Proceedings of the ISCA ITRW on Non-linear Speech Processing , pp. 130-136
- Kartik, V.¹ Satish, D.S.² Sekhar, C.C.³

45
- 0033692969
- Strategies for automatic segmentation of audio data
- Kemp, T., Schmidt, M., Westphal, M. and Waibel, A. (2000), Strategies for automatic segmentation of audio data, in 'Proceedings ICASSP', Vol. 3, pp. 1423-1426.
- (2000) Proceedings ICASSP , vol.3 , pp. 1423-1426
- Kemp, T.¹ Schmidt, M.² Westphal, M.³ Waibel, A.⁴

46
- 33646908801
- The 1995 abbot hybrid connectionist-hmm large-vocabulary recognition system
- Kershaw, D., Robinson, A. and Renals, S. (1996), The 1995 abbot hybrid connectionist-hmm large-vocabulary recognition system, in 'Proceedings of the ARPA Speech Recognition Workshop', pp. 93-98.
- (1996) Proceedings of the ARPA Speech Recognition Workshop , pp. 93-98
- Kershaw, D.¹ Robinson, A.² Renals, S.³

47
- 84890460936
- How efficient is MPEG-7 for general sound recognition?
- Kim, H.-G., Burred, J. J. and Sikora, T. (2004), How efficient is MPEG-7 for general sound recognition?, in 'Proceedings AES 25th International Conference'.
- (2004) Proceedings AES 25th International Conference
- Kim, H.-G.¹ Burred, J.J.² Sikora, T.³

48
- 33646789869
- Hybrid speaker-based segmentation system using model-level clustering
- Kim, H.-G., Ertelt, D. and Sikora, T. (2005), Hybrid speaker-based segmentation system using model-level clustering, in 'Proceedings of the ICASSP', Vol. 1, pp. 745-748.
- (2005) Proceedings of the ICASSP , vol.1 , pp. 745-748
- Kim, H.-G.¹ Ertelt, D.² Sikora, T.³

49
- 84889435599
- John Wiley & Sons, New York
- Kim, H.-G., Moreau, N. and Sikora, T. (2005), MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval, John Wiley & Sons, New York.
- (2005) MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval
- Kim, H.-G.¹ Moreau, N.² Sikora, T.³

50
- 33748519104
- Signal processing methods for the transcription of music
- PhD thesis, Finland
- Klapuri, A. (2004), Signal Processing Methods for the Transcription of Music, PhD thesis, Tampere University of Technology, Tampere, Finland.
- (2004) Tampere University of Technology, Tampere
- Klapuri, A.¹

51
- 33745217037
- Using syllable-based indexing features and language models to improve german spoken document retrieval
- Larson, M. and Eickeler, S. (2003), Using syllable-based indexing features and language models to improve german spoken document retrieval, in 'Proceedings of the EUROSPEECH', pp. 1217-1220.
- (2003) Proceedings of the EUROSPEECH , pp. 1217-1220
- Larson, M.¹ Eickeler, S.²

52
- 84892243397
- Kluwer Academic Publishers, chapter Appendix I
- Lee, K.-F. (1989), Automatic Speech Recognition, Kluwer Academic Publishers, chapter Appendix I.2, p. 147.
- (1989) Automatic Speech Recognition , vol.2 , pp. 147
- Lee, K.-F.¹

53
- 85119434191
- Fast speaker change detection for broadcast news transcription and indexing
- Liu, D. and Kubala, F. (1999), Fast speaker change detection for broadcast news transcription and indexing, in 'Proceedings of the EUROSPEECH', Vol. 3, pp. 1031-1034.
- (1999) Proceedings of the EUROSPEECH , vol.3 , pp. 1031-1034
- Liu, D.¹ Kubala, F.²

54
- 7744243089
- Automatic mood detection from acoustic music data
- Liu, D., Lu, L. and Zhang, H.-J. (2003), Automatic mood detection from acoustic music data, in 'Proceedings of the ISMIR'.
- (2003) Proceedings of the ISMIR
- Liu, D.¹ Lu, L.² Zhang, H.-J.³

55
- 0032181880
- Audio feature extraction and analysis for scene segmentation and classification
- Liu, Z., Wang, Y. and Chen, T. (1998), 'Audio feature extraction and analysis for scene segmentation and classification', J. VLSI Signal Process. 20(1/2), 61-79.
- (1998) J. VLSI Signal Process. , vol.20 , Issue.1-2 , pp. 61-79
- Liu, Z.¹ Wang, Y.² Chen, T.³

56
- 11244350380
- Fusion of semantic and acoustic approaches for spoken document retrieval
- Logan, B., Prasangsit, P. and Moreno, P. (2003), Fusion of semantic and acoustic approaches for spoken document retrieval, in 'Proceedings of the ISCA Workshop on Multilingual Spoken Document Retrieval', pp. 1-6.
- (2003) Proceedings of the ISCA Workshop on Multilingual Spoken Document Retrieval , pp. 1-6
- Logan, B.¹ Prasangsit, P.² Moreno, P.³

57
- 33645326073
- Real-time unsupervised speaker change detection
- Lu, L. and Zhang, H. J. (2002a), Real-time unsupervised speaker change detection, in 'Proceedings of the ICPR', Vol. 2, pp. 358-361.
- (2002) Proceedings of the ICPR , vol.2 , pp. 358-361
- Lu, L.¹ Zhang, H.J.²

58
- 0037700756
- Speaker change detection and tracking in real-time news broadcasting analysis.
- Lu, L. and Zhang, H. J. (2002b), Speaker change detection and tracking in real-time news broadcasting analysis., in 'Proceedings of the ACM International Conference on Multimedia', pp. 602-610.
- (2002) Proceedings of the ACM International Conference on Multimedia , pp. 602-610
- Lu, L.¹ Zhang, H.J.²

59
- 84873533162
- An investigation of feature models for music genre classification using the support vector classifier
- Meng, A. and Shawe-Taylor, J. (2005), An investigation of feature models for music genre classification using the support vector classifier, in 'Proceedings of the ISMIR', pp. 604-609.
- (2005) Proceedings of the ISMIR , pp. 604-609
- Meng, A.¹ Shawe-Taylor, J.²

60
- 84873444313
- MIREX, last checked February 2007
- MIREX (2006), 'Music information retrieval evaluation exchange'. http://www.music-ir.org/mirex2006/ (last checked February 2007).
- (2006) Music Information Retrieval Evaluation Exchange

61
- 33745218075
- Comparison of different phone-based spoken document retrieval methods with text and spoken queries
- Moreau, N., Jin, S. and Sikora, T. (2005), Comparison of different phone-based spoken document retrieval methods with text and spoken queries, in 'Proceedings of the EUROSPEECH', pp. 641-644.
- (2005) Proceedings of the EUROSPEECH , pp. 641-644
- Moreau, N.¹ Jin, S.² Sikora, T.³

62
- 0034857759
- Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition
- Mori, K. and Nakagawa, S. (2001), Speaker change detection and speaker clustering using VQ distortion for broadcast news speech recognition, in 'Proceedings of the ICASSP', Vol. 1, pp. 413-416.
- (2001) Proceedings of the ICASSP , vol.1 , pp. 413-416
- Mori, K.¹ Nakagawa, S.²

63
- 0027311597
- A new speech recognition method based on VQ-distortion and HMM
- Nakagawa, S. and Suzuk, H. (1993), A new speech recognition method based on VQ-distortion and HMM, in 'Proceedings of the ICASSP', Vol. 2, pp. 676-679.
- (1993) Proceedings of the ICASSP , vol.2 , pp. 676-679
- Nakagawa, S.¹ Suzuk, H.²

64
- 0031632818
- Sound ontology for computational auditory scene analysis
- Nakatani, T. and Okuno, H. (1998), Sound ontology for computational auditory scene analysis, in 'Proceedings of the National Conference on Artificial Intelligence (AAAI)', pp. 1004-1010.
- (1998) Proceedings of the National Conference on Artificial Intelligence (AAAI) , pp. 1004-1010
- Nakatani, T.¹ Okuno, H.²

65
- 0033692609
- Information fusion for spoken document retrieval
- Ng, K. (2000), Information fusion for spoken document retrieval, in 'Proceedings ICASSP', Vol. 6, pp. 2405-2408.
- (2000) Proceedings ICASSP , vol.6 , pp. 2405-2408
- Ng, K.¹

66
- 0031636298
- Phonetic recognition for spoken document retrieval
- Ng, K. and Zue, V. W. (1998), Phonetic recognition for spoken document retrieval, in 'Proceedings ICASSP', Vol. 1, pp. 325-328.
- (1998) Proceedings ICASSP , vol.1 , pp. 325-328
- Ng, K.¹ Zue, V.W.²

67
- 25444508285
- Survey of sparse and non-sparse methods in source separation
- O'Grady, P. D., Pearlmutter, B. A. and Rickard, S. T. (2005), 'Survey of sparse and non-sparse methods in source separation', Int. J. Imag. Syst. Tech. 15(1), 18-33.
- (2005) Int. J. Imag. Syst. Tech. , vol.15 , Issue.1 , pp. 18-33
- O'Grady, P.D.¹ Pearlmutter, B.A.² Rickard, S.T.³

68
- 38549108667
- Musical metadata and knowledge management
- D. Schwartz, ed., Idea Group
- Pachet, F. (2005), Musical metadata and knowledge management, in D. Schwartz, ed., 'Encyclopedia of Knowledge Management', Idea Group, pp. 672-677.
- (2005) Encyclopedia of Knowledge Management , pp. 672-677
- Pachet, F.¹

69
- 33846199932
- PhD thesis, Technische Universitt Wien
- Pampalk, E. (2006), Computational Models of Music Similarity and their Application in Music Information Retrieval, PhD thesis, Technische Universitt Wien.
- (2006) Computational Models of Music Similarity and Their Application in Music Information Retrieval
- Pampalk, E.¹

70
- 0030396150
- Automatic audio content analysis
- Pfeiffer, S., Fischer, S. and Effelsberg, W. (1996), Automatic audio content analysis, in 'Proceedings 4th ACM International Multimedia Conference', pp. 21-30.
- (1996) Proceedings 4th ACM International Multimedia Conference , pp. 21-30
- Pfeiffer, S.¹ Fischer, S.² Effelsberg, W.³

71
- 34547541889
- Applied clustering for automatic speaker-based segmentation of audio material
- Pietquin, O., Couvreur, L. and Couvreur, P. (2001), Applied clustering for automatic speaker-based segmentation of audio material, in 'JORBEL', Vol. 41, pp. 69-81.
- (2001) JORBEL , vol.41 , pp. 69-81
- Pietquin, O.¹ Couvreur, L.² Couvreur, P.³

72
- 0034444712
- Integrating visual, audio and text analysis for news video
- Qi, W., Gu, L., Jiang, H., Chen, X. and Zhang, H. (2000), Integrating visual, audio and text analysis for news video, in 'Proceedings of the ICIP', Vol. 3, pp. 520-523.
- (2000) Proceedings of the ICIP , vol.3 , pp. 520-523
- Qi, W.¹ Gu, L.² Jiang, H.³ Chen, X.⁴ Zhang, H.⁵

73
- 0004244302
- Prentice Hall
- Rabiner, L. and Juang, B.-H. (1993), Fundamentals of Speech Recognition, Prentice Hall.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.¹ Juang, B.-H.²

74
- 0032641862
- The THISL spoken document retrieval project
- Renals, S. (1999), The THISL spoken document retrieval project, in 'Proceedings IEEE International Conference on Multimedia Computing and Systems (MCS)', Vol. 2, pp. 1049-1051.
- (1999) Proceedings IEEE International Conference on Multimedia Computing and Systems (MCS) , vol.2 , pp. 1049-1051
- Renals, S.¹

75
- 0029386354
- Keyword detection in conversational speech utterances using hidden markov model based continuous speech recognition
- Rose, R. (1995), 'Keyword detection in conversational speech utterances using hidden markov model based continuous speech recognition', Comput. Speech Lang. 9(4), 309-333.
- (1995) Comput. Speech Lang. , vol.9 , Issue.4 , pp. 309-333
- Rose, R.¹

76
- 34548201182
- Video to the rescue of audio: Shot boundary assisted speaker change detection
- Samour, A., Karaman, M., Goldmann, L. and Sikora, T. (2007), Video to the rescue of audio: Shot boundary assisted speaker change detection, in 'Proceedings of the Electronic Imaging', Vol. 6506.
- (2007) Proceedings of the Electronic Imaging , vol.6506
- Samour, A.¹ Karaman, M.² Goldmann, L.³ Sikora, T.⁴

77
- 4544228318
- Identity verification using speech and face information
- Sanderson, C. and Paliwala, K. K. (2004), 'Identity verification using speech and face information', Digit. Signal Process. 14(5), 449-480.
- (2004) Digit. Signal Process. , vol.14 , Issue.5 , pp. 449-480
- Sanderson, C.¹ Paliwala, K.K.²

78
- 0029765670
- Real-time discrimination of broadcast speech/music
- Saunders, J. (1996), Real-time discrimination of broadcast speech/music, in 'Proceedings of the ICASSP', Vol. 2, pp. 993-996.
- (1996) Proceedings of the ICASSP , vol.2 , pp. 993-996
- Saunders, J.¹

79
- 2642521115
- Assessing the retrieval effectiveness of a speech retrieval system by simulating recognition errors
- Schaeuble, P. and Glavitsch, U. (1994), Assessing the retrieval effectiveness of a speech retrieval system by simulating recognition errors, in 'Proceedings Workshop on Human Language Technology', pp. 370-372.
- (1994) Proceedings Workshop on Human Language Technology , pp. 370-372
- Schaeuble, P.¹ Glavitsch, U.²

80
- 0030648077
- Construction and evaluation of a robust multifeature speech/music discriminator
- Scheirer, E. and Slaney, M. (1997), Construction and evaluation of a robust multifeature speech/music discriminator, in 'Proceedings of the ICASSP', Vol. 2, pp. 1331-1334.
- (1997) Proceedings of the ICASSP , vol.2 , pp. 1331-1334
- Scheirer, E.¹ Slaney, M.²

81
- 0000120766
- Estimation the dimension of a model
- Schwarz, G. (1978), Estimation the dimension of a model, in 'Annals of Statistics', Vol. 6, pp. 461-464.
- (1978) Annals of Statistics , vol.6 , pp. 461-464
- Schwarz, G.¹

82
- 85069154781
- Musical sound modeling with sinusoids plus noise
- C. Roads, S. T. Pope, A. Piccialli and G. D. Poli, eds, Swets & Zeitlinger The Netherlands
- Serra, X. (1997), Musical sound modeling with sinusoids plus noise, in C. Roads, S. T. Pope, A. Piccialli and G. D. Poli, eds, 'Musical Signal Processing', Swets & Zeitlinger The Netherlands, pp. 91-122.
- (1997) Musical Signal Processing , pp. 91-122
- Serra, X.¹

83
- 0002782496
- Automatic segmentation, classification and clustering of broadcast news audio
- Siegler, M. A., Jain, U., Raj, B. and Stern, R. M. (1997), Automatic segmentation, classification and clustering of broadcast news audio, in 'Proceedings of the DARPA Speech Recognition Workshop', pp. 97-99.
- (1997) Proceedings of the DARPA Speech Recognition Workshop , pp. 97-99
- Siegler, M.A.¹ Jain, U.² Raj, B.³ Stern, R.M.⁴

84
- 84945116938
- Non-negative matrix factorization for polyphonic music transcription
- Smaragdis, P. and Brown, J. C. (2003), Non-negative matrix factorization for polyphonic music transcription, in 'Proceedings of the WASPAA', pp. 177-180.
- (2003) Proceedings of the WASPAA , pp. 177-180
- Smaragdis, P.¹ Brown, J.C.²

85
- 85031608427
- Speaker tracking and detection with multiple speakers
- Sönmez, K., Heck, L. and Weintraub, M. (1999), Speaker tracking and detection with multiple speakers, in 'Proceedings of the EUROSPEECH', Vol. 5, pp. 2219-2222.
- (1999) Proceedings of the EUROSPEECH , vol.5 , pp. 2219-2222
- Sönmez, K.¹ Heck, L.² Weintraub, M.³

86
- 0027252184
- Speech segmentation and clustering based on speaker features
- Sugiyama, M., Murakami, J. and H. Watanabe (1993), Speech segmentation and clustering based on speaker features, in 'Proceedings of the ICASSP', Vol. 2, pp. 395-398.
- (1993) Proceedings of the ICASSP , vol.2 , pp. 395-398
- Sugiyama, M.¹ Murakami, J.² Watanabe, H.³

87
- 20444498720
- Detection of unique people in news programs using multimodal shot clustering
- Taskiran, C., Albiol, A., Torres, L. and Delp, E. (2004), Detection of unique people in news programs using multimodal shot clustering, in 'Proceedings of the ICIP', Vol. 1, pp. 697-700.
- (2004) Proceedings of the ICIP , vol.1 , pp. 697-700
- Taskiran, C.¹ Albiol, A.² Torres, L.³ Delp, E.⁴

88
- 85013694715
- Elsevier, The Netherlands
- Theodoridis, S. and Koutroumbas, K. (2006), Pattern Recognition, Elsevier, The Netherlands.
- (2006) Pattern Recognition
- Theodoridis, S.¹ Koutroumbas, K.²

89
- 78650540904
- Improved speaker segmentation and segments clustering using the bayesian information criterion.
- Tritschler, A. and Gopinath, R. (1999), Improved speaker segmentation and segments clustering using the bayesian information criterion., in 'Proceedings of the EUROSPEECH', pp. 679-682.
- (1999) Proceedings of the EUROSPEECH , pp. 679-682
- Tritschler, A.¹ Gopinath, R.²

90
- 0036648502
- Musical genre classification of audio signals
- Tzanetakis, G. and Cook, P. (2002), 'Musical genre classification of audio signals', IEEE Trans. Speech Audio Process. 10(5), 293-302.
- (2002) IEEE Trans. Speech Audio Process. , vol.10 , Issue.5 , pp. 293-302
- Tzanetakis, G.¹ Cook, P.²

91
- 85032751074
- Identification of sound recordings
- Venkatachalam, V., Cazzanti, L., Dhillon, N. and Wells, M. (2004), 'Identification of sound recordings', IEEE Signal Process. Mag. 21(2), 92-99.
- (2004) IEEE Signal Process. Mag. , vol.21 , Issue.2 , pp. 92-99
- Venkatachalam, V.¹ Cazzanti, L.² Dhillon, N.³ Wells, M.⁴

92
- 85159475599
- Transient modeling synthesis: A flexible analysis/synthesis tool for transient signals
- Verma, T., Levine, S. and Meng, T. (1997), Transient modeling synthesis: A flexible analysis/synthesis tool for transient signals, in 'Proceedings of the International Computer Music Conference (ICMC)', pp. 164-167.
- (1997) Proceedings of the International Computer Music Conference (ICMC) , pp. 164-167
- Verma, T.¹ Levine, S.² Meng, T.³

93
- 84892281943
- Vorhees, E. and Harman, D., eds (2001), NIST Special Publication 500-250:10th Text Retrieval Conference (TREC), chapter Common Evaluation Measures, pp. A14-A23.
- (2001) NIST Special Publication 500-250:10th Text Retrieval Conference (TREC), Chapter Common Evaluation Measures
- Vorhees, E.¹ Harman, D.²

94
- 85032751556
- Multimedia content analysis using both audio and visual clues
- Wang, Y., Liu, Z. and Huang, J.-C. (2000), 'Multimedia content analysis using both audio and visual clues', IEEE Signal Process. Mag. 17(6), 12-36.
- (2000) IEEE Signal Process. Mag. , vol.17 , Issue.6 , pp. 12-36
- Wang, Y.¹ Liu, Z.² Huang, J.-C.³

95
- 79952385877
- Segmentation of speech using speaker identification
- Wilcox, L., Chen, F., Kimber, D. and Balasubramanian, V. (1994), Segmentation of speech using speaker identification, in 'Proceedings of the ICASSP', pp. 161-164.
- (1994) Proceedings of the ICASSP , pp. 161-164
- Wilcox, L.¹ Chen, F.² Kimber, D.³ Balasubramanian, V.⁴

96
- 0025517070
- Automatic recognition of keywords in unconstrained speech using hidden markov models
- Wilpon, J., Rabiner, L. and Lee, C.-H. (1990), 'Automatic recognition of keywords in unconstrained speech using hidden markov models', IEEE Trans. Acoust., Speech, Signal Process. 38, 1870-1878.
- (1990) IEEE Trans. Acoust., Speech, Signal Process. , vol.38 , pp. 1870-1878
- Wilpon, J.¹ Rabiner, L.² Lee, C.-H.³

97
- 0030242072
- Content-based classification, search, and retrieval of audio
- Wold, E., Blum, T., Keislar, D. and Wheaton, J. (1996), 'Content-based classification, search, and retrieval of audio', IEEE Multimedia 3(3), 27-36.
- (1996) IEEE Multimedia , vol.3 , Issue.3 , pp. 27-36
- Wold, E.¹ Blum, T.² Keislar, D.³ Wheaton, J.⁴

98
- 0002452931
- The HTK large vocabulary recognition system for the 1995 ARPA H3 task
- Woodland, P., Gales, M., Pye, D. and Valtchev, V. (1996), The HTK large vocabulary recognition system for the 1995 ARPA H3 task, in 'Proceedings of the ARPA Speech Recognition Workshop', pp. 99-104.
- (1996) Proceedings of the ARPA Speech Recognition Workshop , pp. 99-104
- Woodland, P.¹ Gales, M.² Pye, D.³ Valtchev, V.⁴

99
- 0141478771
- UBM-based real-time speaker segmentation for broadcasting news
- Wu, T., Lu, L. and Zhang, H.-J. (2003), UBM-based real-time speaker segmentation for broadcasting news, in 'Proceedings of the ICASSP', Vol. 2, pp. 193-196.
- (2003) Proceedings of the ICASSP , vol.2 , pp. 193-196
- Wu, T.¹ Lu, L.² Zhang, H.-J.³

100
- 0141855132
- Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification
- Xiong, Z., Radhakrishnan, R., Divakaran, A. and Huang, T. S. (2003), Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification, in 'Proceedings of the ICASSP', Vol. 5, pp. 628-31.
- (2003) Proceedings of the ICASSP , vol.5 , pp. 628-631
- Xiong, Z.¹ Radhakrishnan, R.² Divakaran, A.³ Huang, T.S.⁴

101
- 85009160774
- An improved model-based speaker segmentation system
- Yu, P., Seide, F., Ma, C. and Chang, E. (2003), An improved model-based speaker segmentation system, in 'Proceedings of the EUROSPEECH', pp. 2025-2028.
- (2003) Proceedings of the EUROSPEECH , pp. 2025-2028
- Yu, P.¹ Seide, F.² Ma, C.³ Chang, E.⁴

102
- 0032629748
- Hierarchical classification of audio data for archiving and retrieving
- Zhang, T. and Kuo, C.-C. (1999a), Hierarchical classification of audio data for archiving and retrieving, in 'Proceedings of the ICASSP', Vol. 6, pp. 3001-3004.
- (1999) Proceedings of the ICASSP , vol.6 , pp. 3001-3004
- Zhang, T.¹ Kuo, C.-C.²

103
- 0033331933
- Classification and retrieval of sound effects in audiovisual data management
- Zhang, T. and Kuo, C.-C. J. (1999b), Classification and retrieval of sound effects in audiovisual data management, in 'Proceedings of the Asilomar Conference on Signals, Systems, and Computers', Vol. 1, pp. 730-734.
- (1999) Proceedings of the Asilomar Conference on Signals, Systems, and Computers , vol.1 , pp. 730-734
- Zhang, T.¹ Kuo, C.-C.J.²

104
- 85009275098
- Speechfind: An experimental on-line spoken document retrieval system for historical audio archives
- Zhou, B. and Hansen, J. (2002), Speechfind: An experimental on-line spoken document retrieval system for historical audio archives, in 'Proceedings of the ICSLP', Vol. 3, pp. 1969-1972.
- (2002) Proceedings of the ICSLP , vol.3 , pp. 1969-1972
- Zhou, B.¹ Hansen, J.²

105
- 85009089453
- Unsupervised audio stream segmentation and clustering via the bayesian information criterion
- Zhou, B. and Hansen, J. H. L. (2000), Unsupervised audio stream segmentation and clustering via the bayesian information criterion, in 'Proceedings of the ICSLP', Vol. 3, pp. 714-717.
- (2000) Proceedings of the ICSLP , vol.3 , pp. 714-717
- Zhou, B.¹ Hansen, J.H.L.²

106
- 0025587109
- The summit speech recognition system: Phonological modeling and lexical access
- Zue, V., Glass, J., Goodine, D., Phillips, M. and Seneff, S. (1990), The summit speech recognition system: phonological modeling and lexical access, in 'Proceedings of the ICASSP', Vol. 1, pp. 49-52.
- (1990) Proceedings of the ICASSP , vol.1 , pp. 49-52
- Zue, V.¹ Glass, J.² Goodine, D.³ Phillips, M.⁴ Seneff, S.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.