SCOPUS 정보 검색 플랫폼

Advances in Computers

Volumn 78, Issue , 2010, Pages 71-150

Features for Content-Based Audio Retrieval

(3) Mitrović, Dalibor a Zeppelzauer, Matthias a Breiteneder, Christian a

a VIENNA UNIVERSITY OF TECHNOLOGY (Austria)

Author keywords

Audio Retrieval; Content Based Audio Features; Content Based Retrieval; Feature Extraction; Taxonomy

Indexed keywords

EID: 85027695706 PISSN: 00652458 EISSN: None Source Type: Book Series
DOI: 10.1016/S0065-2458(10)78003-7 Document Type: Chapter

Times cited : (179)

References (204)

1
- 0004244302
- Fundamentals of Speech Recognition
- Prentice-Hall Upper Saddle River, NJ
- Rabiner, L., Juang, B., Fundamentals of Speech Recognition. 1993, Prentice-Hall, Upper Saddle River, NJ.
- (1993)
- Rabiner, L.¹ Juang, B.²

2
- 0037237084
- Music information retrieval
- (Chapter 7)
- Downie, J.S., Music information retrieval. Annu. Rev. Inform. Sci. Technol. 37 (2003), 295–340 (Chapter 7).
- (2003) Annu. Rev. Inform. Sci. Technol. , vol.37 , pp. 295-340
- Downie, J.S.¹

3
- 84892200847
- Signal Processing Methods for Music Transcription
- Springer New York, NY
- Klapuri, A., Davy, M., Signal Processing Methods for Music Transcription. 2006, Springer, New York, NY.
- (2006)
- Klapuri, A.¹ Davy, M.²

4
- 0042830801
- Comparison of techniques for environmental sound recognition
- Cowling, M., Sitte, R., Comparison of techniques for environmental sound recognition. Pattern Recogn. Lett. 24:15 (November 2003), 2895–2907.
- (2003) Pattern Recogn. Lett. , vol.24 , Issue.15 , pp. 2895-2907
- Cowling, M.¹ Sitte, R.²

5
- 0016572913
- A vector space model for automatic indexing
- Salton, G., Wong, A., Yang, C.S., A vector space model for automatic indexing. Commun. ACM, 18(11), 1975, 613620.
- (1975) Commun. ACM , vol.18 , Issue.11 , pp. 613620
- Salton, G.¹ Wong, A.² Yang, C.S.³

6
- 0003947444
- Principles of Visual Information Retrieval
- Springer London
- Lew, M.S., Principles of Visual Information Retrieval. January 2001, Springer, London.
- (2001)
- Lew, M.S.¹

7
- 85097913301
- International Conference on Music Information Retrieval last visited: September 2009
- ISMIR., International Conference on Music Information Retrieval, 2004 http://ismir2004.ismir.net last visited: September 2009.
- (2004)

8
- 84872166874
- Music Information Retrieval Evaluation Exchange
- last visited: September 2009
- MIREX. Music Information Retrieval Evaluation Exchange. 2007 http://www.music-ir.org/mirexwiki last visited: September 2009.
- (2007)

9
- 85097889292
- Bioacoustical Terminology, ANSI S3.20-1995 (R2003)
- American National Standards Institute New York, NY
- ANSI. Bioacoustical Terminology, ANSI S3.20-1995 (R2003). 1995, American National Standards Institute, New York, NY.
- (1995)

10
- 84952660190
- Zur Tonhöhenwahrnehmung von Klängen. I. Psychoakustische Grundlagen
- Terhardt, E., Zur Tonhöhenwahrnehmung von Klängen. I. Psychoakustische Grundlagen. Acustica 26 (1972), 173–186.
- (1972) Acustica , vol.26 , pp. 173-186
- Terhardt, E.¹

11
- 0004236521
- Psychoacoustics: Facts and Models
- second ed. Springer Berlin
- Zwicker, E., Fastl, H., Psychoacoustics: Facts and Models. second ed., 1999, Springer, Berlin.
- (1999)
- Zwicker, E.¹ Fastl, H.²

12
- 0035790891
- Musical instrument timbres classification with spectral features
- Proceedings of the IEEE Workshop on Multimedia Signal Processing, Cannes, France IEEE Piscataway, NJ
- Agostini, G., Longari, M., Pollastri, E., Musical instrument timbres classification with spectral features., Proceedings of the IEEE Workshop on Multimedia Signal Processing, Cannes, France, October 2001, IEEE, Piscataway, NJ, 97–102.
- (2001) , pp. 97-102
- Agostini, G.¹ Longari, M.² Pollastri, E.³

13
- 34547257372
- Perceptual distance in timbre space
- Proceedings of 11th Meeting of the International Conference on Auditory Display, Limerick, Ireland
- Terasawa, H., Slaney, M., Berger, J., Perceptual distance in timbre space., Proceedings of 11th Meeting of the International Conference on Auditory Display, Limerick, Ireland, July 2005, 61–68.
- (2005) , pp. 61-68
- Terasawa, H.¹ Slaney, M.² Berger, J.³

14
- 0001019347
- The relation of pitch to intensity
- Stevens, S.S., The relation of pitch to intensity. J. Acoust. Soc. Am. 6:3 (1935), 150–154.
- (1935) J. Acoust. Soc. Am. , vol.6 , Issue.3 , pp. 150-154
- Stevens, S.S.¹

15
- 0036497405
- Problems of music information retrieval in the real world
- Byrd, D., Crawford, T., Problems of music information retrieval in the real world. Inform. Process. Manage. 38:2 (March 2002), 249–272.
- (2002) Inform. Process. Manage. , vol.38 , Issue.2 , pp. 249-272
- Byrd, D.¹ Crawford, T.²

16
- 0038136880
- Survey of compressed-domain features used in audio–visual indexing and analysis
- Wang, A., Divakaran, A., Vetro, A., Chang, S.F., Sun, H., Survey of compressed-domain features used in audio–visual indexing and analysis. J. Vis. Commun. Image Represent. 14:2 (June 2003), 150–183.
- (2003) J. Vis. Commun. Image Represent. , vol.14 , Issue.2 , pp. 150-183
- Wang, A.¹ Divakaran, A.² Vetro, A.³ Chang, S.F.⁴ Sun, H.⁵

17
- 85032751556
- Multimedia content analysis using both audio and visual clues
- Wang, Y., Liu, Z., Huang, J.C., Multimedia content analysis using both audio and visual clues. IEEE Signal Process. Mag. 17:6 (November 2000), 12–36.
- (2000) IEEE Signal Process. Mag. , vol.17 , Issue.6 , pp. 12-36
- Wang, Y.¹ Liu, Z.² Huang, J.C.³

18
- 2942707401
- Manipulation, Analysis and Retrieval Systems for Audio Signals
- Ph.D. Thesis. Computer Science Department, Princeton University
- Tzanetakis, G., Manipulation, Analysis and Retrieval Systems for Audio Signals. 2002 Ph.D. Thesis. Computer Science Department, Princeton University.
- (2002)
- Tzanetakis, G.¹

19
- 0035786658
- Feature selection for automatic classification of musical instrument sounds
- JCDL'01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries ACM Press New York, NY
- Liu, M., Wan, C., Feature selection for automatic classification of musical instrument sounds., JCDL'01: Proceedings of the 1st ACM/IEEE-CS Joint Conference on Digital Libraries, 2001, ACM Press, New York, NY, 247–248.
- (2001) , pp. 247-248
- Liu, M.¹ Wan, C.²

20
- 0003801149
- Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing
- Kluwer Academic Publishers Boston, MA
- Zhang, T., Kuo, C.C.J., Content-Based Audio Classification and Retrieval for Audiovisual Data Parsing. 2001, Kluwer Academic Publishers, Boston, MA.
- (2001)
- Zhang, T.¹ Kuo, C.C.J.²

21
- 27844503926
- A physiologically inspired method for audio classification
- Ravindran, S., Schlemmer, K., Anderson, D., A physiologically inspired method for audio classification. EURASIP J. Appl. Signal Process. 2005:9 (2005), 1374–1381.
- (2005) EURASIP J. Appl. Signal Process. , vol.2005 , Issue.9 , pp. 1374-1381
- Ravindran, S.¹ Schlemmer, K.² Anderson, D.³

22
- 0141703354
- Robust speech recognition using features based on zero crossings with peak amplitudes
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, China IEEE Piscataway, NJ
- Gajic, B., Paliwal, K.K., Robust speech recognition using features based on zero crossings with peak amplitudes., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, China, April 2003, IEEE, Piscataway, NJ, 64–67.
- (2003) , pp. 64-67
- Gajic, B.¹ Paliwal, K.K.²

23
- 0028517164
- Rasta processing of speech
- Hermansky, H., Morgan, N., Rasta processing of speech. IEEE Trans. Speech Audio Process. 2 (1994), 578–589.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

24
- 4744344335
- Modulation-scale analysis for content identification
- Sukittanon, S., Atlas, L.E., Pitton, W.J., Modulation-scale analysis for content identification. IEEE Trans. Signal Process. 52:10 (2004), 3023–3035.
- (2004) IEEE Trans. Signal Process. , vol.52 , Issue.10 , pp. 3023-3035
- Sukittanon, S.¹ Atlas, L.E.² Pitton, W.J.³

25
- 15544385732
- Automatic feature extraction for classifying audio data
- Mierswa, I., Morik, K., Automatic feature extraction for classifying audio data. Mach. Learn. J. 58:2–3 (February 2005), 127–149.
- (2005) Mach. Learn. J. , vol.58 , Issue.2-3 , pp. 127-149
- Mierswa, I.¹ Morik, K.²

26
- 0002161311
- The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe-cracking
- M. Rosenblatt Proceedings of the Symposium on Time Series Analysis Wiley New York, NY
- Bogert, B., Healy, M., Tukey, J., The quefrency alanysis of time series for echoes: cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe-cracking. Rosenblatt, M., (eds.), Proceedings of the Symposium on Time Series Analysis, 1963, Wiley, New York, NY, 209–243.
- (1963) , pp. 209-243
- Bogert, B.¹ Healy, M.² Tukey, J.³

27
- 0030711174
- The modulation spectrogram: in pursuit of an invariant representation of speech
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing IEEE Piscataway, NJ vol. 3
- Greenberg, S., Kingsbury, B.E.D., The modulation spectrogram: in pursuit of an invariant representation of speech., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, April 1997, IEEE, Piscataway, NJ, 1647–1650 vol. 3.
- (1997) , pp. 1647-1650
- Greenberg, S.¹ Kingsbury, B.E.D.²

28
- 0038376759
- Content-based organization and visualization of music archives
- Proceedings of the 10th ACM International Conference on Multimedia ACM Press New York, NY
- Pampalk, E., Rauber, A., Merkl, D., Content-based organization and visualization of music archives., Proceedings of the 10th ACM International Conference on Multimedia, 2002, ACM Press, New York, NY, 570–579.
- (2002) , pp. 570-579
- Pampalk, E.¹ Rauber, A.² Merkl, D.³

29
- 0032136330
- Robust speech recognition using the modulation spectrogram
- Kingsbury, B., Morgan, N., Greenberg, S., Robust speech recognition using the modulation spectrogram. Speech Commun. 25 (1998), 117–132.
- (1998) Speech Commun. , vol.25 , pp. 117-132
- Kingsbury, B.¹ Morgan, N.² Greenberg, S.³

30
- 0003782493
- Analysis of Observed Chaotic Data
- Springer New York, NY
- Abarbanel, H., Analysis of Observed Chaotic Data. 1996, Springer, New York, NY.
- (1996)
- Abarbanel, H.¹

31
- 0141591552
- Speech recognition using reconstructed phase space features
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, China IEEE Piscataway, NJ
- Lindgren, A.C., Johnson, M.T., Povinelli, R.J., Speech recognition using reconstructed phase space features., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, China, April 2003, IEEE, Piscataway, NJ, 60–63.
- (2003) , pp. 60-63
- Lindgren, A.C.¹ Johnson, M.T.² Povinelli, R.J.³

32
- 84953656445
- Subdivision of the audible frequency range into critical bands (Frequenzgruppen)
- Zwicker, E., Subdivision of the audible frequency range into critical bands (Frequenzgruppen). J. Acoust. Soc. Am., 33, 1961, 248.
- (1961) J. Acoust. Soc. Am. , vol.33 , pp. 248
- Zwicker, E.¹

33
- 0025294553
- Auditory filter shapes at low center frequencies
- Moore, C.J., Peters, R.W., Glasberg, B.R., Auditory filter shapes at low center frequencies. J. Acoust. Soc. Am. 88:1 (1990), 132–140.
- (1990) J. Acoust. Soc. Am. , vol.88 , Issue.1 , pp. 132-140
- Moore, C.J.¹ Peters, R.W.² Glasberg, B.R.³

34
- 84955035459
- A scale for the measurement of the psychological magnitude pitch
- Stevens, S.S., Volkmann, J., Newman, E.B., A scale for the measurement of the psychological magnitude pitch. J. Acoust. Soc. Am. 8:3 (January 1937), 185–190.
- (1937) J. Acoust. Soc. Am. , vol.8 , Issue.3 , pp. 185-190
- Stevens, S.S.¹ Volkmann, J.² Newman, E.B.³

35
- 0003789815
- An Introduction to the Psychology of Hearing
- fifth ed. Academic Press Amsterdam
- Moore, B.C.J., An Introduction to the Psychology of Hearing. fifth ed., 2004, Academic Press, Amsterdam.
- (2004)
- Moore, B.C.J.¹

36
- 0020816083
- Suggested formulae for calculating auditory-filter bandwidths and excitation patterns
- Moore, B.C.J., Glasberg, B.R., Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. J. Acoust. Soc. Am. 74:3 (September 1983), 750–753.
- (1983) J. Acoust. Soc. Am. , vol.74 , Issue.3 , pp. 750-753
- Moore, B.C.J.¹ Glasberg, B.R.²

37
- 0000614795
- The auditory masking of one pure tone by another and its probable relation to the dynamics of the inner ear
- Wegel, R.L., Lane, C.E., The auditory masking of one pure tone by another and its probable relation to the dynamics of the inner ear. Phys. Rev. 23 (February 1924), 266–285.
- (1924) Phys. Rev. , vol.23 , pp. 266-285
- Wegel, R.L.¹ Lane, C.E.²

38
- 84955013022
- Loudness, its definition, measurement and calculation
- Fletcher, H., Munson, W.A., Loudness, its definition, measurement and calculation. J. Acoust. Soc. Am. 5:2 (October 1933), 82–108.
- (1933) J. Acoust. Soc. Am. , vol.5 , Issue.2 , pp. 82-108
- Fletcher, H.¹ Munson, W.A.²

39
- 85097862309
- International Standard 226, Acoustics—Normal Equal-Loudness Level Contours
- International Organization for Standardization (ISO). International Standard 226, Acoustics—Normal Equal-Loudness Level Contours. 1987.
- (1987)

40
- 0032657125
- The importance of perceptive adaptation of sound features for audio content processing
- Proceedings SPIE Conferences, Electronic Imaging 1999, Storage and Retrieval for Image and Video Databases VII, San Jose, CA
- Pfeiffer, S., The importance of perceptive adaptation of sound features for audio content processing., Proceedings SPIE Conferences, Electronic Imaging 1999, Storage and Retrieval for Image and Video Databases VII, San Jose, CA, January 1999, 328–337.
- (1999) , pp. 328-337
- Pfeiffer, S.¹

41
- 34447546202
- On the psychophysical law
- Stevens, S.S., On the psychophysical law. Psychol. Rev. 64:3 (May 1957), 153–181.
- (1957) Psychol. Rev. , vol.64 , Issue.3 , pp. 153-181
- Stevens, S.S.¹

42
- 4243152700
- Content-based identification of audio material using mpeg-7 low level description
- Proceedings of the International Symposium of Music Information Retrieval
- Allamanche, E., Herre, J., Helmuth, O., Frba, B., Kasten, T., Cremer, M., Content-based identification of audio material using mpeg-7 low level description., Proceedings of the International Symposium of Music Information Retrieval, 2001.
- (2001)
- Allamanche, E.¹ Herre, J.² Helmuth, O.³ Frba, B.⁴ Kasten, T.⁵ Cremer, M.⁶

43
- 0038444621
- Distortion discriminant analysis for audio fingerprinting
- Burges, C.J.C., Platt, J.C., Jana, S., Distortion discriminant analysis for audio fingerprinting. IEEE Trans. Speech Audio Process. 11:3 (May 2003), 165–174.
- (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.3 , pp. 165-174
- Burges, C.J.C.¹ Platt, J.C.² Jana, S.³

44
- 85009108066
- MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition
- Proceedings of the International Conference on Spoken Language Processing
- Shannon, B.J., Paliwal, K.K., MFCC computation from magnitude spectrum of higher lag autocorrelation coefficients for robust speech recognition., Proceedings of the International Conference on Spoken Language Processing, October 2004, 129–132.
- (2004) , pp. 129-132
- Shannon, B.J.¹ Paliwal, K.K.²

45
- 22544476848
- Combination of autocorrelation-based features and projection measure technique for speaker identification
- Yuo, K.H., Hwang, T.H., Wang, H.C., Combination of autocorrelation-based features and projection measure technique for speaker identification. IEEE Trans. Speech Audio Process. 13:4 (July 2005), 565–574.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.4 , pp. 565-574
- Yuo, K.H.¹ Hwang, T.H.² Wang, H.C.³

46
- 84873540864
- Audio matching via chroma-based statistical features
- Proceedings of the 6th International Conference on Music Information Retrieval, London
- Müller, M., Kurth, F., Clausen, M., Audio matching via chroma-based statistical features., Proceedings of the 6th International Conference on Music Information Retrieval, London, September 2005, 288–295.
- (2005) , pp. 288-295
- Müller, M.¹ Kurth, F.² Clausen, M.³

47
- 1542439119
- A comparative study on content-based music genre classification
- SIGIR'03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, ON, Canada ACM Press New York, NY
- Li, T., Ogihara, M., Li, Q., A comparative study on content-based music genre classification., SIGIR'03: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Toronto, ON, Canada, 2003, ACM Press, New York, NY, 282–289.
- (2003) , pp. 282-289
- Li, T.¹ Ogihara, M.² Li, Q.³

48
- 33644626634
- A large set of audio features for sound description (similarity and classification) in the CUIDADO project
- Peeters, G., A large set of audio features for sound description (similarity and classification) in the CUIDADO project. Technical Report, 2004.
- (2004) Technical Report
- Peeters, G.¹

49
- 0037708486
- Content-based audio classification and segmentation by using support vector machines
- Lu, L., Zhang, H.J., Li, S.Z., Content-based audio classification and segmentation by using support vector machines. Multimedia Syst. 8:6 (April 2003), 482–492.
- (2003) Multimedia Syst. , vol.8 , Issue.6 , pp. 482-492
- Lu, L.¹ Zhang, H.J.² Li, S.Z.³

50
- 84873543378
- Inferring efficient hierarchical taxonomies for MIR tasks, application to musical instruments
- Proceedings of the International Conference on Music Information Retrieval
- Essid, S., Richard, G., David, B., Inferring efficient hierarchical taxonomies for MIR tasks, application to musical instruments., Proceedings of the International Conference on Music Information Retrieval, September 2005.
- (2005)
- Essid, S.¹ Richard, G.² David, B.³

51
- 0347387977
- An experimental automatic word recognition system
- Joint Speech Research Unit Ruislip, England
- Bridle, J.S., Brown, M.D., An experimental automatic word recognition system. JSRU Report No. 1003, 1974, Joint Speech Research Unit, Ruislip, England.
- (1974) JSRU Report No. 1003
- Bridle, J.S.¹ Brown, M.D.²

52
- 0033705976
- Speech/music discrimination for multimedia applications
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey IEEE Piscataway, NJ vol. 6
- El-Maleh, K., Klein, M., Petrucci, G., Kabal, P., Speech/music discrimination for multimedia applications., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Istanbul, Turkey, June 2000, IEEE, Piscataway, NJ, 2445–2448 vol. 6.
- (2000) , pp. 2445-2448
- El-Maleh, K.¹ Klein, M.² Petrucci, G.³ Kabal, P.⁴

53
- 0022806994
- Spectral analysis and discrimination by zero-crossings
- Kedem, B., Spectral analysis and discrimination by zero-crossings. IEEE Proc. 74 (1986), 1477–1493.
- (1986) IEEE Proc. , vol.74 , pp. 1477-1493
- Kedem, B.¹

54
- 13144306118
- A speech/music discriminator based on RMS and zero-crossings
- Panagiotakis, C., Tziritas, G., A speech/music discriminator based on RMS and zero-crossings. IEEE Trans. Multimedia 7:1 (February 2005), 155–166.
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.1 , pp. 155-166
- Panagiotakis, C.¹ Tziritas, G.²

55
- 0029765670
- Real-time discrimination of broadcast speech/music
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Atlanta, GA IEEE Piscataway, NJ
- Saunders, J., Real-time discrimination of broadcast speech/music., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Atlanta, GA, May 1996, IEEE, Piscataway, NJ, 993–996.
- (1996) , pp. 993-996
- Saunders, J.¹

56
- 33646759744
- Features for audio and music classification
- Proceedings of the International Conference on Music Information Retrieval
- McKinney, M.F., Breebaart, J., Features for audio and music classification., Proceedings of the International Conference on Music Information Retrieval, October 2003.
- (2003)
- McKinney, M.F.¹ Breebaart, J.²

57
- 33646717803
- Fusion of audio and motion information on hmm-based highlight extraction for baseball games
- Cheng, C.C., Hsu, C.T., Fusion of audio and motion information on hmm-based highlight extraction for baseball games. IEEE Trans. Multimedia 8:3 (June 2006), 585–599.
- (2006) IEEE Trans. Multimedia , vol.8 , Issue.3 , pp. 585-599
- Cheng, C.C.¹ Hsu, C.T.²

58
- 11244258301
- Emotion recognition using acoustic features and textual content
- Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 1, Taipei, Taiwan IEEE Piscataway, NJ
- Chuang, Z.J., Wu, C.H., Emotion recognition using acoustic features and textual content., Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 1, Taipei, Taiwan, June 2004, IEEE, Piscataway, NJ, 53–56.
- (2004) , pp. 53-56
- Chuang, Z.J.¹ Wu, C.H.²

59
- 67650108322
- Automatic singer identification
- Proceedings of the IEEE International Conference on Multimedia and Expo IEEE Piscataway, NJ vol. 1
- Zhang, T., Automatic singer identification., Proceedings of the IEEE International Conference on Multimedia and Expo, July 2003, IEEE, Piscataway, NJ, 33–36 vol. 1.
- (2003) , pp. 33-36
- Zhang, T.¹

60
- 33846110275
- A flexible framework for key audio effects detection and auditory context inference
- Cai, R., Lu, L., Hanjalic, A., Zhang, H.J., Cai, L.H., A flexible framework for key audio effects detection and auditory context inference. IEEE Trans. Speech Audio Process. 14 (May 2006), 1026–1039.
- (2006) IEEE Trans. Speech Audio Process. , vol.14 , pp. 1026-1039
- Cai, R.¹ Lu, L.² Hanjalic, A.³ Zhang, H.J.⁴ Cai, L.H.⁵

61
- 0029765806
- Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments
- Proceedings of the International Conference on Acoustics, Speech, and Signal Processing IEEE Piscataway, NJ vol. 1
- Kim, D.-S., Jeong, J.-H., Kim, J.-W., Lee, S.-Y., Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments., Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, October 1996, IEEE, Piscataway, NJ, 61–64 vol. 1.
- (1996) , pp. 61-64
- Kim, D.-S.¹ Jeong, J.-H.² Kim, J.-W.³ Lee, S.-Y.⁴

62
- 0032785783
- Auditory processing of speech signals for robust speech recognition in real-world noisy environments
- Kim, D.S., Lee, S.Y., Kil, R.M., Auditory processing of speech signals for robust speech recognition in real-world noisy environments. IEEE Trans. Speech Audio Process. 7:1 (January 1999), 55–69.
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.1 , pp. 55-69
- Kim, D.S.¹ Lee, S.Y.² Kil, R.M.³

63
- 85009118262
- A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR
- Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea
- Ghulam, M., Fukuda, T., Horikawa, J., Nitta, T., A noise-robust feature extraction method based on pitch-synchronous ZCPA for ASR., Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea, October 2004, 133–136.
- (2004) , pp. 133-136
- Ghulam, M.¹ Fukuda, T.² Horikawa, J.³ Nitta, T.⁴

64
- 33646758174
- Pitch-synchronous ZCPA (PS-ZCPA)-based feature extraction with auditory masking
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Philadelphia, PA IEEE Piscataway, NJ
- Ghulam, M., Fukuda, T., Horikawa, J., Nitta, T., Pitch-synchronous ZCPA (PS-ZCPA)-based feature extraction with auditory masking., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Philadelphia, PA, March 2005, IEEE, Piscataway, NJ, 517–520.
- (2005) , pp. 517-520
- Ghulam, M.¹ Fukuda, T.² Horikawa, J.³ Nitta, T.⁴

65
- 85097901157
- Information Technology—Multimedia Content Description Interface—Part 4: Audio (Number 15938), ISO/IEC, Moving Pictures Expert Group
- first ed.
- ISO-IEC. Information Technology—Multimedia Content Description Interface—Part 4: Audio (Number 15938), ISO/IEC, Moving Pictures Expert Group. first ed., 2002.
- (2002)

66
- 34047265156
- Discrimination and retrieval of animal sounds
- Proceedings of IEEE Multimedia Modelling Conference, Beijing, China IEEE Piscataway, NJ
- Mitrovic, D., Zeppelzauer, M., Breiteneder, C., Discrimination and retrieval of animal sounds., Proceedings of IEEE Multimedia Modelling Conference, Beijing, China, January 2006, IEEE, Piscataway, NJ, 339–343.
- (2006) , pp. 339-343
- Mitrovic, D.¹ Zeppelzauer, M.² Breiteneder, C.³

67
- 33845633986
- Toward semantic indexing and retrieval using hierarchical audio models
- Chu, W.T., Cheng, W.H., Hsu, J.Y.J., Wu, J.L., Toward semantic indexing and retrieval using hierarchical audio models. Multimedia Syst. 10:6 (May 2005), 570–583.
- (2005) Multimedia Syst. , vol.10 , Issue.6 , pp. 570-583
- Chu, W.T.¹ Cheng, W.H.² Hsu, J.Y.J.³ Wu, J.L.⁴

68
- 33847295200
- SVM-based audio scene classification
- Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering, Wuhan, China IEEE Piscataway, NJ
- Jiang, H., Bai, J., Zhang, S., Xu, B., SVM-based audio scene classification., Proceedings of the IEEE International Conference on Natural Language Processing and Knowledge Engineering, Wuhan, China, October 2005, IEEE, Piscataway, NJ, 131–136.
- (2005) , pp. 131-136
- Jiang, H.¹ Bai, J.² Zhang, S.³ Xu, B.⁴

69
- 0030242072
- Content-based classification, search, and retrieval of audio
- Wold, T., Blum, D., Wheaton, J., Content-based classification, search, and retrieval of audio. IEEE Multimedia, 3(3), 1996, 2736.
- (1996) IEEE Multimedia , vol.3 , Issue.3 , pp. 2736
- Wold, T.¹ Blum, D.² Wheaton, J.³

70
- 0032181880
- Audio feature extraction and analysis for scene segmentation and classification
- Liu, Z., Wang, Y., Chen, T., Audio feature extraction and analysis for scene segmentation and classification. J. VLSI Signal Process. 20:1–2 (October 1998), 61–79.
- (1998) J. VLSI Signal Process. , vol.20 , Issue.1-2 , pp. 61-79
- Liu, Z.¹ Wang, Y.² Chen, T.³

71
- 0003425258
- Digital Processing of Speech Signals
- Prentice-Hall Englewood Cliffs, NJ
- Rabiner, L., Schafer, R., Digital Processing of Speech Signals. 1978, Prentice-Hall, Englewood Cliffs, NJ.
- (1978)
- Rabiner, L.¹ Schafer, R.²

72
- 0002884330
- The government standard linear predictive coding algorithm: LPC-10
- Tremain, T., The government standard linear predictive coding algorithm: LPC-10. Speech Technol. Mag. 1 (April 1982), 40–49.
- (1982) Speech Technol. Mag. , vol.1 , pp. 40-49
- Tremain, T.¹

73
- 20444491279
- Automatic classification of speech and music using neural networks
- MMDB'04: Proceedings of the 2nd ACM International Workshop on Multimedia Databases ACM Press New York, NY
- Khan, M.K.S., Al-Khatib, W.G., Moinuddin, M., Automatic classification of speech and music using neural networks., MMDB'04: Proceedings of the 2nd ACM International Workshop on Multimedia Databases, 2004, ACM Press, New York, NY, 94–99.
- (2004) , pp. 94-99
- Khan, M.K.S.¹ Al-Khatib, W.G.² Moinuddin, M.³

74
- 33746879922
- Machine-learning based classification of speech and music
- Khan, M.K.S., Al-Khatib, W.G., Machine-learning based classification of speech and music. Multimedia Syst. 12:1 (August 2006), 55–67.
- (2006) Multimedia Syst. , vol.12 , Issue.1 , pp. 55-67
- Khan, M.K.S.¹ Al-Khatib, W.G.²

75
- 0034867981
- A study on content-based classification and retrieval of audio database
- Proceedings of the International Symposium on Database Engineering and Applications, Grenoble, France IEEE Computer Society Washington, DC
- Liu, M., Wan, C., A study on content-based classification and retrieval of audio database., Proceedings of the International Symposium on Database Engineering and Applications, Grenoble, France, July 2001, IEEE Computer Society, Washington, DC, 339–345.
- (2001) , pp. 339-345
- Liu, M.¹ Wan, C.²

76
- 18744375187
- Automatic music classification and summarization
- Xu, C., Maddage, N.C., Shao, X., Automatic music classification and summarization. IEEE Trans. Speech Audio Process. 13:3 (May 2005), 441–450.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.3 , pp. 441-450
- Xu, C.¹ Maddage, N.C.² Shao, X.³

77
- 0031233424
- Speaker recognition: A tutorial
- Champbell, J.P., Speaker recognition: A tutorial. Proc. IEEE 85:9 (September 1997), 1437–1462.
- (1997) Proc. IEEE , vol.85 , Issue.9 , pp. 1437-1462
- Champbell, J.P.¹

78
- 4544247190
- Music instrument recognition: from isolated notes to solo phrases
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, Montreal, QC, Canada IEEE Piscataway, NJ
- Krishna, A.G., Sreenivas, T.V., Music instrument recognition: from isolated notes to solo phrases., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, Montreal, QC, Canada, May 2004, IEEE, Piscataway, NJ, 265–268.
- (2004) , pp. 265-268
- Krishna, A.G.¹ Sreenivas, T.V.²

79
- 0032023735
- Statistical properties of line spectrum pairs
- Tourneret, J.Y., Statistical properties of line spectrum pairs. Signal Process. 65:2 (March 1998), 239–255.
- (1998) Signal Process. , vol.65 , Issue.2 , pp. 239-255
- Tourneret, J.Y.¹

80
- 17444365032
- Unsupervised speaker segmentation and tracking in real-time audio content analysis
- Lu, L., Zhang, H.J., Unsupervised speaker segmentation and tracking in real-time audio content analysis. Multimedia Syst. 10:4 (April 2005), 332–343.
- (2005) Multimedia Syst. , vol.10 , Issue.4 , pp. 332-343
- Lu, L.¹ Zhang, H.J.²

81
- 13444256090
- Music artist style identification by semi-supervised learning from both lyrics and content
- Proceedings of the 12th Annual ACM International Conference on Multimedia ACM Press New York, NY
- Li, T., Ogihara, M., Music artist style identification by semi-supervised learning from both lyrics and content., Proceedings of the 12th Annual ACM International Conference on Multimedia, 2004, ACM Press, New York, NY, 364–367.
- (2004) , pp. 364-367
- Li, T.¹ Ogihara, M.²

82
- 33646739998
- Toward intelligent music information retrieval
- Li, T., Ogihara, M., Toward intelligent music information retrieval. IEEE Trans. Multimedia 8:3 (June 2006), 564–574.
- (2006) IEEE Trans. Multimedia , vol.8 , Issue.3 , pp. 564-574
- Li, T.¹ Ogihara, M.²

83
- 16244420091
- Multigroup classification of audio signals using time-frequency parameters
- Umapathy, K., Krishnan, S., Jimaa, S., Multigroup classification of audio signals using time-frequency parameters. IEEE Trans. Multimedia 7:2 (April 2005), 308–315.
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.2 , pp. 308-315
- Umapathy, K.¹ Krishnan, S.² Jimaa, S.³

84
- 0033279679
- Towards robust features for classifying audio in the CueVideo system
- Proceedings of the 7th ACM International Conference on Multimedia (Part 1) ACM Press New York, NY
- Srinivasan, S., Petkovic, D., Ponceleon, D., Towards robust features for classifying audio in the CueVideo system., Proceedings of the 7th ACM International Conference on Multimedia (Part 1), 1999, ACM Press, New York, NY, 393–400.
- (1999) , pp. 393-400
- Srinivasan, S.¹ Petkovic, D.² Ponceleon, D.³

85
- 33744973515
- Modeling timbre distance with temporal statistics from polyphonic music
- Mörchen, F., Ultsch, A., Thies, M., Löhken, I., Modeling timbre distance with temporal statistics from polyphonic music. IEEE Trans. Audio Speech Lang. Process. 14:1 (January 2006), 81–90.
- (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , Issue.1 , pp. 81-90
- Mörchen, F.¹ Ultsch, A.² Thies, M.³ Löhken, I.⁴

86
- 0030648077
- Construction and evaluation of a robust multi-feature speech/music discriminator
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Munich, Germany
- Scheirer, E., Slaney, M., Construction and evaluation of a robust multi-feature speech/music discriminator., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Munich, Germany, April 1997, 1331–1334.
- (1997) , pp. 1331-1334
- Scheirer, E.¹ Slaney, M.²

87
- 0034792569
- A robust audio classification and segmentation method
- Proceedings of the 9th ACM International Conference on Multimedia, Ottawa, ON, Canada ACM Press New York, NY
- Lu, L., Jiang, H., Zhang, H.J., A robust audio classification and segmentation method., Proceedings of the 9th ACM International Conference on Multimedia, Ottawa, ON, Canada, 2001, ACM Press, New York, NY, 203–211.
- (2001) , pp. 203-211
- Lu, L.¹ Jiang, H.² Zhang, H.J.³

88
- 0036648502
- Musical genre classification of audio signals
- Tzanetakis, G., Musical genre classification of audio signals. IEEE Trans. Speech Audio Process. 10:5 (July 2002), 293–302.
- (2002) IEEE Trans. Speech Audio Process. , vol.10 , Issue.5 , pp. 293-302
- Tzanetakis, G.¹

89
- 33746837319
- Audio-based gender identification using bootstrapping
- Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, BC, Canada IEEE Piscataway, NJ
- Tzanetakis, G., Audio-based gender identification using bootstrapping., Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, Victoria, BC, Canada, August 2005, IEEE, Piscataway, NJ, 432–433.
- (2005) , pp. 432-433
- Tzanetakis, G.¹

90
- 11144341364
- An industrial strength audio search algorithm
- Proceedings of the International Conference on Music Information Retrieval, Baltimore, MD
- Wang, A., An industrial strength audio search algorithm., Proceedings of the International Conference on Music Information Retrieval, Baltimore, MD, October 2003, 7–13.
- (2003) , pp. 7-13
- Wang, A.¹

91
- 33747199309
- The Shazam music recognition service
- Wang, A., The Shazam music recognition service. Commun. ACM 49:8 (August 2006), 44–48.
- (2006) Commun. ACM , vol.49 , Issue.8 , pp. 44-48
- Wang, A.¹

92
- 0026923568
- Significance of group delay functions in spectrum estimation
- Yegnanarayan, B., Murthy, H.A., Significance of group delay functions in spectrum estimation. IEEE Trans. Signal Process. 40:9 (September 1992), 2281–2289.
- (1992) IEEE Trans. Signal Process. , vol.40 , Issue.9 , pp. 2281-2289
- Yegnanarayan, B.¹ Murthy, H.A.²

93
- 0029375490
- Determination of instants of significant excitation in speech using group delay function
- Smits, R., Yegnanarayana, B., Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3:5 (September 1995), 325–333.
- (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 325-333
- Smits, R.¹ Yegnanarayana, B.²

94
- 14644411724
- Beat tracking of musical performances using low-level audio features
- Sethares, W.A., Morris, R.D., Sethares, J.C., Beat tracking of musical performances using low-level audio features. IEEE Trans. Speech Audio Process. 13:2 (March 2005), 275–285.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.2 , pp. 275-285
- Sethares, W.A.¹ Morris, R.D.² Sethares, J.C.³

95
- 33847165718
- Evaluation of the modified group delay feature for isolated word recognition
- Proceedings of the International Symposium on Signal Processing and Its Applications, vol. 2, Sydney, Australia IEEE Piscataway, NJ
- Alsteris, L.D., Paliwal, K.K., Evaluation of the modified group delay feature for isolated word recognition., Proceedings of the International Symposium on Signal Processing and Its Applications, vol. 2, Sydney, Australia, August 2005, IEEE, Piscataway, NJ, 715–718.
- (2005) , pp. 715-718
- Alsteris, L.D.¹ Paliwal, K.K.²

96
- 33845951461
- Significance of joint features derived from the modified group delay function in speech processing
- 10.1155/2007/79032
- Hegde, M., Murthy, H.A., Gadde, V.R., Significance of joint features derived from the modified group delay function in speech processing. EURASIP J. Appl. Signal Process. 15:1 (January 2007), 190–202 10.1155/2007/79032.
- (2007) EURASIP J. Appl. Signal Process. , vol.15 , Issue.1 , pp. 190-202
- Hegde, M.¹ Murthy, H.A.² Gadde, V.R.³

97
- 4544293687
- Application of the modified group delay function to speaker identification and discrimination
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, QC, Canada IEEE Piscataway, NJ
- Hegde, R.M., Murthy, H.A., Rao, G.V.R., Application of the modified group delay function to speaker identification and discrimination., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, QC, Canada, May 2004, IEEE, Piscataway, NJ, 517–520.
- (2004) , pp. 517-520
- Hegde, R.M.¹ Murthy, H.A.² Rao, G.V.R.³

98
- 0141480080
- The modified group delay function and its application to phoneme recognition
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, China IEEE Piscataway, NJ
- Murthy, H.A., Gadde, V., The modified group delay function and its application to phoneme recognition., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Hong Kong, China, April 2003, IEEE, Piscataway, NJ, 68–71.
- (2003) , pp. 68-71
- Murthy, H.A.¹ Gadde, V.²

99
- 14944343016
- Subband-based group delay segmentation of spontaneous speech into syllable-like units
- Nagarajan, T., Murthy, H.A., Subband-based group delay segmentation of spontaneous speech into syllable-like units. EURASIP J. Appl. Signal Process. 2004:17 (2004), 2614–2625.
- (2004) EURASIP J. Appl. Signal Process. , vol.2004 , Issue.17 , pp. 2614-2625
- Nagarajan, T.¹ Murthy, H.A.²

100
- 0024879901
- Formant extraction from Fourier transform phase
- International Conference on Acoustics, Speech, and Signal Processing vol. 1
- Murthy, H.A., Murthy, K.V.M., Yegnarayana, B., Formant extraction from Fourier transform phase., International Conference on Acoustics, Speech, and Signal Processing, May 1989, 484–487 vol. 1.
- (1989) , pp. 484-487
- Murthy, H.A.¹ Murthy, K.V.M.² Yegnarayana, B.³

101
- 62649114038
- Factors in automatic musical genre classification of audio signals
- Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY IEEE Piscataway, NJ
- Li, T., Tzanetakis, G., Factors in automatic musical genre classification of audio signals., Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2003, IEEE, Piscataway, NJ, 143–146.
- (2003) , pp. 143-146
- Li, T.¹ Tzanetakis, G.²

102
- 4544304284
- Harmonicity and dynamics-based features for audio
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, Montreal, QC, Canada IEEE Piscataway, NJ
- Srinivasan, H., Kankanhalli, M., Harmonicity and dynamics-based features for audio., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, Montreal, QC, Canada, May 2004, IEEE, Piscataway, NJ, 321–324.
- (2004) , pp. 321-324
- Srinivasan, H.¹ Kankanhalli, M.²

103
- 33750566007
- Gaussian mixture modeling using short time Fourier transform features for audio fingerprinting
- Proceedings of the IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands IEEE Piscataway, NJ
- Ramalingam, A., Krishnan, S., Gaussian mixture modeling using short time Fourier transform features for audio fingerprinting., Proceedings of the IEEE International Conference on Multimedia and Expo, Amsterdam, The Netherlands, July 2005, IEEE, Piscataway, NJ, 1146–1149.
- (2005) , pp. 1146-1149
- Ramalingam, A.¹ Krishnan, S.²

104
- 54249104868
- MPEG-7 Audio and Beyond
- Wiley West Sussex, England
- Kim, H., Moreau, N., Sikora, T., MPEG-7 Audio and Beyond. 2005, Wiley, West Sussex, England.
- (2005)
- Kim, H.¹ Moreau, N.² Sikora, T.³

105
- 36549038450
- How similar do songs sound? Towards modeling human perception of musical similarity
- Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY IEEE Piscataway, NJ
- Herre, J., Allamanche, E., Ertel, C., How similar do songs sound? Towards modeling human perception of musical similarity., Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2003, IEEE, Piscataway, NJ, 83–86.
- (2003) , pp. 83-86
- Herre, J.¹ Allamanche, E.² Ertel, C.³

106
- 85013917537
- Audio feature extraction and analysis for scene classification
- Proceedings of the IEEE Workshop on Multimedia Signal Processing, Princeton, NJ IEEE Piscataway, NJ
- Liu, Z., Huang, J., Wang, Y., Chen, T., Audio feature extraction and analysis for scene classification., Proceedings of the IEEE Workshop on Multimedia Signal Processing, Princeton, NJ, June 1997, IEEE, Piscataway, NJ, 343–348.
- (1997) , pp. 343-348
- Liu, Z.¹ Huang, J.² Wang, Y.³ Chen, T.⁴

107
- 17444399233
- Musical instrument timbres classification with spectral features
- Agostini, G., Longari, M., Pollastri, E., Musical instrument timbres classification with spectral features. EURASIP J. Appl. Signal Process. 2003:1 (2003), 5–14.
- (2003) EURASIP J. Appl. Signal Process. , vol.2003 , Issue.1 , pp. 5-14
- Agostini, G.¹ Longari, M.² Pollastri, E.³

108
- 33646532381
- Music genre classification with taxonomy
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing IEEE Piscataway, NJ vol. 5
- Li, T., Ogihara, M., Music genre classification with taxonomy., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, March 2005, IEEE, Piscataway, NJ, 197–200 vol. 5.
- (2005) , pp. 197-200
- Li, T.¹ Ogihara, M.²

109
- 0003579084
- Digital Coding of Waveforms: Principles and Applications to Speech and Video
- Prentice-Hall Englewood Cliffs, NJ
- Jayant, N.S., Noll, P., Digital Coding of Waveforms: Principles and Applications to Speech and Video. Prentice-Hall Signal Processing Series, 1984, Prentice-Hall, Englewood Cliffs, NJ.
- (1984) Prentice-Hall Signal Processing Series
- Jayant, N.S.¹ Noll, P.²

110
- 84948124247
- Visualization of metre and other rhythm features
- Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, Germany IEEE Piscataway, NJ
- Guaus, E., Batlle, E., Visualization of metre and other rhythm features., Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, Germany, December 2003, IEEE, Piscataway, NJ, 282–285.
- (2003) , pp. 282-285
- Guaus, E.¹ Batlle, E.²

111
- 11244341096
- Audio content identification by using perceptual hashing
- Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan IEEE Piscataway, NJ
- Lancini, R., Mapelli, F., Pezzano, R., Audio content identification by using perceptual hashing., Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, June 2004, IEEE, Piscataway, NJ, 739–742.
- (2004) , pp. 739-742
- Lancini, R.¹ Mapelli, F.² Pezzano, R.³

112
- 0035688755
- Robust matching of audio signals using spectral flatness features
- Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY IEEE Piscataway, NJ
- Herre, J., Allamanche, E., Hellmuth, O., Robust matching of audio signals using spectral flatness features., Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2001, IEEE, Piscataway, NJ, 127–130.
- (2001) , pp. 127-130
- Herre, J.¹ Allamanche, E.² Hellmuth, O.³

113
- 4544250678
- Spectral entropy based feature for robust ASR
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, QC, Canada IEEE Piscataway, NJ
- Misra, H., Ikbal, S., Bourlard, H., Hermansky, H., Spectral entropy based feature for robust ASR., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Montreal, QC, Canada, May 2004, IEEE, Piscataway, NJ, 193–196.
- (2004) , pp. 193-196
- Misra, H.¹ Ikbal, S.² Bourlard, H.³ Hermansky, H.⁴

114
- 33646801180
- Multi-resolution spectral entropy feature for robust ASR
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Philadelphia, PA IEEE Piscataway, NJ
- Misra, H., Ikbal, S., Sivadas, S., Bourlard, H., Multi-resolution spectral entropy feature for robust ASR., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Philadelphia, PA, March 2005, IEEE, Piscataway, NJ, 253–256.
- (2005) , pp. 253-256
- Misra, H.¹ Ikbal, S.² Sivadas, S.³ Bourlard, H.⁴

115
- 33744998131
- Musicminer: Visualizing timbre distances of music as topographical maps
- Mörchen, F., Ultsch, A., Thies, M., Löhken, I., Nöcker, M., Stamm, C., Efthymiou, N., Kümmerer, M., Musicminer: Visualizing timbre distances of music as topographical maps. Technical Report, 2005.
- (2005) Technical Report
- Mörchen, F.¹ Ultsch, A.² Thies, M.³ Löhken, I.⁴ Nöcker, M.⁵ Stamm, C.⁶ Efthymiou, N.⁷ Kümmerer, M.⁸

116
- 0035442477
- Scene determination based on video and audio features
- Pfeiffer, S., Lienhart, R., Effelsberg, W., Scene determination based on video and audio features. Multimedia Tools Appl. 15:1 (September 2001), 59–81.
- (2001) Multimedia Tools Appl. , vol.15 , Issue.1 , pp. 59-81
- Pfeiffer, S.¹ Lienhart, R.² Effelsberg, W.³

117
- 0003391579
- Pitch Determination of Speech Signals: Algorithms and Devices
- Springer Berlin
- Hess, W., Pitch Determination of Speech Signals: Algorithms and Devices. 1983, Springer, Berlin.
- (1983)
- Hess, W.¹

118
- 84892166605
- A spectrally mixed excitation (SMX) vocoder with robust parameter determination
- Proceedings of the International Conference on Acoustics, Speech and Signal Processing vol. 2
- Cho, Y.D., Kim, M.Y., Kim, S.R., A spectrally mixed excitation (SMX) vocoder with robust parameter determination., Proceedings of the International Conference on Acoustics, Speech and Signal Processing, May 1998, 601–604 vol. 2.
- (1998) , pp. 601-604
- Cho, Y.D.¹ Kim, M.Y.² Kim, S.R.³

119
- 0030846123
- A unitary model of pitch perception
- Meddis, R., O'Mard, L., A unitary model of pitch perception. J. Acoust. Soc. Am. 102:3 (September 1997), 1811–1820.
- (1997) J. Acoust. Soc. Am. , vol.102 , Issue.3 , pp. 1811-1820
- Meddis, R.¹ O'Mard, L.²

120
- 84953652991
- Circularity in judgements of relative pitch
- Shepard, R.N., Circularity in judgements of relative pitch. J. Acoust. Soc. Am. 36 (1964), 2346–2353.
- (1964) J. Acoust. Soc. Am. , vol.36 , pp. 2346-2353
- Shepard, R.N.¹

121
- 13144282752
- Audio thumbnailing of popular music using chroma-based representations
- Bartsch, M.A., Wakefield, G.H., Audio thumbnailing of popular music using chroma-based representations. IEEE Trans. Multimedia 7:1 (February 2005), 96–104.
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.1 , pp. 96-104
- Bartsch, M.A.¹ Wakefield, G.H.²

122
- 0141520565
- A chorus-section detecting method for musical audio signals
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Hong Kong, China IEEE Piscataway, NJ
- Goto, M., A chorus-section detecting method for musical audio signals., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Hong Kong, China, April 2003, IEEE, Piscataway, NJ, 437–440.
- (2003) , pp. 437-440
- Goto, M.¹

123
- 84892355208
- Information Retrieval for Music and Motion
- Springer Berlin
- Müller, M., Information Retrieval for Music and Motion. 2007, Springer, Berlin.
- (2007)
- Müller, M.¹

124
- 33646741047
- Precise pitch profile feature extraction from musical audio for key detection
- Zhu, Y., Kankanhalli, M.S., Precise pitch profile feature extraction from musical audio for key detection. IEEE Trans. Multimedia 8:3 (June 2006), 575–584.
- (2006) IEEE Trans. Multimedia , vol.8 , Issue.3 , pp. 575-584
- Zhu, Y.¹ Kankanhalli, M.S.²

125
- 0034853025
- Robust singing detection in speech/music discriminator design
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT IEEE Piscataway, NJ
- Chou, W., Gu, L., Robust singing detection in speech/music discriminator design., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, May 2001, IEEE, Piscataway, NJ, 865–868.
- (2001) , pp. 865-868
- Chou, W.¹ Gu, L.²

126
- 84889344642
- Instrument description in the context of MPEG-7
- Proceedings of International Computer Music Conference, Berlin, Germany
- Peeters, G., McAdams, S., Herrera, P., Instrument description in the context of MPEG-7., Proceedings of International Computer Music Conference, Berlin, Germany, August, 2000.
- (2000)
- Peeters, G.¹ McAdams, S.² Herrera, P.³

127
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Davis, S., Mermelstein, P., Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28:4 (August 1980), 357–366.
- (1980) IEEE Trans. Acoust. Speech Signal Process. , vol.28 , Issue.4 , pp. 357-366
- Davis, S.¹ Mermelstein, P.²

128
- 84953653667
- Short-time spectrum and “cepstrum” techniques for vocal-pitch detection
- Noll, A.M., Short-time spectrum and “cepstrum” techniques for vocal-pitch detection. J. Acoust. Soc. Am. 36:2 (1964), 296–302.
- (1964) J. Acoust. Soc. Am. , vol.36 , Issue.2 , pp. 296-302
- Noll, A.M.¹

129
- 13444270431
- Audio keyword generation for sports video analysis
- Proceedings of the ACM International Conference on Multimedia
- Xu, M., Duan, L., Chia, L., Xu, C., Audio keyword generation for sports video analysis., Proceedings of the ACM International Conference on Multimedia, 2004, 758–759.
- (2004) , pp. 758-759
- Xu, M.¹ Duan, L.² Chia, L.³ Xu, C.⁴

130
- 85097903590
- Noise robust Chinese speech recognition using feature vector normalization and higher-order cepstral coefficients
- Proceedings of the 5th International Conference on Signal Processing
- Wang, X., Dong, Y., Hakkinen, J., Viikki, O., Noise robust Chinese speech recognition using feature vector normalization and higher-order cepstral coefficients., Proceedings of the 5th International Conference on Signal Processing, August 2000, 738–741.
- (2000) , pp. 738-741
- Wang, X.¹ Dong, Y.² Hakkinen, J.³ Viikki, O.⁴

131
- 13444292995
- Content-based music structure analysis with applications to music semantics understanding
- Proceedings of the ACM International Conference on Multimedia ACM Press New York, NY
- Maddage, N., Xu, C., Kankanhalli, M., Shao, X., Content-based music structure analysis with applications to music semantics understanding., Proceedings of the ACM International Conference on Multimedia, 2004, ACM Press, New York, NY, 112–119.
- (2004) , pp. 112-119
- Maddage, N.¹ Xu, C.² Kankanhalli, M.³ Shao, X.⁴

132
- 84868695748
- On compensating the Mel-frequency cepstral coefficients for noisy speech recognition
- Proceedings of the Australasian Computer Science Conference, Hobart, Australia Australian Computer Society Darlinghurst, NSW
- Choi, E.H.C., On compensating the Mel-frequency cepstral coefficients for noisy speech recognition., Proceedings of the Australasian Computer Science Conference, Hobart, Australia, 2006, Australian Computer Society, Darlinghurst, NSW, 49–54.
- (2006) , pp. 49-54
- Choi, E.H.C.¹

133
- 85009115888
- An auditory system-based feature for robust speech recognition
- Proceedings of the European Conference on Speech Communication and Technology, Aalborg, Denmark International Speech Communication Association Geneva
- Li, Q., Soong, F.K., Siohan, O., An auditory system-based feature for robust speech recognition., Proceedings of the European Conference on Speech Communication and Technology, Aalborg, Denmark, September 2001, International Speech Communication Association, Geneva, 619–622.
- (2001) , pp. 619-622
- Li, Q.¹ Soong, F.K.² Siohan, O.³

134
- 0026626445
- Auditory representations of acoustic signals
- Yang, X., Wang, K., Shamma, S., Auditory representations of acoustic signals. IEEE Trans. Inform. Theory 38:2 (March 1992), 824–839.
- (1992) IEEE Trans. Inform. Theory , vol.38 , Issue.2 , pp. 824-839
- Yang, X.¹ Wang, K.² Shamma, S.³

135
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Hermansky, H., Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87:4 (April 1990), 1738–1752.
- (1990) J. Acoust. Soc. Am. , vol.87 , Issue.4 , pp. 1738-1752
- Hermansky, H.¹

136
- 0016067897
- Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
- Atal, B.S., Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. J. Acoust. Soc. Am. 55:6 (June 1974), 1304–1312.
- (1974) J. Acoust. Soc. Am. , vol.55 , Issue.6 , pp. 1304-1312
- Atal, B.S.¹

137
- 0035480380
- A speaker identification system using a model of artificial neural networks for an elevator application
- Adami, A., Barone, D., A speaker identification system using a model of artificial neural networks for an elevator application. Inform. Sci. 138:1–4 (October 2001), 1–5.
- (2001) Inform. Sci. , vol.138 , Issue.1-4 , pp. 1-5
- Adami, A.¹ Barone, D.²

138
- 0019939342
- Fluctuation strength and temporal masking patterns of amplitude-modulated broadband noise
- Fastl, H., Fluctuation strength and temporal masking patterns of amplitude-modulated broadband noise. Hear. Res. 8:1 (September 1982), 59–69.
- (1982) Hear. Res. , vol.8 , Issue.1 , pp. 59-69
- Fastl, H.¹

139
- 84873312246
- A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria
- Houtgast, T., Steeneken, H.J., A review of the MTF concept in room acoustics and its use for estimating speech intelligibility in auditoria. J. Acoust. Soc. Am. 77:3 (March 1985), 1069–1077.
- (1985) J. Acoust. Soc. Am. , vol.77 , Issue.3 , pp. 1069-1077
- Houtgast, T.¹ Steeneken, H.J.²

140
- 85143189691
- Modulation frequency features for audio fingerprinting
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Orlando, FL IEEE Piscataway, NJ
- Sukittanon, S., Atlas, L.E., Modulation frequency features for audio fingerprinting., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, Orlando, FL, May 2002, IEEE, Piscataway, NJ, 1773–1776.
- (2002) , pp. 1773-1776
- Sukittanon, S.¹ Atlas, L.E.²

141
- 0034515662
- Automatic audio segmentation using a measure of audio novelty
- Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 1, New York, NY IEEE Piscataway, NJ
- Foote, J., Automatic audio segmentation using a measure of audio novelty., Proceedings of the IEEE International Conference on Multimedia and Expo, vol. 1, New York, NY, August 2000, IEEE, Piscataway, NJ, 452–455.
- (2000) , pp. 452-455
- Foote, J.¹

142
- 84908266591
- The beat spectrum: a new approach to rhythm analysis
- Proceedings of the IEEE International Conference on Multimedia and Expo IEEE Piscataway, NJ
- Foote, J., Uchihashi, S., The beat spectrum: a new approach to rhythm analysis., Proceedings of the IEEE International Conference on Multimedia and Expo, 2001, IEEE, Piscataway, NJ, 881–884.
- (2001) , pp. 881-884
- Foote, J.¹ Uchihashi, S.²

143
- 36549014432
- The cyclic beat spectrum: tempo-related audio features for time-scale invariant audio identification
- Proceedings of the 7th International Conference on Music Information Retrieval, Victoria, BC, Canada
- Kurth, F., Gehrmann, T., Müller, M., The cyclic beat spectrum: tempo-related audio features for time-scale invariant audio identification., Proceedings of the 7th International Conference on Music Information Retrieval, Victoria, BC, Canada, October 2006, 35–40.
- (2006) , pp. 35-40
- Kurth, F.¹ Gehrmann, T.² Müller, M.³

144
- 0031972902
- Tempo and beat analysis of acoustic musical signals
- Scheirer, E., Tempo and beat analysis of acoustic musical signals. J. Acoust. Soc. Am. 103:1 (January 1998), 588–601.
- (1998) J. Acoust. Soc. Am. , vol.103 , Issue.1 , pp. 588-601
- Scheirer, E.¹

145
- 0003616413
- Music-Listening Systems
- Program in Media Arts and Sciences, Ph.D. Thesis, MIT, Cambridge, MA, 2000
- Scheirer, E., Music-Listening Systems. 2000 Program in Media Arts and Sciences, Ph.D. Thesis, MIT, Cambridge, MA, 2000.
- (2000)
- Scheirer, E.¹

146
- 0010051198
- Audio analysis using the discrete wavelet transform
- Proceedings of the International Conference on Acoustics and Music: Theory and Applications, Malta
- Tzanetakis, G., Essl, G., Cook, P., Audio analysis using the discrete wavelet transform., Proceedings of the International Conference on Acoustics and Music: Theory and Applications, Malta, September 2001.
- (2001)
- Tzanetakis, G.¹ Essl, G.² Cook, P.³

147
- 84890516600
- Human perception and computer extraction of musical beat strength
- Proceedings of the International Conference on Digital Audio Effects, Hamburg, Germany
- Tzanetakis, G., Essl, G., Cook, P., Human perception and computer extraction of musical beat strength., Proceedings of the International Conference on Digital Audio Effects, Hamburg, Germany, September 2002, 257–261.
- (2002) , pp. 257-261
- Tzanetakis, G.¹ Essl, G.² Cook, P.³

148
- 33744931123
- A wavelet packet representation of audio signals for music genre classification using different ensemble and feature selection techniques
- Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, CA ACM Press New York, NY
- Grimaldi, M., Cunningham, P., Kokaram, A., A wavelet packet representation of audio signals for music genre classification using different ensemble and feature selection techniques., Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, Berkeley, CA, 2003, ACM Press, New York, NY, 102–108.
- (2003) , pp. 102-108
- Grimaldi, M.¹ Cunningham, P.² Kokaram, A.³

149
- 0003456805
- A Wavelet Tour of Signal Processing
- Academic Press San Diego, CA
- Mallat, S., A Wavelet Tour of Signal Processing. 1999, Academic Press, San Diego, CA.
- (1999)
- Mallat, S.¹

150
- 0038535978
- Using psycho-acoustic models and self-organizing maps to create a hierarchical structuring of music by sound similarity
- Proceedings of the International Conference on Music Information Retrieval, Paris, France IRCAM-Centre Pompidou Paris
- Rauber, A., Pampalk, E., Merkl, D., Using psycho-acoustic models and self-organizing maps to create a hierarchical structuring of music by sound similarity., Proceedings of the International Conference on Music Information Retrieval, Paris, France, October 2002, IRCAM-Centre Pompidou, Paris.
- (2002)
- Rauber, A.¹ Pampalk, E.² Merkl, D.³

151
- 2542463254
- Audio classification based on MPEG-7 spectral basis representations
- Kim, H., Moreau, N., Sikora, T., Audio classification based on MPEG-7 spectral basis representations. IEEE Trans. Circuits Syst. Video Technol. 14 (2004), 716–725.
- (2004) IEEE Trans. Circuits Syst. Video Technol. , vol.14 , pp. 716-725
- Kim, H.¹ Moreau, N.² Sikora, T.³

152
- 17444446371
- Extracting noise-robust features from audio data
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL IEEE Piscataway, NJ
- Burges, C.J.C., Platt, J.C., Jana, S., Extracting noise-robust features from audio data., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, May 2002, IEEE, Piscataway, NJ, 1021–1024.
- (2002) , pp. 1021-1024
- Burges, C.J.C.¹ Platt, J.C.² Jana, S.³

153
- 0032627304
- A modulated complex lapped transform and its applications to audio processing
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ IEEE Piscataway, NJ
- Malvar, H., A modulated complex lapped transform and its applications to audio processing., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Phoenix, AZ, March 1999, IEEE, Piscataway, NJ, 1421–1424.
- (1999) , pp. 1421-1424
- Malvar, H.¹

154
- 27744493655
- Nonlinear speech analysis using models for chaotic systems
- Kokkinos, I., Maragos, P., Nonlinear speech analysis using models for chaotic systems. IEEE Trans. Speech Audio Process. 13:6 (November 2005), 1098–1109.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.6 , pp. 1098-1109
- Kokkinos, I.¹ Maragos, P.²

155
- 0036289924
- Speech analysis and feature extraction using chaotic models
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL IEEE Piscataway, NJ
- Pitsikalis, V., Maragos, P., Speech analysis and feature extraction using chaotic models., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, May 2002, IEEE, Piscataway, NJ, 533–536.
- (2002) , pp. 533-536
- Pitsikalis, V.¹ Maragos, P.²

156
- 27944451785
- Feature analysis and extraction for audio automatic classification
- Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Big Island, HI IEEE Piscataway, NJ
- Bai, L., Hu, Y., Lao, S., Chen, J., Wu, L., Feature analysis and extraction for audio automatic classification., Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Big Island, HI, October 2005, IEEE, Piscataway, NJ, 767–772.
- (2005) , pp. 767-772
- Bai, L.¹ Hu, Y.² Lao, S.³ Chen, J.⁴ Wu, L.⁵

157
- 33746817948
- A silence detection and suppression technique design for voice over IP systems
- Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, Victoria, BC, Canada IEEE Piscataway, NJ
- Becker, R., Corsetti, G., Guedes Silveira, J., Balbinot, R., Castello, F., A silence detection and suppression technique design for voice over IP systems., Proceedings of the IEEE Pacific Rim Conference on Communications, Computers, and Signal Processing, Victoria, BC, Canada, August 2005, IEEE, Piscataway, NJ, 173–176.
- (2005) , pp. 173-176
- Becker, R.¹ Corsetti, G.² Guedes Silveira, J.³ Balbinot, R.⁴ Castello, F.⁵

158
- 0034796139
- Pause concepts for audio segmentation at different semantic levels
- Proceedings of the ACM International Conference on Multimedia, Ottawa, ON, Canada ACM Press New York, NY
- Pfeiffer, S., Pause concepts for audio segmentation at different semantic levels., Proceedings of the ACM International Conference on Multimedia, Ottawa, ON, Canada, 2001, ACM Press, New York, NY, 187–193.
- (2001) , pp. 187-193
- Pfeiffer, S.¹

159
- 85009090165
- High-level feature weighted GMM network for audio stream classification
- Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea
- Huang, R., Hansen, J.H.L., High-level feature weighted GMM network for audio stream classification., Proceedings of the International Conference on Spoken Language Processing, Jeju Island, Korea, October 2004, 1061–1064.
- (2004) , pp. 1061-1064
- Huang, R.¹ Hansen, J.H.L.²

160
- 8344242026
- Local fuzzy PCA based GMM with dimension reduction on speaker identification
- Lee, K.Y., Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recogn. Lett. 25:16 (2004), 1811–1817.
- (2004) Pattern Recogn. Lett. , vol.25 , Issue.16 , pp. 1811-1817
- Lee, K.Y.¹

161
- 0036293699
- Merging segmental and rhythmic features for automatic language identification
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL IEEE Piscataway, NJ
- Farinas, J., Pellegrino, F.C., Rouas, J.-L., Andre-Obrech, F., Merging segmental and rhythmic features for automatic language identification., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, May 2002, IEEE, Piscataway, NJ, 753–756.
- (2002) , pp. 753-756
- Farinas, J.¹ Pellegrino, F.C.² Rouas, J.-L.³ Andre-Obrech, F.⁴

162
- 0141591602
- Speaker and text independent language identification using predictive error histogram vectors
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China IEEE Piscataway, NJ
- Gu, Q.R., Shibata, T., Speaker and text independent language identification using predictive error histogram vectors., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, April 2003, IEEE, Piscataway, NJ, 36–39.
- (2003) , pp. 36-39
- Gu, Q.R.¹ Shibata, T.²

163
- 0035441593
- Spoken language recognition—a step toward multilinguality in speech processing
- Navratil, J., Spoken language recognition—a step toward multilinguality in speech processing. IEEE Trans. Speech Audio Process. 9:6 (September 2001), 678–685.
- (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.6 , pp. 678-685
- Navratil, J.¹

164
- 85009275225
- Approaches to language identification using Gaussian mixture models and shifted delta cepstral features
- Proceedings of the International Conference on Spoken Language Processing, Denver, CO
- Torres-Carrasquillo, P.A., Singer, E., Kohler, M.A., Greene, R.J., Reynolds, D.A., Deller, J.R. Jr., Approaches to language identification using Gaussian mixture models and shifted delta cepstral features., Proceedings of the International Conference on Spoken Language Processing, Denver, CO, September 2002, 89–92.
- (2002) , pp. 89-92
- Torres-Carrasquillo, P.A.¹ Singer, E.² Kohler, M.A.³ Greene, R.J.⁴ Reynolds, D.A.⁵ Deller, J.R.⁶

165
- 33644609617
- Emotive alert: HMM-based emotion detection in voicemail messages
- Proceedings of the International Conference on Intelligent User Interfaces, San Diego, CA ACM Press New York, NY
- Inanoglu, Z., Caneel, R., Emotive alert: HMM-based emotion detection in voicemail messages., Proceedings of the International Conference on Intelligent User Interfaces, San Diego, CA, 2005, ACM Press, New York, NY, 251–253.
- (2005) , pp. 251-253
- Inanoglu, Z.¹ Caneel, R.²

166
- 0141702124
- Classification of stress in speech using linear and nonlinear features
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China IEEE Piscataway, NJ
- Nwe, T.L., Foo, S.W., De Silva, L.C., Classification of stress in speech using linear and nonlinear features., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, April 2003, IEEE, Piscataway, NJ, 9–12.
- (2003) , pp. 9-12
- Nwe, T.L.¹ Foo, S.W.² De Silva, L.C.³

167
- 77956269951
- Towards automatic recognition of emotion in speech
- Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, Germany IEEE Piscataway, NJ
- Razak, A.A., Yusof, M.H.M., Komiya, R., Towards automatic recognition of emotion in speech., Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, Germany, December 2003, IEEE, Piscataway, NJ, 548–551.
- (2003) , pp. 548-551
- Razak, A.A.¹ Yusof, M.H.M.² Komiya, R.³

168
- 0036299156
- Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Orlando, FL IEEE Piscataway, NJ
- Minematsu, N., Sekiguchi, M., Hirose, K., Automatic estimation of one's age with his/her speech based upon acoustic modeling techniques of speakers., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, Orlando, FL, May 2002, IEEE, Piscataway, NJ, 137–140.
- (2002) , pp. 137-140
- Minematsu, N.¹ Sekiguchi, M.² Hirose, K.³

169
- 33846961105
- Comparison of neural networks and support vector machines applied to optimized features extracted from patients’ speech signal for classification of vocal fold inflammation
- Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece IEEE Piscataway, NJ
- Behroozmand, R., Almasganj, F., Comparison of neural networks and support vector machines applied to optimized features extracted from patients’ speech signal for classification of vocal fold inflammation., Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Athens, Greece, December 2005, IEEE, Piscataway, NJ, 844–849.
- (2005) , pp. 844-849
- Behroozmand, R.¹ Almasganj, F.²

170
- 84948186412
- Non-negative component parts of sound for classification
- Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, Germany IEEE Piscataway, NJ
- Cho, Y.C., Choi, S., Bang, S.Y., Non-negative component parts of sound for classification., Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Darmstadt, Germany, December 2003, IEEE, Piscataway, NJ, 633–636.
- (2003) , pp. 633-636
- Cho, Y.C.¹ Choi, S.² Bang, S.Y.³

171
- 33646787141
- Use of modulation spectra for representation and classification of acoustic transients from sniper fire
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, Philadelphia, PA IEEE Piscataway, NJ
- Owsley, L., Atlas, L., Heinemann, C., Use of modulation spectra for representation and classification of acoustic transients from sniper fire., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 4, Philadelphia, PA, March 2005, IEEE, Piscataway, NJ, 1129–1132.
- (2005) , pp. 1129-1132
- Owsley, L.¹ Atlas, L.² Heinemann, C.³

172
- 33749069115
- Audio analysis for surveillance applications
- Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY IEEE Piscataway, NJ
- Radhakrishnan, R., Divakaran, A., Smaragdis, P., Audio analysis for surveillance applications., Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2003, IEEE, Piscataway, NJ, 158–161.
- (2003) , pp. 158-161
- Radhakrishnan, R.¹ Divakaran, A.² Smaragdis, P.³

173
- 0030396150
- Automatic audio content analysis
- Proceedings of the ACM International Conference on Multimedia, Boston, MA ACM Press New York, NY
- Pfeiffer, S., Fischer, S., Effelsberg, E., Automatic audio content analysis., Proceedings of the ACM International Conference on Multimedia, Boston, MA, 1996, ACM Press, New York, NY, 21–30.
- (1996) , pp. 21-30
- Pfeiffer, S.¹ Fischer, S.² Effelsberg, E.³

174
- 11244271500
- Robust soccer highlight generation with a novel dominant-speech feature extractor
- Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan IEEE Piscataway, NJ vol. 1
- Wang, K., Xu, C., Robust soccer highlight generation with a novel dominant-speech feature extractor., Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, June 2004, IEEE, Piscataway, NJ, 591–594 vol. 1.
- (2004) , pp. 591-594
- Wang, K.¹ Xu, C.²

175
- 33745827806
- Affect-based indexing and retrieval of films
- Proceedings of the Annual ACM International Conference on Multimedia, Singapore ACM Press Berkeley
- Chan, C.G., Jones, G.J.F., Affect-based indexing and retrieval of films., Proceedings of the Annual ACM International Conference on Multimedia, Singapore, 2005, ACM Press, Berkeley, 427–430.
- (2005) , pp. 427-430
- Chan, C.G.¹ Jones, G.J.F.²

176
- 4544273366
- Content based audio classification and retrieval using joint time-frequency analysis
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada IEEE Piscataway, NJ
- Esmaili, S., Krishnan, S., Raahemifar, K., Content based audio classification and retrieval using joint time-frequency analysis., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada, May 2004, IEEE, Piscataway, NJ, 665–668.
- (2004) , pp. 665-668
- Esmaili, S.¹ Krishnan, S.² Raahemifar, K.³

177
- 21544467298
- Content-based recognition of musical instruments
- Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Rome, Italy IEEE Piscataway, NJ
- Fanelli, A.M., Caponetti, L., Castellano, G., Buscicchio, C.A., Content-based recognition of musical instruments., Proceedings of the IEEE International Symposium on Signal Processing and Information Technology, Rome, Italy, December 2004, IEEE, Piscataway, NJ, 361–364.
- (2004) , pp. 361-364
- Fanelli, A.M.¹ Caponetti, L.² Castellano, G.³ Buscicchio, C.A.⁴

178
- 33744926889
- Discrete wavelet packet transform and ensembles of lazy and eager learners for music genre classification
- Grimaldi, M., Cunningham, P., Kokaram, A., Discrete wavelet packet transform and ensembles of lazy and eager learners for music genre classification. Multimedia Syst. 11:5 (April 2006), 422–437.
- (2006) Multimedia Syst. , vol.11 , Issue.5 , pp. 422-437
- Grimaldi, M.¹ Cunningham, P.² Kokaram, A.³

179
- 46149102188
- Singing voice features by time-frequency representations
- Proceedings of the International Symposium on Image and Signal Processing and Analysis, vol. 1, Rome, Italy IEEE Piscataway, NJ
- Mesaros, A., Lupu, E., Rusu, C., Singing voice features by time-frequency representations., Proceedings of the International Symposium on Image and Signal Processing and Analysis, vol. 1, Rome, Italy, September 2003, IEEE, Piscataway, NJ, 471–475.
- (2003) , pp. 471-475
- Mesaros, A.¹ Lupu, E.² Rusu, C.³

180
- 29044450290
- The way it sounds: timbre models for analysis and retrieval of music signals
- Aucouturier, J.-J., Pachet, F., Sandler, M., The way it sounds: timbre models for analysis and retrieval of music signals. IEEE Trans. Multimedia 7:6 (December 2005), 1028–1035.
- (2005) IEEE Trans. Multimedia , vol.7 , Issue.6 , pp. 1028-1035
- Aucouturier, J.-J.¹ Pachet, F.² Sandler, M.³

181
- 33749077780
- Hierarchical multi-class self similarities
- Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY IEEE Piscataway, NJ
- Jehan, T., Hierarchical multi-class self similarities., Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2005, IEEE, Piscataway, NJ, 311–314.
- (2005) , pp. 311-314
- Jehan, T.¹

182
- 4544274781
- Content-based music similarity search and emotion detection
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Montreal, QC, Canada IEEE Piscataway, NJ
- Li, T., Ogihara, M., Content-based music similarity search and emotion detection., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 5, Montreal, QC, Canada, May 2004, IEEE, Piscataway, NJ, 705–708.
- (2004) , pp. 705-708
- Li, T.¹ Ogihara, M.²

183
- 4744357961
- A unified approach to content-based and fault-tolerant music recognition
- Clausen, M., Kurth, F., A unified approach to content-based and fault-tolerant music recognition. IEEE Trans. Multimedia 6:5 (October 2004), 717–731.
- (2004) IEEE Trans. Multimedia , vol.6 , Issue.5 , pp. 717-731
- Clausen, M.¹ Kurth, F.²

184
- 15344342335
- Repeating pattern discovery and structure analysis from acoustic music data
- Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, NY ACM Press New York, NY
- Lu, L., Wang, M., Zhang, H.J., Repeating pattern discovery and structure analysis from acoustic music data., Proceedings of the ACM SIGMM International Workshop on Multimedia Information Retrieval, New York, NY, 2004, ACM Press, New York, NY, 275–282.
- (2004) , pp. 275-282
- Lu, L.¹ Wang, M.² Zhang, H.J.³

185
- 3042520493
- Recognition of piano notes with the aid of FRM filters
- Proceedings of the International Symposium on Control, Communications and Signal Processing, Hammamet, Tunisia IEEE Piscataway, NJ
- Foo, S.W., Leem, W.T., Recognition of piano notes with the aid of FRM filters., Proceedings of the International Symposium on Control, Communications and Signal Processing, Hammamet, Tunisia, March 2004, IEEE, Piscataway, NJ, 409–413.
- (2004) , pp. 409-413
- Foo, S.W.¹ Leem, W.T.²

186
- 84945133945
- Summarizing popular music via structural similarity analysis
- Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY IEEE Piscataway, NJ
- Cooper, M., Foote, J., Summarizing popular music via structural similarity analysis., Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2003, IEEE, Piscataway, NJ, 127–130.
- (2003) , pp. 127-130
- Cooper, M.¹ Foote, J.²

187
- 85097907867
- Cubyhum: a fully operational query by humming system
- Proceedings of the International Conference on Music Information Retrieval, Paris, France IRCAM-Centre Pompidou Paris
- Pauws, S., Cubyhum: a fully operational query by humming system., Proceedings of the International Conference on Music Information Retrieval, Paris, France, October 2002, IRCAM-Centre Pompidou, Paris.
- (2002)
- Pauws, S.¹

188
- 0037622306
- Enhancing sonic browsing using audio information retrieval
- Proceedings of the International Conference on Auditory Display, Kyoto, Japan
- Brazil, E., Fernström, M., Tzanetakis, G., Cook, P., Enhancing sonic browsing using audio information retrieval., Proceedings of the International Conference on Auditory Display, Kyoto, Japan, July 2002.
- (2002)
- Brazil, E.¹ Fernström, M.² Tzanetakis, G.³ Cook, P.⁴

189
- 33646767819
- Improving music genre classification by short time feature integration
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA IEEE Piscataway, NJ
- Meng, A., Ahrendt, P., Larsen, J., Improving music genre classification by short time feature integration., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, March 2005, IEEE, Piscataway, NJ, 497–500.
- (2005) , pp. 497-500
- Meng, A.¹ Ahrendt, P.² Larsen, J.³

190
- 0141743614
- Musical genre classification using support vector machines
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China IEEE Piscataway, NJ
- Changsheng, X., Maddage, N.C., Xi, S., Fang, C., Qi, T., Musical genre classification using support vector machines., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, April 2003, IEEE, Piscataway, NJ, 429–432.
- (2003) , pp. 429-432
- Changsheng, X.¹ Maddage, N.C.² Xi, S.³ Fang, C.⁴ Qi, T.⁵

191
- 0035685514
- To catch a chorus: using chroma-based representations for audio thumbnailing
- Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, New Paltz, NY IEEE Piscataway, NJ
- Bartsch, M.A., Wakefield, G.H., To catch a chorus: using chroma-based representations for audio thumbnailing., Proceedings of the IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics, New Paltz, NY, October 2001, IEEE, Piscataway, NJ, 15–18.
- (2001) , pp. 15-18
- Bartsch, M.A.¹ Wakefield, G.H.²

192
- 85097899262
- Comparison of several acoustic modeling techniques and decoding algorithms for embedded speech recognition systems
- Proceedings of the Workshop on DSP in Mobile and Vehicular Systems, Nagoya, Japan
- Lvy, C., Linars, G., Nocera, P., Comparison of several acoustic modeling techniques and decoding algorithms for embedded speech recognition systems., Proceedings of the Workshop on DSP in Mobile and Vehicular Systems, Nagoya, Japan, April 2003.
- (2003)
- Lvy, C.¹ Linars, G.² Nocera, P.³

193
- 30344446676
- Pseudo complex cepstrum using discrete cosine transform
- Muralishankar, R., Ramakrishnan, A.G., Pseudo complex cepstrum using discrete cosine transform. Int. J. Speech Technol. 8:2 (June 2005), 181–191.
- (2005) Int. J. Speech Technol. , vol.8 , Issue.2 , pp. 181-191
- Muralishankar, R.¹ Ramakrishnan, A.G.²

194
- 85097870299
- Speaker adaptive speech recognition using phone pair model
- Proceedings of the 5th International Conference on Signal Processing, Beijing, China
- Baojie, L., Hirose, K., Speaker adaptive speech recognition using phone pair model., Proceedings of the 5th International Conference on Signal Processing, Beijing, China, August 2000, 714–717.
- (2000) , pp. 714-717
- Baojie, L.¹ Hirose, K.²

195
- 85097903590
- Noise robust Chinese speech recognition using feature vector normalization and higher-order cepstral coefficients
- Proceedings of the 5th International Conference on Signal Processing, Beijing, China
- Wang, X., Dong, Y., Häkkinen, J., Viikki, O., Noise robust Chinese speech recognition using feature vector normalization and higher-order cepstral coefficients., Proceedings of the 5th International Conference on Signal Processing, Beijing, China, August 2000, 738–741.
- (2000) , pp. 738-741
- Wang, X.¹ Dong, Y.² Häkkinen, J.³ Viikki, O.⁴

196
- 0036298770
- Modulation features for speech recognition
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL IEEE Piscataway, NJ
- Dimitriadis, D., Maragos, P., Potamianos, A., Modulation features for speech recognition., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, May 2002, IEEE, Piscataway, NJ, 377–380.
- (2002) , pp. 377-380
- Dimitriadis, D.¹ Maragos, P.² Potamianos, A.³

197
- 33947639038
- Joint acoustic-modulation frequency for speaker recognition
- Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France IEEE Piscataway, NJ
- Kinnunen, T., Joint acoustic-modulation frequency for speaker recognition., Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Toulouse, France, May 2006, IEEE, Piscataway, NJ, 665–668.
- (2006) , pp. 665-668
- Kinnunen, T.¹

198
- 33644628732
- Evaluation of frequently used audio features for classification of music into perceptual categories
- Proceedings of the 4th International Workshop Content-Based Multimedia Indexing, Riga, Latvia
- Pohle, T., Pampalk, E., Widmer, G., Evaluation of frequently used audio features for classification of music into perceptual categories., Proceedings of the 4th International Workshop Content-Based Multimedia Indexing, Riga, Latvia, 2005.
- (2005)
- Pohle, T.¹ Pampalk, E.² Widmer, G.³

199
- 85009188340
- Nonlinear analysis of speech signals: generalized dimensions and Lyapunov exponents
- Proceedings of the European Conference on Speech Communication and Technology, Geneva, Switzerland
- Pitsikalis, V., Kokkinos, I., Maragos, P., Nonlinear analysis of speech signals: generalized dimensions and Lyapunov exponents., Proceedings of the European Conference on Speech Communication and Technology, Geneva, Switzerland, September 2003, 817–820.
- (2003) , pp. 817-820
- Pitsikalis, V.¹ Kokkinos, I.² Maragos, P.³

200
- 33750574645
- Pitch histograms in audio and symbolic music information retrieval
- Tzanetakis, G., Ermolinskyi, A., Cook, P., Pitch histograms in audio and symbolic music information retrieval. J. New Music Res. 32:2 (June 2003), 143–152.
- (2003) J. New Music Res. , vol.32 , Issue.2 , pp. 143-152
- Tzanetakis, G.¹ Ermolinskyi, A.² Cook, P.³

201
- 11244339730
- An audio recommendation system based on audio signature description scheme in MPEG-7 audio
- Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, vol. 1 IEEE Piscataway, NJ
- Huang, Y.C., Jenor, S.K., An audio recommendation system based on audio signature description scheme in MPEG-7 audio., Proceedings of the IEEE International Conference on Multimedia and Expo, Taipei, Taiwan, vol. 1, June 2004, IEEE, Piscataway, NJ, 639–642.
- (2004) , pp. 639-642
- Huang, Y.C.¹ Jenor, S.K.²

202
- 0035576554
- Indexing and retrieval of audio: a survey
- Lu, G., Indexing and retrieval of audio: a survey. Multimedia Tools Appl. 15:3 (December 2001), 269–290.
- (2001) Multimedia Tools Appl. , vol.15 , Issue.3 , pp. 269-290
- Lu, G.¹

203
- 3042712303
- Audio information retrieval a bibliographical study
- Davy, M., Godsill, S.J., Audio information retrieval a bibliographical study. Technical Report, February 2002.
- (2002) Technical Report
- Davy, M.¹ Godsill, S.J.²

204
- 84942244978
- A review of algorithms for audio fingerprinting
- Proceedings of the IEEE Workshop on Multimedia Signal Processing, St. Thomas, VI IEEE Piscataway, NJ
- Cano, P., Batle, E., Kalker, T., Haitsma, J., A review of algorithms for audio fingerprinting., Proceedings of the IEEE Workshop on Multimedia Signal Processing, St. Thomas, VI, December 2002, IEEE, Piscataway, NJ, 169–173.
- (2002) , pp. 169-173
- Cano, P.¹ Batle, E.² Kalker, T.³ Haitsma, J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.