메뉴 건너뛰기




Volumn 50, Issue 7, 2008, Pages 547-563

Detection of speech and music based on spectral tracking

Author keywords

Sinusoidal model; Sinusoidal trajectory; Spectral tracking; Speech and music discrimination; Speech detection

Indexed keywords

AUDIO ACOUSTICS; CLASSIFICATION (OF INFORMATION); ELECTRONIC MUSICAL INSTRUMENTS; MICROFLUIDICS; MUSICAL INSTRUMENTS; SPEECH PROCESSING; TRAJECTORIES;

EID: 45849121392     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2008.03.007     Document Type: Article
Times cited : (19)

References (33)
  • 1
    • 34047193159 scopus 로고    scopus 로고
    • Sinusoidal model based on instantaneous frequency attractors
    • Abe T., and Honda M. Sinusoidal model based on instantaneous frequency attractors. IEEE Trans. Audio, Speech Language Process. 14 4 (2006) 1292-1300
    • (2006) IEEE Trans. Audio, Speech Language Process. , vol.14 , Issue.4 , pp. 1292-1300
    • Abe, T.1    Honda, M.2
  • 2
    • 0030371135 scopus 로고    scopus 로고
    • Abe, T., Kobayashi, T., Imai, S., 1996. Robust pitch estimation with harmonics enhancement in noisy based on instantaneous frequency. In: Proc. ICSLP 9, Vol. 2, pp. 1277-1280.
    • Abe, T., Kobayashi, T., Imai, S., 1996. Robust pitch estimation with harmonics enhancement in noisy based on instantaneous frequency. In: Proc. ICSLP 9, Vol. 2, pp. 1277-1280.
  • 4
    • 85143191655 scopus 로고    scopus 로고
    • Chou, W., Gu, L., May 2001. Robust singing detection in speech/music discriminator design. In: Proc. ICASSP 2001, Vol. II, pp. 865-868.
    • Chou, W., Gu, L., May 2001. Robust singing detection in speech/music discriminator design. In: Proc. ICASSP 2001, Vol. II, pp. 865-868.
  • 5
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the EM algorithm
    • Dempster A., Laird N., and Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1 (1977) 1-38
    • (1977) J. Roy. Statist. Soc. Ser. B , vol.39 , Issue.1 , pp. 1-38
    • Dempster, A.1    Laird, N.2    Rubin, D.3
  • 6
    • 0027154415 scopus 로고    scopus 로고
    • Depalle, P., Garcia, G., Rodet, X., 1993. Tracking of partials for additive sound synthesis using hidden markov models. In: ICASSP-93, Vol. 1, pp. 225-228.
    • Depalle, P., Garcia, G., Rodet, X., 1993. Tracking of partials for additive sound synthesis using hidden markov models. In: ICASSP-93, Vol. 1, pp. 225-228.
  • 7
    • 0028831004 scopus 로고
    • Temporal envelope and fine structure cues for speech intelligibility
    • Drullman R. Temporal envelope and fine structure cues for speech intelligibility. J. Acoust. Soc. Amer. 97 (1995) 585-592
    • (1995) J. Acoust. Soc. Amer. , vol.97 , pp. 585-592
    • Drullman, R.1
  • 8
    • 45849089726 scopus 로고    scopus 로고
    • Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2002. RWC music database: popular, classical, and jazz music databases. In: Proc. 3rd Internat. Conf. on Music Information Retrieval (ISMIR 2002), pp. 287-288.
    • Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2002. RWC music database: popular, classical, and jazz music databases. In: Proc. 3rd Internat. Conf. on Music Information Retrieval (ISMIR 2002), pp. 287-288.
  • 9
    • 45849095873 scopus 로고    scopus 로고
    • Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2003. RWC music database: music genre database and musical instrument sound database. In: Proc. 4th Internat. Conf. on Music Information Retrieval (ISMIR 2003), pp. 229-230.
    • Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2003. RWC music database: music genre database and musical instrument sound database. In: Proc. 4th Internat. Conf. on Music Information Retrieval (ISMIR 2003), pp. 229-230.
  • 13
    • 0037347128 scopus 로고    scopus 로고
    • Signal representation including waveform envelope by clustered line-spectrum modeling
    • Kazama M., Yoshida K., and Tohyama M. Signal representation including waveform envelope by clustered line-spectrum modeling. J. Audio Eng. Soc. 51 3 (2003) 123-137
    • (2003) J. Audio Eng. Soc. , vol.51 , Issue.3 , pp. 123-137
    • Kazama, M.1    Yoshida, K.2    Tohyama, M.3
  • 14
    • 45849084436 scopus 로고    scopus 로고
    • Kim, H., Burred, J., Sikora, T., 2004. How efficient is MPEG-7 for general sound recognition? In: 25th Internat. Audio Engineering Society Conference Metadata For Audio.
    • Kim, H., Burred, J., Sikora, T., 2004. How efficient is MPEG-7 for general sound recognition? In: 25th Internat. Audio Engineering Society Conference Metadata For Audio.
  • 15
    • 85037142779 scopus 로고    scopus 로고
    • Maekawa, K., Koiso, H., Furui, S., Isahara, H., 2000. Spontaneous speech corpus of Japanese. In: Proc. 2nd Internat. Conf. Language Resources and Evaluation (LREC2000), pp. 947-952.
    • Maekawa, K., Koiso, H., Furui, S., Isahara, H., 2000. Spontaneous speech corpus of Japanese. In: Proc. 2nd Internat. Conf. Language Resources and Evaluation (LREC2000), pp. 947-952.
  • 16
    • 29844444090 scopus 로고    scopus 로고
    • Marks, S., Gonzalez, R., 2005. Techniques for improving the accuracy of sinusoidal tracking. In: Proc. Internet and Multimedia Systems and Applications 2005, pp. 299-304.
    • Marks, S., Gonzalez, R., 2005. Techniques for improving the accuracy of sinusoidal tracking. In: Proc. Internet and Multimedia Systems and Applications 2005, pp. 299-304.
  • 17
    • 84863772450 scopus 로고
    • Speech analysis/synthesis based on a sinusoidal representation
    • McAulay R., and Quatieri T. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. ASSP ASSP-34 4 (1986) 744-754
    • (1986) IEEE Trans. ASSP , vol.ASSP-34 , Issue.4 , pp. 744-754
    • McAulay, R.1    Quatieri, T.2
  • 18
    • 84904439718 scopus 로고    scopus 로고
    • Melih, K., Gonzalez, R., 1999. Audio source type segmentation using a perceptually based representation. In: Proc. 5th Internat. Symposium on Signal Processing and Its Applications, 1999, ISSPA'99, Vol. 1, pp. 51-54.
    • Melih, K., Gonzalez, R., 1999. Audio source type segmentation using a perceptually based representation. In: Proc. 5th Internat. Symposium on Signal Processing and Its Applications, 1999, ISSPA'99, Vol. 1, pp. 51-54.
  • 19
    • 0034502302 scopus 로고    scopus 로고
    • Melih, K., Gonzalez, R., 2000. Source segmentation for structured audio. In: IEEE Internat. Conf. on Multimedia and Expo, ICME 2000, Vol. 2, pp. 811-814.
    • Melih, K., Gonzalez, R., 2000. Source segmentation for structured audio. In: IEEE Internat. Conf. on Multimedia and Expo, ICME 2000, Vol. 2, pp. 811-814.
  • 21
    • 45849098098 scopus 로고    scopus 로고
    • Nawab, S.H., Espy-Wilson, C.Y., Mani, R., Bitar, N.N., 1998. Computational auditory scene analysis. Lawrence Erlbaum Associates, Knowledge-based analysis of speech mixed with sporadic environmental sounds, pp. 177-194 (Chapter 12).
    • Nawab, S.H., Espy-Wilson, C.Y., Mani, R., Bitar, N.N., 1998. Computational auditory scene analysis. Lawrence Erlbaum Associates, Knowledge-based analysis of speech mixed with sporadic environmental sounds, pp. 177-194 (Chapter 12).
  • 22
    • 45849084435 scopus 로고    scopus 로고
    • Plante, F., Meyer, G., Ainsworth, W.A., 1995. A pitch extraction reference database. In: EUROSPEECH'95. pp. 837-840.
    • Plante, F., Meyer, G., Ainsworth, W.A., 1995. A pitch extraction reference database. In: EUROSPEECH'95. pp. 837-840.
  • 24
    • 45849148672 scopus 로고    scopus 로고
    • Sakakibara, K.-I., Osaka, N., 1998. On concatenation of musical sounds using a sinusoidal model. In: Technical Report of IEICE, Vol. SP97-108, pp. 1-6 (in Japanese).
    • Sakakibara, K.-I., Osaka, N., 1998. On concatenation of musical sounds using a sinusoidal model. In: Technical Report of IEICE, Vol. SP97-108, pp. 1-6 (in Japanese).
  • 25
    • 0029765670 scopus 로고    scopus 로고
    • Saunders, J., 1996. Real-time discrimination of broadcast speech/music. In: Proc. ICASSP'96, Vol. 2, pp. 993-996.
    • Saunders, J., 1996. Real-time discrimination of broadcast speech/music. In: Proc. ICASSP'96, Vol. 2, pp. 993-996.
  • 26
    • 0030648077 scopus 로고    scopus 로고
    • Scheirer, E., Slaney, M., 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. ICASSP'97, Vol. II, pp. 1331-1334.
    • Scheirer, E., Slaney, M., 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. ICASSP'97, Vol. II, pp. 1331-1334.
  • 27
    • 45849137620 scopus 로고    scopus 로고
    • Takeuchi, S., Yamashita, M., Uchida, T., Sugiyama, M., 2001. Optimization of voice/music detection in sound dat. In: Consistent and Reliable Acoustic Cues for sound analysis (CRAC Workshop).
    • Takeuchi, S., Yamashita, M., Uchida, T., Sugiyama, M., 2001. Optimization of voice/music detection in sound dat. In: Consistent and Reliable Acoustic Cues for sound analysis (CRAC Workshop).
  • 28
    • 33745184907 scopus 로고    scopus 로고
    • Taniguchi, T., Adachi, A., Okawa, S., Honda, M., Shirai, K., 2005. Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals. In: Proc. Interspeech2005, pp. 589-592.
    • Taniguchi, T., Adachi, A., Okawa, S., Honda, M., Shirai, K., 2005. Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals. In: Proc. Interspeech2005, pp. 589-592.
  • 29
    • 44449143006 scopus 로고    scopus 로고
    • Taniguchi, T., Tohyama, M., Shirai, K., 2006. Spectral frequency tracking for classifying audio signals. In: IEEE Internat. Symposium on Signal Processing and Information Technology, 2006, pp. 300-303.
    • Taniguchi, T., Tohyama, M., Shirai, K., 2006. Spectral frequency tracking for classifying audio signals. In: IEEE Internat. Symposium on Signal Processing and Information Technology, 2006, pp. 300-303.
  • 30
    • 45849099961 scopus 로고    scopus 로고
    • Torkkola, K., 1999. Blind separation for audio signals - are we there yet? In: Proc. Internat. Workshop on Independent Component Analysis and Signal Separation (ICA'99).
    • Torkkola, K., 1999. Blind separation for audio signals - are we there yet? In: Proc. Internat. Workshop on Independent Component Analysis and Signal Separation (ICA'99).
  • 31
    • 45849129969 scopus 로고    scopus 로고
    • Virtanen, T., 2003. Sound source separation using sparse coding with temporal continuity objective. In: Proc. ICMC, pp. 231-234.
    • Virtanen, T., 2003. Sound source separation using sparse coding with temporal continuity objective. In: Proc. ICMC, pp. 231-234.
  • 32
    • 0033707902 scopus 로고    scopus 로고
    • Virtanen, T., Klapuri, A., 2000. Separation of harmonic sound sources using sinusoidal modeling. In: Proc. IEEE Internat. Conf. on Acoust. Speech Signal Process, ICASSP'00, Vol. 2, pp. 765-768.
    • Virtanen, T., Klapuri, A., 2000. Separation of harmonic sound sources using sinusoidal modeling. In: Proc. IEEE Internat. Conf. on Acoust. Speech Signal Process, ICASSP'00, Vol. 2, pp. 765-768.
  • 33
    • 45849143454 scopus 로고    scopus 로고
    • Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T., 2003. Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification. In: Proc. Internat. Conf. on Multimedia and Expo, 2003, ICME'03, Vol. 3, pp. 397-400.
    • Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T., 2003. Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification. In: Proc. Internat. Conf. on Multimedia and Expo, 2003, ICME'03, Vol. 3, pp. 397-400.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.