SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 50, Issue 7, 2008, Pages 547-563

Detection of speech and music based on spectral tracking

(3) Taniguchi, Toru a Tohyama, Mikio a Shirai, Katsuhiko a

a WASEDA UNIVERSITY (Japan)

Author keywords

Sinusoidal model; Sinusoidal trajectory; Spectral tracking; Speech and music discrimination; Speech detection

Indexed keywords

AUDIO ACOUSTICS; CLASSIFICATION (OF INFORMATION); ELECTRONIC MUSICAL INSTRUMENTS; MICROFLUIDICS; MUSICAL INSTRUMENTS; SPEECH PROCESSING; TRAJECTORIES;

AUDIO INFORMATION; BACKGROUND MUSIC (BGM); CLASSIFICATION METHODS; COMPLEX (I/Q) SIGNALS; DETECTION METHODS; REAL WORLD; SINUSOIDAL TRAJECTORIES; STATISTICAL CLASSIFIERS; TEMPORAL CHARACTERISTICS; TEMPORAL FEATURES;

SPEECH;

EID: 45849121392 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2008.03.007 Document Type: Article

Times cited : (19)

References (33)

1
- 34047193159
- Sinusoidal model based on instantaneous frequency attractors
- Abe T., and Honda M. Sinusoidal model based on instantaneous frequency attractors. IEEE Trans. Audio, Speech Language Process. 14 4 (2006) 1292-1300
- (2006) IEEE Trans. Audio, Speech Language Process. , vol.14 , Issue.4 , pp. 1292-1300
- Abe, T.¹ Honda, M.²

2
- 0030371135
- Abe, T., Kobayashi, T., Imai, S., 1996. Robust pitch estimation with harmonics enhancement in noisy based on instantaneous frequency. In: Proc. ICSLP 9, Vol. 2, pp. 1277-1280.
- Abe, T., Kobayashi, T., Imai, S., 1996. Robust pitch estimation with harmonics enhancement in noisy based on instantaneous frequency. In: Proc. ICSLP 9, Vol. 2, pp. 1277-1280.

3
- 0003684441
- MIT Press
- Bregman A.S. Auditory Scene Analysis (1990), MIT Press
- (1990) Auditory Scene Analysis
- Bregman, A.S.¹

4
- 85143191655
- Chou, W., Gu, L., May 2001. Robust singing detection in speech/music discriminator design. In: Proc. ICASSP 2001, Vol. II, pp. 865-868.
- Chou, W., Gu, L., May 2001. Robust singing detection in speech/music discriminator design. In: Proc. ICASSP 2001, Vol. II, pp. 865-868.

5
- 0002629270
- Maximum likelihood from incomplete data via the EM algorithm
- Dempster A., Laird N., and Rubin D. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1 (1977) 1-38
- (1977) J. Roy. Statist. Soc. Ser. B , vol.39 , Issue.1 , pp. 1-38
- Dempster, A.¹ Laird, N.² Rubin, D.³

6
- 0027154415
- Depalle, P., Garcia, G., Rodet, X., 1993. Tracking of partials for additive sound synthesis using hidden markov models. In: ICASSP-93, Vol. 1, pp. 225-228.
- Depalle, P., Garcia, G., Rodet, X., 1993. Tracking of partials for additive sound synthesis using hidden markov models. In: ICASSP-93, Vol. 1, pp. 225-228.

7
- 0028831004
- Temporal envelope and fine structure cues for speech intelligibility
- Drullman R. Temporal envelope and fine structure cues for speech intelligibility. J. Acoust. Soc. Amer. 97 (1995) 585-592
- (1995) J. Acoust. Soc. Amer. , vol.97 , pp. 585-592
- Drullman, R.¹

8
- 45849089726
- Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2002. RWC music database: popular, classical, and jazz music databases. In: Proc. 3rd Internat. Conf. on Music Information Retrieval (ISMIR 2002), pp. 287-288.
- Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2002. RWC music database: popular, classical, and jazz music databases. In: Proc. 3rd Internat. Conf. on Music Information Retrieval (ISMIR 2002), pp. 287-288.

9
- 45849095873
- Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2003. RWC music database: music genre database and musical instrument sound database. In: Proc. 4th Internat. Conf. on Music Information Retrieval (ISMIR 2003), pp. 229-230.
- Goto, M., Hashiguchi, H., Nishimura, T., Oka, R., 2003. RWC music database: music genre database and musical instrument sound database. In: Proc. 4th Internat. Conf. on Music Information Retrieval (ISMIR 2003), pp. 229-230.

10
- 0003962869
- MacMillan
- Hogg R., and Ledolter J. Engineering Statistics (1987), MacMillan
- (1987) Engineering Statistics
- Hogg, R.¹ Ledolter, J.²

11
- 0032644224
- JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research
- Itou K., Yamamoto M., Takeda K., Takezawa T., Matsuoka T., Kobayashi T., Shikano K., and Itahashi S. JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. J. Acoust. Soc. Jpn. (E) 20 3 (1999) 199-206
- (1999) J. Acoust. Soc. Jpn. (E) , vol.20 , Issue.3 , pp. 199-206
- Itou, K.¹ Yamamoto, M.² Takeda, K.³ Takezawa, T.⁴ Matsuoka, T.⁵ Kobayashi, T.⁶ Shikano, K.⁷ Itahashi, S.⁸

12
- 0003658046
- John Wiley and Sons p. 592
- Jackson J. A User's Guide to Principal Components (1991), John Wiley and Sons p. 592
- (1991) A User's Guide to Principal Components
- Jackson, J.¹

13
- 0037347128
- Signal representation including waveform envelope by clustered line-spectrum modeling
- Kazama M., Yoshida K., and Tohyama M. Signal representation including waveform envelope by clustered line-spectrum modeling. J. Audio Eng. Soc. 51 3 (2003) 123-137
- (2003) J. Audio Eng. Soc. , vol.51 , Issue.3 , pp. 123-137
- Kazama, M.¹ Yoshida, K.² Tohyama, M.³

14
- 45849084436
- Kim, H., Burred, J., Sikora, T., 2004. How efficient is MPEG-7 for general sound recognition? In: 25th Internat. Audio Engineering Society Conference Metadata For Audio.
- Kim, H., Burred, J., Sikora, T., 2004. How efficient is MPEG-7 for general sound recognition? In: 25th Internat. Audio Engineering Society Conference Metadata For Audio.

15
- 85037142779
- Maekawa, K., Koiso, H., Furui, S., Isahara, H., 2000. Spontaneous speech corpus of Japanese. In: Proc. 2nd Internat. Conf. Language Resources and Evaluation (LREC2000), pp. 947-952.
- Maekawa, K., Koiso, H., Furui, S., Isahara, H., 2000. Spontaneous speech corpus of Japanese. In: Proc. 2nd Internat. Conf. Language Resources and Evaluation (LREC2000), pp. 947-952.

16
- 29844444090
- Marks, S., Gonzalez, R., 2005. Techniques for improving the accuracy of sinusoidal tracking. In: Proc. Internet and Multimedia Systems and Applications 2005, pp. 299-304.
- Marks, S., Gonzalez, R., 2005. Techniques for improving the accuracy of sinusoidal tracking. In: Proc. Internet and Multimedia Systems and Applications 2005, pp. 299-304.

17
- 84863772450
- Speech analysis/synthesis based on a sinusoidal representation
- McAulay R., and Quatieri T. Speech analysis/synthesis based on a sinusoidal representation. IEEE Trans. ASSP ASSP-34 4 (1986) 744-754
- (1986) IEEE Trans. ASSP , vol.ASSP-34 , Issue.4 , pp. 744-754
- McAulay, R.¹ Quatieri, T.²

18
- 84904439718
- Melih, K., Gonzalez, R., 1999. Audio source type segmentation using a perceptually based representation. In: Proc. 5th Internat. Symposium on Signal Processing and Its Applications, 1999, ISSPA'99, Vol. 1, pp. 51-54.
- Melih, K., Gonzalez, R., 1999. Audio source type segmentation using a perceptually based representation. In: Proc. 5th Internat. Symposium on Signal Processing and Its Applications, 1999, ISSPA'99, Vol. 1, pp. 51-54.

19
- 0034502302
- Melih, K., Gonzalez, R., 2000. Source segmentation for structured audio. In: IEEE Internat. Conf. on Multimedia and Expo, ICME 2000, Vol. 2, pp. 811-814.
- Melih, K., Gonzalez, R., 2000. Source segmentation for structured audio. In: IEEE Internat. Conf. on Multimedia and Expo, ICME 2000, Vol. 2, pp. 811-814.

20
- 0003789815
- Elsevier Academic Press pp. 269-298 (Chapter 8)
- Moore B.C.J. An Introduction to the Psychology of Hearing. fifth ed. (2004), Elsevier Academic Press pp. 269-298 (Chapter 8)
- (2004) An Introduction to the Psychology of Hearing. fifth ed.
- Moore, B.C.J.¹

21
- 45849098098
- Nawab, S.H., Espy-Wilson, C.Y., Mani, R., Bitar, N.N., 1998. Computational auditory scene analysis. Lawrence Erlbaum Associates, Knowledge-based analysis of speech mixed with sporadic environmental sounds, pp. 177-194 (Chapter 12).
- Nawab, S.H., Espy-Wilson, C.Y., Mani, R., Bitar, N.N., 1998. Computational auditory scene analysis. Lawrence Erlbaum Associates, Knowledge-based analysis of speech mixed with sporadic environmental sounds, pp. 177-194 (Chapter 12).

22
- 45849084435
- Plante, F., Meyer, G., Ainsworth, W.A., 1995. A pitch extraction reference database. In: EUROSPEECH'95. pp. 837-840.
- Plante, F., Meyer, G., Ainsworth, W.A., 1995. A pitch extraction reference database. In: EUROSPEECH'95. pp. 837-840.

23
- 0003425258
- Prentice-Hall pp. 274-277 (Chapter 6.1.5)
- Rabiner L., and Schafer R. Digtal Processing of Speech Signals (1978), Prentice-Hall pp. 274-277 (Chapter 6.1.5)
- (1978) Digtal Processing of Speech Signals
- Rabiner, L.¹ Schafer, R.²

24
- 45849148672
- Sakakibara, K.-I., Osaka, N., 1998. On concatenation of musical sounds using a sinusoidal model. In: Technical Report of IEICE, Vol. SP97-108, pp. 1-6 (in Japanese).
- Sakakibara, K.-I., Osaka, N., 1998. On concatenation of musical sounds using a sinusoidal model. In: Technical Report of IEICE, Vol. SP97-108, pp. 1-6 (in Japanese).

25
- 0029765670
- Saunders, J., 1996. Real-time discrimination of broadcast speech/music. In: Proc. ICASSP'96, Vol. 2, pp. 993-996.
- Saunders, J., 1996. Real-time discrimination of broadcast speech/music. In: Proc. ICASSP'96, Vol. 2, pp. 993-996.

26
- 0030648077
- Scheirer, E., Slaney, M., 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. ICASSP'97, Vol. II, pp. 1331-1334.
- Scheirer, E., Slaney, M., 1997. Construction and evaluation of a robust multifeature speech/music discriminator. In: Proc. ICASSP'97, Vol. II, pp. 1331-1334.

27
- 45849137620
- Takeuchi, S., Yamashita, M., Uchida, T., Sugiyama, M., 2001. Optimization of voice/music detection in sound dat. In: Consistent and Reliable Acoustic Cues for sound analysis (CRAC Workshop).
- Takeuchi, S., Yamashita, M., Uchida, T., Sugiyama, M., 2001. Optimization of voice/music detection in sound dat. In: Consistent and Reliable Acoustic Cues for sound analysis (CRAC Workshop).

28
- 33745184907
- Taniguchi, T., Adachi, A., Okawa, S., Honda, M., Shirai, K., 2005. Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals. In: Proc. Interspeech2005, pp. 589-592.
- Taniguchi, T., Adachi, A., Okawa, S., Honda, M., Shirai, K., 2005. Discrimination of speech, musical instruments and singing voices using the temporal patterns of sinusoidal segments in audio signals. In: Proc. Interspeech2005, pp. 589-592.

29
- 44449143006
- Taniguchi, T., Tohyama, M., Shirai, K., 2006. Spectral frequency tracking for classifying audio signals. In: IEEE Internat. Symposium on Signal Processing and Information Technology, 2006, pp. 300-303.
- Taniguchi, T., Tohyama, M., Shirai, K., 2006. Spectral frequency tracking for classifying audio signals. In: IEEE Internat. Symposium on Signal Processing and Information Technology, 2006, pp. 300-303.

30
- 45849099961
- Torkkola, K., 1999. Blind separation for audio signals - are we there yet? In: Proc. Internat. Workshop on Independent Component Analysis and Signal Separation (ICA'99).
- Torkkola, K., 1999. Blind separation for audio signals - are we there yet? In: Proc. Internat. Workshop on Independent Component Analysis and Signal Separation (ICA'99).

31
- 45849129969
- Virtanen, T., 2003. Sound source separation using sparse coding with temporal continuity objective. In: Proc. ICMC, pp. 231-234.
- Virtanen, T., 2003. Sound source separation using sparse coding with temporal continuity objective. In: Proc. ICMC, pp. 231-234.

32
- 0033707902
- Virtanen, T., Klapuri, A., 2000. Separation of harmonic sound sources using sinusoidal modeling. In: Proc. IEEE Internat. Conf. on Acoust. Speech Signal Process, ICASSP'00, Vol. 2, pp. 765-768.
- Virtanen, T., Klapuri, A., 2000. Separation of harmonic sound sources using sinusoidal modeling. In: Proc. IEEE Internat. Conf. on Acoust. Speech Signal Process, ICASSP'00, Vol. 2, pp. 765-768.

33
- 45849143454
- Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T., 2003. Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification. In: Proc. Internat. Conf. on Multimedia and Expo, 2003, ICME'03, Vol. 3, pp. 397-400.
- Xiong, Z., Radhakrishnan, R., Divakaran, A., Huang, T., 2003. Comparing MFCC and MPEG-7 audio features for feature extraction, maximum likelihood HMM and entropic prior HMM for sports audio classification. In: Proc. Internat. Conf. on Multimedia and Expo, 2003, ICME'03, Vol. 3, pp. 397-400.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.