메뉴 건너뛰기




Volumn 19, Issue 5, 2011, Pages 1091-1102

HMM-Based Multipitch Tracking for Noisy and Reverberant Speech

Author keywords

Hidden Markov model (HMM) tracking; multi pitch tracking; pitch detection algorithm (PDA); room reverberation

Indexed keywords


EID: 85008056718     PISSN: 15587916     EISSN: 15587924     Source Type: Journal    
DOI: 10.1109/TASL.2010.2077280     Document Type: Article
Times cited : (63)

References (33)
  • 1
    • 0018455820 scopus 로고
    • Image method for efficiently simulating small-room acoustics
    • J. B. Allen and D. A. Berkley “Image method for efficiently simulating small-room acoustics,” J. Acoust. Soc. Amer., vol. 65, pp. 943–950, 1979.
    • (1979) J. Acoust. Soc. Amer. , vol.65 , pp. 943-950
    • Allen, J.B.1    Berkley, D.A.2
  • 2
    • 33646773610 scopus 로고    scopus 로고
    • Discriminative training of hidden Markov models for multiple pitch tracking
    • F. Bach and M. Jordan, “Discriminative training of hidden Markov models for multiple pitch tracking,” in Proc. IEEE ICASSP, 2005, pp. 489–492.
    • (2005) Proc. IEEE ICASSP , pp. 489-492
    • Bach, F.1    Jordan, M.2
  • 4
    • 0029745579 scopus 로고    scopus 로고
    • Neural correlates of the pitch of complex tones. I. Pitch and pitch salience
    • P. A. Cariani and B. Delgutte “Neural correlates of the pitch of complex tones. I. Pitch and pitch salience,” J. Neurophysiol., vol. 76, pp. 1698–1716, 1996.
    • (1996) J. Neurophysiol. , vol.76 , pp. 1698-1716
    • Cariani, P.A.1    Delgutte, B.2
  • 7
    • 0036214787 scopus 로고    scopus 로고
    • YIN, a fundamental frequency estimator for speech and music
    • A. de Cheveigne and H. Kawahara “YIN, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Amer., vol. 111, pp. 1917–1930, 2002.
    • (2002) J. Acoust. Soc. Amer. , vol.111 , pp. 1917-1930
    • de Cheveigne, A.1    Kawahara, H.2
  • 8
    • 56149096580 scopus 로고    scopus 로고
    • Robust F0 estimation based on a multichannel periodicity function for distant-talking speech
    • F. Flego and M. Omologo, “Robust F0 estimation based on a multichannel periodicity function for distant-talking speech,” in Proc. EU-SIPCO, 2006.
    • (2006) Proc. EU-SIPCO
    • Flego, F.1    Omologo, M.2
  • 9
    • 34248183857 scopus 로고
    • DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus
    • [Online]. Available: http://www.ldc. upenn.edu/Catalog/LDC93S1.html
    • J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, “DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus,” CDROM 1993 [Online]. Available: http://www.ldc. upenn.edu/Catalog/LDC93S1.html
    • (1993) CDROM
    • Garofolo, J.S.1    Lamel, L.F.2    Fisher, W.M.3    Fiscus, J.G.4    Pallett, D.S.5    Dahlgren, N.L.6
  • 11
    • 4644265990 scopus 로고    scopus 로고
    • Monaural speech segregation based on pitch tracking and amplitude modulation
    • Sep.
    • G. Hu and D. L. Wang “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Netw., vol. 15, no. 5, pp. 1135–1150, Sep. 2004.
    • (2004) IEEE Trans. Neural Netw. , vol.15 , Issue.5 , pp. 1135-1150
    • Hu, G.1    Wang, D.L.2
  • 12
    • 77955695149 scopus 로고    scopus 로고
    • A tandem algorithm for pitch estimation and voiced speech segregation
    • Nov.
    • G. Hu and D. L. Wang “A tandem algorithm for pitch estimation and voiced speech segregation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2067–2079, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2067-2079
    • Hu, G.1    Wang, D.L.2
  • 14
    • 65249103478 scopus 로고    scopus 로고
    • A supervised learning approach to monaural segregation of reverberant speech
    • May
    • Z. Jin and D. L. Wang “A supervised learning approach to monaural segregation of reverberant speech,” IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 625–638, May 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.4 , pp. 625-638
    • Jin, Z.1    Wang, D.L.2
  • 15
    • 39649094860 scopus 로고    scopus 로고
    • Multipitch analysis of polyphonic music and speech signals using an auditory model
    • Feb.
    • A. Klapuri “Multipitch analysis of polyphonic music and speech signals using an auditory model,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 2, pp. 255–266, Feb. 2008.
    • (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.2 , pp. 255-266
    • Klapuri, A.1
  • 18
    • 0023944462 scopus 로고
    • Simulation of auditory-neural transduction: Further studies
    • R. Meddis “Simulation of auditory-neural transduction: Further studies,” J. Acoust. Soc. Amer., vol. 83, pp. 1056–1063, 1988.
    • (1988) J. Acoust. Soc. Amer. , vol.83 , pp. 1056-1063
    • Meddis, R.1
  • 19
    • 0030846123 scopus 로고    scopus 로고
    • A unitary model of pitch perception
    • sR. Meddis and L. P. O'Mard “A unitary model of pitch perception,” J. Acoust. Soc. Amer., vol. 102, pp. 1811–1820, 1997.
    • (1997) J. Acoust. Soc. Amer. , vol.102 , pp. 1811-1820
    • Meddis, R.1    O'Mard, L.P.2
  • 20
    • 0141624530 scopus 로고
    • An efficient auditory filterbank based on the gammatone function
    • Cambridge, U.K., APU Rep. 2341
    • R. D. Patterson, I. Nimmo-Smith, J. Holdsworth, and P. Rice, “An efficient auditory filterbank based on the gammatone function,” in Appl. Psychol. Unit, Cambridge, U.K., 1988, APU Rep. 2341.
    • (1988) Appl. Psychol. Unit
    • Patterson, R.D.1    Nimmo-Smith, I.2    Holdsworth, J.3    Rice, P.4
  • 21
    • 4544369752 scopus 로고    scopus 로고
    • Extraction of pitch in adverse conditions
    • S. R. M. Prasanna and B. Yegnanarayana, “Extraction of pitch in adverse conditions,” in Proc. IEEE ICASSP, 2004, pp. 109–112.
    • (2004) Proc. IEEE ICASSP , pp. 109-112
    • Prasanna, S.R.M.1    Yegnanarayana, B.2
  • 22
    • 0031124228 scopus 로고    scopus 로고
    • A pitch determination and voice/unvoiced decision algorithm for noisy speech
    • J. Rouat, Y. C. Liu, and D. Morissette, “A pitch determination and voice/unvoiced decision algorithm for noisy speech,” Speech Commun., pp. 191–207, 1997.
    • (1997) Speech Commun. , pp. 191-207
    • Rouat, J.1    Liu, Y.C.2    Morissette, D.3
  • 23
    • 50249167077 scopus 로고    scopus 로고
    • Single and multiple F0 contour estimation through parametric spectrogram modeling of speech in noisy environments
    • May
    • J. L. Roux, H. Kameoka, N. Ono, A. de Cheveigne, and S. Sagayama “Single and multiple F0 contour estimation through parametric spectrogram modeling of speech in noisy environments,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1135–1145, May 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.4 , pp. 1135-1145
    • Roux, J.L.1    Kameoka, H.2    Ono, N.3    de Cheveigne, A.4    Sagayama, S.5
  • 24
    • 44649176982 scopus 로고    scopus 로고
    • Reverberation challenges the temporal representation of the pitch of complex sounds
    • M. Sayles and I. M. Winter “Reverberation challenges the temporal representation of the pitch of complex sounds,” Neuron, vol. 58, pp. 789–801, 2008.
    • (2008) Neuron , vol.58 , pp. 789-801
    • Sayles, M.1    Winter, I.M.2
  • 25
    • 0022341184 scopus 로고
    • Speech processing in the auditory system I: The representation of speech sounds in the responses of the auditory nerve
    • S. A. Shamma “Speech processing in the auditory system I: The representation of speech sounds in the responses of the auditory nerve,” J. Acoust. Soc. Amer., vol. 78, pp. 1613–1621, 1985.
    • (1985) J. Acoust. Soc. Amer. , vol.78 , pp. 1613-1621
    • Shamma, S.A.1
  • 26
    • 0032678076 scopus 로고    scopus 로고
    • Hidden Markov models based on multi-space probability distribution for pitch pattern modeling
    • K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Hidden Markov models based on multi-space probability distribution for pitch pattern modeling,” in Proc. IEEE ICASSP, 1999, pp. 229–232.
    • (1999) Proc. IEEE ICASSP , pp. 229-232
    • Tokuda, K.1    Masuko, T.2    Miyazaki, N.3    Kobayashi, T.4
  • 27
    • 0034319894 scopus 로고    scopus 로고
    • A computationally efficient multipitch analysis model
    • Nov.
    • T. Tolonen and M. Karjalainen “A computationally efficient multipitch analysis model,” IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 708–716, Nov. 2000.
    • (2000) IEEE Trans. Speech Audio Process. , vol.8 , Issue.6 , pp. 708-716
    • Tolonen, T.1    Karjalainen, M.2
  • 28
    • 4344685385 scopus 로고    scopus 로고
    • An improved method based on the MTF concept for restoring the power envelope from a reverberant signal
    • M. Unoki, M. Furukawa, K. Sakata, and M. Akagi “An improved method based on the MTF concept for restoring the power envelope from a reverberant signal,” Acoust. Sci. Technol., vol. 25, pp. 232–242, 2004.
    • (2004) Acoust. Sci. Technol. , vol.25 , pp. 232-242
    • Unoki, M.1    Furukawa, M.2    Sakata, K.3    Akagi, M.4
  • 29
    • 0027623210 scopus 로고
    • Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
    • A. Varga and H. J. M. Steeneken “Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, pp. 247–251, 1993.
    • (1993) Speech Commun. , vol.12 , pp. 247-251
    • Varga, A.1    Steeneken, H.J.M.2
  • 30
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary mask as the computational goal of auditory scene analysis
    • P. Divenyi, Ed. Norwell, MA: Kluwer
    • D. L. Wang, “On ideal binary mask as the computational goal of auditory scene analysis,” in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181–197.
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.L.1
  • 32
    • 0037767686 scopus 로고    scopus 로고
    • A multipitch tracking algorithm for noisy speech
    • May
    • M. Wu, D. L. Wang, and G. J. Brown, “A multipitch tracking algorithm for noisy speech,” IEEE Trans. Speech Audio Process., vol. 11, no. 3, pp. 229–241, May 2003.
    • (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.3 , pp. 229-241
    • Wu, M.1    Wang, D.L.2    Brown, G.J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.