SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 19, Issue 5, 2011, Pages 1091-1102

HMM-Based Multipitch Tracking for Noisy and Reverberant Speech

(2) Jin, Zhaozhang a Wang, DeLiang a

a Ohio State University (United States)

Author keywords

Hidden Markov model (HMM) tracking; multi pitch tracking; pitch detection algorithm (PDA); room reverberation

Indexed keywords

EID: 85008056718 PISSN: 15587916 EISSN: 15587924 Source Type: Journal
DOI: 10.1109/TASL.2010.2077280 Document Type: Article

Times cited : (63)

References (33)

1
- 0018455820
- Image method for efficiently simulating small-room acoustics
- J. B. Allen and D. A. Berkley “Image method for efficiently simulating small-room acoustics,” J. Acoust. Soc. Amer., vol. 65, pp. 943–950, 1979.
- (1979) J. Acoust. Soc. Amer. , vol.65 , pp. 943-950
- Allen, J.B.¹ Berkley, D.A.²

2
- 33646773610
- Discriminative training of hidden Markov models for multiple pitch tracking
- F. Bach and M. Jordan, “Discriminative training of hidden Markov models for multiple pitch tracking,” in Proc. IEEE ICASSP, 2005, pp. 489–492.
- (2005) Proc. IEEE ICASSP , pp. 489-492
- Bach, F.¹ Jordan, M.²

3
- 85008004589
- Reverberation
- D. L. Wang and G. J. Brown, Eds. Hoboken, NJ: Wiley/IEEE Press
- G. J. Brown and K. J. Palomaki, “Reverberation,” in Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, D. L. Wang and G. J. Brown, Eds. Hoboken, NJ: Wiley/IEEE Press, 2006, pp. 209–250.
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , pp. 209-250
- Brown, G.J.¹ Palomaki, K.J.²

4
- 0029745579
- Neural correlates of the pitch of complex tones. I. Pitch and pitch salience
- P. A. Cariani and B. Delgutte “Neural correlates of the pitch of complex tones. I. Pitch and pitch salience,” J. Neurophysiol., vol. 76, pp. 1698–1716, 1996.
- (1996) J. Neurophysiol. , vol.76 , pp. 1698-1716
- Cariani, P.A.¹ Delgutte, B.²

5
- 0003479143
- Cambridge, U.K.: Cambridge Univ. Press
- M. P. Cooke, Modeling Auditory Processing and Organization. Cambridge, U.K.: Cambridge Univ. Press, 1993.
- (1993) Modeling Auditory Processing and Organization
- Cooke, M.P.¹

6
- 85008537526
- 0estimation
- D. L. Wang and G. J. Brown, Eds. Hoboken, NJ: Wiley/IEEE Press
- 0estimation,” in Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, D. L. Wang and G. J. Brown, Eds. Hoboken, NJ: Wiley/IEEE Press, 2006, pp. 45–78.
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , pp. 45-78
- de Cheveigne, A.¹

7
- 0036214787
- YIN, a fundamental frequency estimator for speech and music
- A. de Cheveigne and H. Kawahara “YIN, a fundamental frequency estimator for speech and music,” J. Acoust. Soc. Amer., vol. 111, pp. 1917–1930, 2002.
- (2002) J. Acoust. Soc. Amer. , vol.111 , pp. 1917-1930
- de Cheveigne, A.¹ Kawahara, H.²

8
- 56149096580
- Robust F0 estimation based on a multichannel periodicity function for distant-talking speech
- F. Flego and M. Omologo, “Robust F0 estimation based on a multichannel periodicity function for distant-talking speech,” in Proc. EU-SIPCO, 2006.
- (2006) Proc. EU-SIPCO
- Flego, F.¹ Omologo, M.²

9
- 34248183857
- DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus
- [Online]. Available: http://www.ldc. upenn.edu/Catalog/LDC93S1.html
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, “DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus,” CDROM 1993 [Online]. Available: http://www.ldc. upenn.edu/Catalog/LDC93S1.html
- (1993) CDROM
- Garofolo, J.S.¹ Lamel, L.F.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵ Dahlgren, N.L.⁶

10
- 0003391579
- Berlin, Germany: Springer-Verlag
- W. Hess, Pitch Determination of Speech Signals. Berlin, Germany: Springer-Verlag, 1983.
- (1983) Pitch Determination of Speech Signals
- Hess, W.¹

11
- 4644265990
- Monaural speech segregation based on pitch tracking and amplitude modulation
- Sep.
- G. Hu and D. L. Wang “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Netw., vol. 15, no. 5, pp. 1135–1150, Sep. 2004.
- (2004) IEEE Trans. Neural Netw. , vol.15 , Issue.5 , pp. 1135-1150
- Hu, G.¹ Wang, D.L.²

12
- 77955695149
- A tandem algorithm for pitch estimation and voiced speech segregation
- Nov.
- G. Hu and D. L. Wang “A tandem algorithm for pitch estimation and voiced speech segregation,” IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2067–2079, Nov. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2067-2079
- Hu, G.¹ Wang, D.L.²

13
- 85008062922
- Upper Saddle River, NJ: Prentice Hall
- X. Huang, A. Acero, and H. Hon, Spoken Language Processing. Upper Saddle River, NJ: Prentice Hall, 2001.
- (2001) Spoken Language Processing
- Huang, X.¹ Acero, A.² Hon, H.³

14
- 65249103478
- A supervised learning approach to monaural segregation of reverberant speech
- May
- Z. Jin and D. L. Wang “A supervised learning approach to monaural segregation of reverberant speech,” IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 625–638, May 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.4 , pp. 625-638
- Jin, Z.¹ Wang, D.L.²

15
- 39649094860
- Multipitch analysis of polyphonic music and speech signals using an auditory model
- Feb.
- A. Klapuri “Multipitch analysis of polyphonic music and speech signals using an auditory model,” IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 2, pp. 255–266, Feb. 2008.
- (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.2 , pp. 255-266
- Klapuri, A.¹

16
- 84880729403
- [Online]. Available: http://www.eric-lehmann.com/ism_code. html
- E. Lehmann, “Image-source method: Matlab code implementation,” 2008 [Online]. Available: http://www.eric-lehmann.com/ism_code. html
- (2008) Image-source method: Matlab code implementation
- Lehmann, E.¹

17
- 0016601945
- A semiautomatic pitch detector (SAPD)
- Dec.
- C. A. McGonegal, L. R. Rabiner, and A. E. Rosenberg “A semiautomatic pitch detector (SAPD),” IEEE Trans. Audio, Speech, Signal Process., vol. ASSP-23, no. 6, pp. 570–574, Dec. 1975.
- (1975) IEEE Trans. Audio, Speech, Signal Process. , vol.ASSP-23 , Issue.6 , pp. 570-574
- McGonegal, C.A.¹ Rabiner, L.R.² Rosenberg, A.E.³

18
- 0023944462
- Simulation of auditory-neural transduction: Further studies
- R. Meddis “Simulation of auditory-neural transduction: Further studies,” J. Acoust. Soc. Amer., vol. 83, pp. 1056–1063, 1988.
- (1988) J. Acoust. Soc. Amer. , vol.83 , pp. 1056-1063
- Meddis, R.¹

19
- 0030846123
- A unitary model of pitch perception
- sR. Meddis and L. P. O'Mard “A unitary model of pitch perception,” J. Acoust. Soc. Amer., vol. 102, pp. 1811–1820, 1997.
- (1997) J. Acoust. Soc. Amer. , vol.102 , pp. 1811-1820
- Meddis, R.¹ O'Mard, L.P.²

20
- 0141624530
- An efficient auditory filterbank based on the gammatone function
- Cambridge, U.K., APU Rep. 2341
- R. D. Patterson, I. Nimmo-Smith, J. Holdsworth, and P. Rice, “An efficient auditory filterbank based on the gammatone function,” in Appl. Psychol. Unit, Cambridge, U.K., 1988, APU Rep. 2341.
- (1988) Appl. Psychol. Unit
- Patterson, R.D.¹ Nimmo-Smith, I.² Holdsworth, J.³ Rice, P.⁴

21
- 4544369752
- Extraction of pitch in adverse conditions
- S. R. M. Prasanna and B. Yegnanarayana, “Extraction of pitch in adverse conditions,” in Proc. IEEE ICASSP, 2004, pp. 109–112.
- (2004) Proc. IEEE ICASSP , pp. 109-112
- Prasanna, S.R.M.¹ Yegnanarayana, B.²

22
- 0031124228
- A pitch determination and voice/unvoiced decision algorithm for noisy speech
- J. Rouat, Y. C. Liu, and D. Morissette, “A pitch determination and voice/unvoiced decision algorithm for noisy speech,” Speech Commun., pp. 191–207, 1997.
- (1997) Speech Commun. , pp. 191-207
- Rouat, J.¹ Liu, Y.C.² Morissette, D.³

23
- 50249167077
- Single and multiple F0 contour estimation through parametric spectrogram modeling of speech in noisy environments
- May
- J. L. Roux, H. Kameoka, N. Ono, A. de Cheveigne, and S. Sagayama “Single and multiple F0 contour estimation through parametric spectrogram modeling of speech in noisy environments,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1135–1145, May 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.4 , pp. 1135-1145
- Roux, J.L.¹ Kameoka, H.² Ono, N.³ de Cheveigne, A.⁴ Sagayama, S.⁵

24
- 44649176982
- Reverberation challenges the temporal representation of the pitch of complex sounds
- M. Sayles and I. M. Winter “Reverberation challenges the temporal representation of the pitch of complex sounds,” Neuron, vol. 58, pp. 789–801, 2008.
- (2008) Neuron , vol.58 , pp. 789-801
- Sayles, M.¹ Winter, I.M.²

25
- 0022341184
- Speech processing in the auditory system I: The representation of speech sounds in the responses of the auditory nerve
- S. A. Shamma “Speech processing in the auditory system I: The representation of speech sounds in the responses of the auditory nerve,” J. Acoust. Soc. Amer., vol. 78, pp. 1613–1621, 1985.
- (1985) J. Acoust. Soc. Amer. , vol.78 , pp. 1613-1621
- Shamma, S.A.¹

26
- 0032678076
- Hidden Markov models based on multi-space probability distribution for pitch pattern modeling
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Hidden Markov models based on multi-space probability distribution for pitch pattern modeling,” in Proc. IEEE ICASSP, 1999, pp. 229–232.
- (1999) Proc. IEEE ICASSP , pp. 229-232
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

27
- 0034319894
- A computationally efficient multipitch analysis model
- Nov.
- T. Tolonen and M. Karjalainen “A computationally efficient multipitch analysis model,” IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 708–716, Nov. 2000.
- (2000) IEEE Trans. Speech Audio Process. , vol.8 , Issue.6 , pp. 708-716
- Tolonen, T.¹ Karjalainen, M.²

28
- 4344685385
- An improved method based on the MTF concept for restoring the power envelope from a reverberant signal
- M. Unoki, M. Furukawa, K. Sakata, and M. Akagi “An improved method based on the MTF concept for restoring the power envelope from a reverberant signal,” Acoust. Sci. Technol., vol. 25, pp. 232–242, 2004.
- (2004) Acoust. Sci. Technol. , vol.25 , pp. 232-242
- Unoki, M.¹ Furukawa, M.² Sakata, K.³ Akagi, M.⁴

29
- 0027623210
- Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- A. Varga and H. J. M. Steeneken “Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems,” Speech Commun., vol. 12, pp. 247–251, 1993.
- (1993) Speech Commun. , vol.12 , pp. 247-251
- Varga, A.¹ Steeneken, H.J.M.²

30
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- P. Divenyi, Ed. Norwell, MA: Kluwer
- D. L. Wang, “On ideal binary mask as the computational goal of auditory scene analysis,” in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181–197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

31
- 82255178542
- Hoboken, NJ: Wiley-IEEE Press
- D. L. Wang and G. J. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Hoboken, NJ: Wiley-IEEE Press, 2006.
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms and Applications
- Wang, D.L.¹ Brown, G.J.²

32
- 0037767686
- A multipitch tracking algorithm for noisy speech
- May
- M. Wu, D. L. Wang, and G. J. Brown, “A multipitch tracking algorithm for noisy speech,” IEEE Trans. Speech Audio Process., vol. 11, no. 3, pp. 229–241, May 2003.
- (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.3 , pp. 229-241
- Wu, M.¹ Wang, D.L.² Brown, G.J.³

33
- 0004256316
- San Diego, CA: Academic
- W. A. Yost, Fundamentals of Hearing. San Diego, CA: Academic, 2000.
- (2000) Fundamentals of Hearing
- Yost, W.A.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.