SCOPUS 정보 검색 플랫폼

Journal of the Acoustical Society of America

Volumn 120, Issue 1, 2006, Pages 443-452

Speech feature extraction method using subband-based periodicity and nonperiodicity decomposition

(4) Ishizuka, Kentaro a Nakatani, Tomohiro a Minami, Yasuhiro a Miyazaki, Noboru b

a NTT Corporation (Japan)

b NTT CORPORATION (Japan)

Author keywords

[No Author keywords available]

Indexed keywords

ACOUSTIC NOISE; DECOMPOSITION; SENSORY PERCEPTION; SPEECH PROCESSING; SPEECH RECOGNITION; SPEECH SYNTHESIS;

AUDITORY COMB FILTERING; NOISE SPECTRUM; PARALLEL DISTRIBUTED PROCESSING; SPEECH PERCEPTION;

FEATURE EXTRACTION;

ARTICLE; HUMAN; HUMAN EXPERIMENT; MATHEMATICAL MODEL; NOISE REDUCTION; NORMAL HUMAN; PERIODICITY; PRIORITY JOURNAL; PSYCHOPHYSICS; SIGNAL NOISE RATIO; SPEECH DISCRIMINATION; SPEECH PERCEPTION;

ALGORITHMS; FEMALE; HUMANS; MALE; MODELS, BIOLOGICAL; NORMAL DISTRIBUTION; PERIODICITY; PSYCHOACOUSTICS; SPEECH; SPEECH PERCEPTION; SPEECH PRODUCTION MEASUREMENT; TIME FACTORS;

EID: 33745738849 PISSN: 00014966 EISSN: None Source Type: Journal
DOI: 10.1121/1.2205131 Document Type: Article

Times cited : (9)

References (48)

1
- 0030037151
- Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
- Aikawa, K., Singer, H., Kawahara, H., and Tohkura, Y. (1996). "Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition," J. Acoust. Soc. Am. 100, 603-614.
- (1996) J. Acoust. Soc. Am. , vol.100 , pp. 603-614
- Aikawa, K.¹ Singer, H.² Kawahara, H.³ Tohkura, Y.⁴

2
- 0036649309
- Robust auditory-based speech processing using the average localized synchrony detection
- Ali, A. M., Spiegel, J. V., and Mueller, P. (2002). "Robust auditory-based speech processing using the average localized synchrony detection," IEEE Trans. Speech Audio Process. 10, 279-292.
- (2002) IEEE Trans. Speech Audio Process. , vol.10 , pp. 279-292
- Ali, A.M.¹ Spiegel, J.V.² Mueller, P.³

3
- 0016067897
- Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
- Atal, B. S. (1974). "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. Acoust. Soc. Am. 55, 1304-1312.
- (1974) J. Acoust. Soc. Am. , vol.55 , pp. 1304-1312
- Atal, B.S.¹

4
- 4544310318
- Can back-ends be more robust than front-ends? Investigation over the Aurora-2 database
- Bernard, A., Gong, Y., and Cui, X. (2004). "Can back-ends be more robust than front-ends? Investigation over the Aurora-2 database," Proc. of the 29th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vol. 1, pp. 1025-1028.
- (2004) Proc. of the 29th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , vol.1 , pp. 1025-1028
- Bernard, A.¹ Gong, Y.² Cui, X.³

5
- 0002161311
- The quefrency analysis of time series for echoes
- edited by M. Rosenblatt (Wiley, New York)
- Bogert, B., Healy, M., and Tukey, J. (1963). "The quefrency analysis of time series for echoes," in Proc. Symp. on Time Series Analysis, edited by M. Rosenblatt (Wiley, New York), pp. 209-243.
- (1963) Proc. Symp. on Time Series Analysis , pp. 209-243
- Bogert, B.¹ Healy, M.² Tukey, J.³

6
- 0018455310
- Suppression of acoustic noise in speech using spectral subtraction
- Boll, S. (1979). "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process. ASSP-27, 113-210.
- (1979) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-27 , pp. 113-210
- Boll, S.¹

7
- 0001698589
- Auditory grouping
- edited by B. C. J. Moore (Academic, San Diego)
- Darwin, C. J., and Carlyon, R.P. (1995). "Auditory grouping," in Hearing, edited by B. C. J. Moore (Academic, San Diego), pp. 387-424.
- (1995) Hearing , pp. 387-424
- Darwin, C.J.¹ Carlyon, R.P.²

8
- 0019053271
- Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
- Davis, S. B., and Mermelstein, P. (1980). "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process. ASSP-28, 357-366.
- (1980) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-28 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

9
- 0027298253
- Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing
- de Cheveigné, A. (1993). "Separation of concurrent harmonic sounds: Fundamental frequency estimation and a time-domain cancellation model of auditory processing," J. Acoust. Soc. Am. 93, 3271-3290.
- (1993) J. Acoust. Soc. Am. , vol.93 , pp. 3271-3290
- De Cheveigné, A.¹

10
- 0031009840
- Concurrent vowel identification. III. A neural model of harmonic interference cancellation
- de Cheveigné, A. (1997). "Concurrent vowel identification. III. A neural model of harmonic interference cancellation," J. Acoust. Soc. Am. 101, 2857-2865.
- (1997) J. Acoust. Soc. Am. , vol.101 , pp. 2857-2865
- De Cheveigné, A.¹

11
- 0011902657
- Acoustic features and distance measure to reduce vulnerability of ASR performance due to the presence of a communication channel and/or background noise
- edited by J.-C. Junqua and G. van Noord (Kluwer Academic, Dordrecht, Netherlands)
- de Veth, J., Cranen, B., and Boves, L. (2001). "Acoustic features and distance measure to reduce vulnerability of ASR performance due to the presence of a communication channel and/or background noise," in Robustness in Language and Speech Technology, edited by J.-C. Junqua and G. van Noord (Kluwer Academic, Dordrecht, Netherlands), pp. 9-45.
- (2001) Robustness in Language and Speech Technology , pp. 9-45
- De Veth, J.¹ Cranen, B.² Boves, L.³

12
- 85135375893
- HMM recognition in noise using parallel model combination
- Gales, M., and Young, S. (1993). "HMM recognition in noise using parallel model combination," Proc. of the 3rd European Conference on Speech Communication and Technology (Eurospeech), pp. 837-840.
- (1993) Proc. of the 3rd European Conference on Speech Communication and Technology (Eurospeech) , pp. 837-840
- Gales, M.¹ Young, S.²

13
- 84871624012
- Auditory model based speech processing
- Gao, Y., Huang, T., Chen, S., and Haton, J.-P. (1992). "Auditory model based speech processing," Proc. of the 2nd International Conference on Spoken Language Processing (ICSLP), pp. 73-76.
- (1992) Proc. of the 2nd International Conference on Spoken Language Processing (ICSLP) , pp. 73-76
- Gao, Y.¹ Huang, T.² Chen, S.³ Haton, J.-P.⁴

14
- 84928838192
- Temporal non-place information in the auditory nerve firing patterns as a front-end for speech recognition in a noisy environment
- Ghitza, O. (1988). "Temporal non-place information in the auditory nerve firing patterns as a front-end for speech recognition in a noisy environment," J. Phonetics 16, 109-124.
- (1988) J. Phonetics , vol.16 , pp. 109-124
- Ghitza, O.¹

15
- 0028312802
- Auditory models and human performance in tasks related to speech coding and speech recognition
- Ghitza, O. (1994). "Auditory models and human performance in tasks related to speech coding and speech recognition," IEEE Trans. Speech Audio Process. 2, 115-132.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , pp. 115-132
- Ghitza, O.¹

16
- 0029288202
- Speech recognition in noisy environments: A survey
- Gong, Y. (1995). "Speech recognition in noisy environments: A survey," Speech Commun. 16, 261-291.
- (1995) Speech Commun. , vol.16 , pp. 261-291
- Gong, Y.¹

17
- 33745742672
- Speech processing in the auditory system: An overview
- edited by S. Greenberg, W. A. Ainsworth, A. N. Popper, and R. R. Fay (Springer-Verlag, New York)
- Greenberg, S. (2004). "Speech processing in the auditory system: An overview," in Speech Processing in the Auditory System, edited by S. Greenberg, W. A. Ainsworth, A. N. Popper, and R. R. Fay (Springer-Verlag, New York).
- (2004) Speech Processing in the Auditory System
- Greenberg, S.¹

18
- 0025041264
- Perceptual Linear Predictive (PLP) analysis of speech
- Hermansky, H. (1990). "Perceptual Linear Predictive (PLP) analysis of speech," J. Acoust. Soc. Am. 87, 1738-1752.
- (1990) J. Acoust. Soc. Am. , vol.87 , pp. 1738-1752
- Hermansky, H.¹

19
- 0022112505
- Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain
- Hermansky, H., Hanson, B., and Wakita, H. (1985). "Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain," Speech Commun. 4, 181-187.
- (1985) Speech Commun. , vol.4 , pp. 181-187
- Hermansky, H.¹ Hanson, B.² Wakita, H.³

20
- 33745741500
- Short-term analysis pitch determination
- Springer-Verlag, New York
- Hess, W. (1983). "Short-term analysis pitch determination," in Pitch Determination of Speech Signals (Springer-Verlag, New York).
- (1983) Pitch Determination of Speech Signals
- Hess, W.¹

21
- 0038669544
- The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Hirsh, H. G., and Pearce, D. (2000). "The AURORA experimental framework for the performance evaluation of speech recognition systems under noisy conditions," Proc. of the ISCA Tutorial and Research Workshop on Automatic Speech Recognition (ISCA ITRW ASR), pp. 181-188.
- (2000) Proc. of the ISCA Tutorial and Research Workshop on Automatic Speech Recognition (ISCA ITRW ASR) , pp. 181-188
- Hirsh, H.G.¹ Pearce, D.²

22
- 84870238333
- Speech signal representation
- Prentice-Hall, Englewood Cliffs, NJ
- Huang, X., Acero, A., and Hon, H. (2001). "Speech signal representation," in Spoken Language Processing (Prentice-Hall, Englewood Cliffs, NJ).
- (2001) Spoken Language Processing
- Huang, X.¹ Acero, A.² Hon, H.³

23
- 85009122855
- Effect of F0 fluctuation and amplitude modulation of natural vowels on vowel identification in noisy environments
- Ishizuka, K., and Aikawa, K. (2002). "Effect of F0 fluctuation and amplitude modulation of natural vowels on vowel identification in noisy environments," Proc. of the 7th International Conference on Spoken Language Processing (ICSLP), pp. 1633-1636.
- (2002) Proc. of the 7th International Conference on Spoken Language Processing (ICSLP) , pp. 1633-1636
- Ishizuka, K.¹ Aikawa, K.²

24
- 0016467604
- Minimum prediction residual principle applied to speech recognition
- Itakura, F. (1975). "Minimum prediction residual principle applied to speech recognition," IEEE Trans. Acoust., Speech, Signal Process. ASSP-23, 67-72.
- (1975) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-23 , pp. 67-72
- Itakura, F.¹

25
- 0035472456
- Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech
- Jackson, P. J. B., and Shadle, C. H. (2001). "Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech," IEEE Trans. Speech Audio Process. 9, 713-726.
- (2001) IEEE Trans. Speech Audio Process. , vol.9 , pp. 713-726
- Jackson, P.J.B.¹ Shadle, C.H.²

26
- 85009168054
- Covariation and weighting of harmonically decomposed streams for ASR
- Jackson, P. J. B., Moreno, D. M., Russell, M. J., and Hernando, J. (2003). "Covariation and weighting of harmonically decomposed streams for ASR," Proc. of 8th European Conference on Speech Communication and Technology (Eurospeech), pp. 2321-2324.
- (2003) Proc. of 8th European Conference on Speech Communication and Technology (Eurospeech) , pp. 2321-2324
- Jackson, P.J.B.¹ Moreno, D.M.² Russell, M.J.³ Hernando, J.⁴

27
- 0019142263
- The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones
- Johnson, D. H. (1980). "The relationship between spike rate and synchrony in responses of auditory-nerve fibers to single tones," J. Acoust. Soc. Am. 68, 1115-1122.
- (1980) J. Acoust. Soc. Am. , vol.68 , pp. 1115-1122
- Johnson, D.H.¹

28
- 0028996914
- Robust feature extraction using SBCOR analysis
- Kajita, S., and Itakura, F. (1995). "Robust feature extraction using SBCOR analysis," Proc. of the 20th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 421-424.
- (1995) Proc. of the 20th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 421-424
- Kajita, S.¹ Itakura, F.²

29
- 0032785783
- Auditory processing of speech signals for robust speech recognition in real-world noisy environments
- Kim, D. S., Lee, S. Y., and Kil, R. M. (1999). "Auditory processing of speech signals for robust speech recognition in real-world noisy environments," IEEE Trans. Speech Audio Process. 7, 55-69.
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , pp. 55-69
- Kim, D.S.¹ Lee, S.Y.² Kil, R.M.³

30
- 0024879192
- Filtering of colored noise for speech enhancement and coding
- Koo, B., Gibson, J., and Gray, S. (1989). "Filtering of colored noise for speech enhancement and coding," Proc. of the 14th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 349-352.
- (1989) Proc. of the 14th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 349-352
- Koo, B.¹ Gibson, J.² Gray, S.³

31
- 0026142334
- A study on speaker adaptation of the parameters of continuous density hidden Markov models
- Lee, C.-H., Lin, C.-H., and Juang, B.-H. (1991). "A study on speaker adaptation of the parameters of continuous density hidden Markov models," IEEE Trans. Signal Process. 39, 806-814.
- (1991) IEEE Trans. Signal Process. , vol.39 , pp. 806-814
- Lee, C.-H.¹ Lin, C.-H.² Juang, B.-H.³

32
- 85009115888
- An auditory system-based feature for robust speech recognition
- Li, Q., Soong, F. K., and Siohan, O. (2001). "An auditory system-based feature for robust speech recognition," Proc. of the 7th European Conference on Speech Communication and Technology (Eurospeech), pp. 619-621.
- (2001) Proc. of the 7th European Conference on Speech Communication and Technology (Eurospeech) , pp. 619-621
- Li, Q.¹ Soong, F.K.² Siohan, O.³

33
- 0017980972
- All-pole modeling of degraded speech
- Lim, J., and Oppenheim, A. (1978). "All-pole modeling of degraded speech," IEEE Trans. Acoust., Speech, Signal Process. ASSP-26, 197-210.
- (1978) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-26 , pp. 197-210
- Lim, J.¹ Oppenheim, A.²

34
- 0026882842
- Experiments with a Nonlinear Spectral Subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars
- Lockwood, P., and Boudy, J. (1992). "Experiments with a Nonlinear Spectral Subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars," Speech Commun. 11, 215-228.
- (1992) Speech Commun. , vol.11 , pp. 215-228
- Lockwood, P.¹ Boudy, J.²

35
- 0028996915
- A maximum likelihood procedure for a universal adaptation method
- Minami, Y., and Furui, S. (1995). "A maximum likelihood procedure for a universal adaptation method," Proc. of the 20th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 129-132.
- (1995) Proc. of the 20th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 129-132
- Minami, Y.¹ Furui, S.²

36
- 0020816083
- Suggested formula for calculating auditory-filter bandwidths and excitation patterns
- Moore, B. C. J., and Glasberg, B. R. (1983). "Suggested formula for calculating auditory-filter bandwidths and excitation patterns," J. Acoust. Soc. Am. 74, 750-753.
- (1983) J. Acoust. Soc. Am. , vol.74 , pp. 750-753
- Moore, B.C.J.¹ Glasberg, B.R.²

37
- 33646799204
- Data collection and evaluation of AURORA-2 Japanese corpus
- Nakamura, S., Yamamoto, K., Takeda, K., Kuroiwa, S., Kitaoka, N., Yamada, T., Mizumachi, M., Nishiura, T., Fujimoto, M., Saso, A., and Endo, T. (2003). "Data collection and evaluation of AURORA-2 Japanese corpus," Proc. of the 8th IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 619-623.
- (2003) Proc. of the 8th IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) , pp. 619-623
- Nakamura, S.¹ Yamamoto, K.² Takeda, K.³ Kuroiwa, S.⁴ Kitaoka, N.⁵ Yamada, T.⁶ Mizumachi, M.⁷ Nishiura, T.⁸ Fujimoto, M.⁹ Saso, A.¹⁰ Endo, T.¹¹

38
- 24144494616
- AURORA-2J: An evaluation framework for Japanese noisy speech recognition
- Nakamura, S., Takeda, K., Yamamoto, K., Yamada, T., Kuroiwa, S., Kitaoka, N., Nishiura, T., Sasou, A., Mizumachi, M., Miyajima, C., Fujimoto, M., and Endo, T. (2005). "AURORA-2J: An evaluation framework for Japanese noisy speech recognition," IEICE Trans. Inf. Syst. E88-D, 535-544.
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , pp. 535-544
- Nakamura, S.¹ Takeda, K.² Yamamoto, K.³ Yamada, T.⁴ Kuroiwa, S.⁵ Kitaoka, N.⁶ Nishiura, T.⁷ Sasou, A.⁸ Mizumachi, M.⁹ Miyajima, C.¹⁰ Fujimoto, M.¹¹ Endo, T.¹²

39
- 0016938506
- Auditory filter shapes derived with noise stimuli
- Patterson, R. D. (1976). "Auditory filter shapes derived with noise stimuli," J. Acoust. Soc. Am. 59, 640-654.
- (1976) J. Acoust. Soc. Am. , vol.59 , pp. 640-654
- Patterson, R.D.¹

40
- 0001050571
- Auditory filters and excitation patterns as representations of frequency resolution
- edited by B. C. J. Moore (Academic, London)
- Patterson, R. D., and Moore, B. C. J. (1986). "Auditory filters and excitation patterns as representations of frequency resolution," in Frequency Selectivity in Hearing, edited by B. C. J. Moore (Academic, London), pp. 123-177.
- (1986) Frequency Selectivity in Hearing , pp. 123-177
- Patterson, R.D.¹ Moore, B.C.J.²

41
- 84987702417
- The AURORA experimental framework for the performance evaluation of speech recognition systems under noise conditions
- Pearce, D., and Hirsh, H. G. (2000). "The AURORA experimental framework for the performance evaluation of speech recognition systems under noise conditions," Proc. of the 6th International Conference on Spoken Language Processing (ICSLP), Vol. 4, pp. 29-32.
- (2000) Proc. of the 6th International Conference on Spoken Language Processing (ICSLP) , vol.4 , pp. 29-32
- Pearce, D.¹ Hirsh, H.G.²

42
- 0017367712
- On the use of autocorrelation analysis for pitch detection
- Rabiner, L. R. (1977). "On the use of autocorrelation analysis for pitch detection," IEEE Trans. Acoust., Speech, Signal Process. 25, 24-33.
- (1977) IEEE Trans. Acoust., Speech, Signal Process. , vol.25 , pp. 24-33
- Rabiner, L.R.¹

43
- 0015084215
- Some effects of the stimulus intensity on response of auditory nerve fibers in the squirrel monkey
- Rose, J. E., Hind, J. E., Anderson, D. J., and Brugge, J. F. (1971). "Some effects of the stimulus intensity on response of auditory nerve fibers in the squirrel monkey," J. Neurophysiol. 34, 685-699.
- (1971) J. Neurophysiol. , vol.34 , pp. 685-699
- Rose, J.E.¹ Hind, J.E.² Anderson, D.J.³ Brugge, J.F.⁴

44
- 84928837806
- A joint synchrony/mean-rate model of auditory speech processing
- Seneff, S. (1988). "A joint synchrony/mean-rate model of auditory speech processing," J. Phonetics 16, 55-76.
- (1988) J. Phonetics , vol.16 , pp. 55-76
- Seneff, S.¹

45
- 0032123832
- A parametric formulation of the generalized spectral subtraction method
- Sim, B. L., Tong, Y. C., Chang, J. S., and Tan, C. T. (1998). "A parametric formulation of the generalized spectral subtraction method," IEEE Trans. Speech Audio Process. 6, 328-337.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , pp. 328-337
- Sim, B.L.¹ Tong, Y.C.² Chang, J.S.³ Tan, C.T.⁴

46
- 85009217371
- Signal and feature compensation methods for robust speech recognition
- edited by G. M. Davis (CRC, Boca Raton, FL)
- Singh, R., Stern, R. M., and Raj, B. (2002). "Signal and feature compensation methods for robust speech recognition," in Noise Reduction in Speech Applications, edited by G. M. Davis (CRC, Boca Raton, FL), pp. 219-244.
- (2002) Noise Reduction in Speech Applications , pp. 219-244
- Singh, R.¹ Stern, R.M.² Raj, B.³

47
- 0025681008
- Hidden Markov model decomposition of speech and noise
- Varga, A., and Moore, R. (1990). "Hidden Markov model decomposition of speech and noise," Proc. of the 15th International Conference on Acoustic, Speech and Signal Processing (ICASSP), pp. 845-848.
- (1990) Proc. of the 15th International Conference on Acoustic, Speech and Signal Processing (ICASSP) , pp. 845-848
- Varga, A.¹ Moore, R.²

48
- 0027151535
- Noisy speech recognition based on HMMs, Wiener filters and re-evaluation of most likely candidates
- Vaseghi, S., and Milner, B. (1993). "Noisy speech recognition based on HMMs, Wiener filters and re-evaluation of most likely candidates," Proc. of the 18th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 437-440.
- (1993) Proc. of the 18th International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 437-440
- Vaseghi, S.¹ Milner, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.