메뉴 건너뛰기




Volumn 17, Issue 3, 2007, Pages 578-616

Short-time phase spectrum in speech processing: A review and some experimental results

Author keywords

Automatic speech recognition; Magnitude spectrum; Overlap add procedure; Phase spectrum; Short time Fourier transform; Speech perception

Indexed keywords

FOURIER TRANSFORMS; INTELLIGENT CONTROL; SPECTRUM ANALYSIS; SPEECH RECOGNITION; SPEECH SYNTHESIS;

EID: 34047258851     PISSN: 10512004     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.dsp.2006.06.007     Document Type: Article
Times cited : (112)

References (71)
  • 1
    • 0018642851 scopus 로고
    • Enhancement and bandwidth compression of noisy speech
    • Lim J.S., and Oppenheim A.V. Enhancement and bandwidth compression of noisy speech. Proc. IEEE 67 12 (1979) 1586-1604
    • (1979) Proc. IEEE , vol.67 , Issue.12 , pp. 1586-1604
    • Lim, J.S.1    Oppenheim, A.V.2
  • 3
    • 0027659197 scopus 로고
    • Signal modeling techniques in speech recognition
    • Picone J.W. Signal modeling techniques in speech recognition. Proc. IEEE 81 9 (1993) 1215-1247
    • (1993) Proc. IEEE , vol.81 , Issue.9 , pp. 1215-1247
    • Picone, J.W.1
  • 4
    • 0016555855 scopus 로고
    • Models of hearing
    • Schroeder M.R. Models of hearing. Proc. IEEE 63 (1975) 1332-1350
    • (1975) Proc. IEEE , vol.63 , pp. 1332-1350
    • Schroeder, M.R.1
  • 5
    • 0019569248 scopus 로고
    • The importance of phase in signals
    • Oppenheim A.V., and Lim J.S. The importance of phase in signals. Proc. IEEE 69 (1981) 529-541
    • (1981) Proc. IEEE , vol.69 , pp. 529-541
    • Oppenheim, A.V.1    Lim, J.S.2
  • 6
    • 0031220487 scopus 로고    scopus 로고
    • Effects of phase on the perception of intervocalic stop consonants
    • Liu L., He J., and Palm G. Effects of phase on the perception of intervocalic stop consonants. Speech Commun. 22 4 (1997) 403-417
    • (1997) Speech Commun. , vol.22 , Issue.4 , pp. 403-417
    • Liu, L.1    He, J.2    Palm, G.3
  • 8
    • 0016962212 scopus 로고
    • Implementation of the digital phase vocoder using the fast Fourier transform
    • Portnoff M.R. Implementation of the digital phase vocoder using the fast Fourier transform. IEEE Trans. Acoust. Speech Signal Process. ASSP-24 3 (1976) 243-248
    • (1976) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-24 , Issue.3 , pp. 243-248
    • Portnoff, M.R.1
  • 9
    • 0032638660 scopus 로고    scopus 로고
    • H. Pobloth, W.B. Kleijn, On phase perception in speech, in: Proc. International Conf. Acoustics, Speech, Signal Processing, 1999, pp. 29-32
  • 10
    • 0033676787 scopus 로고    scopus 로고
    • D.S. Kim, Perceptual phase redundancy in speech, in: Proc. International Conf. Acoustics, Speech, Signal Processing, 2000, pp. 1383-1386
  • 11
    • 0036286580 scopus 로고    scopus 로고
    • The effect of group delay spectrum on timbre
    • Banno H., Takeda K., and Itakura F. The effect of group delay spectrum on timbre. Acoust. Sci. Tech. 23 (2002) 1-9
    • (2002) Acoust. Sci. Tech. , vol.23 , pp. 1-9
    • Banno, H.1    Takeda, K.2    Itakura, F.3
  • 12
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous frequency based F0 extraction: Possible role of a repetitive structure in sounds
    • Kawahara H., Katsuse I.M., and Cheveigne A.D. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous frequency based F0 extraction: Possible role of a repetitive structure in sounds. Speech Commun. 27 (1999) 187-207
    • (1999) Speech Commun. , vol.27 , pp. 187-207
    • Kawahara, H.1    Katsuse, I.M.2    Cheveigne, A.D.3
  • 13
    • 0024965810 scopus 로고
    • Formant extraction from phase using weighted group delay function
    • Murthy H.A., Murthy K.V.M., and Yegnanarayana B. Formant extraction from phase using weighted group delay function. Electron. Lett. 25 23 (1989) 1609-1611
    • (1989) Electron. Lett. , vol.25 , Issue.23 , pp. 1609-1611
    • Murthy, H.A.1    Murthy, K.V.M.2    Yegnanarayana, B.3
  • 14
    • 0026923568 scopus 로고
    • Significance of group delay functions in spectrum estimation
    • Yegnanarayana B., and Murthy H.A. Significance of group delay functions in spectrum estimation. IEEE Trans. Signal Process. 40 9 (1992) 2281-2289
    • (1992) IEEE Trans. Signal Process. , vol.40 , Issue.9 , pp. 2281-2289
    • Yegnanarayana, B.1    Murthy, H.A.2
  • 15
    • 84980066893 scopus 로고
    • Uber die Definition des Tones, nebst daran geknupfter Theorie der Sirene und ahnlicher tonbildender Vorrichtungen
    • Ohm G.S. Uber die Definition des Tones, nebst daran geknupfter Theorie der Sirene und ahnlicher tonbildender Vorrichtungen. Ann. Phys. Chem. 59 (1843) 513-565
    • (1843) Ann. Phys. Chem. , vol.59 , pp. 513-565
    • Ohm, G.S.1
  • 16
    • 0004220068 scopus 로고
    • (English translation by A.J. Ellis), Longmans, Green and Co., London (original work published 1875)
    • von Helmholtz H.L.F. On the Sensations of Tone. (English translation by A.J. Ellis) (1912), Longmans, Green and Co., London (original work published 1875)
    • (1912) On the Sensations of Tone
    • von Helmholtz, H.L.F.1
  • 17
    • 0006927594 scopus 로고
    • A note on phase distortion in hearing
    • de Boer E. A note on phase distortion in hearing. Acoustica 11 (1961) 182
    • (1961) Acoustica , vol.11 , pp. 182
    • de Boer, E.1
  • 18
    • 13544259544 scopus 로고    scopus 로고
    • On the usefulness of STFT phase spectrum in human listening tests
    • Paliwal K.K., and Alsteris L.D. On the usefulness of STFT phase spectrum in human listening tests. Speech Commun. 45 2 (2005) 153-170
    • (2005) Speech Commun. , vol.45 , Issue.2 , pp. 153-170
    • Paliwal, K.K.1    Alsteris, L.D.2
  • 19
    • 0141480080 scopus 로고    scopus 로고
    • H.A. Murthy, V. Gadde, The modified group delay function and its application to phoneme recognition, in: Proc. International Conf. Acoustics, Speech, Signal Processing, April 2003, pp. 68-71
  • 20
    • 0017552006 scopus 로고
    • A unified approach to short-time Fourier analysis and synthesis
    • Allen J.B., and Rabiner L.R. A unified approach to short-time Fourier analysis and synthesis. Proc. IEEE 65 11 (1977) 1558-1564
    • (1977) Proc. IEEE , vol.65 , Issue.11 , pp. 1558-1564
    • Allen, J.B.1    Rabiner, L.R.2
  • 21
    • 0017626192 scopus 로고
    • Short-term spectral analysis, synthesis, and modification by discrete Fourier transform
    • Allen J.B. Short-term spectral analysis, synthesis, and modification by discrete Fourier transform. IEEE Trans. Acoust. Speech Signal Process. ASSP-25 3 (1977) 235-238
    • (1977) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-25 , Issue.3 , pp. 235-238
    • Allen, J.B.1
  • 22
    • 0018983179 scopus 로고
    • A weighted overlap-add method of short-time Fourier analysis/synthesis
    • Crochiere R.E. A weighted overlap-add method of short-time Fourier analysis/synthesis. IEEE Trans. Acoust. Speech Signal Process. ASSP-28 1 (1980) 99-102
    • (1980) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-28 , Issue.1 , pp. 99-102
    • Crochiere, R.E.1
  • 23
    • 0021407831 scopus 로고
    • Signal estimation from modified short-time Fourier transform
    • Griffin D.W., and Lim J.S. Signal estimation from modified short-time Fourier transform. IEEE Trans. Acoust. Speech Signal Process. ASSP-32 2 (1984) 236-243
    • (1984) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-32 , Issue.2 , pp. 236-243
    • Griffin, D.W.1    Lim, J.S.2
  • 25
    • 0018298311 scopus 로고    scopus 로고
    • M.R. Portnoff, Magnitude-phase relationships for short-time Fourier transforms based on Gaussian analysis windows, in: Proc. International Conf. Acoustics, Speech, Signal Processing, April 1979, pp. 186-189
  • 26
    • 0018982701 scopus 로고
    • Time-frequency representation of digital signals and systems based on short-time Fourier analysis
    • Portnoff M.R. Time-frequency representation of digital signals and systems based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. ASSP-28 1 (1980) 55-69
    • (1980) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-28 , Issue.1 , pp. 55-69
    • Portnoff, M.R.1
  • 27
    • 0019582149 scopus 로고
    • Short-time Fourier analysis of sampled speech
    • Portnoff M.R. Short-time Fourier analysis of sampled speech. IEEE Trans. Acoust. Speech Signal Process. ASSP-29 3 (1981) 364-373
    • (1981) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-29 , Issue.3 , pp. 364-373
    • Portnoff, M.R.1
  • 28
    • 0019582545 scopus 로고
    • Time-scale modification of speech based on short-time Fourier analysis
    • Portnoff M.R. Time-scale modification of speech based on short-time Fourier analysis. IEEE Trans. Acoust. Speech Signal Process. ASSP-29 3 (1981) 374-390
    • (1981) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-29 , Issue.3 , pp. 374-390
    • Portnoff, M.R.1
  • 31
    • 0015699024 scopus 로고
    • Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis
    • Schafer R.W., and Rabiner L.R. Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis. IEEE Trans. Audio Electroacoust. AU-21 (1973) 165-174
    • (1973) IEEE Trans. Audio Electroacoust. , vol.AU-21 , pp. 165-174
    • Schafer, R.W.1    Rabiner, L.R.2
  • 36
    • 0037333359 scopus 로고    scopus 로고
    • Using the matrix pencil method to solve phase unwrapping
    • Nico G., and Fortuny J. Using the matrix pencil method to solve phase unwrapping. IEEE Trans. Signal Process. 51 3 (2003)
    • (2003) IEEE Trans. Signal Process. , vol.51 , Issue.3
    • Nico, G.1    Fortuny, J.2
  • 37
    • 84866792444 scopus 로고    scopus 로고
    • L.D. Alsteris, K.K. Paliwal, ASR on speech reconstructed from short-time Fourier phase spectra, in: Proc. International Conf. Spoken Language Processing, October 2004
  • 38
    • 0035278964 scopus 로고    scopus 로고
    • Time-frequency distributions for automatic speech recognition
    • Potamianos A., and Maragos P. Time-frequency distributions for automatic speech recognition. IEEE Trans. Speech Audio Process. 9 (2001) 196-200
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , pp. 196-200
    • Potamianos, A.1    Maragos, P.2
  • 39
    • 85009224932 scopus 로고    scopus 로고
    • D. Dimitriadis, P. Maragos, Robust energy demodulation based on continuous models with application to speech recognition, in: Proc. European Conf. Speech Communication and Technology, September 2003, pp. 2853-2856
  • 40
    • 85009192384 scopus 로고    scopus 로고
    • K.K. Paliwal, B.S. Atal, Frequency-related representation of speech, in: Proc. European Conf. Speech Communication and Technology, September 2003, pp. 65-68
  • 41
    • 85009230349 scopus 로고    scopus 로고
    • Y. Wang, J. Hansen, G.K. Allu, R. Kumaresan, Average instantaneous frequency and average log envelopes for ASR with the aurora 2 database, in: Proc. European Conf. Speech Communication and Technology, September 2003, pp. 25-28
  • 42
    • 0028997045 scopus 로고    scopus 로고
    • T. Abe, T. Kobayashi, S. Imai, Harmonics tracking and pitch extraction based on instantaneous frequency, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 1995, pp. 756-759
  • 43
    • 0022907822 scopus 로고    scopus 로고
    • F.J. Charpentier, Pitch detection using the short-term phase spectrum, in: Proc. International Conf. Acoustics, Speech, Signal Processing, April 1986, pp. 113-116
  • 44
    • 85009168546 scopus 로고    scopus 로고
    • 0 estimation, in: Proc. European Conf. Speech Communication and Technology, September 2003, pp. 2313-2316
  • 45
    • 0029375490 scopus 로고
    • Determination of instants of significant excitation in speech using group delay function
    • Smits R., and Yegnanarayana B. Determination of instants of significant excitation in speech using group delay function. IEEE Trans. Speech Audio Process. 3 5 (1995) 325-333
    • (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 325-333
    • Smits, R.1    Yegnanarayana, B.2
  • 46
    • 0000668614 scopus 로고    scopus 로고
    • Robustness of group-delay based method for extraction of significant instants of excitation from speech signals
    • Murthy P.S., and Yegnanarayana B. Robustness of group-delay based method for extraction of significant instants of excitation from speech signals. IEEE Trans. Speech Audio Process. 7 6 (1999) 609-619
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.6 , pp. 609-619
    • Murthy, P.S.1    Yegnanarayana, B.2
  • 47
    • 0024879901 scopus 로고    scopus 로고
    • H.A. Murthy, K.V.M. Murthy, B. Yegnanarayana, Formant extraction from Fourier transform phase, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 1989, pp. 484-487
  • 48
    • 0024900419 scopus 로고    scopus 로고
    • G. Duncan, B. Yegnanarayana, H.A. Murthy, A nonparametric method of formant estimation using group delay spectra, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 1989, pp. 572-575
  • 49
    • 0030008906 scopus 로고    scopus 로고
    • Speech formant frequency and bandwidth tracking using multiband energy demodulation
    • Potamianos A., and Maragos P. Speech formant frequency and bandwidth tracking using multiband energy demodulation. J. Acoust. Soc. Amer. 99 6 (1996) 3795-3806
    • (1996) J. Acoust. Soc. Amer. , vol.99 , Issue.6 , pp. 3795-3806
    • Potamianos, A.1    Maragos, P.2
  • 50
    • 0022236938 scopus 로고    scopus 로고
    • D.H. Friedman, Instantaneous-frequency distribution vs time: An interpretation of the phase structure of speech, in: Proc. International Conf. Acoustics, Speech, Signal Processing, March 1985, pp. 1121-1124
  • 53
    • 0035165763 scopus 로고    scopus 로고
    • Cross-spectral methods for processing speech
    • Nelson D.J. Cross-spectral methods for processing speech. J. Acoust. Soc. Amer. 110 5 (2001) 2575-2592
    • (2001) J. Acoust. Soc. Amer. , vol.110 , Issue.5 , pp. 2575-2592
    • Nelson, D.J.1
  • 54
    • 4644266117 scopus 로고    scopus 로고
    • D.J. Nelson, Cross-spectral based formant estimation and alignment, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 2004, pp. 621-624
  • 55
    • 0028997020 scopus 로고    scopus 로고
    • B. Yegnanarayana, R. Smits, A robust method for determining instants of major excitations in voiced speech, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 1995, pp. 776-779
  • 56
    • 34047271998 scopus 로고    scopus 로고
    • L.D. Alsteris, K.K. Paliwal, Intelligibility of speech from phase spectrum, in: Proc. Microelectronic Engineering Research Conf., November 2003
  • 57
    • 4544293686 scopus 로고    scopus 로고
    • L.D. Alsteris, K.K. Paliwal, Importance of window shape for phase-only reconstruction of speech, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 2004, pp. 573-576
  • 58
    • 34047273057 scopus 로고    scopus 로고
    • L.D. Alsteris, K.K. Paliwal, Further intelligibility results from human listening tests using the short-time phase spectrum, Speech Commun., in press
  • 59
    • 0022079951 scopus 로고
    • Derivative of phase spectrum of truncated autoregressive signals
    • Reddy N.S., and Swamy M.N.S. Derivative of phase spectrum of truncated autoregressive signals. IEEE Trans. Circuits Syst. CAS-32 6 (1985) 2575-2592
    • (1985) IEEE Trans. Circuits Syst. , vol.CAS-32 , Issue.6 , pp. 2575-2592
    • Reddy, N.S.1    Swamy, M.N.S.2
  • 60
    • 0019053271 scopus 로고
    • Comparison of parametric representations for mono-syllabic word recognition in continuously spoken utterances
    • Davis S.B., and Mermelstein P. Comparison of parametric representations for mono-syllabic word recognition in continuously spoken utterances. IEEE Trans. Acoust. Speech Signal Process. 28 4 (1980) 357-366
    • (1980) IEEE Trans. Acoust. Speech Signal Process. , vol.28 , Issue.4 , pp. 357-366
    • Davis, S.B.1    Mermelstein, P.2
  • 61
    • 0003822743 scopus 로고    scopus 로고
    • Cambridge University Engineering Department, Cambridge, UK
    • Young S. The HTK Book (2001), Cambridge University Engineering Department, Cambridge, UK
    • (2001) The HTK Book
    • Young, S.1
  • 62
    • 33847165718 scopus 로고    scopus 로고
    • L.D. Alsteris, K.K. Paliwal, Evaluation of the modified group delay feature for isolated word recognition, in: Int. Symposium on Signal Processing and Its Applications, August 2005, pp. 715-718
  • 63
    • 85009083403 scopus 로고    scopus 로고
    • B. Bozkurt, B. Doval, C. D'Alessandro, T. Dutoit, Zeros of z-transform (ZZT) decomposition of speech for source-tract separation, in: Proc. International Conf. Speech, Language Processing, October 2004
  • 64
    • 34047244425 scopus 로고    scopus 로고
    • B. Bozkurt, B. Doval, C. D'Alessandro, T. Dutoit, Appropriate windowing for group delay analysis and roots of z-transform of speech signals, in: European Signal Processing Conf., 2004
  • 65
    • 34047266694 scopus 로고    scopus 로고
    • K.K. Paliwal, Decorrelated and liftered filter-bank energies for robust speech recognition, in: Proc. European Conf. Speech Communication and Technology, September 1999, pp. 85-88
  • 68
    • 0031630638 scopus 로고    scopus 로고
    • G. Rigoll, D. Willett, A NN/HMM hybrid for continuous speech recognition with a discriminant nonlinear feature extraction, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 1998, pp. 9-12
  • 69
    • 0033709098 scopus 로고    scopus 로고
    • H. Hermansky, D.P.W. Ellis, S. Sharma, Tandem connectionist feature extraction for conventional HMM systems, in: Proc. International Conf. Acoustics, Speech, Signal Processing, June 2000, pp. 1635-1638
  • 70
    • 0034848926 scopus 로고    scopus 로고
    • D.P.W. Ellis, R. Singh, S. Sivadas, Tandem acoustic modeling in large-vocabulary recognition, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 2001, pp. 517-520
  • 71
    • 0034856452 scopus 로고    scopus 로고
    • B. Yegnanarayana, K.S. Reddy, S.P. Kishore, Source and system features for speaker recognition using AANN model, in: Proc. International Conf. Acoustics, Speech, Signal Processing, May 2001, pp. 409-412


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.