메뉴 건너뛰기




Volumn 132, Issue 3, 2012, Pages 1732-1746

Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking

Author keywords

[No Author keywords available]

Indexed keywords

ACOUSTIC SPEECH; AUTOREGRESSIVE MOVING AVERAGE; AUTOREGRESSIVE MOVING AVERAGE MODELING; BIAS AND VARIANCE; CANDIDATE SELECTION; CENTER FREQUENCY; CEPSTRAL COEFFICIENTS; DYNAMIC PROGRAMMING METHODS; FORMANT FREQUENCY; LINEARIZED MAPPING; POINT ESTIMATE; ROOT-MEAN SQUARE ERRORS; STATE-SPACE FRAMEWORK; SYNTHESIZED SPEECH; TIME VARYING; TRACKING PERFORMANCE; TRADE OFF; VOCAL TRACT RESONANCES;

EID: 84866306536     PISSN: 00014966     EISSN: None     Source Type: Journal    
DOI: 10.1121/1.4739462     Document Type: Article
Times cited : (75)

References (51)
  • 1
    • 79959582948 scopus 로고    scopus 로고
    • Automatic estimation of the first subglottal resonance
    • 10.1121/1.3567004
    • Arsikere, H., Lulich, S. M., and Alwan, A. (2011). Automatic estimation of the first subglottal resonance., J. Acoust. Soc. Am. 129, EL197-EL203. 10.1121/1.3567004
    • (2011) J. Acoust. Soc. Am. , vol.129
    • Arsikere, H.1    Lulich, S.M.2    Alwan, A.3
  • 2
    • 0015112070 scopus 로고
    • Speech analysis and synthesis by linear prediction of the speech wave
    • 10.1121/1.1912679
    • Atal, B. S., and Hanauer, S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave., J. Acoust. Soc. Am. 50, 637-655. 10.1121/1.1912679
    • (1971) J. Acoust. Soc. Am. , vol.50 , pp. 637-655
    • Atal, B.S.1    Hanauer, S.L.2
  • 3
    • 0018034848 scopus 로고
    • Linear prediction analysis of speech based on a pole-zero representation
    • 10.1121/1.382117
    • Atal, B. S., and Schroeder, M. R. (1978). Linear prediction analysis of speech based on a pole-zero representation., J. Acoust. Soc. Am. 64, 1310-1318. 10.1121/1.382117
    • (1978) J. Acoust. Soc. Am. , vol.64 , pp. 1310-1318
    • Atal, B.S.1    Schroeder, M.R.2
  • 5
    • 0037567933 scopus 로고
    • Formant estimation by linear transformation of the LPC cepstrum
    • 10.1121/1.398581
    • Broad, D. J., and Clermont, F. (1989). Formant estimation by linear transformation of the LPC cepstrum., J. Acoust. Soc. Am. 86, 2013-2017. 10.1121/1.398581
    • (1989) J. Acoust. Soc. Am. , vol.86 , pp. 2013-2017
    • Broad, D.J.1    Clermont, F.2
  • 6
    • 0017542202 scopus 로고
    • The cepstrum: A guide to processing
    • 10.1109/PROC.1977.10747
    • Childers, D. G., Skinner, D. P., and Kemerait, R. C. (1977). The cepstrum: A guide to processing., Proc. IEEE 65, 1428-1443. 10.1109/PROC.1977. 10747
    • (1977) Proc. IEEE , vol.65 , pp. 1428-1443
    • Childers, D.G.1    Skinner, D.P.2    Kemerait, R.C.3
  • 7
    • 0017103003 scopus 로고
    • A comparison of three methods of extracting resonance information from predictor-coefficient coded speech
    • 10.1109/TASSP.1976.1162767
    • Christensen, R., Strong, W., and Palmer, E. (1976). A comparison of three methods of extracting resonance information from predictor-coefficient coded speech., IEEE Trans. Acoust. 24, 8-14. 10.1109/TASSP.1976.1162767
    • (1976) IEEE Trans. Acoust. , vol.24 , pp. 8-14
    • Christensen, R.1    Strong, W.2    Palmer, E.3
  • 8
    • 33746456716 scopus 로고    scopus 로고
    • Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint
    • 10.1109/TSA.2005.855841
    • Deng, L., Acero, A., and Bazzi, I. (2006a). Tracking vocal tract resonances using a quantized nonlinear function embedded in a temporal constraint., IEEE Trans. Audio Speech Lang. Process. 14, 425-434. 10.1109/TSA.2005.855841
    • (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , pp. 425-434
    • Deng, L.1    Acero, A.2    Bazzi, I.3
  • 10
    • 4544323815 scopus 로고    scopus 로고
    • A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances
    • Deng, L., Lee, L. J., Attias, H., and Acero, A. (2004). A structured speech model with continuous hidden dynamics and prediction-residual training for tracking vocal tract resonances., in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. pp. I-557-560.
    • (2004) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 557-560
    • Deng, L.1    Lee, L.J.2    Attias, H.3    Acero, A.4
  • 11
    • 34547517867 scopus 로고    scopus 로고
    • Adaptive Kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model
    • 10.1109/TASL.2006.876724
    • Deng, L., Lee, L. J., Attias, H., and Acero, A. (2007). Adaptive Kalman filtering and smoothing for tracking vocal tract resonances using a continuous-valued hidden dynamic model., IEEE Trans. Audio Speech Lang. Process. 15, 13-23. 10.1109/TASL.2006.876724
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , pp. 13-23
    • Deng, L.1    Lee, L.J.2    Attias, H.3    Acero, A.4
  • 12
    • 0033623527 scopus 로고    scopus 로고
    • Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics
    • 10.1121/1.1315288
    • Deng, L., and Ma, J. (2000). Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics., J. Acoust. Soc. Am. 108, 3036-3048. 10.1121/1.1315288
    • (2000) J. Acoust. Soc. Am. , vol.108 , pp. 3036-3048
    • Deng, L.1    Ma, J.2
  • 13
    • 33744966561 scopus 로고    scopus 로고
    • A bidirectional target-filtering model of speech coarticulation and reduction: Two-stage implementation for phonetic recognition
    • 10.1109/TSA.2005.854107
    • Deng, L., Yu, D., and Acero, A. (2006c). A bidirectional target-filtering model of speech coarticulation and reduction: Two-stage implementation for phonetic recognition., IEEE Trans. Audio Speech Lang. Process. 14, 256-265. 10.1109/TSA.2005.854107
    • (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , pp. 256-265
    • Deng, L.1    Yu, D.2    Acero, A.3
  • 14
  • 15
    • 0021472532 scopus 로고
    • On the computation of the Cramer-Rao bound for ARMA parameter estimation
    • 10.1109/TASSP.1984.1164391
    • Friedlander, B. (1984). On the computation of the Cramer-Rao bound for ARMA parameter estimation., IEEE Trans. Acoust. 32, 721-727. 10.1109/TASSP.1984.1164391
    • (1984) IEEE Trans. Acoust. , vol.32 , pp. 721-727
    • Friedlander, B.1
  • 16
    • 0024480323 scopus 로고
    • The exact Cramer-Rao bound for Gaussian autoregressive processes
    • 10.1109/7.18656
    • Friedlander, B., and Porat, B. (1989). The exact Cramer-Rao bound for Gaussian autoregressive processes., IEEE Trans. Aerosp. Electron. Syst. 25, 3-7. 10.1109/7.18656
    • (1989) IEEE Trans. Aerosp. Electron. Syst. , vol.25 , pp. 3-7
    • Friedlander, B.1    Porat, B.2
  • 17
    • 77954039336 scopus 로고    scopus 로고
    • Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction
    • 10.1121/1.3308476
    • Fulop, S. A. (2010). Accuracy of formant measurement for synthesized vowels using the reassigned spectrogram and comparison with linear prediction., J. Acoust. Soc. Am. 127, 2114-2117. 10.1121/1.3308476
    • (2010) J. Acoust. Soc. Am. , vol.127 , pp. 2114-2117
    • Fulop, S.A.1
  • 19
    • 0027580559 scopus 로고
    • Novel approach to nonlinear/non-Gaussian Bayesian state estimation
    • 10.1049/ip-f-2.1993.0015
    • Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation., IEE Proc. F, Radar Signal Process. 140, 107-113. 10.1049/ip-f-2.1993.0015
    • (1993) IEE Proc. F, Radar Signal Process. , vol.140 , pp. 107-113
    • Gordon, N.J.1    Salmond, D.J.2    Smith, A.F.M.3
  • 20
    • 0003410290 scopus 로고
    • (Princeton University Press, Princeton, NJ), Sec. (11.1).
    • Hamilton, J. D. (1994). Time Series Analysis (Princeton University Press, Princeton, NJ), Sec. (11.1).
    • (1994) Time Series Analysis
    • Hamilton, J.D.1
  • 21
    • 0032797217 scopus 로고    scopus 로고
    • Glottal characteristics of male speakers: Acoustic correlates and comparison with female data
    • 10.1121/1.427116
    • Hanson, H. M., and Chuang, E. S. (1999). Glottal characteristics of male speakers: Acoustic correlates and comparison with female data., J. Acoust. Soc. Am. 106, 1064-1077. 10.1121/1.427116
    • (1999) J. Acoust. Soc. Am. , vol.106 , pp. 1064-1077
    • Hanson, H.M.1    Chuang, E.S.2
  • 22
    • 34047246313 scopus 로고    scopus 로고
    • Age, sex, and vowel dependencies of acoustic measures related to the voice source
    • 10.1121/1.2697522
    • Iseli, M., Shue, Y.-L., and Alwan, A. (2007). Age, sex, and vowel dependencies of acoustic measures related to the voice source., J. Acoust. Soc. Am. 121, 2283-2295. 10.1121/1.2697522
    • (2007) J. Acoust. Soc. Am. , vol.121 , pp. 2283-2295
    • Iseli, M.1    Shue, Y.-L.2    Alwan, A.3
  • 24
    • 21244437999 scopus 로고    scopus 로고
    • Unscented filtering and nonlinear estimation
    • 10.1109/JPROC.2003.823141
    • Julier, S., and Uhlmann, J. (2004). Unscented filtering and nonlinear estimation., Proc. IEEE 92, 401-422. 10.1109/JPROC.2003.823141
    • (2004) Proc. IEEE , vol.92 , pp. 401-422
    • Julier, S.1    Uhlmann, J.2
  • 25
    • 85024429815 scopus 로고
    • A new approach to linear filtering and prediction problems
    • 10.1115/1.3662552
    • Kalman, R. E. (1960). A new approach to linear filtering and prediction problems., Trans. ASME J. Basic Eng. 82, 35-45. 10.1115/1.3662552
    • (1960) Trans. ASME J. Basic Eng. , vol.82 , pp. 35-45
    • Kalman, R.E.1
  • 26
    • 0018986665 scopus 로고
    • Software for a cascade/parallel formant synthesizer
    • 10.1121/1.383940
    • Klatt, D. H. (1980). Software for a cascade/parallel formant synthesizer., J. Acoust. Soc. Am. 67, 971-995. 10.1121/1.383940
    • (1980) J. Acoust. Soc. Am. , vol.67 , pp. 971-995
    • Klatt, D.H.1
  • 27
    • 4544367684 scopus 로고
    • Formant tracking using hidden Markov models and vector quantization
    • 10.1109/TASSP.1986.1164908
    • Kopec, G. (1986). Formant tracking using hidden Markov models and vector quantization., IEEE Trans. Acoust. 34, 709-729. 10.1109/TASSP.1986.1164908
    • (1986) IEEE Trans. Acoust. , vol.34 , pp. 709-729
    • Kopec, G.1
  • 29
    • 0001523807 scopus 로고    scopus 로고
    • A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech
    • 10.1006/csla.1999.0136
    • Ma, J. Z., and Deng, L. (2000). A path-stack algorithm for optimizing dynamic regimes in a statistical hidden dynamic model of speech., Comput. Speech Lang. 14, 101-114. 10.1006/csla.1999.0136
    • (2000) Comput. Speech Lang. , vol.14 , pp. 101-114
    • Ma, J.Z.1    Deng, L.2
  • 30
    • 0742307392 scopus 로고    scopus 로고
    • Target-directed mixture dynamic models for spontaneous speech recognition
    • 10.1109/TSA.2003.818074
    • Ma, J. Z., and Deng, L. (2004). Target-directed mixture dynamic models for spontaneous speech recognition., IEEE Trans. Speech Audio Process. 12, 47-58. 10.1109/TSA.2003.818074
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , pp. 47-58
    • Ma, J.Z.1    Deng, L.2
  • 31
    • 72949099498 scopus 로고    scopus 로고
    • On pole-zero model estimation methods minimizing a logarithmic criterion for speech analysis
    • 10.1109/TASL.2009.2025544
    • Marelli, D., and Balazs, P. (2010). On pole-zero model estimation methods minimizing a logarithmic criterion for speech analysis., IEEE Trans. Audio Speech Lang. Process. 18, 237-248. 10.1109/TASL.2009.2025544
    • (2010) IEEE Trans. Audio Speech Lang. Process. , vol.18 , pp. 237-248
    • Marelli, D.1    Balazs, P.2
  • 32
    • 0016049328 scopus 로고
    • An algorithm for automatic formant extraction using linear prediction spectra
    • 10.1109/TASSP.1974.1162559
    • McCandless, S. (1974). An algorithm for automatic formant extraction using linear prediction spectra., IEEE Trans. Acoust. 22, 134-141. 10.1109/TASSP.1974.1162559
    • (1974) IEEE Trans. Acoust. , vol.22 , pp. 134-141
    • McCandless, S.1
  • 34
    • 0022737369 scopus 로고
    • Adaptive identification of a time-varying ARMA speech model
    • 10.1109/TASSP.1986.1164831
    • Miyanaga, Y., Miki, N., and Nagai, N. (1986). Adaptive identification of a time-varying ARMA speech model., IEEE Trans. Acoust. 34, 423-433. 10.1109/TASSP.1986.1164831
    • (1986) IEEE Trans. Acoust. , vol.34 , pp. 423-433
    • Miyanaga, Y.1    Miki, N.2    Nagai, N.3
  • 35
    • 0024522014 scopus 로고
    • Static, dynamic, and relational properties in vowel perception
    • 10.1121/1.397861
    • Nearey, T. M. (1989). Static, dynamic, and relational properties in vowel perception., J. Acoust. Soc. Am. 85, 2088-2113. 10.1121/1.397861
    • (1989) J. Acoust. Soc. Am. , vol.85 , pp. 2088-2113
    • Nearey, T.M.1
  • 36
    • 0015100168 scopus 로고
    • Automatic formant tracking by a Newton-Raphson technique
    • 10.1121/1.1912681
    • Olive, J. P. (1971). Automatic formant tracking by a Newton-Raphson technique., J. Acoust. Soc. Am. 50, 661-670. 10.1121/1.1912681
    • (1971) J. Acoust. Soc. Am. , vol.50 , pp. 661-670
    • Olive, J.P.1
  • 37
    • 34249901270 scopus 로고    scopus 로고
    • Simulation and analysis of nasalized vowels based on magnetic resonance imaging data
    • 10.1121/1.2722220
    • Pruthi, T., Espy-Wilson, C. Y., and Story, B. H. (2007). Simulation and analysis of nasalized vowels based on magnetic resonance imaging data., J. Acoust. Soc. Am. 121, 3858-3873. 10.1121/1.2722220
    • (2007) J. Acoust. Soc. Am. , vol.121 , pp. 3858-3873
    • Pruthi, T.1    Espy-Wilson, C.Y.2    Story, B.H.3
  • 38
    • 0022859239 scopus 로고
    • A new algorithm for estimation of formant trajectories directly from the speech signal based on an extended Kalman-filter
    • Rigoll, G. (1986). A new algorithm for estimation of formant trajectories directly from the speech signal based on an extended Kalman-filter., in Proc. IEEE Int. Conf. Acoust. Speech Signal Process. pp. 1229-1232.
    • (1986) Proc. IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 1229-1232
    • Rigoll, G.1
  • 39
    • 0015008817 scopus 로고
    • Effect of glottal pulse shape on the quality of natural vowels
    • 10.1121/1.1912389
    • Rosenberg, A. E. (1971). Effect of glottal pulse shape on the quality of natural vowels., J. Acoust. Soc. Am. 49, 583-590. 10.1121/1.1912389
    • (1971) J. Acoust. Soc. Am. , vol.49 , pp. 583-590
    • Rosenberg, A.E.1
  • 40
    • 79953288197 scopus 로고    scopus 로고
    • Time-varying autoregressions in speech: Detection theory and applications
    • 10.1109/TASL.2010.2073704
    • Rudoy, D., Quatieri, T. F., and Wolfe, P. J. (2011). Time-varying autoregressions in speech: Detection theory and applications., IEEE Trans. Audio Speech Lang. Process. 19, 977-989. 10.1109/TASL.2010.2073704
    • (2011) IEEE Trans. Audio Speech Lang. Process. , vol.19 , pp. 977-989
    • Rudoy, D.1    Quatieri, T.F.2    Wolfe, P.J.3
  • 41
    • 69249139982 scopus 로고    scopus 로고
    • Conditionally linear Gaussian models for estimating vocal tract resonances
    • Rudoy, D., Spendley, D. N., and Wolfe, P. J. (2007). Conditionally linear Gaussian models for estimating vocal tract resonances., in Proc. INTERSPEECH, pp. 526-529.
    • (2007) Proc. INTERSPEECH , pp. 526-529
    • Rudoy, D.1    Spendley, D.N.2    Wolfe, P.J.3
  • 42
    • 0014730929 scopus 로고
    • System for automatic formant analysis of voiced speech
    • 10.1121/1.1911939
    • Schafer, R. W., and Rabiner, L. R. (1970). System for automatic formant analysis of voiced speech., J. Acoust. Soc. Am. 47, 634-648. 10.1121/1.1911939
    • (1970) J. Acoust. Soc. Am. , vol.47 , pp. 634-648
    • Schafer, R.W.1    Rabiner, L.R.2
  • 45
    • 0017626193 scopus 로고
    • On the simultaneous estimation of poles and zeros in speech analysis
    • 10.1109/TASSP.1977.1162939
    • Steiglitz, K. (1977). On the simultaneous estimation of poles and zeros in speech analysis., IEEE Trans. Acoust. 25, 229-234. 10.1109/TASSP.1977.1162939
    • (1977) IEEE Trans. Acoust. , vol.25 , pp. 229-234
    • Steiglitz, K.1
  • 46
  • 47
    • 43549113834 scopus 로고    scopus 로고
    • Nonlinear source-filter coupling in phonation: Theory
    • 10.1121/1.2832337
    • Titze, I. R. (2008). Nonlinear source-filter coupling in phonation: Theory., J. Acoust. Soc. Am. 123, 2733-2749. 10.1121/1.2832337
    • (2008) J. Acoust. Soc. Am. , vol.123 , pp. 2733-2749
    • Titze, I.R.1
  • 48
    • 0041831111 scopus 로고
    • On the statistics of estimated reflection and cepstrum coefficients of an autoregressive process
    • 10.1016/0165-1684(95)00004-W
    • Tourneret, J.-Y., and Lacaze, B. (1995). On the statistics of estimated reflection and cepstrum coefficients of an autoregressive process., Signal Process. 43, 253-267. 10.1016/0165-1684(95)00004-W
    • (1995) Signal Process. , vol.43 , pp. 253-267
    • Tourneret, J.-Y.1    Lacaze, B.2
  • 50
    • 0017969757 scopus 로고
    • Formant extraction from linear-prediction phase spectra
    • 10.1121/1.381864
    • Yegnanarayana, B. (1978). Formant extraction from linear-prediction phase spectra., J. Acoust. Soc. Am. 63, 1638-1640. 10.1121/1.381864
    • (1978) J. Acoust. Soc. Am. , vol.63 , pp. 1638-1640
    • Yegnanarayana, B.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.