메뉴 건너뛰기




Volumn 131, Issue 3, 2012, Pages 2270-2287

Recognizing articulatory gestures from speech for robust speech recognition

Author keywords

[No Author keywords available]

Indexed keywords

ARTICULATORY GESTURES; AUTOMATIC SPEECH RECOGNITION SYSTEM; DATA SETS; DIGIT RECOGNITION; DYNAMIC BAYESIAN NETWORK; DYNAMIC PARAMETERS; NATURAL SPEECH; RECOGNITION PERFORMANCE; RECOGNITION RATES; ROBUST SPEECH RECOGNITION; SPEECH RECOGNITION SYSTEMS; SPEECH SIGNALS; SYNTHETIC SPEECH; THREE STAGES; WORD RECOGNITION;

EID: 84858976368     PISSN: 00014966     EISSN: None     Source Type: Journal    
DOI: 10.1121/1.3682038     Document Type: Article
Times cited : (27)

References (63)
  • 1
    • 0020602364 scopus 로고
    • Efficient coding of LPC parameters by temporal decomposition
    • Boston, MA
    • Atal, B. S. (1983), Efficient coding of LPC parameters by temporal decomposition., Proceedings of ICASSP, Boston, MA, pp. 81-84.
    • (1983) Proceedings of ICASSP , pp. 81-84
    • Atal, B.S.1
  • 2
    • 34547975052 scopus 로고    scopus 로고
    • Scaling learning algorithms toward AI
    • edited by L. Bottou, O. Chapelle, D. De-Coste, and J. Weston (MIT Press, Cambridge, MA)
    • Bengio, Y., and Le Cun, Y. (2007), Scaling learning algorithms toward AI., in Large Scale Kernel Machines, edited by, L. Bottou, O. Chapelle, D. De-Coste, and, J. Weston, (MIT Press, Cambridge, MA), pp. 321-360.
    • (2007) Large Scale Kernel Machines , pp. 321-360
    • Bengio, Y.1    Le Cun, Y.2
  • 3
    • 0036293559 scopus 로고    scopus 로고
    • The graphical models Toolkit: An open source software system for speech and time-series processing
    • Orlando, FL
    • Bilmes, J., and Zweig, G. (2002), The graphical models Toolkit: An open source software system for speech and time-series processing., Proceedings of ICASSP, Orlando, FL, Vol. 4, pp. 3916-3919.
    • (2002) Proceedings of ICASSP , vol.4 , pp. 3916-3919
    • Bilmes, J.1    Zweig, G.2
  • 4
    • 84971737266 scopus 로고
    • Articulatory gestures as phonological units
    • 10.1017/S0952675700001019
    • Browman, C., and Goldstein, L. (1989), Articulatory gestures as phonological units., Phonology 6, 201-251. 10.1017/S0952675700001019
    • (1989) Phonology , vol.6 , pp. 201-251
    • Browman, C.1    Goldstein, L.2
  • 5
    • 0027024362 scopus 로고
    • Articulatory phonology: An overview
    • 10.1159/000261913
    • Browman, C., and Goldstein, L. (1992), Articulatory phonology: An overview., Phonetica 49, 155-180. 10.1159/000261913
    • (1992) Phonetica , vol.49 , pp. 155-180
    • Browman, C.1    Goldstein, L.2
  • 6
    • 42549139762 scopus 로고    scopus 로고
    • MVA processing of speech features
    • 10.1109/TASL.2006.876717
    • Chen, C., and Bilmes, J. (2007), MVA processing of speech features., IEEE Trans. Audio Speech Lang. Processing 15 (1), 257-270. 10.1109/TASL.2006.876717
    • (2007) IEEE Trans. Audio Speech Lang. Processing , vol.15 , Issue.1 , pp. 257-270
    • Chen, C.1    Bilmes, J.2
  • 8
    • 33750368310 scopus 로고    scopus 로고
    • An audio-visual corpus for speech perception and automatic speech recognition
    • DOI 10.1121/1.2229005
    • Cooke, M., Barker, J., Cunningham, S., and Shao, X. (2006), An audio-visual corpus for speech perception and automatic speech recognition., J. Acoust. Soc. Am. 120, 2421-2424. 10.1121/1.2229005 (Pubitemid 44631681)
    • (2006) Journal of the Acoustical Society of America , vol.120 , Issue.5 , pp. 2421-2424
    • Cooke, M.1    Barker, J.2    Cunningham, S.3    Shao, X.4
  • 9
    • 27744539597 scopus 로고    scopus 로고
    • Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR
    • DOI 10.1109/TSA.2005.853002
    • Cui, X., and Alwan, A. (2005), Noise robust speech recognition using feature compensation based on polynomial regression of utterance SNR., IEEE Trans. Speech Audio Processing 13 (6), 1161-1172. 10.1109/TSA.2005.853002 (Pubitemid 41605019)
    • (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.6 , pp. 1161-1172
    • Cui, X.1    Alwan, A.2
  • 11
    • 0003396255 scopus 로고    scopus 로고
    • The MathWorks Inc., Natick, MA. (Last viewed June 28, 2010)
    • Demuth, H., Beale, M., and Hagan, M. (2008), Neural Network ToolboxTM6, User's Guide., The MathWorks Inc., Natick, MA. www.mathworks.com/access/ helpdesk/help/pdf-doc/nnet/nnet.pdf (Last viewed June 28, 2010).
    • (2008) Neural Network ToolboxTM6, User's Guide
    • Demuth, H.1    Beale, M.2    Hagan, M.3
  • 12
    • 0028234947 scopus 로고
    • A statistical approach to automatic speech recognition using the atomic speech units constructed from overlapping articulatory features
    • DOI 10.1121/1.409839
    • Deng, L., and Sun, D. (1994), A statistical approach to automatic speech recognition using atomic units constructed from overlapping articulatory features., J. Acoust. Soc. Am. 95 (5), 2702-2719. 10.1121/1.409839 (Pubitemid 24152864)
    • (1994) Journal of the Acoustical Society of America , vol.95 , Issue.5 , pp. 2702-2719
    • Deng, L.1    Sun, D.X.2
  • 13
    • 27644525945 scopus 로고    scopus 로고
    • Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
    • DOI 10.1109/TSA.2005.851910
    • Deshmukh, O., Espy-Wilson, C., Salomon, A., and Singh, J. (2005), Use of temporal information: Detection of the periodicity and aperiodicity profile of speech., IEEE Trans. Speech Audio Process. 13 (5), 776-786. 10.1109/TSA.2005. 851910 (Pubitemid 41558894)
    • (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.5 , pp. 776-786
    • Deshmukh, O.1    Espy-Wilson, C.Y.2    Salomon, A.3    Singh, J.4
  • 15
    • 52949093125 scopus 로고    scopus 로고
    • Combined speech enhancement and auditory modelling for robust distributed speech recognition
    • 10.1016/j.specom.2008.05.004
    • Flynn, R., and Jones, E. (2008), Combined speech enhancement and auditory modelling for robust distributed speech recognition., Speech Comm. 50, 797-809. 10.1016/j.specom.2008.05.004
    • (2008) Speech Comm. , vol.50 , pp. 797-809
    • Flynn, R.1    Jones, E.2
  • 16
    • 58849145971 scopus 로고    scopus 로고
    • ASR-Articulatory speech recognition
    • Aalborg, Denmark
    • Frankel, J., and King, S. (2001), ASR-Articulatory speech recognition., Proceedings of Eurospeech, Aalborg, Denmark, pp. 599-602.
    • (2001) Proceedings of Eurospeech , pp. 599-602
    • Frankel, J.1    King, S.2
  • 17
    • 33745225408 scopus 로고    scopus 로고
    • A hybrid ANN/DBN approach to articulatory feature recognition
    • Lisbon, Portugal
    • Frankel, J., and King, S. (2005), A hybrid ANN/DBN approach to articulatory feature recognition., Proc. of Eurospeech, Interspeech, Lisbon, Portugal, pp. 3045-3048.
    • (2005) Proc. of Eurospeech, Interspeech , pp. 3045-3048
    • Frankel, J.1    King, S.2
  • 18
    • 85009088992 scopus 로고    scopus 로고
    • Articulatory feature recognition using dynamic Bayesian networks
    • Jeju, Korea
    • Frankel, J., Wester, M., and King, S. (2004), Articulatory feature recognition using dynamic Bayesian networks., Proc. of ICSLP, Jeju, Korea, pp. 1202-1205.
    • (2004) Proc. of ICSLP , pp. 1202-1205
    • Frankel, J.1    Wester, M.2    King, S.3
  • 19
    • 0002049440 scopus 로고    scopus 로고
    • Learning Dynamic Bayesian Networks
    • Adaptive Processing of Sequences and Data Structures
    • Ghahramani, Z. (1998), Learning dynamic Bayesian networks., in Adaptive Processing of Temporal Information, edited by, C. L. Giles, and, M. Gori, (Springer-Verlag, Berlin), pp. 168-197. (Pubitemid 128056031)
    • (1998) Lecture Notes in Computer Science , Issue.1387 , pp. 168-197
    • Ghahramani, Z.1
  • 22
    • 0036711819 scopus 로고    scopus 로고
    • A quasiarticulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn
    • DOI 10.1121/1.1498851
    • Hanson, H. M., and Stevens, K. N. (2002), A quasiarticulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn., J. Acoust. Soc. Am. 112 (3), 1158-1182. 10.1121/1.1498851 (Pubitemid 35006671)
    • (2002) Journal of the Acoustical Society of America , vol.112 , Issue.3 , pp. 1158-1182
    • Hanson, H.M.1    Stevens, K.N.2
  • 24
    • 0033709098 scopus 로고    scopus 로고
    • Tandem connectionist feature stream extraction for conventional HMM systems
    • Istanbul, Turkey
    • Hermansky, H., Ellis, D., and Sharma, S. (2000), Tandem connectionist feature stream extraction for conventional HMM systems., Proceedings of ICASSP, Istanbul, Turkey, pp. 1635-1638.
    • (2000) Proceedings of ICASSP , pp. 1635-1638
    • Hermansky, H.1    Ellis, D.2    Sharma, S.3
  • 26
    • 79959812754 scopus 로고    scopus 로고
    • FSM-based pronunciation modeling using articulatory phonological code
    • Hu, C., Zhuang, X., and Hasegawa-Johnson, M. (2010), FSM-based pronunciation modeling using articulatory phonological code., Proceedings of Interspeech, pp. 2274-2277.
    • (2010) Proceedings of Interspeech , pp. 2274-2277
    • Hu, C.1    Zhuang, X.2    Hasegawa-Johnson, M.3
  • 27
    • 0004149277 scopus 로고
    • Preliminaries to speech analysis: The distinctive features and their correlates
    • (MIT Press, Cambridge, MA)
    • Jakobson, R., Fant, C. G. M., and Halle, M. (1952), Preliminaries to speech analysis: The distinctive features and their correlates., MIT Acoustics Laboratory Technical Report 13 (MIT Press, Cambridge, MA).
    • (1952) MIT Acoustics Laboratory Technical Report 13
    • Jakobson, R.1    Fant, C.G.M.2    Halle, M.3
  • 28
    • 44049116478 scopus 로고
    • Forward models-supervised learning with a distal teacher
    • 10.1207/s15516709cog1603-1
    • Jordan, M. I., and Rumelhart, D. E. (1992), Forward models-supervised learning with a distal teacher., Cogn. Sci. 16, 307-354. 10.1207/ s15516709cog1603-1
    • (1992) Cogn. Sci. , vol.16 , pp. 307-354
    • Jordan, M.I.1    Rumelhart, D.E.2
  • 30
    • 0029753859 scopus 로고    scopus 로고
    • Deriving gestural scores from articulator-movement records using weighted temporal decomposition
    • 10.1109/TSA.1996.481448
    • Jung, T. P., Krishnamurthy, A. K., Ahalt, S. C., Beckman, M. E., and Lee, S. H. (1996), Deriving gestural scores from articulator-movement records using weighted temporal decomposition., IEEE Trans. Speech Audio Process. 4 (1), 2-18. 10.1109/TSA.1996.481448
    • (1996) IEEE Trans. Speech Audio Process. , vol.4 , Issue.1 , pp. 2-18
    • Jung, T.P.1    Krishnamurthy, A.K.2    Ahalt, S.C.3    Beckman, M.E.4    Lee, S.H.5
  • 34
    • 82955170227 scopus 로고
    • Technical Report LA-UR-88-418, Los Alamos National Library, Los Alamos, NM
    • Lapedes, A., and Farber, R. (1988), How neural networks work., Technical Report LA-UR-88-418, Los Alamos National Library, Los Alamos, NM.
    • (1988) How Neural Networks Work
    • Lapedes, A.1    Farber, R.2
  • 35
    • 0031187171 scopus 로고    scopus 로고
    • Speech recognition by machines and humans
    • PII S0167639397000216
    • Lippmann, R. (1997), Speech recognition by machines and humans., Speech Comm. 22, 1-15. 10.1016/S0167-6393(97)00021-6 (Pubitemid 127403436)
    • (1997) Speech Communication , vol.22 , Issue.1 , pp. 1-15
    • Lippmann, R.P.1
  • 36
    • 29444436962 scopus 로고    scopus 로고
    • Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework
    • DOI 10.1016/j.specom.2005.07.003, PII S0167639305001731
    • Markov, K., Dang, J., and Nakamura, S. (2006), Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework., Speech Comm. 48, 161-175. 10.1016/j.specom.2005.07.003 (Pubitemid 43012029)
    • (2006) Speech Communication , vol.48 , Issue.2 , pp. 161-175
    • Markov, K.1    Dang, J.2    Nakamura, S.3
  • 37
    • 0028375762 scopus 로고
    • Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests
    • 10.1016/0167-6393(94)90055-8
    • McGowan, R. S. (1994), Recovering articulatory movement from formant frequency trajectories using task dynamics and a genetic algorithm: Preliminary model tests., Speech Comm. 14 (1), 19-48. 10.1016/0167-6393(94)90055-8
    • (1994) Speech Comm. , vol.14 , Issue.1 , pp. 19-48
    • McGowan, R.S.1
  • 38
    • 34848837678 scopus 로고    scopus 로고
    • The Essential Role of Premotor Cortex in Speech Perception
    • DOI 10.1016/j.cub.2007.08.064, PII S0960982207019690
    • Meister, I. G., Wilson, S. M., Deblieck, C., Wu, A. D., and Iacoboni, M. (2007), The essential role of premotor cortex in speech perception., Curr. Biol. 17, 1692-1696. 10.1016/j.cub.2007.08.064 (Pubitemid 47503812)
    • (2007) Current Biology , vol.17 , Issue.19 , pp. 1692-1696
    • Meister, I.G.1    Wilson, S.M.2    Deblieck, C.3    Wu, A.D.4    Iacoboni, M.5
  • 39
    • 70349207706 scopus 로고    scopus 로고
    • TaDA: An enhanced, portable task dynamics model in MATLAB
    • Nam, H., Goldstein, L., Saltzman, E., and Byrd, D. (2004), TaDA: An enhanced, portable task dynamics model in MATLAB., J. Acoust. Soc. Am. 115 (5), pp. 2430.
    • (2004) J. Acoust. Soc. Am. , vol.115 , Issue.5 , pp. 2430
    • Nam, H.1    Goldstein, L.2    Saltzman, E.3    Byrd, D.4
  • 41
    • 78649390043 scopus 로고    scopus 로고
    • Retrieving tract variables from acoustics: A comparison of different machine learning strategies
    • 10.1109/JSTSP.2010.2076013
    • Mitra, V., Nam, H., Espy-Wilson, C., Saltzman, E., and Goldstein, L. (2010), Retrieving tract variables from acoustics: A comparison of different machine learning strategies., IEEE J. Selected Topics Signal Process. 4, 1027-1045. 10.1109/JSTSP.2010.2076013
    • (2010) IEEE J. Selected Topics Signal Process. , vol.4 , pp. 1027-1045
    • Mitra, V.1    Nam, H.2    Espy-Wilson, C.3    Saltzman, E.4    Goldstein, L.5
  • 44
    • 84867222549 scopus 로고    scopus 로고
    • The acoustic to articulation mapping: Non-linear or non-unique?
    • Brisbane, Australia
    • Neiberg, D., Ananthakrishnan, G., and Engwall, O. (2008), The acoustic to articulation mapping: Non-linear or non-unique?, Proceedings of Interspeech, Brisbane, Australia, pp. 1485-1488.
    • (2008) Proceedings of Interspeech , pp. 1485-1488
    • Neiberg, D.1    Ananthakrishnan, G.2    Engwall, O.3
  • 46
    • 84987702417 scopus 로고    scopus 로고
    • The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
    • Beijing, China
    • Pearce, D., and Hirsch, H. G. (2000), The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions., Proceedings of ICSLP, ASR, Beijing, China, pp. 181-188.
    • (2000) Proceedings of ICSLP, ASR , pp. 181-188
    • Pearce, D.1    Hirsch, H.G.2
  • 48
    • 51449098747 scopus 로고    scopus 로고
    • An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping
    • Antwerp, Belgium
    • Qin, C., and Carreira-Perpin, M. (2007), An empirical investigation of the nonuniqueness in the acoustic-to-articulatory mapping., Proceedings of Interspeech, Antwerp, Belgium, pp. 74-77.
    • (2007) Proceedings of Interspeech , pp. 74-77
    • Qin, C.1    Carreira-Perpin, M.2
  • 49
    • 70449345905 scopus 로고    scopus 로고
    • Analysis of pausing behavior in spontaneous speech using real-time magnetic resonance imaging of articulation
    • 10.1121/1.3213452
    • Ramanarayanan, V., Bresch, E., Byrd, D., Goldstein, L., and Narayanan, S. (2009), Analysis of pausing behavior in spontaneous speech using real-time magnetic resonance imaging of articulation., J. Acoust. Soc. Am. 126 (5), EL160-EL165. 10.1121/1.3213452
    • (2009) J. Acoust. Soc. Am. , vol.126 , Issue.5
    • Ramanarayanan, V.1    Bresch, E.2    Byrd, D.3    Goldstein, L.4    Narayanan, S.5
  • 50
    • 0037697284 scopus 로고    scopus 로고
    • Hidden-articulator Markov models for speech recognition
    • 10.1016/S0167-6393(03)00031-1
    • Richardson, M., Bilmes, J., and Diorio, C. (2003), Hidden-articulator Markov models for speech recognition., Speech Comm. 41 (2-3), 511-529. 10.1016/S0167-6393(03)00031-1
    • (2003) Speech Comm. , vol.41 , Issue.23 , pp. 511-529
    • Richardson, M.1    Bilmes, J.2    Diorio, C.3
  • 52
    • 67650105018 scopus 로고    scopus 로고
    • Trajectory mixture density network with multiple mixtures for acoustic-articulatory inversion
    • Paris, France
    • Richmond, K. (2007), Trajectory mixture density network with multiple mixtures for acoustic-articulatory inversion., ITRW on Non-Linear Speech Processing, NOLISP-07, Paris, France, pp. 67-70.
    • (2007) ITRW on Non-Linear Speech Processing, NOLISP-07 , pp. 67-70
    • Richmond, K.1
  • 54
    • 77956779481 scopus 로고
    • A dynamical approach to gestural patterning in speech production
    • 10.1207/s15326969eco0104-2
    • Saltzman, E., and Munhall, K. (1989), A dynamical approach to gestural patterning in speech production., Ecol. Psychol. 1 (4), 332-382. 10.1207/s15326969eco0104-2
    • (1989) Ecol. Psychol. , vol.1 , Issue.4 , pp. 332-382
    • Saltzman, E.1    Munhall, K.2
  • 56
    • 84939672029 scopus 로고
    • Toward a model for speech recognition
    • 10.1121/1.1907874
    • Stevens, K. (1960), Toward a model for speech recognition., J. Acoust. Soc. Am. 32, 47-55. 10.1121/1.1907874
    • (1960) J. Acoust. Soc. Am. , vol.32 , pp. 47-55
    • Stevens, K.1
  • 57
    • 0036165806 scopus 로고    scopus 로고
    • An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition
    • DOI 10.1121/1.1420380
    • Sun, J. P., and Deng, L. (2002), An overlapping-feature-based phonological model incorporating linguistic constraints: Applications to speech recognition., J. Acoust. Soc. Am. 111 (2), 1086-1101. 10.1121/1.1420380 (Pubitemid 34127489)
    • (2002) Journal of the Acoustical Society of America , vol.111 , Issue.2 , pp. 1086-1101
    • Sun, J.1    Deng, L.2
  • 58
    • 33745288610 scopus 로고    scopus 로고
    • A support vector approach to the acoustic-to- articulatory mapping
    • Lisbon, Portugal
    • Toutios, A., and Margaritis, K. (2005), A support vector approach to the acoustic-to- articulatory mapping., Proceedings of Interspeech, Lisbon, Portugal, pp. 3221-3224.
    • (2005) Proceedings of Interspeech , pp. 3221-3224
    • Toutios, A.1    Margaritis, K.2
  • 59
    • 0033097443 scopus 로고    scopus 로고
    • Single channel speech enhancement based on masking properties of the human auditory system
    • 10.1109/89.748118
    • Virag, N. (1999), Single channel speech enhancement based on masking properties of the human auditory system., IEEE Trans. Speech Audio Process. 7 (2), 126-137. 10.1109/89.748118
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.2 , pp. 126-137
    • Virag, N.1
  • 61
    • 0037503670 scopus 로고    scopus 로고
    • A multichannel articulatory database and its application for automatic speech recognition
    • Bavaria, Germany
    • Wrench, A. A., and Hardcastle, W. J. (2000), A multichannel articulatory database and its application for automatic speech recognition., in 5th Seminar on Speech Production: Models and Data, Bavaria, Germany, pp. 305-308.
    • (2000) 5th Seminar on Speech Production: Models and Data , pp. 305-308
    • Wrench, A.A.1    Hardcastle, W.J.2
  • 62
    • 77955810460 scopus 로고    scopus 로고
    • A study on the generalization capability of acoustic models for robust speech recognition
    • 10.1109/TASL.2009.2031236
    • Xiao, X., Li, J., Chng, E. S., Li, H., and Lee, C. (2010), A study on the generalization capability of acoustic models for robust speech recognition., IEEE Trans. Audio Speech Lang. Process. 18 (6), 1158-1169. 10.1109/TASL.2009. 2031236
    • (2010) IEEE Trans. Audio Speech Lang. Process. , vol.18 , Issue.6 , pp. 1158-1169
    • Xiao, X.1    Li, J.2    Chng, E.S.3    Li, H.4    Lee, C.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.