SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 5, 2013, Pages 1023-1034

Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models

(2) Astudillo, Ramón Fernández a Orglmeister, Reinhold b

a INESC (Portugal)

b TECHNISCHE UNIVERSITÄT BERLIN (Germany)

Author keywords

MMSE; uncertainty decoding; uncertainty propagation; wiener filter

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; CEPSTRAL COEFFICIENTS; DISTORTION MODEL; DYNAMIC COMPENSATION; FEATURE DOMAIN; FEATURE EXTRACTION METHODS; MINIMUM MEAN SQUARE ERRORS (MMSE); MMSE; MMSE ESTIMATORS; OBSERVATION UNCERTAINTIES; POSTERIOR DISTRIBUTIONS; ROBUST ASR; SHORT TIME FOURIER TRANSFORMS; SPEECH DISTORTION; UNCERTAINTY DECODING; UNCERTAINTY PROPAGATION; WIENER FILTERS;

FEATURE EXTRACTION; SPEECH RECOGNITION; UNCERTAINTY ANALYSIS;

ESTIMATION;

EID: 84873901811 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2013.2244085 Document Type: Article

Times cited : (23)

References (49)

1
- 84873885459
- Jan.
- Speech Processing, Transmission and Quality Aspects (STQ); Dis tributed Speech recognition; Front-end feature extraction algorithm; Compression algorithms, ETSI ES 202 050 v1.1.5 (2007-01, ETSI, Jan. 2007.
- (2007) Speech Processing Transmission Quality Aspects (STQ); Dis Tributed Speech Recognition; Front-end Feature Extraction Algorithm; Compression Algorithms ETSI ES 202 050 V1. 1.5 (2007-01, ETSI

2
- 0003671941
- Ph.D. dissertation, Gonville and Caius College, Cambridge, U.K.
- M. J. F. Gales, "Model-based technique for noise robust speech recognition," Ph.D. dissertation, Gonville and Caius College, Cambridge, U.K., 1995.
- (1995) Model-based Technique for Noise Robust Speech Recognition
- Gales, M.J.F.¹

3
- 85009067687
- Using observation uncertainty in HMM decoding
- J. Arrowood and M. Clements, "Using observation uncertainty in HMM decoding," in Proc. Int. Conf. Spoken Lang. Process. (ICSLP), 2002.
- (2002) Proc. Int. Conf. Spoken Lang. Process. (ICSLP)
- Arrowood, J.¹ Clements, M.²

4
- 0036293930
- Probabilistic and Statistical Inference Group, University of Toronto Tech. Rep
- T. T. Kristjansson and B. J. Frey, "Accounting for uncertainty in observations: A new paradigm for robust automatic speech recognition," Probabilistic and Statistical Inference Group, University of Toronto, 2002, Tech. Rep.
- (2002) Accounting for Uncertainty in Observations: A New Paradigm for Robust Automatic Speech Recognition
- Kristjansson, T.T.¹ Frey, B.J.²

5
- 0036291376
- Uncertainty decoding with splice for noise robust speech recognition
- 1
- J. Droppo, A. Acero, and L. Deng, "Uncertainty decoding with splice for noise robust speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP)., 2002, vol. 1, pp. I-57-I-60, vol.1.
- (2002) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP). , vol.1
- Droppo, J.¹ Acero, A.² Deng, L.³

6
- 85009275141
- Exploiting variances in robust feature extraction based on a parametric model of speech distortion
- L. Deng, J. Droppo, and A. Acero, "Exploiting variances in robust feature extraction based on a parametric model of speech distortion," in Proc. Int. Conf. Spoken Lang. Process. (ICSLP), 2002.
- (2002) Proc. Int. Conf. Spoken Lang. Process. (ICSLP)
- Deng, L.¹ Droppo, J.² Acero, A.³

7
- 33749058582
- Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques
- Oct.
- D. Kolossa, A. Klimas, and R. Orglmeister, "Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques," in Proc. Workshop Applicat. Signal Process. Audio Acoust. (WASPAA), Oct. 2005, pp. 82-85.
- (2005) Proc. Workshop Applicat. Signal Process. Audio Acoust. (WASPAA) , pp. 82-85
- Kolossa, D.¹ Klimas, A.² Orglmeister, R.³

8
- 40249103761
- Issues with uncertainty decoding for noise robust automatic speech recognition
- H. Liao and M. Gales, "Issues with uncertainty decoding for noise robust automatic speech recognition," Speech Commun., vol. 50, no. 4, pp. 265-277, 2008.
- (2008) Speech Commun. , vol.50 , Issue.4 , pp. 265-277
- Liao, H.¹ Gales, M.²

9
- 84867337739
- New York NY USA: Springer
- D. Kolossa and R. Haeb-Umbach, Eds., Robust Speech Recognition of Uncertain or Missing Data-Theory and Applications. New York, NY, USA: Springer, 2011.
- (2011) Robust Speech Recognition of Uncertain or Missing Data-Theory and Applications
- Kolossa, D.¹ Haeb-Umbach, R.²

10
- 84873420347
- GMM-based classification from noisy features
- A. Ozerov, M. Lagrange, and E. Vincent, "GMM-based classification from noisy features," in Proc. 1st Int. Workshop Mach. Listening in Multisource Environ. (CHiME), 2011, pp. 30-35.
- (2011) Proc. 1st Int. Workshop Mach. Listening in Multisource Environ. (CHiME) , pp. 30-35
- Ozerov, A.¹ Lagrange, M.² Vincent, E.³

11
- 70450180986
- Model based feature enhancement for automatic speech recognition in reverberant environments
- A. Krueger and R. Haeb-Umbach, "Model based feature enhancement for automatic speech recognition in reverberant environments," in In Proc. Interspeech, 2009, pp. 1231-1234.
- (2009) Proc. Interspeech , pp. 1231-1234
- Krueger, A.¹ Haeb-Umbach, R.²

12
- 0036508276
- Speaker verification in noise using a stochastic version of the weighted viterbi algorithm
- Mar
- N. Yoma and M. Villar, "Speaker verification in noise using a stochastic version of the weighted viterbi algorithm," IEEE Trans. Speech Audio Process., vol. 10, no. 3, pp. 158-166, Mar. 2002.
- (2002) IEEE Trans. Speech Audio Process. , vol.10 , Issue.3 , pp. 158-166
- Yoma, N.¹ Villar, M.²

13
- 33947644911
- A supervised learning approach to uncertainty decoding for robust speech recognition
- S. Srinivasan and D. Wang, "A supervised learning approach to uncertainty decoding for robust speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), vol. 1, pp. I-I, 14-19 2006.
- (2006) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , vol.1 , pp. 14-19
- Srinivasan, S.¹ Wang, D.²

14
- 56249136428
- Transforming binary uncertainties for robust speech recognition
- Se
- S. Srinivasan and D. Wang, "Transforming binary uncertainties for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2130-2140, Sep. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 2130-2140
- Srinivasan, S.¹ Wang, D.²

15
- 84867602659
- Integration of beamforming and automatic speech recognition through propagation of the wiener posterior
- R. F. Astudillo, A. Abad, and J. P. Neto, "Integration of beamforming and automatic speech recognition through propagation of the wiener posterior," in Proc. ICASSP, Apr. 2012, pp. 4909-4912.
- (2012) Proc. ICASSP, Apr. , pp. 4909-4912
- Astudillo, R.F.¹ Abad, A.² Neto, J.P.³

16
- 77954583785
- Independent component analysis and time-frequency masking for speech recognition in multi-talker conditions
- D. Kolossa, R. F. Astudillo, E. Hoffmann, and R. Orglmeister, "Independent component analysis and time-frequency masking for speech recognition in multi-talker conditions," EURASIP J. Audio, Speech, Music Process., pp. 1-13, 2010.
- (2010) EURASIP J. Audio, Speech, Music Process. , pp. 1-13
- Kolossa, D.¹ Astudillo, R.F.² Hoffmann, E.³ Orglmeister, R.⁴

17
- 77956717352
- An uncertainty propagation approach to robust ASR using the ETSI advanced front-end
- Oct
- R. F. Astudillo, D. Kolossa, P. Mandelartz, and R. Orglmeister, "An uncertainty propagation approach to robust ASR using the ETSI advanced front-end," IEEE J. Sel. Topics Signal Process., vol. 4, no. 5, pp. 824-833, Oct. 2010.
- (2010) IEEE J. Sel. Topics Signal Process. , vol.4 , Issue.5 , pp. 824-833
- Astudillo, R.F.¹ Kolossa, D.² Mandelartz, P.³ Orglmeister, R.⁴

18
- 79959836811
- A MMSE estimator in mel-cep-stral domain for robust large vocabulary automatic speech recognition using uncertainty propagation
- R. F. Astudillo and R. Orglmeister, "A MMSE estimator in mel-cep-stral domain for robust large vocabulary automatic speech recognition using uncertainty propagation," in Proc. Interspeech, 2010.
- (2010) Proc. Interspeech
- Astudillo, R.F.¹ Orglmeister, R.²

19
- 66149101303
- Robust speech recognition using a cepstral minimum-mean-square-error- motivated noise suppressor
- Jul
- D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, "Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor," IEEE Trans. Audio, Speech, Lang. Process., vol. 16, no. 5, pp. 1061-1070, Jul. 2008.
- (2008) IEEE Trans. Audio, Speech, Lang. Process. , vol.16 , Issue.5 , pp. 1061-1070
- Yu, D.¹ Deng, L.² Droppo, J.³ Wu, J.⁴ Gong, Y.⁵ Acero, A.⁶

20
- 79551496435
- MMSE estimation of log-filterbank energies for robust speech recognition
- A. Stark and K. Paliwal, "MMSE estimation of log-filterbank energies for robust speech recognition," Speech Commun., vol. 53, no. 3, pp. 403-416, 2011.
- (2011) Speech Commun. , vol.53 , Issue.3 , pp. 403-416
- Stark, A.¹ Paliwal, K.²

21
- 44949190747
- Improved source modeling and predictive classification for channel robust speech recognition
- V. Ion and R. Haeb-Umbach, "Improved source modeling and predictive classification for channel robust speech recognition," in Proc. Interspeech, 2006.
- (2006) Proc. Interspeech
- Ion, V.¹ Haeb-Umbach, R.²

22
- 0019009880
- Speech enhancement using a soft-decision noise suppression filter
- Apr.
- R. McAulay and M. Malpass, "Speech enhancement using a soft-decision noise suppression filter," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 2, pp. 137-145, Apr. 1980.
- (1980) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-28 , Issue.2 , pp. 137-145
- McAulay, R.¹ Malpass, M.²

23
- 0021645331
- Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
- Dec.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.
- (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
- Ephraim, Y.¹ Malah, D.²

24
- 0021892216
- Speech enhancement using a minimum mean square error log-spectral amplitude estimator
- Apr.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error log-spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-33, no. 2, pp. 443-445, Apr. 1985.
- (1985) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-33 , Issue.2 , pp. 443-445
- Ephraim, Y.¹ Malah, D.²

25
- 51449084820
- Ph.D. dissertation, Technische Univ. Eindhoven, Eindhoven, The Netherlands
- E. A. P. Habets, "Single-and multi-microphone speech dereverbera-tion using spectral enhancement," Ph.D. dissertation, Technische Univ. Eindhoven, Eindhoven, The Netherlands, 2007.
- (2007) Single-and Multi-microphone Speech Dereverbera-tion Using Spectral Enhancement
- Habets, E.A.P.¹

26
- 0023773764
- A microphone array with adaptive post-filtering for noise reduction in reverberant rooms
- Apr.
- R. Zelinski, "A microphone array with adaptive post-filtering for noise reduction in reverberant rooms," in Proc. Int. Conf. Acoust., Speech, Signal Process., Apr. 1988, vol. 5, pp. 2578-2581.
- (1988) Proc. Int. Conf. Acoust., Speech, Signal Process , vol.5 , pp. 2578-2581
- Zelinski, R.¹

27
- 7544226792
- Speech enhancement based on the general transfer function GSC and postfiltering
- Nov
- S. Gannot and I. Cohen, "Speech enhancement based on the general transfer function GSC and postfiltering," IEEE Trans. Speech Audio Process., vol. 12, no. 6, pp. 561-571, Nov. 2004.
- (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.6 , pp. 561-571
- Gannot, S.¹ Cohen, I.²

28
- 84873926851
- Robust automatic speech recognition through on-line semi blind source extraction
- F. Nesta and M. Matassoni, "Robust automatic speech recognition through on-line semi blind source extraction," in Proc. Int. Workshop Mach. Listening in Multisource Environ., 2011, pp. 18-23.
- (2011) Proc. Int. Workshop Mach. Listening in Multisource Environ. , pp. 18-23
- Nesta, F.¹ Matassoni, M.²

29
- 0041360463
- Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
- Set
- I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 466-475, Sept. 2003.
- (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.5 , pp. 466-475
- Cohen, I.¹

30
- 0003837293
- Englewood Cliffs, NJ, USA: Prentice-Hall
- S. M. Kay, Fundamentals of Statistical Signal Processing, ser. Signal Processing Series. Englewood Cliffs, NJ, USA: Prentice-Hall, 1993.
- (1993) Fundamentals of Statistical Signal Processing, Ser. Signal Processing Series
- Kay, S.M.¹

31
- 70450180510
- Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement
- R. F. Astudillo, D. Kolossa, and R. Orglmeister, "Accounting for the uncertainty of speech estimates in the complex domain for minimum mean square error speech enhancement," in Proc. Interspeech, 2009.
- (2009) Proc. Interspeech
- Astudillo, R.F.¹ Kolossa, D.² Orglmeister, R.³

32
- 0004236421
- Amsterdam The Netherlands: Elsevier
- I. S. Gradshteyn and I. Ryzhik, Table of Integrals. Amsterdam, The Netherlands: Elsevier, 2007.
- (2007) Table of Integrals
- Gradshteyn, I.S.¹ Ryzhik, I.²

33
- 33947683765
- MMSE speech spectral amplitude estimators with chi and gamma speech priors
- May.
- I. Andrianakis and P. White, "MMSE speech spectral amplitude estimators with chi and gamma speech priors," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP), May 2006, vol. 3, pp. III-1068-III-1071.
- (2006) Proc. IEEE Int. Conf. Acoust., Speech Signal Process. (ICASSP) , vol.3
- Andrianakis, I.¹ White, P.²

34
- 51449104842
- Minimum mean-square error estimation of discrete fourier coefficients with generalized gamma priors
- Aug
- J. Erkelens, R. Hendriks, R. Heusdens, and J. Jensen, "Minimum mean-square error estimation of discrete fourier coefficients with generalized gamma priors," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.6 , pp. 1741-1752
- Erkelens, J.¹ Hendriks, R.² Heusdens, R.³ Jensen, J.⁴

35
- 0141957802
- Efficient alternatives to the ephraim and malah suppression rule for audio signal enhancement
- P. J. Wolfe and S. J. Godsill, "Efficient alternatives to the ephraim and malah suppression rule for audio signal enhancement," EURASIP J. Appl. Signal Process., vol. 10, pp. 1043-1051, 2003.
- (2003) EURASIP J. Appl. Signal Process. , vol.10 , pp. 1043-1051
- Wolfe, P.J.¹ Godsill, S.J.²

36
- 79959819066
- Ph.D. dissertation, Technische Univ. Berlin, Berlin, Germany
- R. F. Astudillo, "Integration of short-time Fourier domain speech enhancement and observation uncertainty techniques for robust automatic speech recognition," Ph.D. dissertation, Technische Univ. Berlin, Berlin, Germany, 2010.
- (2010) Integration of Short-time Fourier Domain Speech Enhancement and Observation Uncertainty Techniques for Robust Automatic Speech Recognition
- Astudillo, R.F.¹

37
- 51749122132
- Propagation of statistical information through non-linear feature extractions for robust speech recognition
- R. F. Astudillo, D. Kolossa, and R. Orglmeister, "Propagation of statistical information through non-linear feature extractions for robust speech recognition," in Proc. MaxEnt2007, 2007.
- (2007) Proc. MaxEnt2007
- Astudillo, R.F.¹ Kolossa, D.² Orglmeister, R.³

38
- 84872036128
- Uncertainty propagation for speech recognition using rasta features in highly nonstationary noisy environments
- R. F. Astudillo, D. Kolossa, and R. Orglmeister, "Uncertainty propagation for speech recognition using rasta features in highly nonstationary noisy environments," in Proc. ITG Workshop for Speech Commun., 2008.
- (2008) Proc. ITG Workshop for Speech Commun.
- Astudillo, R.F.¹ Kolossa, D.² Orglmeister, R.³

39
- 84865725710
- Propagation of uncertainty through multilayer perceptrons for robust automatic speech recognition
- R. F. Astudillo and J. P. Neto, "Propagation of uncertainty through multilayer perceptrons for robust automatic speech recognition," in Proc. Interspeech, 2011, pp. 461-464.
- (2011) Proc. Interspeech , pp. 461-464
- Astudillo, R.F.¹ Neto, J.P.²

40
- 84873928960
- Some applications of dirac's delta function in statistics for more than one random variable
- S. Chakraborty, "Some applications of dirac's delta function in statistics for more than one random variable," Applicat. Appl. Math., vol. 3, no. 1, pp. 42-54, 2008.
- (2008) Applicat. Appl. Math. , vol.3 , Issue.1 , pp. 42-54
- Chakraborty, S.¹

41
- 0003712010
- Oxford U.K. Tech. Rep
- S. Julier and J. Uhlmann, A general method for approximating nonlinear transformations of probability distributions Univ. of Oxford, Oxford, U.K., 1996, Tech. Rep.
- (1996) A General Method for Approximating Nonlinear Transformations of Probability Distributions Univ. of Oxford
- Julier, S.¹ Uhlmann, J.²

42
- 84887135566
- The sum of log-normal probability distributions in scattered transmission systems
- L. Fenton, "The sum of log-normal probability distributions in scattered transmission systems," IRE Trans. Commun. Syst., vol. 8, pp. 57-67, 1960.
- (1960) IRE Trans. Commun. Syst. , vol.8 , pp. 57-67
- Fenton, L.¹

43
- 85009074657
- Iterating Laplaces method to remove multiple types of acoustic distortion for robust speech recognition
- Sep.
- B. Frey, L. Deng, A. Acero, and T. T. Kristjansson, "Iterating Laplaces method to remove multiple types of acoustic distortion for robust speech recognition," in Proc. Eurospeech, Aalborg, Denmark, Sep. 2001.
- (2001) Proc. Eurospeech, Aalborg, Denmark
- Frey, B.¹ Deng, L.² Acero, A.³ Kristjansson, T.T.⁴

44
- 33745202806
- Joint uncertainty decoding for noise robust speech recognition
- H. Liao and M. J. F. Gales, "Joint uncertainty decoding for noise robust speech recognition," in Proc. Interspeech, 2005, pp. 3129-3132.
- (2005) Proc. Interspeech , pp. 3129-3132
- Liao, H.¹ Gales, M.J.F.²

45
- 85032752225
- Missing-feature approaches in speech recognition
- Se
- B. Raj and R. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol. 22, no. 5, pp. 101-116, Sep. 2005.
- (2005) IEEE Signal Process. Mag. , vol.22 , Issue.5 , pp. 101-116
- Raj, B.¹ Stern, R.²

46
- 33646677283
- Niederrhein Univ. of Appl. Sci., Nov.
- G. Hirsch, Experimental framework for the performance evaluation of speech recognition front-ends on a large vocabulary task Germany, Niederrhein Univ. of Appl. Sci., Nov. 2002.
- (2002) Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on A Large Vocabulary Task Germany
- Hirsch, G.¹

47
- 0003571976
- Cambridge, U.K.: Cambridge Univ. Engineering Department.
- S. Young, The HTK Book (for HTK Version 3.4). Cambridge, U.K.: Cambridge Univ. Engineering Department., 2006.
- (2006) The HTK Book (For HTK Version 3.4)
- Young, S.¹

48
- 84873907572
- K. Vertanen, HTK Wall Street Journal Training recipe, 2006 [Online]. Available: http://www.inference.phy.cam.ac.uk/kv227/htk/
- (2006) HTK Wall Street Journal Training Recipe
- Vertanen, K.¹

49
- 84873895491
- R. F. Astudillo, Aurora4 benchmark for stft-up, 2013 [Online]. Available: http://www.astudillo.com/ramon/research/stft-up/
- (2013) Aurora4 Benchmark for Stft-up
- Astudillo, R.F.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.