SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2013, Pages 7092-7096

Ideal ratio mask estimation using deep neural networks for robust speech recognition

(2) Narayanan, Arun a Wang, Deliang a,b

a Ohio State University (United States)

b OHIO STATE UNIVERSITY (United States)

Author keywords

Aurora 4; Computational Auditory Scene Analysis; instantaneous SNR; noise robust ASR

Indexed keywords

AURORA-4; COMPUTATIONAL AUDITORY SCENE ANALYSIS; DEEP NEURAL NETWORKS; INSTANTANEOUS SNR; MULTI-CONDITION TRAININGS; NOISE ROBUST ASR; ROBUST AUTOMATIC SPEECH RECOGNITIONS (ASR); ROBUST SPEECH RECOGNITION;

ACOUSTIC NOISE; ALGORITHMS; ESTIMATION; FEATURE EXTRACTION; NEURAL NETWORKS; SIGNAL PROCESSING; SIGNAL TO NOISE RATIO; SPEECH PROCESSING; SPEECH RECOGNITION;

FREQUENCY ESTIMATION;

EID: 84890493989 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2013.6639038 Document Type: Conference Paper

Times cited : (596)

References (25)

1
- 84891583985
- John Wiley &Sons, West Sussex, UK
- T. Virtanen, B. Raj, and R. Singh, Eds., Techniques for Noise Robustness in Automatic Speech Recognition, John Wiley &Sons, West Sussex, UK, 2012
- (2012) Techniques for Noise Robustness in Automatic Speech Recognition
- Virtanen, T.¹ Raj, B.² Singh, R.³

2
- 0028517164
- RASTA processing of speech
- H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE Transactions on Speech and Audio Processing, vol. 2, no. 4, pp. 578-589, 1994
- (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

3
- 0442317754
- ETSI, ES 202 050 V1.1.4
- ETSI, ES 202 050 V1.1.4, "Speech processing transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms," 2005
- (2005) Speech Processing Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithms

4
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Computer speech and language, vol. 12, no. 2, pp. 75-98, 1998
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

5
- 0029725301
- A vector taylor series approach for environment-independent speech recognition
- P. J. Moreno, B. Raj, and R. M. Stern, "A vector taylor series approach for environment-independent speech recognition," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 1996, pp. 733-736
- (1996) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , pp. 733-736
- Moreno, P.J.¹ Raj, B.² Stern, R.M.³

6
- 62249130045
- A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions
- J. Li, L. Deng, D. Yu, Y. Gong, and A. Acero, "A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions," Computer, Speech, and Language, vol. 23, pp. 389-405, 2009
- (2009) Computer, Speech, and Language , vol.23 , pp. 389-405
- Li, J.¹ Deng, L.² Yu, D.³ Gong, Y.⁴ Acero, A.⁵

7
- 85032752225
- Missing-feature approaches in speech recognition
- B. Raj and R. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 101-116, 2005
- (2005) IEEE Signal Processing Magazine , vol.22 , Issue.5 , pp. 101-116
- Raj, B.¹ Stern, R.²

8
- 34447100796
- CRC Press, Boca Raton, Florida
- P. C. Loizou, Speech Enhancement: Theory and Practice, CRC Press, Boca Raton, Florida, 2007
- (2007) Speech Enhancement: Theory and Practice
- Loizou, P.C.¹

9
- 84867584623
- Improvements to VTS feature enhancement
- J. Droppo, L. Deng, and A. Acero, "Improvements to VTS feature enhancement," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, pp. 4677-4680
- (2012) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , pp. 4677-4680
- Droppo, J.¹ Deng, L.² Acero, A.³

10
- 82255178542
- Wiley/ IEEE Press, Hoboken, NJ
- D. L. Wang and G. J. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms, and Applications, Wiley/ IEEE Press, Hoboken, NJ, 2006
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
- Wang, D.L.¹ Brown, G.J.²

11
- 84892233308
- On ideal binary masks as the computational goal of auditory scene analysis
- P. Divenyi, Ed.Kluwer Academic, Boston, MA
- D. L.Wang, "On ideal binary masks as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed., pp. 181-197. Kluwer Academic, Boston, MA, 2005
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

12
- 84877594942
- Tech. Rep. OSU-CISRC-7/11-TR21, Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA
- W. Hartmann, A. Narayanan, E. Fosler-Lussier, and D. L. Wang, "Nothing doing: Re-evaluating missing feature ASR," Tech. Rep. OSU-CISRC-7/11-TR21, Department of Computer Science and Engineering, The Ohio State University, Columbus, Ohio, USA, 2011, Available: ftp://ftp.cse.ohiostate. edu/pub/tech-report/2011
- (2011) Nothing Doing: Re-evaluating Missing Feature ASR
- Hartmann, W.¹ Narayanan, A.² Fosler-Lussier, E.³ Wang, D.L.⁴

13
- 0142026377
- Speech segregation based on sound localization
- N. Roman, D. L. Wang, and G. J. Brown, "Speech segregation based on sound localization," Journal of Acoustical Society of America, vol. 114, no. 4, pp. 2236-2252, 2003
- (2003) Journal of Acoustical Society of America , vol.114 , Issue.4 , pp. 2236-2252
- Roman, N.¹ Wang, D.L.² Brown, G.J.³

14
- 4644317224
- A Bayesian classifer for spectrographic mask estimation for missing feature speech recognition
- M. L. Seltzer, B. Raj, and R. M. Stern, "A Bayesian classifer for spectrographic mask estimation for missing feature speech recognition," Speech Communication, vol. 43, no. 4, pp. 379-393, 2004
- (2004) Speech Communication , vol.43 , Issue.4 , pp. 379-393
- Seltzer, M.L.¹ Raj, B.² Stern, R.M.³

15
- 33750311718
- Binary and ratio time-frequency masks for robust speech recognition
- S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Communication, vol. 48, pp. 1486-1501, 2006
- (2006) Speech Communication , vol.48 , pp. 1486-1501
- Srinivasan, S.¹ Roman, N.² Wang, D.L.³

16
- 85009063707
- Soft decisions in missing data techniques for robust automatic speech recognition
- J. Barker, L. Josifovski, M. P. Cooke, and P. D. Green, "Soft decisions in missing data techniques for robust automatic speech recognition," in Proceedings of the International Conference on Spoken Language Processing, Beijing, China, 2000, pp. 373-376
- (2000) Proceedings of the International Conference on Spoken Language Processing, Beijing, China , pp. 373-376
- Barker, J.¹ Josifovski, L.² Cooke, M.P.³ Green, P.D.⁴

17
- 84867596016
- A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition
- J. van Hout and A. Alwan, "A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition," in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2012, pp. 4105-4108
- (2012) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing , pp. 4105-4108
- Van Hout, J.¹ Alwan, A.²

18
- 0038712550
- SNR estimation based on amplitude modulation analysis with applications to noise suppression
- J. Tchorz and B. Kollmeier, "SNR estimation based on amplitude modulation analysis with applications to noise suppression," IEEE Transactions on Audio, Speech, and Signal Processing, vol. 11, pp. 184-192, 2003
- (2003) IEEE Transactions on Audio, Speech, and Signal Processing , vol.11 , pp. 184-192
- Tchorz, J.¹ Kollmeier, B.²

19
- 64649103540
- Speech intelligibility in background noise with ideal binary time-frequency masking
- D. L.Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary time-frequency masking," Journal of Acoustical Society of America, vol. 125, pp. 2336-2347, 2009
- (2009) Journal of Acoustical Society of America , vol.125 , pp. 2336-2347
- Wang, D.L.¹ Kjems, U.² Pedersen, M.S.³ Boldt, J.B.⁴ Lunner, T.⁵

20
- 84870477511
- Exploring monaural features for classification-based speech segregation
- Y. Wang, K. Han, and D. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, pp. 270-279, 2013
- (2013) IEEE Transactions on Audio, Speech, and Language Processing , vol.21 , pp. 270-279
- Wang, Y.¹ Han, K.² Wang, D.³

21
- 84875678689
- Towards scaling up classificationbased speech separation
- in press
- Y. Wang and D. Wang, "Towards scaling up classificationbased speech separation," IEEE Transactions on Audio, Speech, and Language Processing, 2013, in press
- (2013) IEEE Transactions on Audio, Speech, and Language Processing
- Wang, Y.¹ Wang, D.²

22
- 33745805403
- A fast learning algorithm for deep belief nets
- G.E. Hinton, S. Osindero, and Y.W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.W.³

23
- 85009227702
- Analysis of the Aurora large vocabulary evalutions
- N. Parihar and J. Picone, "Analysis of the Aurora large vocabulary evalutions," in Proceedings of the European Conference on Speech Communication and Technology, 2003, pp. 337-340
- (2003) Proceedings of the European Conference on Speech Communication and Technology , pp. 337-340
- Parihar, N.¹ Picone, J.²

24
- 0003548585
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "DARPA TIMIT acoustic phonetic continuous speech corpus," 1993 [Online]. Available: http://www.ldc.upenn.edu/Catalog/ LDC93S1.html
- (1993) DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus
- Garofolo, J.S.¹ Lamel, L.F.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵ Dahlgren, N.L.⁶

25
- 0003822743
- Cambridge University Publishing Department
- S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book, Cambridge University Publishing Department, 2002, [Online]. Available: http://htk.eng.cam.ac.uk.
- (2002) The HTK Book
- Young, S.¹ Evermann, G.² Hain, T.³ Kershaw, D.⁴ Moore, G.⁵ Odell, J.⁶ Ollason, D.⁷ Povey, D.⁸ Valtchev, V.⁹ Woodland, P.¹⁰

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.