SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 22, Issue 4, 2014, Pages 826-835

Investigation of speech separation as a front-end for noise robust speech recognition

(2) Narayanan, Arun a Wang, DeLiang a

a The Ohio State University (United States)

Author keywords

Aurora 4; Deep neural networks; Feature mapping; Robust ASR; Time frequency masking

Indexed keywords

AURORA-4; DEEP NEURAL NETWORKS; FEATURE MAPPING; ROBUST ASR; TIME-FREQUENCY MASKING;

FUNCTIONS; NEURAL NETWORKS; SPEECH RECOGNITION; ADDITIVE NOISE; SEPARATION; SPEECH; SPEECH ANALYSIS;

SPEECH ANALYSIS; SPEECH RECOGNITION;

EID: 84898075497 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2014.2305833 Document Type: Article

Times cited : (126)

References (48)

1
- 84873901811
- Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models
- May
- R. Astudillo and R. Orglmeister, "Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1023-1034, May 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.5 , pp. 1023-1034
- Astudillo, R.¹ Orglmeister, R.²

2
- 85162035281
- The tradeoffs of large-scale learning
- L. Bottou and O. Bousquet, "The tradeoffs of large-scale learning," Adv. Neural Inf. Process. Syst. 20, pp. 161-168, 2008.
- (2008) Adv. Neural Inf. Process. Syst. , vol.20 , pp. 161-168
- Bottou, L.¹ Bousquet, O.²

3
- 42549139762
- MVA processing of speech features
- Jan.
- C.-P. Chen and J. A. Bilmes, "MVA processing of speech features," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 257-270, Jan. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 257-270
- Chen, C.-P.¹ Bilmes, J.A.²

4
- 84890527827
- Improving deep neural networks for LVCSR using rectified linear units and dropout
- G. E. Dahl, T. N. Sainath, and G. Hinton, "Improving deep neural networks for LVCSR using rectified linear units and dropout," in Proc. IEEE ICASSP, 2013, pp. 8609-8613.
- Proc. IEEE ICASSP, 2013 , pp. 8609-8613
- Dahl, G.E.¹ Sainath, T.N.² Hinton, G.³

5
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- Mar.
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Mar. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

6
- 84906222220
- Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
- M. Delcroix, Y. Kubo, T. Nakatani, and A. Nakamura, "Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?," in Proc. Interspeech, 2013, pp. 2992-2996.
- Proc. Interspeech, 2013 , pp. 2992-2996
- Delcroix, M.¹ Kubo, Y.² Nakatani, T.³ Nakamura, A.⁴

7
- 0034855352
- High-performance robust speech recognition using stereo training data
- L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "High-performance robust speech recognition using stereo training data," in Proc. IEEE ICASSP, 2001, pp. 301-304.
- Proc. IEEE ICASSP, 2001 , pp. 301-304
- Deng, L.¹ Acero, A.² Jiang, L.³ Droppo, J.⁴ Huang, X.⁵

8
- 18744401086
- Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
- May
- L. Deng, J. Droppo, and A. Acero, "Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion," IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 412-421, May 2005.
- (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.3 , pp. 412-421
- Deng, L.¹ Droppo, J.² Acero, A.³

9
- 84886120743
- Feature compensation
- T. Virtanen, B. Raj, and R. Singh, Eds. West Sussex, U.K.: Wiley, ch. 9
- J. Droppo, "Feature compensation," in Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, B. Raj, and R. Singh, Eds. West Sussex, U.K.: Wiley, 2012, ch. 9, pp. 229-250.
- (2012) Techniques for Noise Robustness in Automatic Speech Recognition , pp. 229-250
- Droppo, J.¹

10
- 78049390326
- HMM-based pseudo-clean speech synthesis for splice algorithm
- J. Du, Y. Hu, L.-R. Dai, and R.-H. Wang, "HMM-based pseudo-clean speech synthesis for splice algorithm," in Proc. IEEE ICASSP, 2010, pp. 4570-4573.
- Proc. IEEE ICASSP, 2010 , pp. 4570-4573
- Du, J.¹ Hu, Y.² Dai, L.-R.³ Wang, R.-H.⁴

11
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," J. Mach. Learn. Res., vol. 12, pp. 2121-2159, 2010.
- (2010) J. Mach. Learn. Res. , vol.12 , pp. 2121-2159
- Duchi, J.¹ Hazan, E.² Singer, Y.³

12
- 36049044257
- [Online]. Available
- D. P. W. Ellis, "PLP and RASTA (and MFCC, and inversion) in Matlab," 2005 [Online]. Available: http://www.ee.columbia.edu/dpwe/ resources/matlab/rastamat/
- (2005) PLP and RASTA (and MFCC, and Inversion) in Matlab
- Ellis, D.P.W.¹

13
- 0442317754
- ES 202 050 V1.1.4, ETSI
- Speech processing transmission and quality aspects (STQ); Distributed speech recognition; Advanced front-end feature extraction algorithm; Compression algorithms, ES 202 050 V1.1.4, ETSI, 2005.
- (2005) Speech Processing Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Advanced Front-end Feature Extraction Algorithm; Compression Algorithms

14
- 77949378972
- Discriminative adaptive training with VTS and JUD
- F. Flego and M. J. F. Gales, "Discriminative adaptive training with VTS and JUD," in Proc. IEEE ASRU, 2009, pp. 170-175.
- Proc. IEEE ASRU, 2009 , pp. 170-175
- Flego, F.¹ Gales, M.J.F.²

15
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
- (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

16
- 84893710272
- Maxout networks
- I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio, "Maxout networks," J. Mach. Learn. Res. Workshop Conf. Proc., vol. 28, no. 3, pp. 1319-1327, 2013.
- (2013) J. Mach. Learn. Res. Workshop Conf. Proc. , vol.28 , Issue.3 , pp. 1319-1327
- Goodfellow, I.J.¹ Warde-Farley, D.² Mirza, M.³ Courville, A.⁴ Bengio, Y.⁵

17
- 84869105129
- A classification based approach to speech segregation
- K. Han and D. L. Wang, "A classification based approach to speech segregation," J. Acoust. Soc. Amer., vol. 132, no. 5, pp. 3475-3483, 2012.
- (2012) J. Acoust. Soc. Amer. , vol.132 , Issue.5 , pp. 3475-3483
- Han, K.¹ Wang, D.L.²

18
- 84881088302
- A direct masking approach to robust ASR
- Oct.
- W. Hartmann, A. Narayanan, E. Fosler-Lussier, and D. L. Wang, "A direct masking approach to robust ASR," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 1993-2005, Oct. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.10 , pp. 1993-2005
- Hartmann, W.¹ Narayanan, A.² Fosler-Lussier, E.³ Wang, D.L.⁴

19
- 0028517164
- RASTA processing of speech
- Oct.
- H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 578-589, Oct. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

20
- 84867720412
- arXiv preprint arXiv:1207.0580
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.
- (2012) Improving Neural Networks by Preventing Co-adaptation of Feature Detectors
- Hinton, G.E.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

21
- 33745805403
- A fast learning algorithm for deep belief nets
- G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
- (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.¹ Osindero, S.² Teh, Y.³

22
- 70349093614
- An algorithm that improves speech intelligibility in noise for normal-hearing listeners
- G. Kim, Y. Lu, Y. Hu, and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Amer., vol. 126, no. 3, pp. 1486-1494, 2009.
- (2009) J. Acoust. Soc. Amer. , vol.126 , Issue.3 , pp. 1486-1494
- Kim, G.¹ Lu, Y.² Hu, Y.³ Loizou, P.⁴

23
- 78649325568
- Mask classifcation for missing-feature reconstruction for robust speech recognition in unknown background noise
- W. Kim and R. Stern, "Mask classifcation for missing-feature reconstruction for robust speech recognition in unknown background noise," Speech Commun., vol. 53, pp. 1-11, 2011.
- (2011) Speech Commun. , vol.53 , pp. 1-11
- Kim, W.¹ Stern, R.²

24
- 84878919540
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Adv. Neural Inf. Process. Syst., vol. 25, pp. 1106-1114, 2012.
- (2012) Adv. Neural Inf. Process. Syst. , vol.25 , pp. 1106-1114
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

25
- 84878409063
- Recurrent neural networks for noise reduction in robust ASR
- A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent neural networks for noise reduction in robust ASR," in Proc. Interspeech, 2012.
- Proc. Interspeech, 2012
- Maas, A.L.¹ Le, Q.V.² O'Neil, T.M.³ Vinyals, O.⁴ Nguyen, P.⁵ Ng, A.Y.⁶

26
- 84055211743
- Acoustic modeling using deep belief networks
- Jan.
- A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-22, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

27
- 77956509090
- Rectified linear units improve restricted Boltzmann machines
- V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," in Proc. ICML 27, 2010, pp. 807-814.
- Proc. ICML 27, 2010 , pp. 807-814
- Nair, V.¹ Hinton, G.E.²

28
- 84890475416
- Coupling binary masking and robust ASR
- A. Narayanan and D. Wang, "Coupling binary masking and robust ASR," in Proc. IEEE ICASSP, 2013, pp. 6817-6821.
- Proc. IEEE ICASSP, 2013 , pp. 6817-6821
- Narayanan, A.¹ Wang, D.²

29
- 84890493989
- Ideal ratio mask estimation using deep neural networks for robust speech recognition
- A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural networks for robust speech recognition," in Proc. IEEE ICASSP, 2013, pp. 7092-7096.
- Proc. IEEE ICASSP, 2013 , pp. 7092-7096
- Narayanan, A.¹ Wang, D.²

30
- 85009227702
- Analysis of the Aurora large vocabulary evaluations
- N. Parihar and J. Picone, "Analysis of the Aurora large vocabulary evaluations," in Proc. Eurospeech, 2003, pp. 337-340.
- Proc. Eurospeech, 2003 , pp. 337-340
- Parihar, N.¹ Picone, J.²

31
- 84890448307
- An evaluation of posterior modeling techniques for phonetic recognition
- R. Prabhavalkar, T. N. Sainath, D. Nahamoo, B. Ramabhadran, and D. Kanevsky, "An evaluation of posterior modeling techniques for phonetic recognition," in Proc. IEEE ICASSP, 2013, pp. 7165-7169.
- Proc. IEEE ICASSP, 2013 , pp. 7165-7169
- Prabhavalkar, R.¹ Sainath, T.N.² Nahamoo, D.³ Ramabhadran, B.⁴ Kanevsky, D.⁵

32
- 85032752225
- Missing-feature approaches in speech recognition
- B. Raj and R. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol. 22, no. 5, pp. 101-116, 2005.
- (2005) IEEE Signal Process. Mag. , vol.22 , Issue.5 , pp. 101-116
- Raj, B.¹ Stern, R.²

33
- 0142026377
- Speech segregation based on sound localization
- N. Roman, D. L. Wang, and G. J. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, no. 4, pp. 2236-2252, 2003.
- (2003) J. Acoust. Soc. Amer. , vol.114 , Issue.4 , pp. 2236-2252
- Roman, N.¹ Wang, D.L.² Brown, G.J.³

34
- 82255167374
- Intelligibility of reverberant noisy speech with ideal binary masking
- N. Roman and J. Woodruff, "Intelligibility of reverberant noisy speech with ideal binary masking," J. Acoust. Soc. Amer., vol. 130, no. 4, pp. 2153-2161, 2011.
- (2011) J. Acoust. Soc. Amer. , vol.130 , Issue.4 , pp. 2153-2161
- Roman, N.¹ Woodruff, J.²

35
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. IEEE ASRU, 2011, pp. 24-29.
- Proc. IEEE ASRU, 2011 , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

36
- 4644317224
- A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
- M. L. Seltzer, B. Raj, and R. M. Stern, "A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition," Speech Commun., vol. 43, no. 4, pp. 379-393, 2004.
- (2004) Speech Commun. , vol.43 , Issue.4 , pp. 379-393
- Seltzer, M.L.¹ Raj, B.² Stern, R.M.³

37
- 84890492030
- An investigation of deep neural networks for noise robust speech recognition
- M. L. Seltzer, D. Yu, and Y.-Q. Wang, "An investigation of deep neural networks for noise robust speech recognition," in Proc. IEEE ICASSP, 2013, pp. 7398-7402.
- Proc. IEEE ICASSP, 2013 , pp. 7398-7402
- Seltzer, M.L.¹ Yu, D.² Wang, Y.-Q.³

38
- 33750311718
- Binary and ratio time-frequency masks for robust speech recognition
- S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, pp. 1486-1501, 2006.
- (2006) Speech Commun. , vol.48 , pp. 1486-1501
- Srinivasan, S.¹ Roman, N.² Wang, D.L.³

39
- 84898081070
- [Online]. Available
- K. Vertanen, "HTKWall Street Journal training recipe," 2005 [Online]. Available: http://www.keithv.com/software/htk/
- (2005) HTKWall Street Journal Training Recipe
- Vertanen, K.¹

40
- 84892233308
- On ideal binary masks as the computational goal of auditory scene analysis
- P. Divenyi, Ed. Boston, MA, USA: Kluwer
- D. L. Wang, "On ideal binary masks as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Boston, MA, USA: Kluwer, 2005, pp. 181-197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

41
- 64649103540
- Speech intelligibility in background noise with ideal binary time-frequency masking
- D. L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary time-frequency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
- (2009) J. Acoust. Soc. Amer. , vol.125 , pp. 2336-2347
- Wang, D.L.¹ Kjems, U.² Pedersen, M.S.³ Boldt, J.B.⁴ Lunner, T.⁵

42
- 84870477511
- Exploring monaural features for classification-based speech segregation
- Y. Wang, K. Han, and D. L. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, pp. 270-279, 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , pp. 270-279
- Wang, Y.¹ Han, K.² Wang, D.L.³

43
- 84875678689
- Towards scaling up classification-based speech separation
- Jul.
- Y. Wang and D. L. Wang, "Towards scaling up classification-based speech separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381-1390, Jul. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.7 , pp. 1381-1390
- Wang, Y.¹ Wang, D.L.²

44
- 84890523904
- Feature denoising for speech separation in unknown noisy environments
- Y. Wang and D. L. Wang, "Feature denoising for speech separation in unknown noisy environments," in Proc. IEEE ICASSP, 2013, pp. 7472-7476.
- Proc. IEEE ICASSP, 2013 , pp. 7472-7476
- Wang, Y.¹ Wang, D.L.²

45
- 84862293102
- Speaker and noise factorization for robust speech recognition
- Y.-Q. Wang and M. J. F. Gales, "Speaker and noise factorization for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 7, pp. 2149-2158, 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.7 , pp. 2149-2158
- Wang, Y.-Q.¹ Gales, M.J.F.²

46
- 84900537286
- The Munich feature enhancement approach to the 2nd CHiME challenge using BLSTM recurrent neural networks
- F. Weninger, J. Geiger, M. Wöllmer, B. Schuller, and G. Rigoll, "The Munich feature enhancement approach to the 2nd CHiME challenge using BLSTM recurrent neural networks," in Proc. 2nd CHiME Mach. Listen. Multisource Environ. Workshop, 2013, pp. 86-90.
- Proc. 2nd CHiME Mach. Listen. Multisource Environ. Workshop, 2013 , pp. 86-90
- Weninger, F.¹ Geiger, J.² Wöllmer, M.³ Schuller, B.⁴ Rigoll, G.⁵

47
- 0003822743
- Cambridge, U.K.: Cambridge Univ. Press, [Online]. Available
- S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book. Cambridge, U.K.: Cambridge Univ. Press, 2002 [Online]. Available: http://htk.eng.cam.ac.uk.
- (2002) The HTK Book
- Young, S.¹ Evermann, G.² Hain, T.³ Kershaw, D.⁴ Moore, G.⁵ Odell, J.⁶ Ollason, D.⁷ Povey, D.⁸ Valtchev, V.⁹ Woodland, P.¹⁰

48
- 85083953021
- Feature learning in deep neural networks - studies on speech recognition tasks
- D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide, "Feature learning in deep neural networks - studies on speech recognition tasks," in Proc. ICLR, 2013.
- Proc. ICLR, 2013
- Yu, D.¹ Seltzer, M.L.² Li, J.³ Huang, J.-T.⁴ Seide, F.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.