메뉴 건너뛰기




Volumn 22, Issue 4, 2014, Pages 826-835

Investigation of speech separation as a front-end for noise robust speech recognition

Author keywords

Aurora 4; Deep neural networks; Feature mapping; Robust ASR; Time frequency masking

Indexed keywords

AURORA-4; DEEP NEURAL NETWORKS; FEATURE MAPPING; ROBUST ASR; TIME-FREQUENCY MASKING;

EID: 84898075497     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2014.2305833     Document Type: Article
Times cited : (126)

References (48)
  • 1
    • 84873901811 scopus 로고    scopus 로고
    • Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models
    • May
    • R. Astudillo and R. Orglmeister, "Computing MMSE estimates and residual uncertainty directly in the feature domain of ASR using STFT domain speech distortion models," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1023-1034, May 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.5 , pp. 1023-1034
    • Astudillo, R.1    Orglmeister, R.2
  • 4
    • 84890527827 scopus 로고    scopus 로고
    • Improving deep neural networks for LVCSR using rectified linear units and dropout
    • G. E. Dahl, T. N. Sainath, and G. Hinton, "Improving deep neural networks for LVCSR using rectified linear units and dropout," in Proc. IEEE ICASSP, 2013, pp. 8609-8613.
    • Proc. IEEE ICASSP, 2013 , pp. 8609-8613
    • Dahl, G.E.1    Sainath, T.N.2    Hinton, G.3
  • 5
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • Mar.
    • G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Mar. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 6
    • 84906222220 scopus 로고    scopus 로고
    • Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
    • M. Delcroix, Y. Kubo, T. Nakatani, and A. Nakamura, "Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?," in Proc. Interspeech, 2013, pp. 2992-2996.
    • Proc. Interspeech, 2013 , pp. 2992-2996
    • Delcroix, M.1    Kubo, Y.2    Nakatani, T.3    Nakamura, A.4
  • 8
    • 18744401086 scopus 로고    scopus 로고
    • Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
    • May
    • L. Deng, J. Droppo, and A. Acero, "Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion," IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 412-421, May 2005.
    • (2005) IEEE Trans. Speech Audio Process. , vol.13 , Issue.3 , pp. 412-421
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 9
    • 84886120743 scopus 로고    scopus 로고
    • Feature compensation
    • T. Virtanen, B. Raj, and R. Singh, Eds. West Sussex, U.K.: Wiley, ch. 9
    • J. Droppo, "Feature compensation," in Techniques for Noise Robustness in Automatic Speech Recognition, T. Virtanen, B. Raj, and R. Singh, Eds. West Sussex, U.K.: Wiley, 2012, ch. 9, pp. 229-250.
    • (2012) Techniques for Noise Robustness in Automatic Speech Recognition , pp. 229-250
    • Droppo, J.1
  • 10
    • 78049390326 scopus 로고    scopus 로고
    • HMM-based pseudo-clean speech synthesis for splice algorithm
    • J. Du, Y. Hu, L.-R. Dai, and R.-H. Wang, "HMM-based pseudo-clean speech synthesis for splice algorithm," in Proc. IEEE ICASSP, 2010, pp. 4570-4573.
    • Proc. IEEE ICASSP, 2010 , pp. 4570-4573
    • Du, J.1    Hu, Y.2    Dai, L.-R.3    Wang, R.-H.4
  • 11
    • 80052250414 scopus 로고    scopus 로고
    • Adaptive subgradient methods for online learning and stochastic optimization
    • J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," J. Mach. Learn. Res., vol. 12, pp. 2121-2159, 2010.
    • (2010) J. Mach. Learn. Res. , vol.12 , pp. 2121-2159
    • Duchi, J.1    Hazan, E.2    Singer, Y.3
  • 14
    • 77949378972 scopus 로고    scopus 로고
    • Discriminative adaptive training with VTS and JUD
    • F. Flego and M. J. F. Gales, "Discriminative adaptive training with VTS and JUD," in Proc. IEEE ASRU, 2009, pp. 170-175.
    • Proc. IEEE ASRU, 2009 , pp. 170-175
    • Flego, F.1    Gales, M.J.F.2
  • 15
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
    • (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.J.F.1
  • 17
    • 84869105129 scopus 로고    scopus 로고
    • A classification based approach to speech segregation
    • K. Han and D. L. Wang, "A classification based approach to speech segregation," J. Acoust. Soc. Amer., vol. 132, no. 5, pp. 3475-3483, 2012.
    • (2012) J. Acoust. Soc. Amer. , vol.132 , Issue.5 , pp. 3475-3483
    • Han, K.1    Wang, D.L.2
  • 21
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
    • (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.1    Osindero, S.2    Teh, Y.3
  • 22
    • 70349093614 scopus 로고    scopus 로고
    • An algorithm that improves speech intelligibility in noise for normal-hearing listeners
    • G. Kim, Y. Lu, Y. Hu, and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Amer., vol. 126, no. 3, pp. 1486-1494, 2009.
    • (2009) J. Acoust. Soc. Amer. , vol.126 , Issue.3 , pp. 1486-1494
    • Kim, G.1    Lu, Y.2    Hu, Y.3    Loizou, P.4
  • 23
    • 78649325568 scopus 로고    scopus 로고
    • Mask classifcation for missing-feature reconstruction for robust speech recognition in unknown background noise
    • W. Kim and R. Stern, "Mask classifcation for missing-feature reconstruction for robust speech recognition in unknown background noise," Speech Commun., vol. 53, pp. 1-11, 2011.
    • (2011) Speech Commun. , vol.53 , pp. 1-11
    • Kim, W.1    Stern, R.2
  • 24
    • 84878919540 scopus 로고    scopus 로고
    • Imagenet classification with deep convolutional neural networks
    • A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," Adv. Neural Inf. Process. Syst., vol. 25, pp. 1106-1114, 2012.
    • (2012) Adv. Neural Inf. Process. Syst. , vol.25 , pp. 1106-1114
    • Krizhevsky, A.1    Sutskever, I.2    Hinton, G.E.3
  • 27
    • 77956509090 scopus 로고    scopus 로고
    • Rectified linear units improve restricted Boltzmann machines
    • V. Nair and G. E. Hinton, "Rectified linear units improve restricted Boltzmann machines," in Proc. ICML 27, 2010, pp. 807-814.
    • Proc. ICML 27, 2010 , pp. 807-814
    • Nair, V.1    Hinton, G.E.2
  • 28
  • 29
    • 84890493989 scopus 로고    scopus 로고
    • Ideal ratio mask estimation using deep neural networks for robust speech recognition
    • A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural networks for robust speech recognition," in Proc. IEEE ICASSP, 2013, pp. 7092-7096.
    • Proc. IEEE ICASSP, 2013 , pp. 7092-7096
    • Narayanan, A.1    Wang, D.2
  • 30
    • 85009227702 scopus 로고    scopus 로고
    • Analysis of the Aurora large vocabulary evaluations
    • N. Parihar and J. Picone, "Analysis of the Aurora large vocabulary evaluations," in Proc. Eurospeech, 2003, pp. 337-340.
    • Proc. Eurospeech, 2003 , pp. 337-340
    • Parihar, N.1    Picone, J.2
  • 32
    • 85032752225 scopus 로고    scopus 로고
    • Missing-feature approaches in speech recognition
    • B. Raj and R. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol. 22, no. 5, pp. 101-116, 2005.
    • (2005) IEEE Signal Process. Mag. , vol.22 , Issue.5 , pp. 101-116
    • Raj, B.1    Stern, R.2
  • 33
    • 0142026377 scopus 로고    scopus 로고
    • Speech segregation based on sound localization
    • N. Roman, D. L. Wang, and G. J. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, no. 4, pp. 2236-2252, 2003.
    • (2003) J. Acoust. Soc. Amer. , vol.114 , Issue.4 , pp. 2236-2252
    • Roman, N.1    Wang, D.L.2    Brown, G.J.3
  • 34
    • 82255167374 scopus 로고    scopus 로고
    • Intelligibility of reverberant noisy speech with ideal binary masking
    • N. Roman and J. Woodruff, "Intelligibility of reverberant noisy speech with ideal binary masking," J. Acoust. Soc. Amer., vol. 130, no. 4, pp. 2153-2161, 2011.
    • (2011) J. Acoust. Soc. Amer. , vol.130 , Issue.4 , pp. 2153-2161
    • Roman, N.1    Woodruff, J.2
  • 35
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineering in context-dependent deep neural networks for conversational speech transcription
    • F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. IEEE ASRU, 2011, pp. 24-29.
    • Proc. IEEE ASRU, 2011 , pp. 24-29
    • Seide, F.1    Li, G.2    Chen, X.3    Yu, D.4
  • 36
    • 4644317224 scopus 로고    scopus 로고
    • A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
    • M. L. Seltzer, B. Raj, and R. M. Stern, "A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition," Speech Commun., vol. 43, no. 4, pp. 379-393, 2004.
    • (2004) Speech Commun. , vol.43 , Issue.4 , pp. 379-393
    • Seltzer, M.L.1    Raj, B.2    Stern, R.M.3
  • 37
    • 84890492030 scopus 로고    scopus 로고
    • An investigation of deep neural networks for noise robust speech recognition
    • M. L. Seltzer, D. Yu, and Y.-Q. Wang, "An investigation of deep neural networks for noise robust speech recognition," in Proc. IEEE ICASSP, 2013, pp. 7398-7402.
    • Proc. IEEE ICASSP, 2013 , pp. 7398-7402
    • Seltzer, M.L.1    Yu, D.2    Wang, Y.-Q.3
  • 38
    • 33750311718 scopus 로고    scopus 로고
    • Binary and ratio time-frequency masks for robust speech recognition
    • S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, pp. 1486-1501, 2006.
    • (2006) Speech Commun. , vol.48 , pp. 1486-1501
    • Srinivasan, S.1    Roman, N.2    Wang, D.L.3
  • 40
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary masks as the computational goal of auditory scene analysis
    • P. Divenyi, Ed. Boston, MA, USA: Kluwer
    • D. L. Wang, "On ideal binary masks as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Boston, MA, USA: Kluwer, 2005, pp. 181-197.
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.L.1
  • 41
    • 64649103540 scopus 로고    scopus 로고
    • Speech intelligibility in background noise with ideal binary time-frequency masking
    • D. L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary time-frequency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
    • (2009) J. Acoust. Soc. Amer. , vol.125 , pp. 2336-2347
    • Wang, D.L.1    Kjems, U.2    Pedersen, M.S.3    Boldt, J.B.4    Lunner, T.5
  • 42
    • 84870477511 scopus 로고    scopus 로고
    • Exploring monaural features for classification-based speech segregation
    • Y. Wang, K. Han, and D. L. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, pp. 270-279, 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , pp. 270-279
    • Wang, Y.1    Han, K.2    Wang, D.L.3
  • 43
    • 84875678689 scopus 로고    scopus 로고
    • Towards scaling up classification-based speech separation
    • Jul.
    • Y. Wang and D. L. Wang, "Towards scaling up classification-based speech separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381-1390, Jul. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.7 , pp. 1381-1390
    • Wang, Y.1    Wang, D.L.2
  • 44
    • 84890523904 scopus 로고    scopus 로고
    • Feature denoising for speech separation in unknown noisy environments
    • Y. Wang and D. L. Wang, "Feature denoising for speech separation in unknown noisy environments," in Proc. IEEE ICASSP, 2013, pp. 7472-7476.
    • Proc. IEEE ICASSP, 2013 , pp. 7472-7476
    • Wang, Y.1    Wang, D.L.2
  • 45
    • 84862293102 scopus 로고    scopus 로고
    • Speaker and noise factorization for robust speech recognition
    • Y.-Q. Wang and M. J. F. Gales, "Speaker and noise factorization for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 7, pp. 2149-2158, 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.7 , pp. 2149-2158
    • Wang, Y.-Q.1    Gales, M.J.F.2
  • 48
    • 85083953021 scopus 로고    scopus 로고
    • Feature learning in deep neural networks - studies on speech recognition tasks
    • D. Yu, M. L. Seltzer, J. Li, J.-T. Huang, and F. Seide, "Feature learning in deep neural networks - studies on speech recognition tasks," in Proc. ICLR, 2013.
    • Proc. ICLR, 2013
    • Yu, D.1    Seltzer, M.L.2    Li, J.3    Huang, J.-T.4    Seide, F.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.