메뉴 건너뛰기




Volumn 22, Issue 6, 2014, Pages 1037-1046

Memory-enhanced neural networks and NMF for robust ASR

Author keywords

Long short term memory; Multi stream recognition; Noise robust speech recognition; Non negative matrix factorization

Indexed keywords

ACOUSTIC NOISE; ARTS COMPUTING; BRAIN; FACTORIZATION; MATRIX ALGEBRA; RECURRENT NEURAL NETWORKS; REVERBERATION; SPEECH; SPEECH ENHANCEMENT;

EID: 84910095643     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2014.2318514     Document Type: Article
Times cited : (30)

References (44)
  • 2
    • 0035396555 scopus 로고    scopus 로고
    • Noise power spectral density estimation based on optimal smoothing and minimum statistics
    • Jul.
    • R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics," IEEE Trans. Speech Audio Process., vol. 9, no. 5, pp. 504-512, Jul. 2001.
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.5 , pp. 504-512
    • Martin, R.1
  • 3
    • 51449100115 scopus 로고    scopus 로고
    • Efficient model-based speech separation and denoising using non-negative subspace analysis
    • S. J. Rennie, J. R. Hershey, and P. A. Olsen, "Efficient model-based speech separation and denoising using non-negative subspace analysis," in Proc. ICASSP, Las Vegas, NV, USA, 2008, pp. 1833-1836.
    • Proc. ICASSP, Las Vegas, NV, USA, 2008 , pp. 1833-1836
    • Rennie, S.J.1    Hershey, J.R.2    Olsen, P.A.3
  • 4
    • 38049021850 scopus 로고    scopus 로고
    • Convolutive speech bases and their application to supervised speech separation
    • Jan.
    • P. Smaragdis, "Convolutive speech bases and their application to supervised speech separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 1-14, Jan. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 1-14
    • Smaragdis, P.1
  • 6
    • 77955673019 scopus 로고    scopus 로고
    • Model-based feature enhancement for reverberant speech recognition
    • Sep.
    • A. Krueger and R. Haeb-Umbach, "Model-based feature enhancement for reverberant speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 7, pp. 1692-1707, Sep. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.7 , pp. 1692-1707
    • Krueger, A.1    Haeb-Umbach, R.2
  • 7
    • 85017287487 scopus 로고    scopus 로고
    • Linear discriminant analysis for improved large vocabulary continuous speech recognition
    • R. Haeb-Umbach and H. Ney, "Linear discriminant analysis for improved large vocabulary continuous speech recognition," in Proc. ICASSP, San Francisco, CA, USA, 1992, pp. 13-16.
    • Proc. ICASSP, San Francisco, CA, USA, 1992 , pp. 13-16
    • Haeb-Umbach, R.1    Ney, H.2
  • 9
    • 0032048385 scopus 로고    scopus 로고
    • Speech recognition in noisy environments using first-order vector Taylor series
    • D. Y. Kim, C. Kwan Un, and N. S. Kim, "Speech recognition in noisy environments using first-order vector Taylor series," Speech Commun., vol. 24, no. 1, pp. 39-49, 1998.
    • (1998) Speech Commun. , vol.24 , Issue.1 , pp. 39-49
    • Kim, D.Y.1    Kwan Un, C.2    Kim, N.S.3
  • 11
    • 84890492030 scopus 로고    scopus 로고
    • An investigation of deep neural networks for noise robust speech recognition
    • M. Seltzer, D. Yu, and Y. Wang, "An investigation of deep neural networks for noise robust speech recognition," in Proc. ICASSP, Vancouver, BC, Canada, 2013, pp. 7398-7402.
    • Proc. ICASSP, Vancouver, BC, Canada, 2013 , pp. 7398-7402
    • Seltzer, M.1    Yu, D.2    Wang, Y.3
  • 13
    • 84890543083 scopus 로고    scopus 로고
    • Speech recognition with deep recurrent neural networks
    • A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in Proc. ICASSP, 2013, pp. 6645-6649.
    • Proc. ICASSP, 2013 , pp. 6645-6649
    • Graves, A.1    Mohamed, A.-R.2    Hinton, G.3
  • 14
    • 0141741840 scopus 로고    scopus 로고
    • Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
    • S. C. Kremer and J. F. Kolen, Eds. Piscataway, NJ, USA: IEEE Press
    • S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber, "Gradient flow in recurrent nets: The difficulty of learning long-term dependencies," in Field Guide to Dynamical Recurrent Networks, S. C. Kremer and J. F. Kolen, Eds. Piscataway, NJ, USA: IEEE Press, 2001.
    • (2001) Field Guide to Dynamical Recurrent Networks
    • Hochreiter, S.1    Bengio, Y.2    Frasconi, P.3    Schmidhuber, J.4
  • 15
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 18
    • 85032752364 scopus 로고    scopus 로고
    • Graphical model architectures for speech recognition
    • Sep.
    • J. A. Bilmes and C. Bartels, "Graphical model architectures for speech recognition," IEEE Signal Process. Mag., vol. 22, no. 5, pp. 89-100, Sep. 2005.
    • (2005) IEEE Signal Process. Mag. , vol.22 , Issue.5 , pp. 89-100
    • Bilmes, J.A.1    Bartels, C.2
  • 19
    • 9644308136 scopus 로고    scopus 로고
    • Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR
    • A. Hagen and A. Morris, "Recent advances in the multi-stream HMM/ANN hybrid approach to noise robust ASR," Comput. Speech Lang., vol. 19, no. 1, pp. 3-30, 2005.
    • (2005) Comput. Speech Lang. , vol.19 , Issue.1 , pp. 3-30
    • Hagen, A.1    Morris, A.2
  • 21
    • 84878543263 scopus 로고    scopus 로고
    • The PASCAL CHiME speech separation and recognition challenge
    • J. P. Barker, E. Vincent, N. Ma, H. Christensen, and P. D. Green, "The PASCAL CHiME speech separation and recognition challenge," Comput. Speech Lang., vol. 27, no. 3, pp. 621-633, 2013.
    • (2013) Comput. Speech Lang. , vol.27 , Issue.3 , pp. 621-633
    • Barker, J.P.1    Vincent, E.2    Ma, N.3    Christensen, H.4    Green, P.D.5
  • 25
    • 79960657803 scopus 로고    scopus 로고
    • Exemplar-based sparse representations for noise robust automatic speech recognition
    • Sep.
    • J. Gemmeke, T. Virtanen, and A. Hurmalainen, "Exemplar-based sparse representations for noise robust automatic speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2067-2080, Sep. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.7 , pp. 2067-2080
    • Gemmeke, J.1    Virtanen, T.2    Hurmalainen, A.3
  • 26
    • 84906222220 scopus 로고    scopus 로고
    • Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?
    • M. Delcroix, Y. Kubo, T. Nakatani, and A. Nakamura, "Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?," in Proc. Interspeech, Lyon, France, 2013, pp. 2992-2996.
    • Proc. Interspeech, Lyon, France, 2013 , pp. 2992-2996
    • Delcroix, M.1    Kubo, Y.2    Nakatani, T.3    Nakamura, A.4
  • 27
    • 84890503970 scopus 로고    scopus 로고
    • Effectiveness of discriminative training and feature transformation for reverberated and noisy speech
    • Y. Tachioka, S. Watanabe, and J. R. Hershey, "Effectiveness of discriminative training and feature transformation for reverberated and noisy speech," in Proc. ICASSP, Vancouver, BC, Canada, 2013, pp. 6935-6939.
    • Proc. ICASSP, Vancouver, BC, Canada, 2013 , pp. 6935-6939
    • Tachioka, Y.1    Watanabe, S.2    Hershey, J.R.3
  • 30
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M. J. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
    • (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.J.1
  • 33
  • 34
    • 84878390904 scopus 로고    scopus 로고
    • Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise
    • F. Weninger, M. Wöllmer, and B. Schuller, "Combining Bottleneck-BLSTM and Semi-Supervised Sparse NMF for Recognition of Conversational Speech in Highly Instationary Noise," in Proc. Interspeech, Portland, OR, USA, 2012, pp. 302-305.
    • Proc. Interspeech, Portland, OR, USA, 2012 , pp. 302-305
    • Weninger, F.1    Wöllmer, M.2    Schuller, B.3
  • 35
    • 0031268931 scopus 로고    scopus 로고
    • Bidirectional recurrent neural networks
    • Nov.
    • M. Schuster and K. K. Paliwal, "Bidirectional recurrent neural networks," IEEE Trans. Signal Process., vol. 45, no. 11, pp. 2673-2681, Nov. 1997.
    • (1997) IEEE Trans. Signal Process. , vol.45 , Issue.11 , pp. 2673-2681
    • Schuster, M.1    Paliwal, K.K.2
  • 36
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    • A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Netw., vol. 18, no. 5-6, pp. 602-610, 2005.
    • (2005) Neural Netw. , vol.18 , Issue.5-6 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 42
    • 84886818613 scopus 로고    scopus 로고
    • Active-set Newton algorithm for overcomplete non-negative representations of audio
    • Nov.
    • T. Virtanen, J. Gemmeke, and B. Raj, "Active-set Newton algorithm for overcomplete non-negative representations of audio," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 11, pp. 2277-2289, Nov. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.11 , pp. 2277-2289
    • Virtanen, T.1    Gemmeke, J.2    Raj, B.3
  • 43
    • 84893685019 scopus 로고    scopus 로고
    • A flexible spatial blind source extraction framework for robust speech recognition in noisy environments
    • F. Nesta, M. Matassoni, and R. F. Astudillo, "A flexible spatial blind source extraction framework for robust speech recognition in noisy environments," in Proc. CHiME Workshop, Vancouver, BC, Canada, 2013, pp. 33-38.
    • Proc. CHiME Workshop, Vancouver, BC, Canada, 2013 , pp. 33-38
    • Nesta, F.1    Matassoni, M.2    Astudillo, R.F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.