메뉴 건너뛰기




Volumn 27, Issue 3, 2013, Pages 780-797

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

Author keywords

Automatic speech recognition; Long Short Term Memory; Non negative matrix factorization; Tandem feature extraction

Indexed keywords

BRAIN; EXTRACTION; FACTORIZATION; FEATURE EXTRACTION; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MATRIX ALGEBRA; MEMORY ARCHITECTURE; REVERBERATION; SIGNAL TO NOISE RATIO; SOURCE SEPARATION; SPEECH RECOGNITION;

EID: 84883396653     PISSN: 08852308     EISSN: 10958363     Source Type: Journal    
DOI: 10.1016/j.csl.2012.05.002     Document Type: Article
Times cited : (12)

References (45)
  • 2
    • 0028392483 scopus 로고
    • Learning long-term dependencies with gradient descent is difficult
    • Y. Bengio, P. Simard, and P. Frasconi Learning long-term dependencies with gradient descent is difficult IEEE Transactions on Neural Networks 5 2 1994 157 166
    • (1994) IEEE Transactions on Neural Networks , vol.5 , Issue.2 , pp. 157-166
    • Bengio, Y.1    Simard, P.2    Frasconi, P.3
  • 6
    • 0000259511 scopus 로고    scopus 로고
    • Approximate statistical tests for comparing supervised classification learning algorithms
    • T.G. Dietterich Approximate statistical tests for comparing supervised classification learning algorithms Neural Computation 10 1998 1895 1923
    • (1998) Neural Computation , vol.10 , pp. 1895-1923
    • Dietterich, T.G.1
  • 8
    • 38149014113 scopus 로고    scopus 로고
    • An application of recurrent neural networks to discriminative keyword spotting
    • Porto, Portugal
    • S. Fernandez, A. Graves, and J. Schmidhuber An application of recurrent neural networks to discriminative keyword spotting Proc. of ICANN Porto, Portugal 2007 220 229
    • (2007) Proc. of ICANN , pp. 220-229
    • Fernandez, S.1    Graves, A.2    Schmidhuber, J.3
  • 10
    • 84890521030 scopus 로고    scopus 로고
    • Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition
    • Florence, Italy
    • J.F. Gemmeke, T. Virtanen, and A. Hurmalainen Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition Proc. of CHiME Workshop Florence, Italy 2011 53 57
    • (2011) Proc. of CHiME Workshop , pp. 53-57
    • Gemmeke, J.F.1    Virtanen, T.2    Hurmalainen, A.3
  • 11
    • 0034293152 scopus 로고    scopus 로고
    • Learning to forget: continual prediction with LSTM
    • F. Gers, J. Schmidhuber, and F. Cummins Learning to forget: continual prediction with LSTM Neural Computation 12 10 2000 2451 2471
    • (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
    • Gers, F.1    Schmidhuber, J.2    Cummins, F.3
  • 12
    • 33749259827 scopus 로고    scopus 로고
    • Connectionist temporal classification: labelling unsegmented data with recurrent neural networks
    • Pittsburgh, USA
    • A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber Connectionist temporal classification: labelling unsegmented data with recurrent neural networks Proc. of ICML Pittsburgh, USA 2006 369 376
    • (2006) Proc. of ICML , pp. 369-376
    • Graves, A.1    Fernandez, S.2    Gomez, F.3    Schmidhuber, J.4
  • 14
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    • A. Graves, and J. Schmidhuber Framewise phoneme classification with bidirectional LSTM and other neural network architectures Neural Networks 18 5-6 2005 602 610
    • (2005) Neural Networks , vol.18 , Issue.5-6 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 16
    • 84863690059 scopus 로고    scopus 로고
    • Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine
    • Antalya, Turkey
    • M. Helen, and T. Virtanen Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine Proc. of EUSIPCO Antalya, Turkey 2005
    • (2005) Proc. of EUSIPCO
    • Helen, M.1    Virtanen, T.2
  • 17
    • 0033709098 scopus 로고    scopus 로고
    • Tandem connectionist feature extraction for conventional HMM systems
    • Istanbul, Turkey
    • H. Hermansky, D.P.W. Ellis, and S. Sharma Tandem connectionist feature extraction for conventional HMM systems Proc. of ICASSP Istanbul, Turkey 2000 1635 1638
    • (2000) Proc. of ICASSP , pp. 1635-1638
    • Hermansky, H.1    Ellis, D.P.W.2    Sharma, S.3
  • 21
    • 1842436050 scopus 로고    scopus 로고
    • The echo state approach to analyzing and training recurrent neural networks
    • Tech. Rep. German National Research Center for Information Technology, Bremen
    • Jaeger, H., 2001. The echo state approach to analyzing and training recurrent neural networks. Tech. Rep. German National Research Center for Information Technology, Bremen (Tech. Rep. No. 148).
    • (2001) Tech. Rep. No. 148
    • Jaeger, H.1
  • 22
    • 0025254722 scopus 로고
    • A time-delay neural network architecture for isolated word recognition
    • K.J. Lang, A.H. Waibel, and G.E. Hinton A time-delay neural network architecture for isolated word recognition Neural Networks 3 1 1990 23 43
    • (1990) Neural Networks , vol.3 , Issue.1 , pp. 23-43
    • Lang, K.J.1    Waibel, A.H.2    Hinton, G.E.3
  • 25
    • 84865736185 scopus 로고    scopus 로고
    • Phoneme-dependent NMF for speech enhancement in monaural mixtures
    • ISCA, Florence, Italy
    • B. Raj, R. Singh, and T. Virtanen Phoneme-dependent NMF for speech enhancement in monaural mixtures Proc. of Interspeech ISCA, Florence, Italy 2011 1217 1220
    • (2011) Proc. of Interspeech , pp. 1217-1220
    • Raj, B.1    Singh, R.2    Virtanen, T.3
  • 26
    • 79959818117 scopus 로고    scopus 로고
    • Non-negative matrix factorization based compensation of music for automatic speech recognition
    • Makuhari, Japan
    • B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh Non-negative matrix factorization based compensation of music for automatic speech recognition Proc. of Interspeech Makuhari, Japan 2010 717 720
    • (2010) Proc. of Interspeech , pp. 717-720
    • Raj, B.1    Virtanen, T.2    Chaudhuri, S.3    Singh, R.4
  • 27
    • 51449100115 scopus 로고    scopus 로고
    • Efficient model-based speech separation and denoising using non-negative subspace analysis
    • Las Vegas, NV, USA
    • S.J. Rennie, J.R. Hershey, and P.A. Olsen Efficient model-based speech separation and denoising using non-negative subspace analysis Proc. of ICASSP Las Vegas, NV, USA 2008 1833 1836
    • (2008) Proc. of ICASSP , pp. 1833-1836
    • Rennie, S.J.1    Hershey, J.R.2    Olsen, P.A.3
  • 28
    • 56449109755 scopus 로고    scopus 로고
    • Learning long-term dependencies with recurrent neural networks
    • A.M. Schaefer, S. Udluft, and H.G. Zimmermann Learning long-term dependencies with recurrent neural networks Neurocomputing 71 13-15 2008 2481 2488
    • (2008) Neurocomputing , vol.71 , Issue.13-15 , pp. 2481-2488
    • Schaefer, A.M.1    Udluft, S.2    Zimmermann, H.G.3
  • 29
    • 0001033889 scopus 로고
    • Learning complex extended sequences using the principle of history compression
    • J. Schmidhuber Learning complex extended sequences using the principle of history compression Neural Computing 4 2 1992 234 242
    • (1992) Neural Computing , vol.4 , Issue.2 , pp. 234-242
    • Schmidhuber, J.1
  • 30
    • 44949110218 scopus 로고    scopus 로고
    • Single-channel speech separation using sparse non-negative matrix factorization
    • Pittsburgh, PA, USA
    • M.N. Schmidt, and R.K. Olsson Single-channel speech separation using sparse non-negative matrix factorization Proc. of Interspeech Pittsburgh, PA, USA 2006
    • (2006) Proc. of Interspeech
    • Schmidt, M.N.1    Olsson, R.K.2
  • 33
    • 78049383291 scopus 로고    scopus 로고
    • Discovering auditory objects through non-negativity constraints
    • Jeju, Korea
    • P. Smaragdis Discovering auditory objects through non-negativity constraints Proc. of SAPA Jeju, Korea 2004
    • (2004) Proc. of SAPA
    • Smaragdis, P.1
  • 34
    • 38049021850 scopus 로고    scopus 로고
    • Convolutive speech bases and their application to supervised speech separation
    • P. Smaragdis Convolutive speech bases and their application to supervised speech separation IEEE Transactions on Audio, Speech and Language Processing 15 1 2007 1 14
    • (2007) IEEE Transactions on Audio, Speech and Language Processing , vol.15 , Issue.1 , pp. 1-14
    • Smaragdis, P.1
  • 35
    • 67650142420 scopus 로고    scopus 로고
    • A multiplicative algorithm for convolutive non-negative matrix factorization based on squared Euclidean distance
    • W. Wang, A. Cichocki, and J.A. Chambers A multiplicative algorithm for convolutive non-negative matrix factorization based on squared Euclidean distance IEEE Transactions on Signal Processing 57 7 2009 July 2858 2864
    • (2009) IEEE Transactions on Signal Processing , vol.57 , Issue.7 , pp. 2858-2864
    • Wang, W.1    Cichocki, A.2    Chambers, J.A.3
  • 37
    • 80051618211 scopus 로고    scopus 로고
    • openBliSSART: design and evaluation of a research toolkit for blind source separation in audio recognition tasks
    • Prague, Czech Republic
    • F. Weninger, A. Lehmann, and B. Schuller openBliSSART: design and evaluation of a research toolkit for blind source separation in audio recognition tasks Proc. of ICASSP Prague, Czech Republic 2011 1625 1628
    • (2011) Proc. of ICASSP , pp. 1625-1628
    • Weninger, F.1    Lehmann, A.2    Schuller, B.3
  • 39
    • 51449092704 scopus 로고    scopus 로고
    • Speech denoising using nonnegative matrix factorization with priors
    • Las Vegas, NV, USA
    • K.W. Wilson, B. Raj, P. Smaragdis, and A. Divakaran Speech denoising using nonnegative matrix factorization with priors Proc. of ICASSP Las Vegas, NV, USA 2008 4029 4032
    • (2008) Proc. of ICASSP , pp. 4029-4032
    • Wilson, K.W.1    Raj, B.2    Smaragdis, P.3    Divakaran, A.4
  • 41
    • 78651563436 scopus 로고    scopus 로고
    • Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework
    • M. Wöllmer, F. Eyben, A. Graves, B. Schuller, and G. Rigoll Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework Cognitive Computation 2 3 2010 180 190
    • (2010) Cognitive Computation , vol.2 , Issue.3 , pp. 180-190
    • Wöllmer, M.1    Eyben, F.2    Graves, A.3    Schuller, B.4    Rigoll, G.5
  • 42
    • 80051637579 scopus 로고    scopus 로고
    • A multi-stream ASR framework for BLSTM modeling of conversational speech
    • Prague, Czech Republic
    • M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll A multi-stream ASR framework for BLSTM modeling of conversational speech Proc. of ICASSP Prague, Czech Republic 2011 4860 4863
    • (2011) Proc. of ICASSP , pp. 4860-4863
    • Wöllmer, M.1    Eyben, F.2    Schuller, B.3    Rigoll, G.4
  • 43
    • 81155123235 scopus 로고    scopus 로고
    • Enhancing spontaneous speech recognition with BLSTM features
    • Las Palmas de Gran Canaria, Spain
    • M. Wöllmer, and B. Schuller Enhancing spontaneous speech recognition with BLSTM features Proc. of NOLISP Las Palmas de Gran Canaria, Spain 2011 17 24
    • (2011) Proc. of NOLISP , pp. 17-24
    • Wöllmer, M.1    Schuller, B.2
  • 44
    • 77956721304 scopus 로고    scopus 로고
    • Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening
    • M. Wöllmer, B. Schuller, F. Eyben, and G. Rigoll Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening IEEE Journal of Selected Topics in Signal Processing 4 5 2010 867 881
    • (2010) IEEE Journal of Selected Topics in Signal Processing , vol.4 , Issue.5 , pp. 867-881
    • Wöllmer, M.1    Schuller, B.2    Eyben, F.3    Rigoll, G.4
  • 45
    • 84865748400 scopus 로고    scopus 로고
    • Feature frame stacking in RNN-based Tandem ASR systems - learned vs. predefined context
    • Florence, Italy
    • M. Wöllmer, B. Schuller, and G. Rigoll Feature frame stacking in RNN-based Tandem ASR systems - learned vs. predefined context Proc. of Interspeech Florence, Italy 2011 1233 1236
    • (2011) Proc. of Interspeech , pp. 1233-1236
    • Wöllmer, M.1    Schuller, B.2    Rigoll, G.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.