메뉴 건너뛰기




Volumn 28, Issue 4, 2014, Pages 888-902

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

Author keywords

Automatic speech recognition; Deep neural networks; Feature enhancement; Long Short Term Memory

Indexed keywords

AUDITION; BRAIN; ELECTRIC NETWORK TOPOLOGY; RECURRENT NEURAL NETWORKS; REVERBERATION;

EID: 84900534601     PISSN: 08852308     EISSN: 10958363     Source Type: Journal    
DOI: 10.1016/j.csl.2014.01.001     Document Type: Article
Times cited : (58)

References (38)
  • 1
    • 0030677475 scopus 로고    scopus 로고
    • Speaker adaptive training: A maximum likelihood approach to speaker normalization
    • IEEE
    • T. Anastasakos, J. McDonough, and J. Makhoul Speaker adaptive training: a maximum likelihood approach to speaker normalization Proc. of ICASSP 1997 IEEE 1043 1046
    • (1997) Proc. of ICASSP , pp. 1043-1046
    • Anastasakos, T.1    McDonough, J.2    Makhoul, J.3
  • 3
    • 0028392483 scopus 로고
    • Learning long-term dependencies with gradient descent is difficult
    • Y. Bengio, P. Simard, and P. Frasconi Learning long-term dependencies with gradient descent is difficult IEEE Transactions on Neural Networks 5 2 1994 157 166
    • (1994) IEEE Transactions on Neural Networks , vol.5 , Issue.2 , pp. 157-166
    • Bengio, Y.1    Simard, P.2    Frasconi, P.3
  • 6
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M.J. Gales Maximum likelihood linear transformations for HMM-based speech recognition Computer Speech & Language 12 2 1998 75 98 (Pubitemid 128383747)
    • (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.J.F.1
  • 9
    • 84962920708 scopus 로고    scopus 로고
    • Evaluating long-term spectral subtraction for reverberant ASR
    • IEEE Madonna di Campiglio, Italy
    • D. Gelbart, and N. Morgan Evaluating long-term spectral subtraction for reverberant ASR Proc. of ASRU 2001 IEEE Madonna di Campiglio, Italy 103 106
    • (2001) Proc. of ASRU , pp. 103-106
    • Gelbart, D.1    Morgan, N.2
  • 10
    • 0034293152 scopus 로고    scopus 로고
    • Learning to forget: Continual prediction with LSTM
    • F. Gers, J. Schmidhuber, and F. Cummins Learning to forget: continual prediction with LSTM Neural Computation 12 10 2000 2451 2471
    • (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
    • Gers, F.1    Schmidhuber, J.2    Cummins, F.3
  • 12
    • 84890543083 scopus 로고    scopus 로고
    • Speech recognition with deep recurrent neural networks
    • May IEEE Vancouver, Canada
    • A. Graves, A. Mohamed, and G. Hinton Speech recognition with deep recurrent neural networks Proc. of ICASSP May 2013 IEEE Vancouver, Canada 6645 6649
    • (2013) Proc. of ICASSP , pp. 6645-6649
    • Graves, A.1    Mohamed, A.2    Hinton, G.3
  • 13
    • 0017097474 scopus 로고
    • Distance measures for speech processing
    • A. Gray, and J. Markel Distance measures for speech processing IEEE Transactions on Acoustics, Speech and Signal Processing 24 5 1976 380 391 (Pubitemid 8091024)
    • (1976) Ieee Trans.acoust.speech Sign.Proc. , vol.24 , Issue.5 , pp. 380-391
    • Gray Jr., A.H.1    Markel, J.D.2
  • 18
    • 84869432703 scopus 로고    scopus 로고
    • A two-channel acoustic front-end for robust automatic speech recognition in noisy and reverberant environments
    • R. Maas, A. Schwarz, Y. Zheng, K. Reindl, S. Meier, A. Sehr, and W. Kellermann A two-channel acoustic front-end for robust automatic speech recognition in noisy and reverberant environments Proc. of CHiME 2011 41 46
    • (2011) Proc. of CHiME , pp. 41-46
    • Maas, R.1    Schwarz, A.2    Zheng, Y.3    Reindl, K.4    Meier, S.5    Sehr, A.6    Kellermann, W.7
  • 19
    • 84867585919 scopus 로고    scopus 로고
    • Understanding how deep belief networks perform acoustic modelling
    • Kyoto, Japan
    • A. Mohamed, G. Hinton, and G. Penn Understanding how deep belief networks perform acoustic modelling Proc. of ICASSP Kyoto, Japan 2012 4273 4276
    • (2012) Proc. of ICASSP , pp. 4273-4276
    • Mohamed, A.1    Hinton, G.2    Penn, G.3
  • 20
    • 84893685019 scopus 로고    scopus 로고
    • A flexible spatial blind source extraction framework for robust speech recognition in noisy environments
    • Vancouver, Canada
    • F. Nesta, M. Matassoni, and R.F. Astudillo A flexible spatial blind source extraction framework for robust speech recognition in noisy environments Proc. of CHiME Vancouver, Canada 2013 33 38
    • (2013) Proc. of CHiME , pp. 33-38
    • Nesta, F.1    Matassoni, M.2    Astudillo, R.F.3
  • 21
    • 4544354701 scopus 로고    scopus 로고
    • Speech enhancement with missing data techniques using recurrent neural networks
    • Montreal, Canada
    • S. Parveen, and P. Green Speech enhancement with missing data techniques using recurrent neural networks Proc. of ICASSP Montreal, Canada 2004
    • (2004) Proc. of ICASSP
    • Parveen, S.1    Green, P.2
  • 24
    • 79959818117 scopus 로고    scopus 로고
    • Non-negative matrix factorization based compensation of music for automatic speech recognition
    • Makuhari, Japan
    • B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh Non-negative matrix factorization based compensation of music for automatic speech recognition Proc. of Interspeech Makuhari, Japan 2010 717 720
    • (2010) Proc. of Interspeech , pp. 717-720
    • Raj, B.1    Virtanen, T.2    Chaudhuri, S.3    Singh, R.4
  • 25
    • 51449100115 scopus 로고    scopus 로고
    • Efficient model-based speech separation and denoising using non-negative subspace analysis
    • Las Vegas, NV, USA
    • S.J. Rennie, J.R. Hershey, and P.A. Olsen Efficient model-based speech separation and denoising using non-negative subspace analysis Proc. of ICASSP Las Vegas, NV, USA 2008 1833 1836
    • (2008) Proc. of ICASSP , pp. 1833-1836
    • Rennie, S.J.1    Hershey, J.R.2    Olsen, P.A.3
  • 27
    • 67650135931 scopus 로고    scopus 로고
    • Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement
    • (Article ID: 942617)
    • B. Schuller, M. Wöllmer, T. Moosmayr, and G. Rigoll Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing 2009 2009 1 17 (Article ID: 942617)
    • (2009) EURASIP Journal on Audio, Speech, and Music Processing , vol.2009 , pp. 1-17
    • Schuller, B.1    Wöllmer, M.2    Moosmayr, T.3    Rigoll, G.4
  • 28
    • 84890545600 scopus 로고    scopus 로고
    • Multi-task learning in deep neural networks for improved phoneme recognition
    • IEEE Vancouver, Canada
    • M.L. Seltzer, and J. Droppo Multi-task learning in deep neural networks for improved phoneme recognition Proc. of ICASSP 2013 IEEE Vancouver, Canada 6965 6969
    • (2013) Proc. of ICASSP , pp. 6965-6969
    • Seltzer, M.L.1    Droppo, J.2
  • 29
    • 84890492030 scopus 로고    scopus 로고
    • An investigation of deep neural networks for noise robust speech recognition
    • Vancouver, Canada
    • M.L. Seltzer, D. Yu, and Y. Wang An investigation of deep neural networks for noise robust speech recognition Proc. of ICASSP Vancouver, Canada 2013 7398 7402
    • (2013) Proc. of ICASSP , pp. 7398-7402
    • Seltzer, M.L.1    Yu, D.2    Wang, Y.3
  • 30
    • 84890503970 scopus 로고    scopus 로고
    • Effectiveness of discriminative training and feature transformation for reverberated and noisy speech
    • Vancouver, Canada
    • Y. Tachioka, S. Watanabe, and J.R. Hershey Effectiveness of discriminative training and feature transformation for reverberated and noisy speech Proc. of ICASSP Vancouver, Canada 2013 6935 6939
    • (2013) Proc. of ICASSP , pp. 6935-6939
    • Tachioka, Y.1    Watanabe, S.2    Hershey, J.R.3
  • 32
    • 84890541701 scopus 로고    scopus 로고
    • The second 'CHiME' speech separation and recognition challenge: Datasets, tasks and baselines
    • Vancouver, Canada
    • E. Vincent, J. Barker, S. Watanabe, J. Le Roux, F. Nesta, and M. Matassoni The second 'CHiME' speech separation and recognition challenge: datasets, tasks and baselines Proc. of ICASSP Vancouver, Canada 2013 126 130
    • (2013) Proc. of ICASSP , pp. 126-130
    • Vincent, E.1    Barker, J.2    Watanabe, S.3    Le Roux, J.4    Nesta, F.5    Matassoni, M.6
  • 35
    • 81355147535 scopus 로고    scopus 로고
    • Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting
    • M. Wöllmer, E. Marchi, S. Squartini, and B. Schuller Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting Cognitive Neurodynamics 5 3 2011 253 264
    • (2011) Cognitive Neurodynamics , vol.5 , Issue.3 , pp. 253-264
    • Wöllmer, M.1    Marchi, E.2    Squartini, S.3    Schuller, B.4
  • 36
    • 84865748400 scopus 로고    scopus 로고
    • Feature frame stacking in RNN-based tandem ASR systems - Learned vs. Predefined context
    • Florence, Italy
    • M. Wöllmer, B. Schuller, and G. Rigoll Feature frame stacking in RNN-based tandem ASR systems - learned vs. predefined context Proc. of Interspeech Florence, Italy 2011 1233 1236
    • (2011) Proc. of Interspeech , pp. 1233-1236
    • Wöllmer, M.1    Schuller, B.2    Rigoll, G.3
  • 37
    • 84890489927 scopus 로고    scopus 로고
    • Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise
    • Vancouver, Canada
    • M. Wöllmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise Proc. of ICASSP Vancouver, Canada 2013 6822 6826
    • (2013) Proc. of ICASSP , pp. 6822-6826
    • Wöllmer, M.1    Zhang, Z.2    Weninger, F.3    Schuller, B.4    Rigoll, G.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.