SCOPUS 정보 검색 플랫폼

Computer Speech and Language

Volumn 27, Issue 3, 2013, Pages 780-797

Noise robust ASR in reverberated multisource environments applying convolutive NMF and Long Short-Term Memory

(5) Wöllmer, Martin a Weninger, Felix a Geiger, Jürgen a Schuller, Björn a Rigoll, Gerhard a

a TECHNICAL UNIVERSITY OF MUNICH (Germany)

Author keywords

Automatic speech recognition; Long Short Term Memory; Non negative matrix factorization; Tandem feature extraction

Indexed keywords

BRAIN; EXTRACTION; FACTORIZATION; FEATURE EXTRACTION; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MATRIX ALGEBRA; MEMORY ARCHITECTURE; REVERBERATION; SIGNAL TO NOISE RATIO; SOURCE SEPARATION; SPEECH RECOGNITION;

AUTOMATIC SPEECH RECOGNITION; CONTEXT SENSITIVE; FEATURE GENERATION; MULTI-SOURCE ENVIRONMENT; NONNEGATIVE MATRIX FACTORIZATION; SPARSE CLASSIFICATION; TEMPORAL CLASSIFICATION; TEMPORAL CONTEXT MODELS;

LONG SHORT-TERM MEMORY;

EID: 84883396653 PISSN: 08852308 EISSN: 10958363 Source Type: Journal
DOI: 10.1016/j.csl.2012.05.002 Document Type: Article

Times cited : (12)

References (45)

1
- 84874909204
- The PASCAL CHiME speech separation and recognition challenge
- submitted for publication.
- Barker, J.P., Vincent, E., Ma, N., Christensen, H., Green, P.D. The PASCAL CHiME speech separation and recognition challenge. Computer Speech and Language, submitted for publication.
- Computer Speech and Language
- Barker, J.P.¹ Vincent, E.² Ma, N.³ Christensen, H.⁴ Green, P.D.⁵

2
- 0028392483
- Learning long-term dependencies with gradient descent is difficult
- Y. Bengio, P. Simard, and P. Frasconi Learning long-term dependencies with gradient descent is difficult IEEE Transactions on Neural Networks 5 2 1994 157 166
- (1994) IEEE Transactions on Neural Networks , vol.5 , Issue.2 , pp. 157-166
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

3
- 33750368310
- An audio-visual corpus for speech perception and automatic speech recognition
- M. Cooke, J. Barker, S. Cunningham, and X. Shao An audio-visual corpus for speech perception and automatic speech recognition The Journal of the Acoustical Society of America 120 5 2006 2421 2424
- (2006) The Journal of the Acoustical Society of America , vol.120 , Issue.5 , pp. 2421-2424
- Cooke, M.¹ Barker, J.² Cunningham, S.³ Shao, X.⁴

4
- 69249202377
- Monaural speech separation and recognition challenge
- M. Cooke, J.R. Hershey, and S.J. Rennie Monaural speech separation and recognition challenge Computer Speech and Language 24 2010 1 15
- (2010) Computer Speech and Language , vol.24 , pp. 1-15
- Cooke, M.¹ Hershey, J.R.² Rennie, S.J.³

5
- 84873898784
- Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation.
- Florence, Italy
- M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S. Watanabe, M. Fujimoto, T. Yoshioka, T. Oba, Y. Kubo, M. Souden, S.J. Hahm, and A. Nakamura Speech recognition in the presence of highly non-stationary noise based on spatial, spectral and temporal speech/noise modeling combined with dynamic variance adaptation. Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 Florence, Italy 2011 12 17
- (2011) Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 , pp. 12-17
- Delcroix, M.¹ Kinoshita, K.² Nakatani, T.³ Araki, S.⁴ Ogawa, A.⁵ Hori, T.⁶ Watanabe, S.⁷ Fujimoto, M.⁸ Yoshioka, T.⁹ Oba, T.¹⁰ Kubo, Y.¹¹ Souden, M.¹² Hahm, S.J.¹³ Nakamura, A.¹⁴

6
- 0000259511
- Approximate statistical tests for comparing supervised classification learning algorithms
- T.G. Dietterich Approximate statistical tests for comparing supervised classification learning algorithms Neural Computation 10 1998 1895 1923
- (1998) Neural Computation , vol.10 , pp. 1895-1923
- Dietterich, T.G.¹

7
- 84886063988
- Sound source separation
- U. Zölzer, 2nd edition Wiley
- G. Evangelista, S. Marchand, M. Plumbley, and E. Vincent Sound source separation U. Zölzer, DAFX - Digital Audio Effects 2nd edition 2011 Wiley
- (2011) DAFX - Digital Audio Effects
- Evangelista, G.¹ Marchand, S.² Plumbley, M.³ Vincent, E.⁴

8
- 38149014113
- An application of recurrent neural networks to discriminative keyword spotting
- Porto, Portugal
- S. Fernandez, A. Graves, and J. Schmidhuber An application of recurrent neural networks to discriminative keyword spotting Proc. of ICANN Porto, Portugal 2007 220 229
- (2007) Proc. of ICANN , pp. 220-229
- Fernandez, S.¹ Graves, A.² Schmidhuber, J.³

9
- 79960657803
- Exemplar-based sparse representations for noise robust automatic speech recognition
- J. Gemmeke, T. Virtanen, and A. Hurmalainen Exemplar-based sparse representations for noise robust automatic speech recognition IEEE Transactions on Audio, Speech, and Language Processing 19 7 2011 2067 2080
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.7 , pp. 2067-2080
- Gemmeke, J.¹ Virtanen, T.² Hurmalainen, A.³

10
- 84890521030
- Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition
- Florence, Italy
- J.F. Gemmeke, T. Virtanen, and A. Hurmalainen Exemplar-based speech enhancement and its application to noise-robust automatic speech recognition Proc. of CHiME Workshop Florence, Italy 2011 53 57
- (2011) Proc. of CHiME Workshop , pp. 53-57
- Gemmeke, J.F.¹ Virtanen, T.² Hurmalainen, A.³

11
- 0034293152
- Learning to forget: continual prediction with LSTM
- F. Gers, J. Schmidhuber, and F. Cummins Learning to forget: continual prediction with LSTM Neural Computation 12 10 2000 2451 2471
- (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
- Gers, F.¹ Schmidhuber, J.² Cummins, F.³

12
- 33749259827
- Connectionist temporal classification: labelling unsegmented data with recurrent neural networks
- Pittsburgh, USA
- A. Graves, S. Fernandez, F. Gomez, and J. Schmidhuber Connectionist temporal classification: labelling unsegmented data with recurrent neural networks Proc. of ICML Pittsburgh, USA 2006 369 376
- (2006) Proc. of ICML , pp. 369-376
- Graves, A.¹ Fernandez, S.² Gomez, F.³ Schmidhuber, J.⁴

13
- 85161980569
- Unconstrained online handwriting recognition with recurrent neural networks
- A. Graves, S. Fernandez, M. Liwicki, H. Bunke, and J. Schmidhuber Unconstrained online handwriting recognition with recurrent neural networks Advances in Neural Information Processing Systems 20 2008 1 8
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1-8
- Graves, A.¹ Fernandez, S.² Liwicki, M.³ Bunke, H.⁴ Schmidhuber, J.⁵

14
- 27744588611
- Framewise phoneme classification with bidirectional LSTM and other neural network architectures
- A. Graves, and J. Schmidhuber Framewise phoneme classification with bidirectional LSTM and other neural network architectures Neural Networks 18 5-6 2005 602 610
- (2005) Neural Networks , vol.18 , Issue.5-6 , pp. 602-610
- Graves, A.¹ Schmidhuber, J.²

15
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- F. Grezl, M. Karafiat, K. Stanislav, and J. Cernocky Probabilistic and bottle-neck features for LVCSR of meetings Proc. of ICASSP 2007 757 760
- (2007) Proc. of ICASSP , pp. 757-760
- Grezl, F.¹ Karafiat, M.² Stanislav, K.³ Cernocky, J.⁴

16
- 84863690059
- Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine
- Antalya, Turkey
- M. Helen, and T. Virtanen Separation of drums from polyphonic music using non-negative matrix factorization and support vector machine Proc. of EUSIPCO Antalya, Turkey 2005
- (2005) Proc. of EUSIPCO
- Helen, M.¹ Virtanen, T.²

17
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- Istanbul, Turkey
- H. Hermansky, D.P.W. Ellis, and S. Sharma Tandem connectionist feature extraction for conventional HMM systems Proc. of ICASSP Istanbul, Turkey 2000 1635 1638
- (2000) Proc. of ICASSP , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

18
- 0041914606
- Gradient flow in recurrent nets: the difficulty of learning long-term dependencies
- S.C. Kremer, J.F. Kolen, IEEE Press
- S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber Gradient flow in recurrent nets: the difficulty of learning long-term dependencies S.C. Kremer, J.F. Kolen, A Field Guide to Dynamical Recurrent Neural Networks 2001 IEEE Press 1 15
- (2001) A Field Guide to Dynamical Recurrent Neural Networks , pp. 1-15
- Hochreiter, S.¹ Bengio, Y.² Frasconi, P.³ Schmidhuber, J.⁴

19
- 0031573117
- Long short-term memory
- S. Hochreiter, and J. Schmidhuber Long short-term memory Neural Computation 9 8 1997 1735 1780
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

20
- 84869754173
- Exemplar-based recognition of speech in highly variable noise
- Florence, Italy
- A. Hurmalainen, K. Mahkonen, J.F. Gemmeke, and T. Virtanen Exemplar-based recognition of speech in highly variable noise Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 Florence, Italy 2011 1 5
- (2011) Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 , pp. 1-5
- Hurmalainen, A.¹ Mahkonen, K.² Gemmeke, J.F.³ Virtanen, T.⁴

21
- 1842436050
- The echo state approach to analyzing and training recurrent neural networks
- Tech. Rep. German National Research Center for Information Technology, Bremen
- Jaeger, H., 2001. The echo state approach to analyzing and training recurrent neural networks. Tech. Rep. German National Research Center for Information Technology, Bremen (Tech. Rep. No. 148).
- (2001) Tech. Rep. No. 148
- Jaeger, H.¹

22
- 0025254722
- A time-delay neural network architecture for isolated word recognition
- K.J. Lang, A.H. Waibel, and G.E. Hinton A time-delay neural network architecture for isolated word recognition Neural Networks 3 1 1990 23 43
- (1990) Neural Networks , vol.3 , Issue.1 , pp. 23-43
- Lang, K.J.¹ Waibel, A.H.² Hinton, G.E.³

23
- 33646241633
- Learning long-term dependencies in NARX recurrent neural networks
- T. Lin, B.G. Horne, P. Tino, and C.L. Giles Learning long-term dependencies in NARX recurrent neural networks IEEE Transactions on Neural Networks 7 6 1996 1329 1338
- (1996) IEEE Transactions on Neural Networks , vol.7 , Issue.6 , pp. 1329-1338
- Lin, T.¹ Horne, B.G.² Tino, P.³ Giles, C.L.⁴

24
- 84940458837
- Distant microphone speech recognition in a noisy indoor environment: combining soft missing data and speech fragment decoding
- Makuhari, Japan
- N. Ma, J. Barker, H. Christensen, and P. Green Distant microphone speech recognition in a noisy indoor environment: combining soft missing data and speech fragment decoding Proc. of ISCA Workshop on Statistical and Perceptual Audition (SAPA) Makuhari, Japan 2010
- (2010) Proc. of ISCA Workshop on Statistical and Perceptual Audition (SAPA)
- Ma, N.¹ Barker, J.² Christensen, H.³ Green, P.⁴

25
- 84865736185
- Phoneme-dependent NMF for speech enhancement in monaural mixtures
- ISCA, Florence, Italy
- B. Raj, R. Singh, and T. Virtanen Phoneme-dependent NMF for speech enhancement in monaural mixtures Proc. of Interspeech ISCA, Florence, Italy 2011 1217 1220
- (2011) Proc. of Interspeech , pp. 1217-1220
- Raj, B.¹ Singh, R.² Virtanen, T.³

26
- 79959818117
- Non-negative matrix factorization based compensation of music for automatic speech recognition
- Makuhari, Japan
- B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh Non-negative matrix factorization based compensation of music for automatic speech recognition Proc. of Interspeech Makuhari, Japan 2010 717 720
- (2010) Proc. of Interspeech , pp. 717-720
- Raj, B.¹ Virtanen, T.² Chaudhuri, S.³ Singh, R.⁴

27
- 51449100115
- Efficient model-based speech separation and denoising using non-negative subspace analysis
- Las Vegas, NV, USA
- S.J. Rennie, J.R. Hershey, and P.A. Olsen Efficient model-based speech separation and denoising using non-negative subspace analysis Proc. of ICASSP Las Vegas, NV, USA 2008 1833 1836
- (2008) Proc. of ICASSP , pp. 1833-1836
- Rennie, S.J.¹ Hershey, J.R.² Olsen, P.A.³

28
- 56449109755
- Learning long-term dependencies with recurrent neural networks
- A.M. Schaefer, S. Udluft, and H.G. Zimmermann Learning long-term dependencies with recurrent neural networks Neurocomputing 71 13-15 2008 2481 2488
- (2008) Neurocomputing , vol.71 , Issue.13-15 , pp. 2481-2488
- Schaefer, A.M.¹ Udluft, S.² Zimmermann, H.G.³

29
- 0001033889
- Learning complex extended sequences using the principle of history compression
- J. Schmidhuber Learning complex extended sequences using the principle of history compression Neural Computing 4 2 1992 234 242
- (1992) Neural Computing , vol.4 , Issue.2 , pp. 234-242
- Schmidhuber, J.¹

30
- 44949110218
- Single-channel speech separation using sparse non-negative matrix factorization
- Pittsburgh, PA, USA
- M.N. Schmidt, and R.K. Olsson Single-channel speech separation using sparse non-negative matrix factorization Proc. of Interspeech Pittsburgh, PA, USA 2006
- (2006) Proc. of Interspeech
- Schmidt, M.N.¹ Olsson, R.K.²

31
- 67650135931
- Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement
- B. Schuller, M. Wöllmer, T. Moosmayr, and G. Rigoll Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement Journal on Audio, Speech, and Music Processing 2009 (ID 942617)
- (2009) Journal on Audio, Speech, and Music Processing
- Schuller, B.¹ Wöllmer, M.² Moosmayr, T.³ Rigoll, G.⁴

32
- 0031268931
- Bidirectional recurrent neural networks
- M. Schuster, and K.K. Paliwal Bidirectional recurrent neural networks IEEE Transactions on Signal Processing 45 1997 2673 2681
- (1997) IEEE Transactions on Signal Processing , vol.45 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.K.²

33
- 78049383291
- Discovering auditory objects through non-negativity constraints
- Jeju, Korea
- P. Smaragdis Discovering auditory objects through non-negativity constraints Proc. of SAPA Jeju, Korea 2004
- (2004) Proc. of SAPA
- Smaragdis, P.¹

34
- 38049021850
- Convolutive speech bases and their application to supervised speech separation
- P. Smaragdis Convolutive speech bases and their application to supervised speech separation IEEE Transactions on Audio, Speech and Language Processing 15 1 2007 1 14
- (2007) IEEE Transactions on Audio, Speech and Language Processing , vol.15 , Issue.1 , pp. 1-14
- Smaragdis, P.¹

35
- 67650142420
- A multiplicative algorithm for convolutive non-negative matrix factorization based on squared Euclidean distance
- W. Wang, A. Cichocki, and J.A. Chambers A multiplicative algorithm for convolutive non-negative matrix factorization based on squared Euclidean distance IEEE Transactions on Signal Processing 57 7 2009 July 2858 2864
- (2009) IEEE Transactions on Signal Processing , vol.57 , Issue.7 , pp. 2858-2864
- Wang, W.¹ Cichocki, A.² Chambers, J.A.³

36
- 84857258863
- The Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments
- Florence, Italy
- F. Weninger, J. Geiger, M. Wöllmer, B. Schuller, and G. Rigoll The Munich 2011 CHiME Challenge Contribution: NMF-BLSTM Speech Enhancement and Recognition for Reverberated Multisource Environments Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 Florence, Italy 2011 24 29
- (2011) Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 , pp. 24-29
- Weninger, F.¹ Geiger, J.² Wöllmer, M.³ Schuller, B.⁴ Rigoll, G.⁵

37
- 80051618211
- openBliSSART: design and evaluation of a research toolkit for blind source separation in audio recognition tasks
- Prague, Czech Republic
- F. Weninger, A. Lehmann, and B. Schuller openBliSSART: design and evaluation of a research toolkit for blind source separation in audio recognition tasks Proc. of ICASSP Prague, Czech Republic 2011 1625 1628
- (2011) Proc. of ICASSP , pp. 1625-1628
- Weninger, F.¹ Lehmann, A.² Schuller, B.³

38
- 84867600087
- Non-Negative Matrix Factorization for Highly Noise-Robust ASR: to Enhance or to Recognize?
- Kyoto, Japan
- F. Weninger, M. Wöllmer, J. Geiger, B. Schuller, J. Gemmeke, A. Hurmalainen, T. Virtanen, and G. Rigoll Non-Negative Matrix Factorization for Highly Noise-Robust ASR: to Enhance or to Recognize? Proc. of ICASSP Kyoto, Japan 2012 4681 4684
- (2012) Proc. of ICASSP , pp. 4681-4684
- Weninger, F.¹ Wöllmer, M.² Geiger, J.³ Schuller, B.⁴ Gemmeke, J.⁵ Hurmalainen, A.⁶ Virtanen, T.⁷ Rigoll, G.⁸

39
- 51449092704
- Speech denoising using nonnegative matrix factorization with priors
- Las Vegas, NV, USA
- K.W. Wilson, B. Raj, P. Smaragdis, and A. Divakaran Speech denoising using nonnegative matrix factorization with priors Proc. of ICASSP Las Vegas, NV, USA 2008 4029 4032
- (2008) Proc. of ICASSP , pp. 4029-4032
- Wilson, K.W.¹ Raj, B.² Smaragdis, P.³ Divakaran, A.⁴

40
- 79958176949
- On-line driver distraction detection using long short-term memory
- M. Wöllmer, C. Blaschke, T. Schindl, B. Schuller, B. Färber, S. Mayer, and B. Trefflich On-line driver distraction detection using long short-term memory IEEE Transactions on Intelligent Transportation Systems 12 2 2011 574 582
- (2011) IEEE Transactions on Intelligent Transportation Systems , vol.12 , Issue.2 , pp. 574-582
- Wöllmer, M.¹ Blaschke, C.² Schindl, T.³ Schuller, B.⁴ Färber, B.⁵ Mayer, S.⁶ Trefflich, B.⁷

41
- 78651563436
- Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework
- M. Wöllmer, F. Eyben, A. Graves, B. Schuller, and G. Rigoll Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework Cognitive Computation 2 3 2010 180 190
- (2010) Cognitive Computation , vol.2 , Issue.3 , pp. 180-190
- Wöllmer, M.¹ Eyben, F.² Graves, A.³ Schuller, B.⁴ Rigoll, G.⁵

42
- 80051637579
- A multi-stream ASR framework for BLSTM modeling of conversational speech
- Prague, Czech Republic
- M. Wöllmer, F. Eyben, B. Schuller, and G. Rigoll A multi-stream ASR framework for BLSTM modeling of conversational speech Proc. of ICASSP Prague, Czech Republic 2011 4860 4863
- (2011) Proc. of ICASSP , pp. 4860-4863
- Wöllmer, M.¹ Eyben, F.² Schuller, B.³ Rigoll, G.⁴

43
- 81155123235
- Enhancing spontaneous speech recognition with BLSTM features
- Las Palmas de Gran Canaria, Spain
- M. Wöllmer, and B. Schuller Enhancing spontaneous speech recognition with BLSTM features Proc. of NOLISP Las Palmas de Gran Canaria, Spain 2011 17 24
- (2011) Proc. of NOLISP , pp. 17-24
- Wöllmer, M.¹ Schuller, B.²

44
- 77956721304
- Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening
- M. Wöllmer, B. Schuller, F. Eyben, and G. Rigoll Combining long short-term memory and dynamic Bayesian networks for incremental emotion-sensitive artificial listening IEEE Journal of Selected Topics in Signal Processing 4 5 2010 867 881
- (2010) IEEE Journal of Selected Topics in Signal Processing , vol.4 , Issue.5 , pp. 867-881
- Wöllmer, M.¹ Schuller, B.² Eyben, F.³ Rigoll, G.⁴

45
- 84865748400
- Feature frame stacking in RNN-based Tandem ASR systems - learned vs. predefined context
- Florence, Italy
- M. Wöllmer, B. Schuller, and G. Rigoll Feature frame stacking in RNN-based Tandem ASR systems - learned vs. predefined context Proc. of Interspeech Florence, Italy 2011 1233 1236
- (2011) Proc. of Interspeech , pp. 1233-1236
- Wöllmer, M.¹ Schuller, B.² Rigoll, G.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.