SCOPUS 정보 검색 플랫폼

Computer Speech and Language

Volumn 28, Issue 4, 2014, Pages 888-902

Feature enhancement by deep LSTM networks for ASR in reverberant multisource environments

(5) Weninger, Felix a Geiger, Jürgen a Wöllmer, Martin a,b Schuller, Björn a,c Rigoll, Gerhard a

a TECHNICAL UNIVERSITY OF MUNICH (Germany)

b BMW GROUP (Germany)

c IMPERIAL COLLEGE LONDON (United Kingdom)

Author keywords

Automatic speech recognition; Deep neural networks; Feature enhancement; Long Short Term Memory

Indexed keywords

AUDITION; BRAIN; ELECTRIC NETWORK TOPOLOGY; RECURRENT NEURAL NETWORKS; REVERBERATION;

AUTOMATIC SPEECH RECOGNITION; BIDIRECTIONAL RECURRENT NEURAL NETWORKS; DEEP NEURAL NETWORKS; FEATURE ENHANCEMENT; LONG SHORT-TERM MEMORY; MULTI-SOURCE ENVIRONMENT; SPEECH FEATURE ENHANCEMENT; SPEECH RECOGNITION PERFORMANCE;

SPEECH RECOGNITION;

EID: 84900534601 PISSN: 08852308 EISSN: 10958363 Source Type: Journal
DOI: 10.1016/j.csl.2014.01.001 Document Type: Article

Times cited : (58)

References (38)

1
- 0030677475
- Speaker adaptive training: A maximum likelihood approach to speaker normalization
- IEEE
- T. Anastasakos, J. McDonough, and J. Makhoul Speaker adaptive training: a maximum likelihood approach to speaker normalization Proc. of ICASSP 1997 IEEE 1043 1046
- (1997) Proc. of ICASSP , pp. 1043-1046
- Anastasakos, T.¹ McDonough, J.² Makhoul, J.³

2
- 84878543263
- The PASCAL CHiME speech separation and recognition challenge
- J.P. Barker, E. Vincent, N. Ma, H. Christensen, and P.D. Green The PASCAL CHiME speech separation and recognition challenge Computer Speech & Language 27 3 2013 621 633
- (2013) Computer Speech & Language , vol.27 , Issue.3 , pp. 621-633
- Barker, J.P.¹ Vincent, E.² Ma, N.³ Christensen, H.⁴ Green, P.D.⁵

3
- 0028392483
- Learning long-term dependencies with gradient descent is difficult
- Y. Bengio, P. Simard, and P. Frasconi Learning long-term dependencies with gradient descent is difficult IEEE Transactions on Neural Networks 5 2 1994 157 166
- (1994) IEEE Transactions on Neural Networks , vol.5 , Issue.2 , pp. 157-166
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

4
- 18744371585
- Histogram equalization of speech representation for robust speech recognition
- DOI 10.1109/TSA.2005.845805
- A. de la Torre, A.M. Peinado, J.C. Segura, J.L. Perez-Cordoba, M.C. Benitez, and A.J. Rubio Histogram equalization of speech representation for robust speech recognition IEEE Transactions on Speech and Audio Processing 13 3 2005 355 366 (Pubitemid 40666170)
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 355-366
- De La Torre, A.¹ Peinado, A.M.² Segura, J.C.³ Perez-Cordoba, J.L.⁴ Benitez, Ma.C.⁵ Rubio, A.J.⁶

5
- 84890443834
- Real-life voice activity detection with LSTM recurrent neural networks and an application to Hollywood movies
- IEEE Vancouver, Canada
- F. Eyben, F. Weninger, S. Squartini, and B. Schuller Real-life voice activity detection with LSTM recurrent neural networks and an application to Hollywood movies Proceedings 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 2013 May IEEE Vancouver, Canada 483 487
- (2013) Proceedings 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 , pp. 483-487
- Eyben, F.¹ Weninger, F.² Squartini, S.³ Schuller, B.⁴

6
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M.J. Gales Maximum likelihood linear transformations for HMM-based speech recognition Computer Speech & Language 12 2 1998 75 98 (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

7
- 79961153040
- Model-based approaches to handling additive noise in reverberant environments
- Edinburgh, UK
- M.J.F. Gales, and Y.Q. Wang Model-based approaches to handling additive noise in reverberant environments Proc. IEEE Workshop on Hands-free Speech Communication and Microphone Arrays Edinburgh, UK 2011 121 126
- (2011) Proc. IEEE Workshop on Hands-free Speech Communication and Microphone Arrays , pp. 121-126
- Gales, M.J.F.¹ Wang, Y.Q.²

8
- 84893675434
- The TUM + TUT + KUL approach to the CHiME challenge 2013: Multi-stream ASR exploiting BLSTM networks and sparse NMF
- June IEEE Vancouver, Canada
- J.T. Geiger, F. Weninger, A. Hurmalainen, J.F. Gemmeke, M. Wöllmer, B. Schuller, G. Rigoll, and T. Virtanen The TUM + TUT + KUL approach to the CHiME challenge 2013: multi-stream ASR exploiting BLSTM networks and sparse NMF Proceedings the 2nd CHiME Workshop on Machine Listening in Multisource Environments held in Conjunction with ICASSP 2013 June 2013 IEEE Vancouver, Canada 25 30
- (2013) Proceedings the 2nd CHiME Workshop on Machine Listening in Multisource Environments Held in Conjunction with ICASSP 2013 , pp. 25-30
- Geiger, J.T.¹ Weninger, F.² Hurmalainen, A.³ Gemmeke, J.F.⁴ Wöllmer, M.⁵ Schuller, B.⁶ Rigoll, G.⁷ Virtanen, T.⁸

9
- 84962920708
- Evaluating long-term spectral subtraction for reverberant ASR
- IEEE Madonna di Campiglio, Italy
- D. Gelbart, and N. Morgan Evaluating long-term spectral subtraction for reverberant ASR Proc. of ASRU 2001 IEEE Madonna di Campiglio, Italy 103 106
- (2001) Proc. of ASRU , pp. 103-106
- Gelbart, D.¹ Morgan, N.²

10
- 0034293152
- Learning to forget: Continual prediction with LSTM
- F. Gers, J. Schmidhuber, and F. Cummins Learning to forget: continual prediction with LSTM Neural Computation 12 10 2000 2451 2471
- (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
- Gers, F.¹ Schmidhuber, J.² Cummins, F.³

11
- 70349284484
- Technische Universität München (Ph.D. thesis)
- A. Graves Supervised sequence labelling with recurrent neural networks 2008 Technische Universität München (Ph.D. thesis)
- (2008) Supervised Sequence Labelling with Recurrent Neural Networks
- Graves, A.¹

12
- 84890543083
- Speech recognition with deep recurrent neural networks
- May IEEE Vancouver, Canada
- A. Graves, A. Mohamed, and G. Hinton Speech recognition with deep recurrent neural networks Proc. of ICASSP May 2013 IEEE Vancouver, Canada 6645 6649
- (2013) Proc. of ICASSP , pp. 6645-6649
- Graves, A.¹ Mohamed, A.² Hinton, G.³

13
- 0017097474
- Distance measures for speech processing
- A. Gray, and J. Markel Distance measures for speech processing IEEE Transactions on Acoustics, Speech and Signal Processing 24 5 1976 380 391 (Pubitemid 8091024)
- (1976) Ieee Trans.acoust.speech Sign.Proc. , vol.24 , Issue.5 , pp. 380-391
- Gray Jr., A.H.¹ Markel, J.D.²

14
- 85016663198
- RASTA-PLP speech analysis technique
- H. Hermansky, N. Morgan, A. Bayya, and P. Kohn RASTA-PLP speech analysis technique Proceedings of International Conference on Acoustics, Speech, and Signal Processing, vol. 1 1992 121 124
- (1992) Proceedings of International Conference on Acoustics, Speech, and Signal Processing, Vol. 1 , pp. 121-124
- Hermansky, H.¹ Morgan, N.² Bayya, A.³ Kohn, P.⁴

15
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, and B. Kingsbury Deep neural networks for acoustic modeling in speech recognition IEEE Signal Processing Magazine 29 6 2012 82 97
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

16
- 84869754173
- Exemplar-based recognition of speech in highly variable noise
- Florence, Italy
- A. Hurmalainen, K. Mahkonen, J.F. Gemmeke, and T. Virtanen Exemplar-based recognition of speech in highly variable noise Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 Florence, Italy 2011 1 5
- (2011) Proc. of Machine Listening in Multisource Environments (CHiME 2011), Satellite Workshop of Interspeech 2011 , pp. 1-5
- Hurmalainen, A.¹ Mahkonen, K.² Gemmeke, J.F.³ Virtanen, T.⁴

17
- 84900542109
- Recurrent neural network feature enhancement: The 2nd CHiME challenge
- June IEEE Vancouver, Canada
- A.L. Maas, T.M. O'Neil, A.Y. Hannun, and A.Y. Ng Recurrent neural network feature enhancement: the 2nd CHiME challenge Proceedings of the 2nd CHiME Workshop on Machine Listening in Multisource Environments held in Conjunction with ICASSP 2013 June 2013 IEEE Vancouver, Canada 79 80
- (2013) Proceedings of the 2nd CHiME Workshop on Machine Listening in Multisource Environments Held in Conjunction with ICASSP 2013 , pp. 79-80
- Maas, A.L.¹ O'Neil, T.M.² Hannun, A.Y.³ Ng, A.Y.⁴

18
- 84869432703
- A two-channel acoustic front-end for robust automatic speech recognition in noisy and reverberant environments
- R. Maas, A. Schwarz, Y. Zheng, K. Reindl, S. Meier, A. Sehr, and W. Kellermann A two-channel acoustic front-end for robust automatic speech recognition in noisy and reverberant environments Proc. of CHiME 2011 41 46
- (2011) Proc. of CHiME , pp. 41-46
- Maas, R.¹ Schwarz, A.² Zheng, Y.³ Reindl, K.⁴ Meier, S.⁵ Sehr, A.⁶ Kellermann, W.⁷

19
- 84867585919
- Understanding how deep belief networks perform acoustic modelling
- Kyoto, Japan
- A. Mohamed, G. Hinton, and G. Penn Understanding how deep belief networks perform acoustic modelling Proc. of ICASSP Kyoto, Japan 2012 4273 4276
- (2012) Proc. of ICASSP , pp. 4273-4276
- Mohamed, A.¹ Hinton, G.² Penn, G.³

20
- 84893685019
- A flexible spatial blind source extraction framework for robust speech recognition in noisy environments
- Vancouver, Canada
- F. Nesta, M. Matassoni, and R.F. Astudillo A flexible spatial blind source extraction framework for robust speech recognition in noisy environments Proc. of CHiME Vancouver, Canada 2013 33 38
- (2013) Proc. of CHiME , pp. 33-38
- Nesta, F.¹ Matassoni, M.² Astudillo, R.F.³

21
- 4544354701
- Speech enhancement with missing data techniques using recurrent neural networks
- Montreal, Canada
- S. Parveen, and P. Green Speech enhancement with missing data techniques using recurrent neural networks Proc. of ICASSP Montreal, Canada 2004
- (2004) Proc. of ICASSP
- Parveen, S.¹ Green, P.²

22
- 84858953642
- The Kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlícek, Y. Qian, and P. Schwarz et al. The Kaldi speech recognition toolkit Proc. of ASRU 2011
- (2011) Proc. of ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlícek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰

23
- 51449120120
- Boosted MMI for model and feature-space discriminative training
- IEEE
- D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah Boosted MMI for model and feature-space discriminative training Proc. of ICASSP 2008 IEEE 4057 4060
- (2008) Proc. of ICASSP , pp. 4057-4060
- Povey, D.¹ Kanevsky, D.² Kingsbury, B.³ Ramabhadran, B.⁴ Saon, G.⁵ Visweswariah, K.⁶

24
- 79959818117
- Non-negative matrix factorization based compensation of music for automatic speech recognition
- Makuhari, Japan
- B. Raj, T. Virtanen, S. Chaudhuri, and R. Singh Non-negative matrix factorization based compensation of music for automatic speech recognition Proc. of Interspeech Makuhari, Japan 2010 717 720
- (2010) Proc. of Interspeech , pp. 717-720
- Raj, B.¹ Virtanen, T.² Chaudhuri, S.³ Singh, R.⁴

25
- 51449100115
- Efficient model-based speech separation and denoising using non-negative subspace analysis
- Las Vegas, NV, USA
- S.J. Rennie, J.R. Hershey, and P.A. Olsen Efficient model-based speech separation and denoising using non-negative subspace analysis Proc. of ICASSP Las Vegas, NV, USA 2008 1833 1836
- (2008) Proc. of ICASSP , pp. 1833-1836
- Rennie, S.J.¹ Hershey, J.R.² Olsen, P.A.³

26
- 0033677121
- Maximum likelihood discriminant feature spaces
- IEEE
- G. Saon, M. Padmanabhan, R. Gopinath, and S. Chen Maximum likelihood discriminant feature spaces Proc. of ICASSP, vol. 2 2000 IEEE 1129 1132
- (2000) Proc. of ICASSP, Vol. 2 , pp. 1129-1132
- Saon, G.¹ Padmanabhan, M.² Gopinath, R.³ Chen, S.⁴

27
- 67650135931
- Recognition of noisy speech: A comparative survey of robust model architecture and feature enhancement
- (Article ID: 942617)
- B. Schuller, M. Wöllmer, T. Moosmayr, and G. Rigoll Recognition of noisy speech: a comparative survey of robust model architecture and feature enhancement. EURASIP Journal on Audio, Speech, and Music Processing 2009 2009 1 17 (Article ID: 942617)
- (2009) EURASIP Journal on Audio, Speech, and Music Processing , vol.2009 , pp. 1-17
- Schuller, B.¹ Wöllmer, M.² Moosmayr, T.³ Rigoll, G.⁴

28
- 84890545600
- Multi-task learning in deep neural networks for improved phoneme recognition
- IEEE Vancouver, Canada
- M.L. Seltzer, and J. Droppo Multi-task learning in deep neural networks for improved phoneme recognition Proc. of ICASSP 2013 IEEE Vancouver, Canada 6965 6969
- (2013) Proc. of ICASSP , pp. 6965-6969
- Seltzer, M.L.¹ Droppo, J.²

29
- 84890492030
- An investigation of deep neural networks for noise robust speech recognition
- Vancouver, Canada
- M.L. Seltzer, D. Yu, and Y. Wang An investigation of deep neural networks for noise robust speech recognition Proc. of ICASSP Vancouver, Canada 2013 7398 7402
- (2013) Proc. of ICASSP , pp. 7398-7402
- Seltzer, M.L.¹ Yu, D.² Wang, Y.³

30
- 84890503970
- Effectiveness of discriminative training and feature transformation for reverberated and noisy speech
- Vancouver, Canada
- Y. Tachioka, S. Watanabe, and J.R. Hershey Effectiveness of discriminative training and feature transformation for reverberated and noisy speech Proc. of ICASSP Vancouver, Canada 2013 6935 6939
- (2013) Proc. of ICASSP , pp. 6935-6939
- Tachioka, Y.¹ Watanabe, S.² Hershey, J.R.³

31
- 51449115975
- Cavendish Laboratory, University of Cambridge (Tech. rep.)
- K. Vertanen Baseline WSJ acoustic models for HTK and Sphinx: training recipes and recognition experiments 2006 Cavendish Laboratory, University of Cambridge (Tech. rep.)
- (2006) Baseline WSJ Acoustic Models for HTK and Sphinx: Training Recipes and Recognition Experiments
- Vertanen, K.¹

32
- 84890541701
- The second 'CHiME' speech separation and recognition challenge: Datasets, tasks and baselines
- Vancouver, Canada
- E. Vincent, J. Barker, S. Watanabe, J. Le Roux, F. Nesta, and M. Matassoni The second 'CHiME' speech separation and recognition challenge: datasets, tasks and baselines Proc. of ICASSP Vancouver, Canada 2013 126 130
- (2013) Proc. of ICASSP , pp. 126-130
- Vincent, E.¹ Barker, J.² Watanabe, S.³ Le Roux, J.⁴ Nesta, F.⁵ Matassoni, M.⁶

33
- 84900537286
- The Munich feature enhancement approach to the 2013 CHiME challenge using BLSTM recurrent neural networks
- June IEEE Vancouver, Canada
- F. Weninger, J. Geiger, M. Wöllmer, B. Schuller, and G. Rigoll The Munich feature enhancement approach to the 2013 CHiME challenge using BLSTM recurrent neural networks Proceedings of the 2nd CHiME Workshop on Machine Listening in Multisource Environments held in Conjunction with ICASSP 2013 June 2013 IEEE Vancouver, Canada 86 90
- (2013) Proceedings of the 2nd CHiME Workshop on Machine Listening in Multisource Environments Held in Conjunction with ICASSP 2013 , pp. 86-90
- Weninger, F.¹ Geiger, J.² Wöllmer, M.³ Schuller, B.⁴ Rigoll, G.⁵

34
- 84867600087
- Non-negative matrix factorization for highly noise-robust ASR: To enhance or to recognize?
- IEEE Kyoto, Japan
- F. Weninger, M. Wöllmer, J. Geiger, B. Schuller, J.F. Gemmeke, A. Hurmalainen, T. Virtanen, and G. Rigoll Non-negative matrix factorization for highly noise-robust ASR: to enhance or to recognize? Proc. of ICASSP 2012 IEEE Kyoto, Japan 4681 4684
- (2012) Proc. of ICASSP , pp. 4681-4684
- Weninger, F.¹ Wöllmer, M.² Geiger, J.³ Schuller, B.⁴ Gemmeke, J.F.⁵ Hurmalainen, A.⁶ Virtanen, T.⁷ Rigoll, G.⁸

35
- 81355147535
- Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting
- M. Wöllmer, E. Marchi, S. Squartini, and B. Schuller Multi-stream LSTM-HMM decoding and histogram equalization for noise robust keyword spotting Cognitive Neurodynamics 5 3 2011 253 264
- (2011) Cognitive Neurodynamics , vol.5 , Issue.3 , pp. 253-264
- Wöllmer, M.¹ Marchi, E.² Squartini, S.³ Schuller, B.⁴

36
- 84865748400
- Feature frame stacking in RNN-based tandem ASR systems - Learned vs. Predefined context
- Florence, Italy
- M. Wöllmer, B. Schuller, and G. Rigoll Feature frame stacking in RNN-based tandem ASR systems - learned vs. predefined context Proc. of Interspeech Florence, Italy 2011 1233 1236
- (2011) Proc. of Interspeech , pp. 1233-1236
- Wöllmer, M.¹ Schuller, B.² Rigoll, G.³

37
- 84890489927
- Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise
- Vancouver, Canada
- M. Wöllmer, Z. Zhang, F. Weninger, B. Schuller, and G. Rigoll Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise Proc. of ICASSP Vancouver, Canada 2013 6822 6826
- (2013) Proc. of ICASSP , pp. 6822-6826
- Wöllmer, M.¹ Zhang, Z.² Weninger, F.³ Schuller, B.⁴ Rigoll, G.⁵

38
- 64849090257
- Cambridge University Press Cambridge, UK
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland The HTK Book (v3.4) 2006 Cambridge University Press Cambridge, UK
- (2006) The HTK Book (v3.4)
- Young, S.¹ Evermann, G.² Gales, M.³ Hain, T.⁴ Kershaw, D.⁵ Liu, X.⁶ Moore, G.⁷ Odell, J.⁸ Ollason, D.⁹ Povey, D.¹⁰ Valtchev, V.¹¹ Woodland, P.¹²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.