SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2013, Pages 2992-2996

Is speech enhancement pre-processing still relevant when using deep neural networks for acoustic modeling?

(4) Delcroix, Marc a Kubo, Yotaro a Nakatani, Tomohiro a Nakamura, Atsushi a

a Nippon Telegraph and Telephone Corporation (Japan)

Author keywords

Deep neural network; Multi condition training; Robust speech recognition; Speech enhancement

Indexed keywords

HUMAN COMPUTER INTERACTION; SPEECH ENHANCEMENT;

AUTOMATIC SPEECH RECOGNITION; DEEP NEURAL NETWORKS; DISTANT SPEECH RECOGNITION; GAUSSIAN MIXTURE MODEL (GMMS); MULTI-CONDITION TRAININGS; ROBUST SPEECH RECOGNITION; SPEAKER VARIABILITY; TRAINING AND TESTING;

SPEECH RECOGNITION;

EID: 84906222220 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (47)

References (33)

1
- 84055211743
- Acoustic modeling using deep belief networks
- A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 1, pp. 14-22, 2012.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

2
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, " IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

3
- 78650474133
- Tech. Rep
- G. Hinton, "A practical guide to training restricted Boltzmann machines, " Tech. Rep., 2010.
- (2010) A Practical Guide to Training Restricted Boltzmann Machines
- Hinton, G.¹

4
- 78650904464
- Hanover, MA, USA: Now Publishers Inc
- Y. Bengio, Learning deep architectures for AI. Hanover, MA, USA: Now Publishers Inc., 2009.
- (2009) Learning Deep Architectures for AI
- Bengio, Y.¹

5
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Proc. IEEE Workshop on Automatic Speech Recognition Understanding (ASRU), 2011, pp. 24-29.
- (2011) Proc. IEEE Workshop on Automatic Speech Recognition Understanding (ASRU) , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

6
- 84867585919
- Understanding how deep belief networks perform acoustic modelling
- A. Mohamed, G. Hinton, and G. Penn, "Understanding how deep belief networks perform acoustic modelling, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 4273-4276.
- (2012) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP , pp. 4273-4276
- Mohamed, A.¹ Hinton, G.² Penn, G.³

7
- 50449083999
- Chichester, UK: John Wiley and Sons Ltd
- M. Woelfel and J. McDonough, Distant speech recognition. Chichester, UK: John Wiley and Sons Ltd, 2009.
- (2009) Distant Speech Recognition
- Woelfel, M.¹ McDonough, J.²

8
- 84891583985
- Chichester, UK: John Wiley and Sons Ltd
- T. Virtanen, R. Singh, and B. Raj, Technique for noise robustness in automatic speech recognition. Chichester, UK: John Wiley and Sons Ltd, 2012.
- (2012) Technique for Noise Robustness in Automatic Speech Recognition
- Virtanen, T.¹ Singh, R.² Raj, B.³

9
- 0030245128
- Robust continuous speech recognition using parallel model combination
- M. J. F. Gales and S. Young, "Robust continuous speech recognition using parallel model combination, " IEEE Transactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352-359, 1996.
- (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 352-359
- Gales, M.J.F.¹ Young, S.²

10
- 0032048385
- Speech recognition in noisy environments using first-order vector Taylor series
- D. Y. Kim, C. K. Un, and N. S. Kim, "Speech recognition in noisy environments using first-order vector Taylor series, " Speech Communication, pp. 39-49, 1998.
- (1998) Speech Communication , pp. 39-49
- Kim, D.Y.¹ Un, C.K.² Kim, N.S.³

11
- 85009113852
- HMM adaptation using vector Taylor series for noisy speech recognition
- A. Acero, L. Deng, T. Kristjansson, and J. Zhang, "HMM adaptation using vector Taylor series for noisy speech recognition, " in Proc. International Conference on Spoken Language Processing (ICSLP), 2000, pp. 869-872.
- (2000) Proc. International Conference on Spoken Language Processing (ICSLP) , pp. 869-872
- Acero, A.¹ Deng, L.² Kristjansson, T.³ Zhang, J.⁴

12
- 0035396555
- Noise power spectral density estimation based on optimal smoothing and minimum statistics
- R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics, " IEEE Transactions on Speech and Audio Processing, vol. 9, no. 5, pp. 504-512, 2001.
- (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.5 , pp. 504-512
- Martin, R.¹

13
- 77949352396
- Hierarchical variational loopy belief propagation for multi-talker speech recognition
- S. Rennie, J. Hershey, and P. Olsen, "Hierarchical variational loopy belief propagation for multi-talker speech recognition, " in Proc. IEEE Workshop on Automatic Speech Recognition Understanding (ASRU), 2009, pp. 176-181.
- (2009) Proc. IEEE Workshop on Automatic Speech Recognition Understanding (ASRU) , pp. 176-181
- Rennie, S.¹ Hershey, J.² Olsen, P.³

14
- 79959854950
- Multichannel source separation based on source location cue with logspectral shaping by hidden Markov source model
- T. Nakatani, S. Araki, T. Yoshioka, and M. Fujimoto, "Multichannel source separation based on source location cue with logspectral shaping by hidden Markov source model, " in Proc. Interspeech, 2010, pp. 2766-2769.
- (2010) Proc. Interspeech , pp. 2766-2769
- Nakatani, T.¹ Araki, S.² Yoshioka, T.³ Fujimoto, M.⁴

15
- 84865754161
- Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR
- T. Nakatani, S. Araki, M. Delcroix, T. Yoshioka, and M. Fujimoto, "Reduction of highly nonstationary ambient noise by integrating spectral and locational characteristics of speech and noise for robust ASR, " in Proc. Interspeech, 2011, pp. 1785-1788.
- (2011) Proc. Interspeech , pp. 1785-1788
- Nakatani, T.¹ Araki, S.² Delcroix, M.³ Yoshioka, T.⁴ Fujimoto, M.⁵

16
- 77955673019
- Model-based feature enhancement for reverberant speech recognition
- A. Krueger and R. Haeb-Umbach, "Model-based feature enhancement for reverberant speech recognition, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 7, pp. 1692-1707, 2010.
- (2010) IEEE Transactions on Audio, Speech, and Language Processing , vol.18 , Issue.7 , pp. 1692-1707
- Krueger, A.¹ Haeb-Umbach, R.²

17
- 85009070292
- Large-vocabulary speech recognition under adverse acoustic environments
- L. Deng, A. Acero, M. Plumpe, and X. Huang, "Large-vocabulary speech recognition under adverse acoustic environments, " in Proc. International Conference on Spoken Language Processing (ICSLP), 2000, pp. 806-809.
- (2000) Proc. International Conference on Spoken Language Processing (ICSLP) , pp. 806-809
- Deng, L.¹ Acero, A.² Plumpe, M.³ Huang, X.⁴

18
- 51449102822
- Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer
- M. Delcroix, T. Nakatani, and S. Watanabe, "Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2008, pp. 4073-4076.
- (2008) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4073-4076
- Delcroix, M.¹ Nakatani, T.² Watanabe, S.³

19
- 84945900998
- Best practices for convolutional neural networks applied to visual document analysis
- P. Simard, D. Steinkraus, and J. C. Platt, "Best practices for convolutional neural networks applied to visual document analysis, " in Proc. International Conference on Document Analysis and Recognition, 2003, pp. 958-963.
- (2003) Proc. International Conference on Document Analysis and Recognition , pp. 958-963
- Simard, P.¹ Steinkraus, D.² Platt, J.C.³

20
- 45749110924
- Representational power of restricted Boltzmann machines and deep belief networks
- N. L. Roux and Y. Bengio, "Representational power of restricted Boltzmann machines and deep belief networks, " Neural Computation, vol. 20, no. 6, pp. 1631-1649, 2008.
- (2008) Neural Computation , vol.20 , Issue.6 , pp. 1631-1649
- Roux, N.L.¹ Bengio, Y.²

21
- 84867591985
- LogMax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise
- T. Nakatani, T. Yoshioka, S. Araki, M. Delcroix, and M. Fujimoto, "LogMax observation model with MFCC-based spectral prior for reduction of highly nonstationary ambient noise, " in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2012, pp. 4029-4032.
- (2012) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4029-4032
- Nakatani, T.¹ Yoshioka, T.² Araki, S.³ Delcroix, M.⁴ Fujimoto, M.⁵

22
- 84887382524
- Dominance based integration of spatial and spectral features for speech enhancement
- T. Nakatani, T. Yoshioka, S. Araki, M. Delcroix, and M. Fujimoto, "Dominance based integration of spatial and spectral features for speech enhancement, " Submitted to IEEE Transactions on Audio, Speech, and Language Processing, 2013.
- (2013) IEEE Transactions on Audio, Speech, and Language Processing
- Nakatani, T.¹ Yoshioka, T.² Araki, S.³ Delcroix, M.⁴ Fujimoto, M.⁵

23
- 84887395149
- Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds
- M. Delcroix, K. Kinoshita, T. Nakatani, S. Araki, A. Ogawa, T. Hori, S.Watanabe, M. Fujimoto, T. Yoshioka, T. Oba, Y. Kubo, M. Souden, S.-J. Hahm, and A. Nakamura, "Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds, " Computer Speech&Language, vol. 27, no. 3, pp. 851 - 873, 2013.
- (2013) Computer Speech & Language , vol.27 , Issue.3 , pp. 851-873
- Delcroix, M.¹ Kinoshita, K.² Nakatani, T.³ Araki, S.⁴ Ogawa, A.⁵ Hori, T.⁶ Watanabe, S.⁷ Fujimoto, M.⁸ Yoshioka, T.⁹ Oba, T.¹⁰ Kubo, Y.¹¹ Souden, M.¹² Hahm, S.-J.¹³ Nakamura, A.¹⁴

24
- 78650016939
- Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment
- H. Sawada, S. Araki, and S. Makino, "Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 3, pp. 516-527, 2010.
- (2010) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.3 , pp. 516-527
- Sawada, H.¹ Araki, S.² Makino, S.³

25
- 84878543263
- The PASCAL CHiME speech separation and recognition challenge
- J. Barker, E. Vincent, N. Ma, H. Christensen, and P. Green, "The PASCAL CHiME speech separation and recognition challenge, " Computer Speech&Language, vol. 27, no. 3, pp. 621 - 633, 2013.
- (2013) Computer Speech&Language , vol.27 , Issue.3 , pp. 621-633
- Barker, J.¹ Vincent, E.² Ma, N.³ Christensen, H.⁴ Green, P.⁵

26
- 85018751865
- cited April 24 2012
- J. Barker, E. Vincent, N. Ma, C. Christensen, and P. Green, "The PASCAL CHiME peech separation and recognition challenge, " http://www.dcs.shef.ac.uk/spandh/chime/challenge.html cited April 24 2012.
- The PASCAL CHiME Peech Separation and Recognition Challenge
- Barker, J.¹ Vincent, E.² Ma, N.³ Christensen, C.⁴ Green, P.⁵

27
- 45849093239
- Efficient WFSTbased one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition
- T. Hori, C. Hori, Y. Minami, and A. Nakamura, "Efficient WFSTbased one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 4, pp. 1352-1365, 2006.
- (2006) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.4 , pp. 1352-1365
- Hori, T.¹ Hori, C.² Minami, Y.³ Nakamura, A.⁴

28
- 70450194926
- Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training
- E. McDermott, S. Watanabe, and A. Nakamura, "Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training, " in Proc. Interspeech, 2009, pp. 224-227.
- (2009) Proc. Interspeech , pp. 224-227
- McDermott, E.¹ Watanabe, S.² Nakamura, A.³

29
- 0036296863
- Minimum phone error and Ismoothing for improved discriminative training
- IEEE
- D. Povey and P. Woodland, "Minimum phone error and Ismoothing for improved discriminative training, " in Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1. IEEE, 2002, pp. 105-108.
- (2002) Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , vol.1 , pp. 105-108
- Povey, D.¹ Woodland, P.²

30
- 84874226579
- Adaptation of context-dependent deep neural networks for automatic speech recognition
- K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong, "Adaptation of context-dependent deep neural networks for automatic speech recognition, " in Proc. IEEE Spoken Language Technology Workshop (SLT), 2012, pp. 366-369.
- (2012) Proc. IEEE Spoken Language Technology Workshop (SLT) , pp. 366-369
- Yao, K.¹ Yu, D.² Seide, F.³ Su, H.⁴ Deng, L.⁵ Gong, Y.⁶

31
- 84866720201
- Robust Boltzmann machines for recognition and denoising
- Y. Tang, R. Salakhutdinov, and G. E. Hinton, "Robust Boltzmann machines for recognition and denoising, " in Proc. IEEE International Conference on Computer Vision and Pattern Recognition, 2012, pp. 2264-2271.
- (2012) Proc. IEEE International Conference on Computer Vision and Pattern Recognition , pp. 2264-2271
- Tang, Y.¹ Salakhutdinov, R.² Hinton, G.E.³

32
- 56449089103
- Extracting and composing robust features with denoising autoencoders
- P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, "Extracting and composing robust features with denoising autoencoders, " in Proc. international conference on Machine learning, 2008, pp. 1096-1103.
- (2008) Proc. International Conference on Machine Learning , pp. 1096-1103
- Vincent, P.¹ Larochelle, H.² Bengio, Y.³ Manzagol, P.-A.⁴

33
- 84878409063
- Recurrent neural networks for noise reduction in robust ASR
- A. Maas, Q. Le, T. O'Neil, O. Vinyals, P. Nguyen, and A. Ng, "Recurrent neural networks for noise reduction in robust ASR, " in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Maas, A.¹ Le, Q.² O'Neil, T.³ Vinyals, O.⁴ Nguyen, P.⁵ Ng, A.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.