SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2015-August, Issue , 2015, Pages 4360-4364

Far-field speech recognition using CNN-DNN-HMM with convolution in time

(3) Yoshioka, Takuya a Karita, Shigeki a,b Nakatani, Tomohiro a

a NTT Communication Science Laboratories (Japan)

b OSAKA UNIVERSITY (Japan)

Author keywords

convolutional neural network; deep neural network; Far field speech recognition; reverberation

Indexed keywords

AUDIO SIGNAL PROCESSING; CONVOLUTION; DEEP NEURAL NETWORKS; NEURAL NETWORKS; REVERBERATION; SPEECH; SPEECH COMMUNICATION;

CONVOLUTIONAL NEURAL NETWORK; DEREVERBERATION; FAR FIELD; LONG-TERM CORRELATIONS; MODELLING CAPABILITIES; SHORT-TIME CORRELATIONS; SPECTRAL ENERGY; TEST ENVIRONMENT;

SPEECH RECOGNITION;

EID: 84946020145 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2015.7178794 Document Type: Conference Paper

Times cited : (29)

References (30)

1
- 0029306621
- Continuous speech recognition: An introduction to the hybrid HMM/connectionist approach
- N. Morgan and H. Bourlard, Continuous speech recognition: an introduction to the hybrid HMM/connectionist approach, IEEE Signal Process. Mag., vol. 12, no. 3, pp. 24-42, 1995
- (1995) IEEE Signal Process. Mag , vol.12 , Issue.3 , pp. 24-42
- Morgan, N.¹ Bourlard, H.²

2
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition, IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 1, pp. 30-42, 2012
- (2012) IEEE Trans. Audio, Speech, Language Process , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

3
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kinsgbury, Deep neural networks for acoustic modeling in speech recognition, IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, 2012
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kinsgbury, B.¹¹

4
- 84919935784
- Environmentally robust ASR front-end for deep neural network acoustic models
- T. Yoshioka and M. J. F. Gales, Environmentally robust ASR front-end for deep neural network acoustic models, Comp. Speech, Language, vol. 31, no. 1, pp. 65-86, 2015
- (2015) Comp. Speech, Language , vol.31 , Issue.1 , pp. 65-86
- Yoshioka, T.¹ Gales, M.J.F.²

5
- 84904512262
- Neural networks for distant speech recognition
- S. Renals and P. Swietojanski, Neural networks for distant speech recognition, in Proc. Joint Workshop Hands-free Speech Commun. Microphone Arrays, 2014, pp. 172-176
- (2014) Proc. Joint Workshop Hands-free Speech Commun. Microphone Arrays , pp. 172-176
- Renals, S.¹ Swietojanski, P.²

6
- 84933559263
- Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge
- M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, I. Nobutaka, K. Kinoshita, M. Espi, T. Hori, T. Nakatani, and A. Nakamura, Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge, in Proc. REVERB Worksh., 2014
- (2014) Proc. REVERB Worksh
- Delcroix, M.¹ Yoshioka, T.² Ogawa, A.³ Kubo, Y.⁴ Fujimoto, M.⁵ Nobutaka, I.⁶ Kinoshita, K.⁷ Espi, M.⁸ Hori, T.⁹ Nakatani, T.¹⁰ Nakamura, A.¹¹

7
- 85032751613
- Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition
- T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, and W. Kellermann, Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition, IEEE Signal Process. Mag., vol. 29, no. 6, pp. 114-126, 2012
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 114-126
- Yoshioka, T.¹ Sehr, A.² Delcroix, M.³ Kinoshita, K.⁴ Maas, R.⁵ Nakatani, T.⁶ Kellermann, W.⁷

8
- 84867693894
- Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening
- T. Yoshioka and T. Nakatani, Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening, IEEE Trans. Audio, Speech, Language Process., vol. 20, no. 10, pp. 2707-2720, 2012
- (2012) IEEE Trans. Audio, Speech, Language Process , vol.20 , Issue.10 , pp. 2707-2720
- Yoshioka, T.¹ Nakatani, T.²

9
- 77955673019
- Model-based feature enhancement for reverberant speech recognition
- A. Krueger and R. Haeb-Umbach, Model-based feature enhancement for reverberant speech recognition, IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 7, pp. 1692-1707, 2010
- (2010) IEEE Trans. Audio, Speech, Language Process , vol.18 , Issue.7 , pp. 1692-1707
- Krueger, A.¹ Haeb-Umbach, R.²

10
- 79961153040
- Model-based approaches to handling additive noise in reverberant environments
- M. J. F. Gales and Y.-Q. Wang, Model-based approaches to handling additive noise in reverberant environments, in Proc. Joint Workshop Hands-free Speech Commun. Microphone Arrays, 2011, pp. 121-126
- (2011) Proc. Joint Workshop Hands-free Speech Commun. Microphone Arrays , pp. 121-126
- Gales, M.J.F.¹ Wang, Y.-Q.²

11
- 77955683144
- Reverberation modelbased decoding in the logmelspec domain for robust distanttalking speech recognition
- A. Sehr, R. Maas, and W. Kellermann, Reverberation modelbased decoding in the logmelspec domain for robust distanttalking speech recognition, IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 7, pp. 1676-1691, 2010
- (2010) IEEE Trans. Audio, Speech, Language Process , vol.18 , Issue.7 , pp. 1676-1691
- Sehr, A.¹ Maas, R.² Kellermann, W.³

12
- 84858954417
- Improving reverberant VTS for hands-free robust speech recognition
- Y.-Q. Wang and M. J. F. Gales, Improving reverberant VTS for hands-free robust speech recognition, in Proc. Workshop. Automat. Speech Recognition, Understanding, 2011, pp. 113-118
- (2011) Proc. Workshop. Automat. Speech Recognition, Understanding , pp. 113-118
- Wang, Y.-Q.¹ Gales, M.J.F.²

13
- 84905247922
- Impact of singlemicrophone dereverberation on DNN-based meeting transcription systems
- T. Yoshioka, X. Chen, and M. J. F. Gales, Impact of singlemicrophone dereverberation on DNN-based meeting transcription systems, in Proc. Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 5527-5531
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 5527-5531
- Yoshioka, T.¹ Chen, X.² Gales, M.J.F.³

14
- 14344274593
- A new method based on spectral subtraction for speech dereverberation
- K. Lebart, J. M. Boucher, and P. N. Denbigh, A new method based on spectral subtraction for speech dereverberation, Acta Acustica United with Acustica, vol. 87, pp. 359-366, 2001
- (2001) Acta Acustica United with Acustica , vol.87 , pp. 359-366
- Lebart, K.¹ Boucher, J.M.² Denbigh, P.N.³

15
- 51449084820
- Ph.D. thesis, Eindhoven University of Technology
- E. A. P. Habets, Single-and multi-microphone speech dereverberation using spectral enhancement, Ph.D. thesis, Eindhoven University of Technology, 2006
- (2006) Single-and Multi-microphone Speech Dereverberation Using Spectral Enhancement
- Habets, E.A.P.¹

16
- 78049354148
- Maximum-likelihood-based cepstral inverse filtering for blind speech dereverberation
- K. Kumar and R. Stern, Maximum-likelihood-based cepstral inverse filtering for blind speech dereverberation, in Proc. Int. Conf. Acoust., Speech, Signal Process., 2010, pp. 4282-4285
- (2010) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 4282-4285
- Kumar, K.¹ Stern, R.²

17
- 84893668957
- Investigation of multilingual deep neural networks for spoken term detection
- K. M. Knill, M. J. F. Gales, S. P. Rath, P. C. Woodland, C. Zhang, and S.-X Zhang, Investigation of multilingual deep neural networks for spoken term detection, in Proc. Workshop. Automat. Speech Recognition, Understanding, 2013, pp. 138-143
- (2013) Proc. Workshop. Automat. Speech Recognition, Understanding , pp. 138-143
- Knill, K.M.¹ Gales, M.J.F.² Rath, S.P.³ Woodland, P.C.⁴ Zhang, C.⁵ Zhang, S.-X.⁶

18
- 84905252069
- Combining time-and frequency-domain convolution in convolutional neural network-based phone recognition
- L. Tóth, Combining time-and frequency-domain convolution in convolutional neural network-based phone recognition, in Proc. Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 190-194
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 190-194
- Tóth, L.¹

19
- 84906214784
- Exploring convolutional neural network structures and optimization techniques for speech recognition
- O. Abdel-Hamid, L. Deng, and D. Yu, Exploring convolutional neural network structures and optimization techniques for speech recognition, in Proc. Interspeech, 2013, pp. 3366-3370
- (2013) Proc. Interspeech , pp. 3366-3370
- Abdel-Hamid, O.¹ Deng, L.² Yu, D.³

20
- 84893654379
- Improvements to deep convolutional neural networks for LVCSR
- T. N. Sainath, B. Kingsbury, A.-r. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, and B. Ramabhadran, Improvements to deep convolutional neural networks for LVCSR, in Proc.Workshop. Automat. Speech Recognition, Understanding, 2013, pp. 315-320
- (2013) Proc.Workshop. Automat. Speech Recognition, Understanding , pp. 315-320
- Sainath, T.N.¹ Kingsbury, B.² Mohamed, A.-R.³ Dahl, G.E.⁴ Saon, G.⁵ Soltau, H.⁶ Beran, T.⁷ Aravkin, A.Y.⁸ Ramabhadran, B.⁹

21
- 84901999583
- Convolutional neural networks for distant speech recognition
- P. Swietojanski, A. Ghoshal, and S. Renals, Convolutional neural networks for distant speech recognition, IEEE Signal Process. Letters, vol. 21, no. 9, pp. 1120-1124, 2014
- (2014) IEEE Signal Process. Letters , vol.21 , Issue.9 , pp. 1120-1124
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

22
- 84946069718
- The NTU-ADSC systems for reverberation challenge 2014
- X. Xiao, Z. Shengkui, D. H. H. Nguyen, Z. Xionghu, D. Jones, E.-S. Chng, and H. Li, The NTU-ADSC systems for reverberation challenge 2014, in Proc. REVERB Worksh., 2014
- (2014) Proc. REVERB Worksh
- Xiao, X.¹ Shengkui, Z.² Nguyen, D.H.H.³ Xionghu, Z.⁴ Jones, D.⁵ Chng, E.-S.⁶ Li, H.⁷

23
- 84946035684
- The MERL/MELCO/TUM system for the REVERB challenge using deep recurrent neural network feature enhancement
- F. J. Weninger, S. Watanabe, J. Le Roux, J. Hershey, Y. Tachioka, J. T. Geiger, B. W. Schuller, and G. Rigoll, The MERL/MELCO/TUM system for the REVERB challenge using deep recurrent neural network feature enhancement, in Proc. REVERB Worksh., 2014
- (2014) Proc. REVERB Worksh
- Weninger, F.J.¹ Watanabe, S.² Le Roux, J.³ Hershey, J.⁴ Tachioka, Y.⁵ Geiger, J.T.⁶ Schuller, B.W.⁷ Rigoll, G.⁸

24
- 84928158249
- Robust features and system fusion for reverberation-robust speech recognition
- V. Mitra, W. Wang, Y. Lei, A. Kathol, G. Sivaraman, and C. Espy-Wilson, Robust features and system fusion for reverberation-robust speech recognition, in Proc. REVERB Worksh., 2014
- (2014) Proc. REVERB Worksh
- Mitra, V.¹ Wang, W.² Lei, Y.³ Kathol, A.⁴ Sivaraman, G.⁵ Espy-Wilson, C.⁶

25
- 84905252792
- Joint noise adaptive training for robust automatic speech recognition
- A. Narayanan and D. Wang, Joint noise adaptive training for robust automatic speech recognition, in Proc. Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 2523-2527
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 2523-2527
- Narayanan, A.¹ Wang, D.²

26
- 84905216003
- Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition
- F.Weninger, S.Watanabe, Y. Tachioka, and B. Schuller, Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition, in Proc. Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 4656-4660
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 4656-4660
- Weninger, F.¹ Watanabe, S.² Tachioka, Y.³ Schuller, B.⁴

27
- 84911473441
- Convolutional neural networks for speech recognition
- O. Abdel-Hamid, A.-r. Mohamed, H. Jiang, L. Deng, G. Penn, and D. Yu, Convolutional neural networks for speech recognition, IEEE/ACM Trans. Audio, Speech, Language Process., vol. 22, no. 10, pp. 1533-1545, 2014
- (2014) IEEE/ACM Trans. Audio, Speech, Language Process , vol.22 , Issue.10 , pp. 1533-1545
- Abdel-Hamid, O.¹ Mohamed, A.-R.² Jiang, H.³ Deng, L.⁴ Penn, G.⁵ Yu, D.⁶

28
- 84893622444
- The REVERB challenge: A common evaluation framework for dereverberation and recognition of reverberant speech
- K. Kinoshita, T. Yoshioka, T. Nakatani, A. Sehr, W. Kellermann, and R. Maas, The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech, in Proc. IEEE Worksh. Appl. Signal Process. Audio, Acoust., 2013
- (2013) Proc. IEEE Worksh. Appl. Signal Process. Audio, Acoust
- Kinoshita, K.¹ Yoshioka, T.² Nakatani, T.³ Sehr, A.⁴ Kellermann, W.⁵ Maas, R.⁶

29
- 84946030287
- The REVERB challenge-evaluating de-reverberation and ASR techniques in reverberant environments, http://reverb2014.dereverberation.com
- The REVERB Challenge-evaluating De-reverberation and ASR Techniques in Reverberant Environments

30
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, Feature engineering in context-dependent deep neural networks for conversational speech transcription, in Proc. Workshop. Automat. Speech Recognition, Understanding, 2011, pp. 24-29
- (2011) Proc. Workshop. Automat. Speech Recognition, Understanding , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.