SCOPUS 정보 검색 플랫폼

2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings

Volumn , Issue , 2016, Pages 167-174

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding

(3) Miao, Yajie a Gowayyed, Mohammad a Metze, Florian a

a Carnegie Mellon University (United States)

Author keywords

connectionist temporal classification; end to end ASR; Recurrent neural network

Indexed keywords

DECODING; RECURRENT NEURAL NETWORKS; SPEECH;

AUTOMATIC SPEECH RECOGNITION; CONTEXT INDEPENDENT; DEEP NEURAL NETWORKS; END TO END; OBJECTIVE FUNCTIONS; RECURRENT NEURAL NETWORK (RNN); TEMPORAL CLASSIFICATION; WEIGHTED FINITE-STATE TRANSDUCERS;

SPEECH RECOGNITION;

EID: 84964489732 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ASRU.2015.7404790 Document Type: Conference Paper

Times cited : (792)

References (38)

1
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- George E Dahl, Dong Yu, Li Deng, and Alex Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 30-42, 2012
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

2
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," Signal Processing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012
- (2012) Signal Processing Magazine, IEEE , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

3
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- Frank Seide, Gang Li, Xie Chen, and Dong Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011, pp. 24-29
- (2011) Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop On. IEEE , pp. 24-29
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

4
- 84910041921
- GMM-free DNN training
- Andrew Senior, Georg Heigold, Michiel Bacchiani, and Hank Liao, "GMM-free DNN training," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 5639-5643
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference On. IEEE , pp. 5639-5643
- Senior, A.¹ Heigold, G.² Bacchiani, M.³ Liao, H.⁴

5
- 84910072495
- Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition
- Michiel Bacchiani, Andrew Senior, and Georg Heigold, "Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition," in Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2014
- (2014) Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA
- Bacchiani, M.¹ Senior, A.² Heigold, G.³

6
- 84919832465
- Towards end-to-end speech recognition with recurrent neural networks
- Alex Graves and Navdeep Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proceedings of the 31st International Conference on Machine Learning (ICML-14), 2014, pp. 1764-1772
- (2014) Proceedings of the 31st International Conference on Machine Learning (ICML-14) , pp. 1764-1772
- Graves, A.¹ Jaitly, N.²

7
- 84928545733
- arXiv preprint arXiv:1412.5567
- Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, et al., "Deepspeech: Scaling up end-to-end speech recognition," arXiv preprint arXiv:1412.5567, 2014
- (2014) Deepspeech: Scaling Up End-to-end Speech Recognition
- Hannun, A.¹ Case, C.² Casper, J.³ Catanzaro, B.⁴ Diamos, G.⁵ Elsen, E.⁶ Prenger, R.⁷ Satheesh, S.⁸ Sengupta, S.⁹ Coates, A.¹⁰

8
- 84957716354
- arXiv preprint arXiv:1408.2873
- Awni Y Hannun, Andrew L Maas, Daniel Jurafsky, and Andrew Y Ng, "First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs," arXiv preprint arXiv:1408.2873, 2014
- (2014) First-pass Large Vocabulary Continuous Speech Recognition Using Bi-directional Recurrent DNNs
- Hannun, A.Y.¹ Maas, A.L.² Jurafsky, D.³ Ng, A.Y.⁴

9
- 84959066041
- arXiv preprint arXiv:1412.1602
- Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio, "End-to-end continuous speech recognition using attention-based recurrent NN: First results," arXiv preprint arXiv:1412.1602, 2014
- (2014) End-to-end Continuous Speech Recognition Using Attention-based Recurrent NN: First Results
- Chorowski, J.¹ Bahdanau, D.² Cho, K.³ Bengio, Y.⁴

10
- 84960153975
- Lexicon-free conversational speech recognition with neural networks
- Andrew L Maas, Ziang Xie, Dan Jurafsky, and Andrew Y Ng, "Lexicon-free conversational speech recognition with neural networks," in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2015
- (2015) Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
- Maas, A.L.¹ Xie, Z.² Jurafsky, D.³ Ng, A.Y.⁴

11
- 84994195882
- arXiv preprint arXiv:1508.04395
- Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, and Yoshua Bengio, "End-to-end attention-based large vocabulary speech recognition," arXiv preprint arXiv:1508.04395, 2015
- (2015) End-to-end Attention-based Large Vocabulary Speech Recognition
- Bahdanau, D.¹ Chorowski, J.² Serdyuk, D.³ Brakel, P.⁴ Bengio, Y.⁵

12
- 84994328213
- arXiv preprint arXiv:1508.01211
- William Chan, Navdeep Jaitly, Quoc V Le, and Oriol Vinyals, "Listen, attend and spell," arXiv preprint arXiv:1508.01211, 2015
- (2015) Listen, Attend and Spell
- Chan, W.¹ Jaitly, N.² Le, Q.V.³ Vinyals, O.⁴

13
- 33749259827
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- Alex Graves, Santiago Fernández, Faustino Gomez, and Jurgen Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proceedings of the 23rd international conference on Machine learning. ACM, 2006, pp. 369-376
- (2006) Proceedings of the 23rd International Conference on Machine Learning. ACM , pp. 369-376
- Graves, A.¹ Fernández, S.² Gomez, F.³ Schmidhuber, J.⁴

14
- 84946084790
- Learning acoustic frame labeling for speech recognition with recurrent neural networks
- Hasim Sak, Andrew Senior, Kanishka Rao, Ozan Irsoy, Alex Graves, Francoise Beaufays, and Johan Schalkwyk, "Learning acoustic frame labeling for speech recognition with recurrent neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4280-4284
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference On. IEEE , pp. 4280-4284
- Sak, H.¹ Senior, A.² Rao, K.³ Irsoy, O.⁴ Graves, A.⁵ Beaufays, F.⁶ Schalkwyk, J.⁷

15
- 84890543083
- Speech recognition with deep recurrent neural networks
- Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton, "Speech recognition with deep recurrent neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 6645-6649
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On. IEEE , pp. 6645-6649
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.³

16
- 84893701254
- Hybrid speech recognition with deep bidirectional LSTM
- Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed, "Hybrid speech recognition with deep bidirectional LSTM," in Automatic Speech Recognition and Understanding (ASRU), 2013 IEEEWorkshop on. IEEE, 2013, pp. 273-278
- (2013) Automatic Speech Recognition and Understanding (ASRU), 2013 IEEEWorkshop On. IEEE , pp. 273-278
- Graves, A.¹ Jaitly, N.² Mohamed, A.-R.³

17
- 0031573117
- Long shortterm memory
- Sepp Hochreiter and Jurgen Schmidhuber, "Long shortterm memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

18
- 84910046405
- Long short-term memory recurrent neural network architectures for large scale acoustic modeling
- Hasim Sak, Andrew Senior, and Francoise Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2014
- (2014) Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA
- Sak, H.¹ Senior, A.² Beaufays, F.³

19
- 84946037134
- Convolutional, long short-term memory, fully connected deep neural networks
- Tara N Sainath, Oriol Vinyals, Andrew Senior, and Hasim Sak, "Convolutional, long short-term memory, fully connected deep neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference On. IEEE
- Sainath, T.N.¹ Vinyals, O.² Senior, A.³ Sak, H.⁴

20
- 84938725974
- On speaker adaptation of long short-term memory recurrent neural networks
- Yajie Miao and Florian Metze, "On speaker adaptation of long short-term memory recurrent neural networks," in Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2015
- (2015) Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA
- Miao, Y.¹ Metze, F.²

21
- 0028392483
- Learning long-term dependencies with gradient descent is difficult
- Yoshua Bengio, Patrice Simard, and Paolo Frasconi, "Learning long-term dependencies with gradient descent is difficult," Neural Networks, IEEE Transactions on, vol. 5, no. 2, pp. 157-166, 1994
- (1994) Neural Networks, IEEE Transactions on , vol.5 , Issue.2 , pp. 157-166
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

22
- 0041965934
- Learning precise timing with LSTM recurrent networks
- Felix A Gers, Nicol N Schraudolph, and Jurgen Schmidhuber, "Learning precise timing with LSTM recurrent networks," The Journal of Machine Learning Research, vol. 3, pp. 115-143, 2003
- (2003) The Journal of Machine Learning Research , vol.3 , pp. 115-143
- Gers, F.A.¹ Schraudolph, N.N.² Schmidhuber, J.³

23
- 0024634603
- Phoneme recognition using time-delay neural networks
- Alex Waibel, Toshiyuki Hanazawa, Geoffrey Hinton, Kiyohiro Shikano, and Kevin J Lang, "Phoneme recognition using time-delay neural networks," Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 3, pp. 328-339, 1989
- (1989) Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.J.⁵

24
- 0025449258
- A novel objective function for improved phoneme recognition using time-delay neural networks
- John B Hampshire, Alexander HWaibel, et al., "A novel objective function for improved phoneme recognition using time-delay neural networks," Neural Networks, IEEE Transactions on, vol. 1, no. 2, pp. 216-228, 1990
- (1990) Neural Networks, IEEE Transactions on , vol.1 , Issue.2 , pp. 216-228
- Hampshire, J.B.¹ Hwaibel, A.²

25
- 84951162898
- Deep convolutional neural networks for large-scale speech tasks
- Tara N Sainath, Brian Kingsbury, George Saon, Hagen Soltau, Abdel-rahman Mohamed, George Dahl, and Bhuvana Ramabhadran, "Deep convolutional neural networks for large-scale speech tasks," Neural Networks, 2014
- (2014) Neural Networks
- Sainath, T.N.¹ Kingsbury, B.² Saon, G.³ Soltau, H.⁴ Mohamed, A.-R.⁵ Dahl, G.⁶ Ramabhadran, B.⁷

26
- 0024610919
- A tutorial on hidden Markov models and selected applications in speech recognition
- Lawrence R. R., "A tutorial on hidden Markov models and selected applications in speech recognition," Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989
- (1989) Proceedings of the IEEE , vol.77 , Issue.2 , pp. 257-286
- Lawrence, R.R.¹

27
- 0036460907
- Weighted finite-state transducers in speech recognition
- Mehryar Mohri, Fernando Pereira, and Michael Riley, "Weighted finite-state transducers in speech recognition," Computer Speech &Language, vol. 16, no. 1, pp. 69-88, 2002
- (2002) Computer Speech &Language , vol.16 , Issue.1 , pp. 69-88
- Mohri, M.¹ Pereira, F.² Riley, M.³

28
- 84858953642
- The Kaldi speech recognition toolkit
- Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukáš Burget, Ondřej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlíček, Yanmin Qian, Petr Schwarz, Jan Silovsḱy, Georg Stemmer, and Karel Veseĺy, "The Kaldi speech recognition toolkit," in Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop on. IEEE, 2011, pp. 1-4
- (2011) Automatic Speech Recognition and Understanding (ASRU), 2011 IEEE Workshop On. IEEE , pp. 1-4
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlíček, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsḱy, J.¹¹ Stemmer, G.¹² Veseĺy, K.¹³

29
- 38149133882
- OpenFst: A general and efficient weighted finite-state transducer library
- Springer
- Cyril Allauzen, Michael Riley, Johan Schalkwyk, Wojciech Skut, and Mehryar Mohri, "OpenFst: A general and efficient weighted finite-state transducer library," in Implementation and Application of Automata, pp. 11-23. Springer, 2007
- (2007) Implementation and Application of Automata , pp. 11-23
- Allauzen, C.¹ Riley, M.² Schalkwyk, J.³ Skut, W.⁴ Mohri, M.⁵

30
- 33745805403
- A fast learning algorithm for deep belief nets
- Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.¹ Osindero, S.² Teh, Y.-W.³

31
- 84962868641
- A one-pass decoder based on polymorphic linguistic context assignment
- Hagen Soltau, Florian Metze, Christian Fugen, and Alex Waibel, "A one-pass decoder based on polymorphic linguistic context assignment," in Automatic Speech Recognition and Understanding, 2001. ASRU'01. IEEE Workshop on. IEEE, 2001, pp. 214-217
- (2001) Automatic Speech Recognition and Understanding, 2001. ASRU'01. IEEE Workshop On. IEEE , pp. 214-217
- Soltau, H.¹ Metze, F.² Fugen, C.³ Waibel, A.⁴

32
- 84910068044
- Distributed learning of multilingual DNN feature extractors using GPUs
- Yajie Miao, Hao Zhang, and Florian Metze, "Distributed learning of multilingual DNN feature extractors using GPUs," in Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2014
- (2014) Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA
- Miao, Y.¹ Zhang, H.² Metze, F.³

33
- 84910028405
- Improving languageuniversal feature extraction with deep maxout and convolutional neural networks
- Yajie Miao and Florian Metze, "Improving languageuniversal feature extraction with deep maxout and convolutional neural networks," in Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2014
- (2014) Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA
- Miao, Y.¹ Metze, F.²

34
- 84959146690
- Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networks
- Jie Li, Heng Zhang, Xinyuan Cai, and Bo Xu, "Towards end-to-end speech recognition for Chinese Mandarin using long short-term memory recurrent neural networks," in Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2015
- (2015) Sixteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA
- Li, J.¹ Zhang, H.² Cai, X.³ Xu, B.⁴

35
- 84890521103
- Speaker adaptation of context dependent deep neural networks
- Hank Liao, "Speaker adaptation of context dependent deep neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7947-7951
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference On. IEEE , pp. 7947-7951
- Liao, H.¹

36
- 84874226579
- Adaptation of contextdependent deep neural networks for automatic speech recognition
- Kaisheng Yao, Dong Yu, Frank Seide, Hang Su, Li Deng, and Yifan Gong, "Adaptation of contextdependent deep neural networks for automatic speech recognition," in 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2012
- (2012) 2012 IEEE Spoken Language Technology Workshop (SLT). IEEE
- Yao, K.¹ Yu, D.² Seide, F.³ Su, H.⁴ Deng, L.⁵ Gong, Y.⁶

37
- 84910031119
- Towards speaker adaptive training of deep neural network acoustic models
- Yajie Miao, Hao Zhang, and Florian Metze, "Towards speaker adaptive training of deep neural network acoustic models," in Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA, 2014
- (2014) Fifteenth Annual Conference of the International Speech Communication Association (INTERSPEECH). ISCA
- Miao, Y.¹ Zhang, H.² Metze, F.³

38
- 84938688160
- Speaker adaptive training of deep neural network acoustic models using i-vectors
- Yajie Miao, Hao Zhang, and Florian Metze, "Speaker adaptive training of deep neural network acoustic models using i-vectors," IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), vol. 23, no. 11, pp. 1938-1949, 2015
- (2015) IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP) , vol.23 , Issue.11 , pp. 1938-1949
- Miao, Y.¹ Zhang, H.² Metze, F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.