SCOPUS 정보 검색 플랫폼

2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings

Volumn 2018-January, Issue , 2017, Pages 309-315

Attention-based Wav2Text with feature transfer learning

(3) Tjandra, Andros a Sakti, Sakriani a Nakamura, Satoshi a

a NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

Author keywords

end to end neural network; raw speech waveform; Speech recognition

Indexed keywords

DECODING; DEEP NEURAL NETWORKS; SIGNAL ENCODING; SPEECH RECOGNITION;

AUTOMATIC SPEECH RECOGNITION; ENCODER-DECODER; END TO END; END-TO-END NEURAL NETWORK; FEATURE TRANSFERS; NEURAL-NETWORKS; RAW SPEECH WAVEFORM; SPEECH WAVEFORMS; TRANSFER LEARNING;

HIDDEN MARKOV MODELS;

EID: 85050532161 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ASRU.2017.8268951 Document Type: Conference Paper

Times cited : (26)

References (31)

1
- 70349227947
- The application of hidden Markov models in speech recognition
- Mark Gales and Steve Young, "The application of hidden Markov models in speech recognition, " Foundations and Trends in Signal Processing, vol. 1, no. 3, pp. 195-304, 2008.
- (2008) Foundations and Trends in Signal Processing , vol.1 , Issue.3 , pp. 195-304
- Gales, M.¹ Young, S.²

2
- 84936133512
- Dimitri Palaz, Ronan Collobert, Mathew Magimai Doss, "End-to-end phoneme sequence recognition using convolutional neural networks, " arXiv preprint arXiv:1312.2137, 2013.
- (2013) End-to-end Phoneme Sequence Recognition Using Convolutional Neural Networks
- Palaz, D.¹ Collobert, R.² Magimai Doss, M.³

3
- 84946023646
- Convolutional neural networks-based continuous speech recognition using raw speech signal
- Dimitri Palaz, Mathew Magimai Doss, Ronan Collobert, "Convolutional neural networks-based continuous speech recognition using raw speech signal, " in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4295-4299.
- (2015) Acoustics, Speech and Signal Processing (ICASSP) 2015 IEEE International Conference On. IEEE , pp. 4295-4299
- Palaz, D.¹ Doss, M.M.² Collobert, R.³

4
- 84959168440
- Learning the speech front-end with raw waveform CLDNNs.
- Tara N Sainath, Ron J Weiss, Andrew W Senior, Kevin W Wilson, Oriol Vinyals, "Learning the speech front-end with raw waveform CLDNNs., " in Interspeech, 2015, vol. 2015.
- (2015) Interspeech , pp. 2015
- Sainath, T.N.¹ Weiss, R.J.² Senior, A.W.³ Wilson, K.W.⁴ Vinyals, O.⁵

5
- 84994235770
- Acoustic modelling from the signal domain using CNNs
- Pegah Ghahremani, Vimal Manohar, Daniel Povey, Sanjeev Khudanpur, "Acoustic modelling from the signal domain using CNNs, " in Interspeech, 2016, vol. 2016.
- (2016) Interspeech , vol.2016
- Ghahremani, P.¹ Manohar, V.² Povey, D.³ Khudanpur, S.⁴

6
- 0012330750
- The design for the Wall Street Journal-based CSR corpus
- HLT Association for Computational Linguistics
- Douglas B. Paul and Janet M. Baker, "The design for the Wall Street Journal-based CSR corpus, " in Proceedings of the Workshop on Speech and Natural Language, Stroudsburg, PA, USA, 1992, HLT '91, pp. 357-362, Association for Computational Linguistics.
- (1992) Proceedings of the Workshop on Speech and Natural Language, Stroudsburg, PA, USA , vol.91 , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

7
- 34250704813
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- Alex Graves, Santiago Fernández, Faustino Gomez, Jürgen Schmidhuber, "Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks, " in Proceedings of the 23rd International Conference on Machine learning. ACM, 2006, pp. 369-376.
- (2006) Proceedings of the 23rd International Conference on Machine Learning. ACM , pp. 369-376
- Graves, A.¹ Fernández, S.² Gomez, F.³ Schmidhuber, J.⁴

8
- 84890543083
- Speech recognition with deep recurrent neural networks
- Alex Graves, Abdel Rahman Mohamed, Geoffrey Hinton, "Speech recognition with deep recurrent neural networks, " in Acoustics, Speech and Signal processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 6645-6649.
- (2013) Acoustics, Speech and Signal Processing (ICASSP) 2013 IEEE International Conference On. IEEE , pp. 6645-6649
- Graves, A.¹ Rahman Mohamed, A.² Hinton, G.³

9
- 84971463350
- Deep speech 2: End-to-end speech recognition in English and Mandarin
- Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, JingDong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, et al., "Deep speech 2: End-to-end speech recognition in English and Mandarin, " in Proceedings of The 33rd International Conference on Machine Learning, 2016, pp. 173-182.
- (2016) Proceedings of the 33rd International Conference on Machine Learning , pp. 173-182
- Amodei, D.¹ Anubhai, R.² Battenberg, E.³ Case, C.⁴ Casper, J.⁵ Catanzaro, B.⁶ Chen, J.⁷ Chrzanowski, M.⁸ Coates, A.⁹ Diamos, G.¹⁰

10
- 84959066041
- Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, "End-to-end continuous speech recognition using attention-based recurrent NN: First results, " arXiv preprint arXiv:1412.1602, 2014.
- (2014) End-to-end Continuous Speech Recognition Using Attention-based Recurrent NN: First Results
- Chorowski, J.¹ Bahdanau, D.² Cho, K.³ Bengio, Y.⁴

11
- 84973351869
- Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
- William Chan, Navdeep Jaitly, Quoc Le, Oriol Vinyals, "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition, " in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4960-4964.
- (2016) Acoustics, Speech and Signal Processing (ICASSP) 2016 IEEE International Conference On. IEEE , pp. 4960-4964
- Chan, W.¹ Jaitly, N.² Le, Q.³ Vinyals, O.⁴

12
- 85041759100
- Ronan Collobert, Christian Puhrsch, Gabriel Synnaeve, "Wav2letter: An end-to-end convnetbased speech recognition system, " arXiv preprint arXiv:1609.03193, 2016.
- (2016) Wav2letter: An End-to-end Convnetbased Speech Recognition System
- Collobert, R.¹ Puhrsch, C.² Synnaeve, G.³

13
- 84922389693
- Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio, "Neural machine translation by jointly learning to align and translate, " arXiv preprint arXiv:1409.0473, 2014.
- (2014) Neural Machine Translation by Jointly Learning to Align and Translate
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

14
- 84994358876
- Minh-Thang Luong, Hieu Pham, Christopher D Manning, "Effective approaches to attentionbased neural machine translation, " arXiv preprint arXiv:1508.04025, 2015.
- (2015) Effective Approaches to Attentionbased Neural Machine Translation
- Luong, M.-T.¹ Pham, H.² Manning, C.D.³

15
- 0000359337
- Backpropagation applied to handwritten zip code recognition
- Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, Lawrence D Jackel, "Backpropagation applied to handwritten zip code recognition, " Neural computation, vol. 1, no. 4, pp. 541-551, 1989.
- (1989) Neural Computation , vol.1 , Issue.4 , pp. 541-551
- LeCun, Y.¹ Boser, B.² Denker, J.S.³ Henderson, D.⁴ Howard, R.E.⁵ Hubbard, W.⁶ Jackel, L.D.⁷

16
- 84939241380
- Min Lin, Qiang Chen, Shuicheng Yan, "Network in network, " arXiv preprint arXiv:1312.4400, 2013.
- (2013) Network in Network
- Lin, M.¹ Chen, Q.² Yan, S.³

17
- 0031573117
- Long shortterm memory
- Sepp Hochreiter and Jürgen Schmidhuber, "Long shortterm memory, " Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

18
- 84962006941
- Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller, "Striving for simplicity: The all convolutional net, " arXiv preprint arXiv:1412.6806, 2014.
- (2014) Striving for Simplicity: The All Convolutional Net
- Springenberg, J.T.¹ Dosovitskiy, A.² Brox, T.³ Riedmiller, M.⁴

19
- 84858953642
- The Kaldi speech recognition toolkit
- Dec. 2011, IEEE Signal Processing Society, IEEE Catalog No.: CFP11SRW-USB
- Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, Jan Silovsky, Georg Stemmer, Karel Vesely, "The Kaldi speech recognition toolkit, " in IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. Dec. 2011, IEEE Signal Processing Society, IEEE Catalog No.: CFP11SRW-USB.
- IEEE 2011 Workshop on Automatic Speech Recognition and Understanding
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsky, J.¹¹ Stemmer, G.¹² Vesely, K.¹³

20
- 84957716354
- Awni Y Hannun, Andrew L Maas, Daniel Jurafsky, Andrew Y Ng, "First-pass large vocabulary continuous speech recognition using bi-directional recurrent DNNs, " arXiv preprint arXiv:1408.2873, 2014.
- (2014) First-pass Large Vocabulary Continuous Speech Recognition Using Bi-directional Recurrent DNNs
- Hannun, A.Y.¹ Maas, A.L.² Jurafsky, D.³ Ng, A.Y.⁴

21
- 84893676344
- Rectifier nonlinearities improve neural network acoustic models
- Andrew L Maas, Awni Y Hannun, Andrew Y Ng, "Rectifier nonlinearities improve neural network acoustic models, " in in ICML Workshop on Deep Learning for Audio, Speech and Language Processing, 2013.
- (2013) ICML Workshop on Deep Learning for Audio, Speech and Language Processing
- Maas, A.L.¹ Hannun, A.Y.² Ng, A.Y.³

22
- 70349284484
- Supervised sequence labelling
- Springer
- Alex Graves, "Supervised sequence labelling, " in Supervised Sequence Labelling with Recurrent Neural Networks, pp. 5-13. Springer, 2012.
- (2012) Supervised Sequence Labelling with Recurrent Neural Networks , pp. 5-13
- Graves, A.¹

23
- 84973293705
- End-to-end attention-based large vocabulary speech recognition
- Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio, "End-to-end attention-based large vocabulary speech recognition, " in Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on. IEEE, 2016, pp. 4945-4949.
- (2016) Acoustics, Speech and Signal Processing (ICASSP) 2016 IEEE International Conference On. IEEE , pp. 4945-4949
- Bahdanau, D.¹ Chorowski, J.² Serdyuk, D.³ Brakel, P.⁴ Bengio, Y.⁵

24
- 84941620184
- Diederik Kingma and Jimmy Ba, "Adam: A method for stochastic optimization, " arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

25
- 85023752928
- Joint CTC-attention based end-to-end speech recognition using multi-task learning
- to appear
- Suyoun Kim, Takaaki Hori, ShinjiWatanabe, "Joint CTC-attention based end-to-end speech recognition using multi-task learning, " in Acoustics, Speech and Signal processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, p. to appear.
- (2017) Acoustics, Speech and Signal Processing (ICASSP) 2017 IEEE International Conference On. IEEE
- Kim, S.¹ Hori, T.² Watanabe, S.³

26
- 84879854889
- Representation learning: A review and new perspectives
- Yoshua Bengio, Aaron Courville, Pascal Vincent, "Representation learning: A review and new perspectives, " IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 1798-1828, 2013.
- (2013) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.35 , Issue.8 , pp. 1798-1828
- Bengio, Y.¹ Courville, A.² Vincent, P.³

27
- 84937508363
- How transferable are features in deep neural networks?
- Jason Yosinski, Jeff Clune, Yoshua Bengio, Hod Lipson, "How transferable are features in deep neural networks?, " in Advances in neural information processing systems, 2014, pp. 3320-3328.
- (2014) Advances in Neural Information Processing Systems , pp. 3320-3328
- Yosinski, J.¹ Clune, J.² Bengio, Y.³ Lipson, H.⁴

28
- 84959115246
- Cross-lingual transfer learning during supervised training in low resource scenarios.
- Amit Das and Mark Hasegawa-Johnson, "Cross-lingual transfer learning during supervised training in low resource scenarios., " in INTERSPEECH, 2015, pp. 3531-3535.
- (2015) Interspeech , pp. 3531-3535
- Das, A.¹ Hasegawa-Johnson, M.²

29
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- Frantisek Grézl, Martin Karafiát, Stanislav Kontár, Jan Cernocky, "Probabilistic and bottle-neck features for LVCSR of meetings, " in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on. IEEE, 2007, vol. 4, pp. IV-757.
- (2007) Acoustics, Speech and Signal Processing 2007. ICASSP 2007. IEEE International Conference On. IEEE , vol.4 , pp. IV-757
- Grézl, F.¹ Karafiát, M.² Kontár, S.³ Cernocky, J.⁴

30
- 84906225757
- A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR
- Z.J. Yan, Q. Huo, J. Xu, "A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR, " in Proc. INTERSPEECH, 2013.
- (2013) Proc. INTERSPEECH
- Yan, Z.J.¹ Huo, Q.² Xu, J.³

31
- 84858971297
- Convolutive bottleneck network features for LVCSR
- K. Vesely, M. Karafiát, F. Grézl, "Convolutive bottleneck network features for LVCSR, " in Proc. ASRU, Waikoloa, USA, 2011, pp. 42-47.
- (2011) Proc. ASRU, Waikoloa, USA , pp. 42-47
- Vesely, K.¹ Karafiát, M.² Grézl, F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.