SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 385-389

Segmental recurrent neural networks for end-to-end speech recognition

(5) Lu, Liang a Kong, Lingpeng b Dyer, Chris b Smith, Noah A c Renals, Steve a

a UNIVERSITY OF EDINBURGH (United Kingdom)

b Carnegie Mellon University ^* (United States)

c UNIVERSITY OF WASHINGTON (United States)

Author keywords

End to end speech recognition; Recurrent neural networks; Segmental CRF

Indexed keywords

DECODING; FEATURE EXTRACTION; RANDOM PROCESSES; RECURRENT NEURAL NETWORKS; SPEECH; SPEECH COMMUNICATION; SPEECH PROCESSING;

ACOUSTIC MODELLING; END TO END; EXTERNAL SYSTEMS; PRACTICAL TRAINING; RECURRENT NEURAL NETWORK (RNN); SEGMENTAL CONDITIONAL RANDOM FIELDS; SEGMENTAL CRF; SEGMENTATION BOUNDARIES;

SPEECH RECOGNITION;

EID: 84994242299 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-40 Document Type: Conference Paper

Times cited : (59)

References (29)

1
- 84858952478
- Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition
- IEEE
- D. Gillick, L. Gillick, and S. Wegmann, "Don't multiply lightly: Quantifying problems with the acoustic model assumptions in speech recognition," in Proc. ASRU. IEEE, 2011, pp. 71-76.
- (2011) Proc. ASRU , pp. 71-76
- Gillick, D.¹ Gillick, L.² Wegmann, S.³

2
- 0030245363
- From HMM's to segment models: A unified view of stochastic modeling for speech recognition
- M. Ostendorf, V. Digalakis, and O. Kimball, "From HMM's to segment models: A unified view of stochastic modeling for speech recognition," IEEE Transactions on Speech and Audio Processing, pp. 360-378, 1996.
- (1996) IEEE Transactions on Speech and Audio Processing , pp. 360-378
- Ostendorf, M.¹ Digalakis, V.² Kimball, O.³

3
- 0009588481
- Speech recognition using SVMs
- N. Smith and M. Gales, "Speech recognition using SVMs," in Advances in neural information processing systems, 2001, pp. 1197-1204.
- (2001) Advances in Neural Information Processing Systems , pp. 1197-1204
- Smith, N.¹ Gales, M.²

4
- 33745185781
- Hidden conditional random fields for phone classification
- A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt, "Hidden conditional random fields for phone classification." in INTERSPEECH, 2005, pp. 1117-1120.
- (2005) INTERSPEECH , pp. 1117-1120
- Gunawardana, A.¹ Mahajan, M.² Acero, A.³ Platt, J.C.⁴

5
- 70350435251
- Speech recognition using augmented conditional random fields
- Y. Hifny and S. Renals, "Speech recognition using augmented conditional random fields," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 17, no. 2, pp. 354-365, 2009.
- (2009) Audio, Speech, and Language Processing, IEEE Transactions on , vol.17 , Issue.2 , pp. 354-365
- Hifny, Y.¹ Renals, S.²

6
- 84936143793
- Towards end-to-end speech recognition with recurrent neural networks
- A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks," in Proc. ICML, 2014, pp. 1764-1772.
- (2014) Proc. ICML , pp. 1764-1772
- Graves, A.¹ Jaitly, N.²

7
- 84928545733
- arXiv preprint arXiv:1412.5567
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger et al., "Deep Speech: Scaling up end-to-end speech recognition," in arXiv preprint arXiv:1412.5567, 2014.
- (2014) Deep Speech: Scaling Up End-to-end Speech Recognition
- Hannun, A.¹ Case, C.² Casper, J.³ Catanzaro, B.⁴ Diamos, G.⁵ Elsen, E.⁶ Prenger, R.⁷

8
- 84959112739
- Fast and accurate recurrent neural network acoustic models for speech recognition
- H. Sak, A. Senior, K. Rao, and F. Beaufays, "Fast and accurate recurrent neural network acoustic models for speech recognition," in Proc. INTERSPEECH, 2015.
- (2015) Proc. INTERSPEECH
- Sak, H.¹ Senior, A.² Rao, K.³ Beaufays, F.⁴

9
- 84964489732
- EESEN: Endto-end speech recognition using deep RNN models and WFST-based decoding
- Y. Miao, M. Gowayyed, and F. Metze, "EESEN: Endto-end speech recognition using deep RNN models and WFST-based decoding," in Proc. ASRU, 2015.
- (2015) Proc. ASRU
- Miao, Y.¹ Gowayyed, M.² Metze, F.³

10
- 85083953689
- Neural machine translation by jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translation by jointly learning to align and translate," in Proc. ICLR, 2015.
- (2015) Proc. ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

11
- 84965139600
- Attention-based models for speech recognition
- J. K. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio, "Attention-based models for speech recognition," in Advances in Neural Information Processing Systems, 2015, pp. 577-585.
- (2015) Advances in Neural Information Processing Systems , pp. 577-585
- Chorowski, J.K.¹ Bahdanau, D.² Serdyuk, D.³ Cho, K.⁴ Bengio, Y.⁵

12
- 84959173420
- A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition
- L. Lu, X. Zhang, K. Cho, and S. Renals, "A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition," in Proc. INTERSPEECH, 2015.
- (2015) Proc. INTERSPEECH
- Lu, L.¹ Zhang, X.² Cho, K.³ Renals, S.⁴

13
- 84994328213
- arXiv preprint arXiv:1508.01211
- W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, "Listen, attend and spell," arXiv preprint arXiv:1508.01211, 2015.
- (2015) Listen, Attend and Spell
- Chan, W.¹ Jaitly, N.² Le, Q.V.³ Vinyals, O.⁴

14
- 84994175117
- arXiv preprint arXiv:1511.06018
- L. Kong, C. Dyer, and N. A. Smith, "Segmental recurrent neural networks," arXiv preprint arXiv:1511.06018, 2015.
- (2015) Segmental Recurrent Neural Networks
- Kong, L.¹ Dyer, C.² Smith, N.A.³

15
- 34047192804
- Semi-markov conditional random fields for information extraction
- S. Sarawagi and W. W. Cohen, "Semi-markov conditional random fields for information extraction," in Advances in neural information processing systems, 2004, pp. 1185-1192.
- (2004) Advances in Neural Information Processing Systems , pp. 1185-1192
- Sarawagi, S.¹ Cohen, W.W.²

16
- 0142192295
- Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- J. Lafferty, A. McCallum, and F. Pereira, "Conditional random fields: Probabilistic models for segmenting and labeling sequence data," in Proc. ICML, 2001, pp. 282-289.
- (2001) Proc. ICML , pp. 282-289
- Lafferty, J.¹ McCallum, A.² Pereira, F.³

17
- 80051659716
- Speech recognition with segmental conditional random fields: A summary of the JHU CLSP 2010 summer workshop
- G. Zweig, P. Nguyen, D. Van Compernolle, K. Demuynck, L. Atlas, P. Clark et al., "Speech recognition with segmental conditional random fields: A summary of the JHU CLSP 2010 summer workshop," in Proc. ICASSP. IEEE, 2011, pp. 5044-5047.
- (2011) Proc. ICASSP. IEEE , pp. 5044-5047
- Zweig, G.¹ Nguyen, P.² Van Compernolle, D.³ Demuynck, K.⁴ Atlas, L.⁵ Clark, P.⁶

18
- 84876691724
- Conditional random fields in speech, audio, and language processing
- E. Fosler-Lussier, Y. He, P. Jyothi, and R. Prabhavalkar, "Conditional random fields in speech, audio, and language processing," Proceedings of the IEEE, vol. 101, no. 5, pp. 1054-1075, 2013.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1054-1075
- Fosler-Lussier, E.¹ He, Y.² Jyothi, P.³ Prabhavalkar, R.⁴

19
- 84906282118
- Deep segmental neural networks for speech recognition
- O. Abdel-Hamid, L. Deng, D. Yu, and H. Jiang, "Deep segmental neural networks for speech recognition." in Proc. INTERSPEECH, 2013, pp. 1849-1853.
- (2013) Proc. INTERSPEECH , pp. 1849-1853
- Abdel-Hamid, O.¹ Deng, L.² Yu, D.³ Jiang, H.⁴

20
- 84959175560
- Segmental conditional random fields with deep neural networks as acoustic models for first-pass word recognition
- Y. He and E. Fosler-Lussier, "Segmental conditional random fields with deep neural networks as acoustic models for first-pass word recognition," in Proc. INTERSPEECH, 2015.
- (2015) Proc. INTERSPEECH
- He, Y.¹ Fosler-Lussier, E.²

21
- 70349284484
- Hierarchical subsampling networks
- Springer
- A. Graves, "Hierarchical subsampling networks," in Supervised Sequence Labelling with Recurrent Neural Networks. Springer, 2012, pp. 109-131.
- (2012) Supervised Sequence Labelling with Recurrent Neural Networks , pp. 109-131
- Graves, A.¹

22
- 84858953642
- The Kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovský, G. Semmer, and K. Veselý, "The Kaldi speech recognition toolkit," in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovský, J.¹¹ Semmer, G.¹² Veselý, K.¹³

23
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

24
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural networks from overfitting," The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929-1958, 2014.
- (2014) The Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

25
- 84944053926
- arXiv preprint arXiv:1409.2329
- W. Zaremba, I. Sutskever, and O. Vinyals, "Recurrent neural network regularization," arXiv preprint arXiv:1409.2329, 2014.
- (2014) Recurrent Neural Network Regularization
- Zaremba, W.¹ Sutskever, I.² Vinyals, O.³

26
- 84867598637
- Classification and recognition with direct segment models
- IEEE
- G. Zweig, "Classification and recognition with direct segment models," in Proc. ICASSP. IEEE, 2012, pp. 4161-4164.
- (2012) Proc. ICASSP , pp. 4161-4164
- Zweig, G.¹

27
- 84878565391
- Efficient segmental conditional random fields for phone recognition
- Y. He and E. Fosler-Lussier, "Efficient segmental conditional random fields for phone recognition," in Proc. INTERSPEECH, 2012, pp. 1898-1901.
- (2012) Proc. INTERSPEECH , pp. 1898-1901
- He, Y.¹ Fosler-Lussier, E.²

28
- 84964454407
- Discriminative segmental cascades for feature-rich phone recognition
- H. Tang, W. Wang, K. Gimpel, and K. Livescu, "Discriminative segmental cascades for feature-rich phone recognition," in Proc. ASRU, 2015.
- (2015) Proc. ASRU
- Tang, H.¹ Wang, W.² Gimpel, K.³ Livescu, K.⁴

29
- 84890543083
- Speech recognition with deep recurrent neural networks
- IEEE
- A. Graves, A.-R. Mohamed, and G. Hinton, "Speech recognition with deep recurrent neural networks," in Proc. ICASSP. IEEE, 2013, pp. 6645-6649
- (2013) Proc. ICASSP , pp. 6645-6649
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.