SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 3249-3253

A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition

(4) Lu, Liang a Zhang, Xingxing a Cho, Kyunghyun b Renals, Steve a

a UNIVERSITY OF EDINBURGH (United Kingdom)

b UNIVERSITÉ DE MONTRÉAL (Canada)

Author keywords

Deep neural networks; Encoder decoder; End to end speech recognition; Recurrent neural networks

Indexed keywords

DECODING; HIDDEN MARKOV MODELS; MARKOV PROCESSES; RECURRENT NEURAL NETWORKS; SPEECH; SPEECH COMMUNICATION; VOCABULARY CONTROL;

AUTOMATIC SPEECH RECOGNITION; DEEP NEURAL NETWORKS; ENCODER-DECODER; END TO END; FEATURE REPRESENTATION; HIDDEN MARKOV MODELS (HMMS); LARGE VOCABULARY SPEECH RECOGNITION; RECURRENT NEURAL NETWORK (RNNS);

SPEECH RECOGNITION;

EID: 84959173420 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (89)

References (41)

1
- 85032751458
- Deep neural networks for acoustic modeling in speechrecognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-r. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speechrecognition: The shared views of four research groups, " SignalProcessing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) SignalProcessing Magazine, IEEE , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

2
- 0003573244
- Springer
- H. A. Bourlard and N. Morgan, Connectionist speech recognition: A hybrid approach. Springer, 1994, vol. 247.
- (1994) Connectionist Speech Recognition: A Hybrid Approach , vol.247
- Bourlard, H.A.¹ Morgan, N.²

3
- 0029308753
- Neural networks for statisticalrecognition of continuous speech
- N. Morgan and H. A. Bourlard, "Neural networks for statisticalrecognition of continuous speech, " Proceedings of the IEEE, vol. 83, no. 5, pp. 742-772, 1995.
- (1995) Proceedings of the IEEE , vol.83 , Issue.5 , pp. 742-772
- Morgan, N.¹ Bourlard, H.A.²

4
- 0028194709
- Connectionist probability estimators in HMM speech recognition
- S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco, "Connectionist probability estimators in HMM speech recognition, "IEEE Transactions on Speech and Audio Processing, vol. 2, no. 1, pp. 161-174, 1994.
- (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.1 , pp. 161-174
- Renals, S.¹ Morgan, N.² Bourlard, H.³ Cohen, M.⁴ Franco, H.⁵

5
- 84055211743
- Acoustic modelingusing deep belief networks
- A.-r. Mohamed, G. E. Dahl, and G. Hinton, "Acoustic modelingusing deep belief networks, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 14-22, 2012.
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.-R.¹ Dahl, G.E.² Hinton, G.³

6
- 84865801985
- Conversational speech transcriptionusing context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, "Conversational speech transcriptionusing context-dependent deep neural networks. " in Interspeech, 2011, pp. 437-440.
- (2011) Interspeech , pp. 437-440
- Seide, F.¹ Li, G.² Yu, D.³

7
- 0000329355
- A recurrent error propagation networkspeech recognition system
- T. Robinson and F. Fallside, "A recurrent error propagation networkspeech recognition system, " Computer Speech and Language, vol. 5, pp. 259-274, 1991.
- (1991) Computer Speech and Language , vol.5 , pp. 259-274
- Robinson, T.¹ Fallside, F.²

8
- 0001592322
- The use of recurrentnetworks in continuous speech recognition
- C. Lee, K. Paliwal, and F. Soong, Eds. Kluwer Academic Publishers
- T. Robinson, M. Hochberg, and S. Renals, "The use of recurrentnetworks in continuous speech recognition, " in Automatic Speechand Speaker Recognition-Advanced Topics, C. Lee, K. Paliwal, and F. Soong, Eds. Kluwer Academic Publishers, 1996, pp. 233-258.
- (1996) Automatic Speechand Speaker Recognition-Advanced Topics , pp. 233-258
- Robinson, T.¹ Hochberg, M.² Renals, S.³

9
- 0036567797
- Connectionist speech recognition of broadcastnews
- A. Robinson, G. Cook, D. Ellis, E. Fosler-Lussier, S. Renals, and D. Williams, "Connectionist speech recognition of broadcastnews, " Speech Communication, vol. 37, pp. 27-45, 2002.
- (2002) Speech Communication , vol.37 , pp. 27-45
- Robinson, A.¹ Cook, G.² Ellis, D.³ Fosler-Lussier, E.⁴ Renals, S.⁵ Williams, D.⁶

10
- 0031268931
- Bidirectional recurrent neuralnetworks
- M. Schuster and K. K. Paliwal, "Bidirectional recurrent neuralnetworks, " Signal Processing, IEEE Transactions on, vol. 45, no. 11, pp. 2673-2681, 1997.
- (1997) Signal Processing, IEEE Transactions on , vol.45 , Issue.11 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.K.²

11
- 84910072094
- Sequence discriminative distributedtraining of long short-term memory recurrent neural networks
- H. Sak, O. Vinyals, G. Heigold, A. Senior, E. McDermott, R. Monga, and M. Mao, "Sequence discriminative distributedtraining of long short-term memory recurrent neural networks, "in Proc. INTERSPEECH, 2014.
- (2014) Proc. INTERSPEECH
- Sak, H.¹ Vinyals, O.² Heigold, G.³ Senior, A.⁴ McDermott, E.⁵ Monga, R.⁶ Mao, M.⁷

12
- 0016663359
- The DRAGON system-an overview
- J. Baker, "The DRAGON system-an overview, " IEEE Transactionson Acoustics, Speech, and Signal Processing, vol. 23, pp. 24-29, 1975.
- (1975) IEEE Transactionson Acoustics, Speech, and Signal Processing , vol.23 , pp. 24-29
- Baker, J.¹

13
- 0020719320
- A maximum likelihood approachto speech recognition
- L. Bahl, F. Jelinek, and R. Mercer, "A maximum likelihood approachto speech recognition, " IEEE Transactions on PatternAnalysis and Machine Intelligence, vol. 5, pp. 179-190, 1983.
- (1983) IEEE Transactions on PatternAnalysis and Machine Intelligence , vol.5 , pp. 179-190
- Bahl, L.¹ Jelinek, F.² Mercer, R.³

14
- 0024610919
- A tutorial on hidden markov models and selectedapplications in speech recognition
- L. Rabiner, "A tutorial on hidden markov models and selectedapplications in speech recognition, " Proceedings of the IEEE, vol. 77, no. 2, pp. 257-286, 1989.
- (1989) Proceedings of the IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.¹

15
- 33645791324
- What HMMs can do
- J. A. Bilmes, "What HMMs can do, " IEICE TRANSACTIONS onInformation and Systems, vol. 89, no. 3, pp. 869-891, 2006.
- (2006) IEICE TRANSACTIONS OnInformation and Systems , vol.89 , Issue.3 , pp. 869-891
- Bilmes, J.A.¹

16
- 0036460907
- Weighted finite-state transducersin speech recognition
- M. Mohri, F. Pereira, and M. Riley, "Weighted finite-state transducersin speech recognition, " Computer Speech & Language, vol. 16, pp. 69-88, 2002.
- (2002) Computer Speech & Language , vol.16 , pp. 69-88
- Mohri, M.¹ Pereira, F.² Riley, M.³

17
- 4544265717
- Ph. D thesis Cambridge, UK: Cambridge University
- D. Povey, "Discriminative training for large vocabulary speechrecognition, " Ph. D thesis Cambridge, UK: Cambridge University, 2004.
- (2004) Discriminative Training for Large Vocabulary Speechrecognition
- Povey, D.¹

18
- 84878379108
- Scalable minimumBayes risk training of deep neural network acoustic models usingdistributed Hessian-free optimization
- B. Kingsbury, T. N. Sainath, and H. Soltau, "Scalable minimumBayes risk training of deep neural network acoustic models usingdistributed Hessian-free optimization. " in INTERSPEECH, 2012.
- (2012) INTERSPEECH
- Kingsbury, B.¹ Sainath, T.N.² Soltau, H.³

19
- 84906274730
- Sequencediscriminativetraining of deep neural networks
- K. Veselý, A. Ghoshal, L. Burget, and D. Povey, "Sequencediscriminativetraining of deep neural networks, " in Proc. INTERSPEECH, 2013.
- (2013) Proc. INTERSPEECH
- Veselý, K.¹ Ghoshal, A.² Burget, L.³ Povey, D.⁴

20
- 0030245363
- From HMM's tosegment models: A unified view of stochastic modeling for speechrecognition
- M. Ostendorf, V. Digalakis, and O. Kimball, "From HMM's tosegment models: A unified view of stochastic modeling for speechrecognition, " IEEE Transactions on Speech and Audio Processing, pp. 360-378, 1996.
- (1996) IEEE Transactions on Speech and Audio Processing , pp. 360-378
- Ostendorf, M.¹ Digalakis, V.² Kimball, O.³

21
- 0009588481
- Speech recognition using SVMs
- N. Smith and M. Gales, "Speech recognition using SVMs, " in Advancesin neural information processing systems, 2001, pp. 1197-1204.
- (2001) Advancesin Neural Information Processing Systems , pp. 1197-1204
- Smith, N.¹ Gales, M.²

22
- 33745185781
- Hiddenconditional rand om fields for phone classification
- A. Gunawardana, M. Mahajan, A. Acero, and J. C. Platt, "Hiddenconditional rand om fields for phone classification. " in INTERSPEECH, 2005, pp. 1117-1120.
- (2005) INTERSPEECH , pp. 1117-1120
- Gunawardana, A.¹ Mahajan, M.² Acero, A.³ Platt, J.C.⁴

23
- 45549086638
- Template-based continuous speechrecognition
- M. DeWachter, M. Matton, K. Demuynck, P. Wambacq, R. Cools, and D. Van Compernolle, "Template-based continuous speechrecognition, " Audio, Speech, and Language Processing, IEEETransactions on, vol. 15, no. 4, pp. 1377-1390, 2007.
- (2007) Audio, Speech, and Language Processing, IEEETransactions on , vol.15 , Issue.4 , pp. 1377-1390
- DeWachter, M.¹ Matton, M.² Demuynck, K.³ Wambacq, P.⁴ Cools, R.⁵ Van Compernolle, D.⁶

24
- 70350435251
- Speech recognition using augmentedconditional rand om fields
- Y. Hifny and S. Renals, "Speech recognition using augmentedconditional rand om fields, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 17, no. 2, pp. 354-365, 2009.
- (2009) Audio, Speech, and Language Processing, IEEE Transactions on , vol.17 , Issue.2 , pp. 354-365
- Hifny, Y.¹ Renals, S.²

25
- 84897549167
- arXiv preprint arXiv: 1211.3711
- A. Graves, "Sequence transduction with recurrent neural networks, "arXiv preprint arXiv: 1211. 3711, 2012.
- (2012) Sequence Transduction with Recurrent Neural Networks
- Graves, A.¹

26
- 84928547704
- Sequence to sequencelearning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le, "Sequence to sequencelearning with neural networks, " in Advances in Neural InformationProcessing Systems, 2014, pp. 3104-3112.
- (2014) Advances in Neural InformationProcessing Systems , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

27
- 84961291190
- Learning phrase representationsusing RNN encoder-decoder for statistical machine translation
- K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representationsusing RNN encoder-decoder for statistical machine translation, "Pro. EMNLP, 2014.
- (2014) Pro. EMNLP
- Cho, K.¹ Van Merrienboer, B.² Gulcehre, C.³ Bougares, F.⁴ Schwenk, H.⁵ Bengio, Y.⁶

28
- 85083953689
- Neural machine translationby jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio, "Neural machine translationby jointly learning to align and translate, " in Proc. ICLR, 2015.
- (2015) Proc. ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

29
- 84939821075
- arXiv preprint arXiv: 1411. 4555
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: Aneural image caption generator, " arXiv preprint arXiv: 1411. 4555, 2014.
- (2014) Show and Tell: Aneural Image Caption Generator
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

30
- 84939821074
- arXiv preprint arXiv: 1502. 03044
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generationwith visual attention, " arXiv preprint arXiv: 1502. 03044, 2015.
- (2015) Show, Attend and Tell: Neural Image Caption Generationwith Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

31
- 84936143793
- Towards end-to-end speech recognitionwith recurrent neural networks
- A. Graves and N. Jaitly, "Towards end-to-end speech recognitionwith recurrent neural networks, " in Proc. ICML, 2014, pp. 1764-1772.
- (2014) Proc. ICML , pp. 1764-1772
- Graves, A.¹ Jaitly, N.²

32
- 84957716354
- arXiv preprint arXiv: 1408. 2873
- A. L. Maas, A. Y. Hannun, D. Jurafsky, and A. Y. Ng, "First-Pass Large Vocabulary Continuous Speech Recognition using Bi-Directional Recurrent DNNs, " arXiv preprint arXiv: 1408. 2873, 2014.
- (2014) First-Pass Large Vocabulary Continuous Speech Recognition Using Bi-Directional Recurrent DNNs
- Maas, A.L.¹ Hannun, A.Y.² Jurafsky, D.³ Ng, A.Y.⁴

33
- 84959066041
- arXiv preprint arXiv: 1412.1602
- J. Chorowski, D. Bahdanau, K. Cho, and Y. Bengio, "End-to-endContinuous Speech Recognition using Attention-based RecurrentNN: First Results, " arXiv preprint arXiv: 1412. 1602, 2014.
- (2014) End-to-endContinuous Speech Recognition Using Attention-based RecurrentNN: First Results
- Chorowski, J.¹ Bahdanau, D.² Cho, K.³ Bengio, Y.⁴

34
- 84928545733
- arXiv preprint arXiv: 1412.5567
- A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger et al., "Deep Speech: Scaling up end-to-endspeech recognition, " in arXiv preprint arXiv: 1412. 5567, 2014.
- (2014) Deep Speech: Scaling Up End-to-endspeech Recognition
- Hannun, A.¹ Case, C.² Casper, J.³ Catanzaro, B.⁴ Diamos, G.⁵ Elsen, E.⁶ Prenger, R.⁷

35
- 84893382981
- arXivpreprint arXiv: 1212. 5701
- M. D. Zeiler, "Adadelta: An adaptive learning rate method, " arXivpreprint arXiv: 1212. 5701, 2012.
- (2012) Adadelta: An Adaptive Learning Rate Method
- Zeiler, M.D.¹

36
- 84887388950
- An empiricalstudy of learning rates in deep neural networks for speech recognition
- A. Senior, G. Heigold, M. Ranzato, and K. Yang, "An empiricalstudy of learning rates in deep neural networks for speech recognition, "in Proc. ICASSP. IEEE, 2013, pp. 6724-6728.
- (2013) Proc. ICASSP. IEEE , pp. 6724-6728
- Senior, A.¹ Heigold, G.² Ranzato, M.³ Yang, K.⁴

37
- 0028392483
- Learning long-term dependencieswith gradient descent is difficult
- Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencieswith gradient descent is difficult, " Neural Networks, IEEE Transactions on, vol. 5, no. 2, pp. 157-166, 1994.
- (1994) Neural Networks, IEEE Transactions on , vol.5 , Issue.2 , pp. 157-166
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

38
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, "Long short-term memory, "Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

39
- 85016587886
- SWITCHBOARD: Telephone speech corpus for research and development
- J. J. Godfrey, E. C. Holliman, and J. McDaniel, "SWITCHBOARD: Telephone speech corpus for research and development, "in Proc. ICASSP. IEEE, 1992, pp. 517-520.
- (1992) Proc. ICASSP. IEEE , pp. 517-520
- Godfrey, J.J.¹ Holliman, E.C.² McDaniel, J.³

40
- 84959109176
- arXiv preprintarXiv: 1503. 03535
- C. Gulcehre, O. Firat, K. Xu, K. Cho, L. Barrault, H.-C. Lin, F. Bougares, H. Schwenk, and Y. Bengio, "On using monolingualcorpora in neural machine translation, " arXiv preprintarXiv: 1503. 03535, 2015.
- (2015) On Using Monolingualcorpora in Neural Machine Translation
- Gulcehre, C.¹ Firat, O.² Xu, K.³ Cho, K.⁴ Barrault, L.⁵ Lin, H.-C.⁶ Bougares, F.⁷ Schwenk, H.⁸ Bengio, Y.⁹

41
- 0024634603
- Phoneme recognition using time-delay neural networks
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, "Phoneme recognition using time-delay neural networks, "Acoustics, Speech and Signal Processing, IEEE Transactions on, vol. 37, no. 3, pp. 328-339, 1989.
- (1989) Acoustics, Speech and Signal Processing, IEEE Transactions on , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.J.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.