SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Recurrent batch normalization

(5) Cooijmans, Tim a Ballas, Nicolas a Laurent, César a Gülçehre, Çağlar a Courville, Aaron a

a UNIVERSITÉ DE MONTRÉAL (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

MODELING LANGUAGES; NATURAL LANGUAGE PROCESSING SYSTEMS;

COVARIATE SHIFTS; FASTER CONVERGENCE; LANGUAGE MODEL; QUESTION ANSWERING; REPARAMETERIZATION; SEQUENCE CLASSIFICATION; SEQUENTIAL PROBLEMS; TIME STEP;

LONG SHORT-TERM MEMORY;

EID: 85088227437 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (107)

References (33)

1
- 84971463350
- D. Amodei et al. Deep speech 2: End-to-end speech recognition in english and mandarin. arXiv:1512.02595, 2015.
- (2015) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
- Amodei, D.¹

2
- 84989200019
- M. Arjovsky, A. Shah, and Y. Bengio. Unitary evolution recurrent neural networks. arXiv:1511.06464, 2015.
- (2015) Unitary Evolution Recurrent Neural Networks
- Arjovsky, M.¹ Shah, A.² Bengio, Y.³

3
- 85015772444
- Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv:1607.06450, 2016.
- (2016) Layer Normalization
- Ba, J.L.¹ Kiros, J.R.² Hinton, G.E.³

4
- 85083953689
- Neural machine translation by jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2015.
- (2015) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

5
- 0028392483
- Learning long-term dependencies with gradient descent is difficult
- Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gradient descent is difficult. Neural Networks, IEEE Transactions on, 1994.
- (1994) Neural Networks, IEEE Transactions on
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

6
- 84961291190
- K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078, 2014.
- (2014) Learning Phrase Representations Using Rnn Encoder-Decoder for Statistical Machine Translation
- Cho, K.¹ Van Merriënboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

7
- 85014419424
- Junyoung Chung, Sungjin Ahn, and Yoshua Bengio. Hierarchical multiscale recurrent neural networks. arXiv:1609.01704, 2016.
- (2016) Hierarchical Multiscale Recurrent Neural Networks
- Chung, J.¹ Ahn, S.² Bengio, Y.³

8
- 84906979661
- A. Graves. Generating sequences with recurrent neural networks. arXiv:1308.0850, 2013.
- (2013) Generating Sequences with Recurrent Neural Networks
- Graves, A.¹

9
- 85030994434
- David Ha, Andrew Dai, and Quoc V Le. Hypernetworks. arXiv:1609.09106, 2016.
- (2016) Hypernetworks
- Ha, D.¹ Dai, A.² Le, Q.V.³

10
- 84965139942
- Teaching machines to read and comprehend
- K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom. Teaching machines to read and comprehend. In NIPS, 2015.
- (2015) NIPS
- Hermann, K.M.¹ Kocisky, T.² Grefenstette, E.³ Espeholt, L.⁴ Kay, W.⁵ Suleyman, M.⁶ Blunsom, P.⁷

11
- 0003575034
- Master's thesis
- S. Hochreiter. Untersuchungen zu dynamischen neuronalen netzen. Master's thesis, 1991.
- (1991) Untersuchungen Zu Dynamischen Neuronalen Netzen
- Hochreiter, S.¹

12
- 0031573117
- Long short-term memory
- S. Hochreiter and J Schmidhuber. Long short-term memory. Neural computation, 1997.
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

13
- 84964923476
- abs/1502.03167
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. abs/1502.03167, 2015.
- (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Ioffe, S.¹ Szegedy, C.²

14
- 84941620184
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

15
- 85083951713
- Regularizing rnns by stabilizing activations
- D Krueger and R. Memisevic. Regularizing rnns by stabilizing activations. ICLR, 2016.
- (2016) ICLR
- Krueger, D.¹ Memisevic, R.²

16
- 85018911798
- David Krueger, Tegan Maharaj, János Kramár, Mohammad Pezeshki, Nicolas Ballas, Nan Rosemary Ke, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, and Aaron Courville. Zoneout: Regularizing rnns by randomly preserving hidden activations. arXiv:1606.01305, 2016.
- (2016) Zoneout: Regularizing Rnns by Randomly Preserving Hidden Activations
- Krueger, D.¹ Maharaj, T.² Kramár, J.³ Pezeshki, M.⁴ Ballas, N.⁵ Ke, N.R.⁶ Goyal, A.⁷ Bengio, Y.⁸ Larochelle, H.⁹ Courville, A.¹⁰

17
- 84973326024
- Batch normalized recurrent neural networks
- C. Laurent, G. Pereyra, P. Brakel, Y. Zhang, and Y. Bengio. Batch normalized recurrent neural networks. ICASSP, 2016.
- (2016) ICASSP
- Laurent, C.¹ Pereyra, G.² Brakel, P.³ Zhang, Y.⁴ Bengio, Y.⁵

18
- 84939821079
- Quoc V Le, N. Jaitly, and G. Hinton. A simple way to initialize recurrent networks of rectified linear units. arXiv:1504.00941, 2015.
- (2015) A Simple Way to Initialize Recurrent Networks of Rectified Linear Units
- Le, Q.V.¹ Jaitly, N.² Hinton, G.³

19
- 84989340411
- Qianli Liao and Tomaso Poggio. Bridging the gaps between residual learning, recurrent neural networks and visual cortex. arXiv:1604.03640, 2016.
- (2016) Bridging the Gaps between Residual Learning, Recurrent Neural Networks and Visual Cortex
- Liao, Q.¹ Poggio, T.²

20
- 79953225681
- M. Mahoney. Large text compression benchmark. 2009.
- (2009) Large Text Compression Benchmark
- Mahoney, M.¹

21
- 34249852033
- Building a large annotated corpus of english: The penn treebank
- M. P. Marcus, M. Marcinkiewicz, and B. Santorini. Building a large annotated corpus of english: The penn treebank. Comput. Linguist., 1993.
- (1993) Comput. Linguist.
- Marcus, M.P.¹ Marcinkiewicz, M.² Santorini, B.³

22
- 80053451847
- Learning recurrent neural networks with hessian-free optimization
- J. Martens and I. Sutskever. Learning recurrent neural networks with hessian-free optimization. In ICML, 2011.
- (2011) ICML
- Martens, J.¹ Sutskever, I.²

23
- 84897527816
- preprint
- T. Mikolov, I. Sutskever, A. Deoras, H. Le, S. Kombrink, and J. Cernocky. Subword language modeling with neural networks. preprint, 2012.
- (2012) Subword Language Modeling with Neural Networks
- Mikolov, T.¹ Sutskever, I.² Deoras, A.³ Le, H.⁴ Kombrink, S.⁵ Cernocky, J.⁶

24
- 85070999675
- CoRR, abs/1306.0514
- Yann Ollivier. Persistent contextual neural networks for learning symbolic data sequences. CoRR, abs/1306.0514, 2013.
- (2013) Persistent Contextual Neural Networks for Learning Symbolic Data Sequences
- Ollivier, Y.¹

25
- 84910681334
- Marius Pachitariu and Maneesh Sahani. Regularization and nonlinearities for neural language models: when are they needed? arXiv:1301.5650, 2013.
- (2013) Regularization and Nonlinearities for Neural Language Models: When Are They Needed?
- Pachitariu, M.¹ Sahani, M.²

26
- 84892982833
- Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. On the difficulty of training recurrent neural networks. arXiv:1211.5063, 2012.
- (2012) On the Difficulty of Training Recurrent Neural Networks
- Pascanu, R.¹ Mikolov, T.² Bengio, Y.³

27
- 0037527188
- Improving predictive inference under covariate shift by weighting the log-likelihood function
- H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of statistical planning and inference, 2000.
- (2000) Journal of Statistical Planning and Inference
- Shimodaira, H.¹

28
- 84979557463
- arXiv e-prints, abs/1605.02688, May
- The Theano Development Team et al. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016.
- (2016) Theano: A Python Framework for Fast Computation of Mathematical Expressions

29
- 84893343292
- Lecture 6.5-rMSprop: Divide the gradient by a running average of its recent magnitude
- T. Tieleman and G. Hinton. Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
- (2012) COURSERA: Neural Networks for Machine Learning
- Tieleman, T.¹ Hinton, G.²

30
- 84962564531
- CoRR, abs/1506.00619
- Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, and Yoshua Bengio. Blocks and fuel: Frameworks for deep learning. CoRR, abs/1506.00619, 2015. URL http://arxiv.org/abs/1506.00619.
- (2015) Blocks and Fuel: Frameworks for Deep Learning
- Van Merriënboer, B.¹ Bahdanau, D.² Dumoulin, V.³ Serdyuk, D.⁴ Warde-Farley, D.⁵ Chorowski, J.⁶ Bengio, Y.⁷

31
- 84939821074
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv:1502.03044, 2015.
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

32
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015.
- (2015) ICCV
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

33
- 85014939893
- S. Zhang, Y. Wu, T. Che, Z. Lin, R. Memisevic, R. Salakhutdinov, and Y. Bengio. Architectural complexity measures of recurrent neural networks. arXiv:1602.08210, 2016.
- (2016) Architectural Complexity Measures of Recurrent Neural Networks
- Zhang, S.¹ Wu, Y.² Che, T.³ Lin, Z.⁴ Memisevic, R.⁵ Salakhutdinov, R.⁶ Bengio, Y.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.