SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Distributed second-order optimization using Kronecker-factored approximations

(3) Ba, Jimmy a Grosse, Roger a Martens, James a,b

a UNIVERSITY OF TORONTO (Canada)

b DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

GRADIENT METHODS; IMAGE ENHANCEMENT; STOCHASTIC SYSTEMS;

CLASSIFICATION MODELS; COMPUTATIONAL PERFORMANCE; COMPUTATIONAL RESOURCES; CURVATURE MATRIX; PARALLEL COM- PUTING; PARALLEL COMPUTATION; SECOND ORDER OPTIMIZATION; STOCHASTIC GRADIENT DESCENT;

OPTIMIZATION;

EID: 85057316160 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (98)

References (40)

1
- 84958264664
- arXiv preprint
- Martın Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, et al. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467, 2016.
- (2016) Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- Abadi, M.¹ Agarwal, A.² Barham, P.³ Brevdo, E.⁴ Chen, Z.⁵ Citro, C.⁶ Corrado, G.S.⁷ Davis, A.⁸ Dean, J.⁹ Devin, M.¹⁰

2
- 0000396062
- Natural gradient works efficiently in learning
- Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251-276, 1998.
- (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
- Amari, S.-I.¹

3
- 84857819132
- Theano: A cpu and GPU math compiler in python
- James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio. Theano: A cpu and gpu math compiler in python. In Proc. 9th Python in Science Conf, pages 1-7, 2010.
- (2010) Proc. 9th Python in Science Conf , pp. 1-7
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

4
- 68949096711
- SGD-QN: Careful quasi-Newton stochastic gradient descent
- Jul
- Antoine Bordes, Léon Bottou, and Patrick Gallinari. Sgd-qn: Careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research, 10(Jul):1737-1754, 2009.
- (2009) Journal of Machine Learning Research , vol.10 , pp. 1737-1754
- Bordes, A.¹ Bottou, L.² Gallinari, P.³

5
- 84976910543
- A stochastic quasi-Newton method for large-scale optimization
- Richard H Byrd, SL Hansen, Jorge Nocedal, and Yoram Singer. A stochastic quasi-newton method for large-scale optimization. SIAM Journal on Optimization, 26(2):1008-1031, 2016.
- (2016) SIAM Journal on Optimization , vol.26 , Issue.2 , pp. 1008-1031
- Byrd, R.H.¹ Hansen, S.L.² Nocedal, J.³ Singer, Y.⁴

6
- 84965175669
- Hessian-free optimization for learning deep multidimensional recurrent neural networks
- Minhyung Cho, Chandra Dhir, and Jaehyung Lee. Hessian-free optimization for learning deep multidimensional recurrent neural networks. In Advances in Neural Information Processing Systems, pages 883-891, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 883-891
- Cho, M.¹ Dhir, C.² Lee, J.³

7
- 85057219093
- A self-correcting variable-metric algorithm for stochastic optimization
- Frank Curtis. A self-correcting variable-metric algorithm for stochastic optimization. In Proceedings of The 33rd International Conference on Machine Learning, pages 632-641, 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning , pp. 632-641
- Curtis, F.¹

8
- 84877760312
- Large scale distributed deep networks
- Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In Advances in neural information processing systems, pages 1223-1231, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1223-1231
- Dean, J.¹ Corrado, G.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Mao, M.⁶ Senior, A.⁷ Tucker, P.⁸ Yang, K.⁹ Le, Q.V.¹⁰

9
- 84965130201
- Natural neural networks
- Guillaume Desjardins, Karen Simonyan, Razvan Pascanu, and Koray Kavukcuoglu. Natural neural networks. In Advances in Neural Information Processing Systems, pages 2071-2079, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 2071-2079
- Desjardins, G.¹ Simonyan, K.² Pascanu, R.³ Kavukcuoglu, K.⁴

10
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- Jul
- John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121-2159, 2011.
- (2011) Journal of Machine Learning Research , vol.12 , pp. 2121-2159
- Duchi, J.¹ Hazan, E.² Singer, Y.³

11
- 84998893215
- A kronecker-factored approximate fisher matrix for convolution layers
- Roger Grosse and James Martens. A kronecker-factored approximate fisher matrix for convolution layers. In Proceedings of the 33rd International Conference on Machine Learning (ICML-16), 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning (ICML-16)
- Grosse, R.¹ Martens, J.²

12
- 84994894307
- Scaling up natural gradient by factorizing fisher information
- Roger Grosse and Ruslan Salakhutdinov. Scaling up natural gradient by factorizing fisher information. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML)
- Grosse, R.¹ Salakhutdinov, R.²

13
- 84958589374
- arXiv preprint
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
- (2015) Deep Residual Learning for Image Recognition
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

14
- 85047480056
- arXiv preprint
- Xi He, Dheevatsa Mudigere, Mikhail Smelyanskiy, and Martin Takáč. Large scale distributed hessian-free optimization for deep neural network. arXiv preprint arXiv:1606.00511, 2016.
- (2016) Large Scale Distributed Hessian-Free Optimization for Deep Neural Network
- He, X.¹ Mudigere, D.² Smelyanskiy, M.³ Takáč, M.⁴

15
- 0034167148
- On “natural” learning and pruning in multilayered perceptrons
- Tom Heskes. On “natural” learning and pruning in multilayered perceptrons. Neural Computation, 12(4): 881-901, 2000.
- (2000) Neural Computation , vol.12 , Issue.4 , pp. 881-901
- Heskes, T.¹

16
- 84969584486
- Batch normalization: Accelerating deep network training by reducing internal covariate shift
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of The 32nd International Conference on Machine Learning, pages 448-456, 2015.
- (2015) Proceedings of the 32nd International Conference on Machine Learning , pp. 448-456
- Ioffe, S.¹ Szegedy, C.²

17
- 85018896963
- arXiv preprint
- Nitish Shirish Keskar and Albert S Berahas. adaqn: An adaptive quasi-newton algorithm for training rnns. arXiv preprint arXiv:1511.01169, 2015.
- (2015) Adaqn: An Adaptive Quasi-Newton Algorithm for Training Rnns
- Keskar, N.S.¹ Berahas, A.S.²

18
- 84941620184
- arXiv preprint
- Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

19
- 85046088901
- arXiv preprint
- Ryan Kiros. Training neural networks with stochastic hessian-free optimization. arXiv preprint arXiv:1301.3641, 2013.
- (2013) Training Neural Networks with Stochastic Hessian-Free Optimization
- Kiros, R.¹

20
- 77956002520
- University of Toronto
- Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images., University of Toronto, 2009.
- (2009) Learning Multiple Layers of Features from Tiny Images
- Krizhevsky, A.¹ Hinton, G.²

21
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

22
- 85162000799
- Topmoumoute online natural gradient algorithm
- Nicolas Le Roux, Pierre-Antoine Manzagol, and Yoshua Bengio. Topmoumoute online natural gradient algorithm. In Advances in neural information processing systems, pages 849-856, 2008.
- (2008) Advances in Neural Information Processing Systems , pp. 849-856
- Le Roux, N.¹ Manzagol, P.-A.² Bengio, Y.³

23
- 0032203257
- Gradient-based learning applied to document recognition
- Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

24
- 77956541496
- Deep learning via Hessian-free optimization
- James Martens. Deep learning via Hessian-free optimization. In Proceedings of the 27th International Conference on Machine Learning (ICML), pages 735-742, 2010.
- (2010) Proceedings of the 27th International Conference on Machine Learning (ICML) , pp. 735-742
- Martens, J.¹

25
- 84969971072
- arXiv preprint
- James Martens. New insights and perspectives on the natural gradient method. arXiv preprint arXiv:1412.1193, 2014.
- (2014) New Insights and Perspectives on the Natural Gradient Method
- Martens, J.¹

26
- 84969988426
- Optimizing neural networks with kronecker-factored approximate curvature
- James Martens and Roger Grosse. Optimizing neural networks with kronecker-factored approximate curvature. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pages 2408-2417, 2015.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML-15) , pp. 2408-2417
- Martens, J.¹ Grosse, R.²

27
- 84872565347
- Training deep and recurrent networks with Hessian-free optimization
- Springer
- James Martens and Ilya Sutskever. Training deep and recurrent networks with Hessian-free optimization. In Neural Networks: Tricks of the Trade, pages 479-535. Springer, 2012.
- (2012) Neural Networks: Tricks of the Trade , pp. 479-535
- Martens, J.¹ Sutskever, I.²

28
- 85067064967
- A linearly-convergent stochastic L-BFGS algorithm
- Philipp Moritz, Robert Nishihara, and Michael Jordan. A linearly-convergent stochastic L-BFGS algorithm. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, pages 249-258, 2016.
- (2016) Proceedings of the 19th International Conference on Artificial Intelligence and Statistics , pp. 249-258
- Moritz, P.¹ Nishihara, R.² Jordan, M.³

29
- 85030478533
- arXiv preprint
- Yann Ollivier. Riemannian metrics for neural networks i: feedforward networks. arXiv preprint arXiv:1303.0818, 2013.
- (2013) Riemannian Metrics for Neural Networks I: Feedforward Networks
- Ollivier, Y.¹

30
- 0026899240
- Acceleration of stochastic approximation by averaging
- Boris T Polyak and Anatoli B Juditsky. Acceleration of stochastic approximation by averaging. SIAM Journal on Control and Optimization, 30(4):838-855, 1992.
- (1992) SIAM Journal on Control and Optimization , vol.30 , Issue.4 , pp. 838-855
- Polyak, B.T.¹ Juditsky, A.B.²

31
- 85083954109
- Parallel training of DNNs with natural gradient and parameter averaging
- Daniel Povey, Xiaohui Zhang, and Sanjeev Khudanpur. Parallel training of DNNs with natural gradient and parameter averaging. In International Conference on Learning Representations: Workshop track, 2015.
- (2015) International Conference on Learning Representations: Workshop Track
- Povey, D.¹ Zhang, X.² Khudanpur, S.³

32
- 85071014844
- Vivek Ramamurthy and Nigel Duffy. L-SR1: A novel second order optimization method for deep learning.
- L-SR1: A Novel Second Order Optimization Method for Deep Learning
- Ramamurthy, V.¹ Duffy, N.²

33
- 84947041871
- Imagenet large scale visual recognition challenge
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, An-drej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211-252, 2015.
- (2015) International Journal of Computer Vision , vol.115 , Issue.3 , pp. 211-252
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.-D.⁸ Khosla, A.⁹ Bernstein, M.¹⁰

34
- 0038231917
- Centering neural network gradient factors
- Genevieve B. Orr and Klaus-Robert Müller, editors, Springer Verlag, Berlin
- Nicol N. Schraudolph. Centering neural network gradient factors. In Genevieve B. Orr and Klaus-Robert Müller, editors, Neural Networks: Tricks of the Trade, volume 1524 of Lecture Notes in Computer Science, pages 207-226. Springer Verlag, Berlin, 1998.
- (1998) Neural Networks: Tricks of the Trade, 1524 of Lecture Notes in Computer Science , pp. 207-226
- Schraudolph, N.N.¹

35
- 0036631778
- Fast curvature matrix-vector products for second-order gradient descent
- Nicol N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 14(7), 2002.
- (2002) Neural Computation , vol.14 , Issue.7
- Schraudolph, N.N.¹

36
- 72449211086
- A stochastic quasi-Newton method for online convex optimization
- Nicol N Schraudolph, Jin Yu, Simon Günter, et al. A stochastic quasi-newton method for online convex optimization. In AISTATS, volume 7, pages 436-443, 2007.
- (2007) AISTATS , vol.7 , pp. 436-443
- Schraudolph, N.N.¹ Yu, J.² Günter, S.³

37
- 84964983441
- arXiv preprint
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Er-han, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.
- (2014) Going Deeper with Convolutions
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Er-Han, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

38
- 84990032289
- arXiv preprint
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. arXiv preprint arXiv:1512.00567, 2015.
- (2015) Rethinking the Inception Architecture for Computer Vision
- Szegedy, C.¹ Vanhoucke, V.² Ioffe, S.³ Shlens, J.⁴ Wojna, Z.⁵

39
- 84954239313
- Krylov subspace descent for deep learning
- Oriol Vinyals and Daniel Povey. Krylov subspace descent for deep learning. In AISTATS, pages 1261-1268, 2012.
- (2012) AISTATS , pp. 1261-1268
- Vinyals, O.¹ Povey, D.²

40
- 84946019022
- arXiv preprint
- Xiao Wang, Shiqian Ma, and Wei Liu. Stochastic quasi-newton methods for nonconvex stochastic optimization. arXiv preprint arXiv:1412.1196, 2014.
- (2014) Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization
- Wang, X.¹ Ma, S.² Liu, W.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.