메뉴 건너뛰기




Volumn 2017-December, Issue , 2017, Pages 1732-1742

Train longer, generalize better: Closing the generalization gap in large batch training of neural networks

Author keywords

[No Author keywords available]

Indexed keywords

STOCHASTIC MODELS; STOCHASTIC SYSTEMS;

EID: 85046996830     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (692)

References (46)
  • 2
    • 84904136037 scopus 로고    scopus 로고
    • Large-scale machine learning with stochastic gradient descent
    • Springer
    • Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pp. 177-186. Springer, 2010.
    • (2010) Proceedings of COMPSTAT'2010 , pp. 177-186
    • Bottou, L.1
  • 3
    • 0040307478 scopus 로고
    • Anomalous diffusion in disordered media: Statistical mechanisms, models and physical applications
    • Bouchaud, J. P. and Georges, A. Anomalous diffusion in disordered media: statistical mechanisms, models and physical applications. Physics reports, 195:127-293, 1990.
    • (1990) Physics Reports , vol.195 , pp. 127-293
    • Bouchaud, J.P.1    Georges, A.2
  • 4
    • 0023403821 scopus 로고
    • Anomalous diffusion in random media of any dimensionality
    • Bouchaud, J. P. and Comtet, A. Anomalous diffusion in random media of any dimensionality. J. Physique, 48: 1445-1450, 1987.
    • (1987) J. Physique , vol.48 , pp. 1445-1450
    • Bouchaud, J.P.1    Comtet, A.2
  • 5
    • 34147142335 scopus 로고    scopus 로고
    • Statistics of critical points of Gaussian fields on large-dimensional spaces
    • Bray, A. J. and Dean, D. S. Statistics of critical points of Gaussian fields on large-dimensional spaces. Physical Review Letters, 98(15):1-5, 2007.
    • (2007) Physical Review Letters , vol.98 , Issue.15 , pp. 1-5
    • Bray, A.J.1    Dean, D.S.2
  • 8
    • 84945969537 scopus 로고    scopus 로고
    • Rmsprop and equilibrated adaptive learning rates for non-convex optimization
    • Dauphin, Y., de Vries, H., Chung, J., and Bengio, Y. Rmsprop and equilibrated adaptive learning rates for non-convex optimization. corr abs/1502.04390 (2015).
    • (2015) Corr Abs/1502.04390
    • Dauphin, Y.1    De Vries, H.2    Chung, J.3    Bengio, Y.4
  • 9
    • 84922386830 scopus 로고    scopus 로고
    • Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
    • Dauphin, Y., Pascanu, R., and Gulcehre, C. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In NIPS, pp. 1-9, 2014.
    • (2014) NIPS , pp. 1-9
    • Dauphin, Y.1    Pascanu, R.2    Gulcehre, C.3
  • 10
    • 84877760312 scopus 로고    scopus 로고
    • Large scale distributed deep networks
    • Dean, J., Corrado, G., Monga, R., et al Large scale distributed deep networks. In NIPS, pp. 1223-1231, 2012.
    • (2012) NIPS , pp. 1223-1231
    • Dean, J.1    Corrado, G.2    Monga, R.3
  • 11
    • 72249100259 scopus 로고    scopus 로고
    • ImageNet: A large-scale hierarchical image database
    • Deng, J., Dong, W., Socher, R., et al ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
    • (2009) CVPR09
    • Deng, J.1    Dong, W.2    Socher, R.3
  • 13
    • 80052250414 scopus 로고    scopus 로고
    • Adaptive subgradient methods for online learning and stochastic optimization
    • Duchi, J., Hazan, E., and Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121-2159, 2011.
    • (2011) Journal of Machine Learning Research , vol.12 , Issue.JUL , pp. 2121-2159
    • Duchi, J.1    Hazan, E.2    Singer, Y.3
  • 14
    • 0040716207 scopus 로고
    • Multidimensional random walks in random environments with subclassical limiting behavior
    • Durrett, R. Multidimensional random walks in random environments with subclassical limiting behavior. Communications in Mathematical Physics, 104(1):87-102, 1986.
    • (1986) Communications in Mathematical Physics , vol.104 , Issue.1 , pp. 87-102
    • Durrett, R.1
  • 15
    • 84998858755 scopus 로고    scopus 로고
    • Escaping from saddle points-online stochastic gradient for tensor decomposition
    • Ge, R., Huang, F., Jin, C., and Yuan, Y. Escaping from saddle points-online stochastic gradient for tensor decomposition. In COLT, pp. 797-842, 2015.
    • (2015) COLT , pp. 797-842
    • Ge, R.1    Huang, F.2    Jin, C.3    Yuan, Y.4
  • 16
    • 0001219859 scopus 로고
    • Regularization theory and neural networks architectures
    • Girosi, F., Jones, M., and Poggio, T. Regularization theory and neural networks architectures. Neural computation, 7(2):219-269, 1995.
    • (1995) Neural Computation , vol.7 , Issue.2 , pp. 219-269
    • Girosi, F.1    Jones, M.2    Poggio, T.3
  • 18
    • 85015190946 scopus 로고    scopus 로고
    • Train faster, generalize better: Stability of stochastic gradient descent
    • Hardt, M., Recht, B., and Singer, Y. Train faster, generalize better: Stability of stochastic gradient descent. ICML, pp. 1-24, 2016.
    • (2016) ICML , pp. 1-24
    • Hardt, M.1    Recht, B.2    Singer, Y.3
  • 21
    • 85015249548 scopus 로고    scopus 로고
    • On large-batch training for deep learning: Generalization gap and sharp minima
    • Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. On large-batch training for deep learning: Generalization gap and sharp minima. In ICLR, 2017.
    • (2017) ICLR
    • Keskar, N.S.1    Mudigere, D.2    Nocedal, J.3    Smelyanskiy, M.4    Tang, P.T.P.5
  • 25
    • 34249873596 scopus 로고    scopus 로고
    • Efficient backprop in neural networks: Tricks of the trade
    • (orr, g. and müller, k., eds.)
    • LeCun, Y., Bottou, L., and Orr, G. Efficient backprop in neural networks: Tricks of the trade (orr, g. and müller, k., eds.). Lecture Notes in Computer Science, 1524, 1998a.
    • (1998) Lecture Notes in Computer Science , vol.1524
    • LeCun, Y.1    Bottou, L.2    Orr, G.3
  • 26
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998b.
    • (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
    • LeCun, Y.1    Bottou, L.2    Bengio, Y.3    Haffner, P.4
  • 31
    • 84924051598 scopus 로고    scopus 로고
    • Human-level control through deep reinforcement learning
    • Mnih, V., Kavukcuoglu, K., Silver, D., et al Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
    • (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
    • Mnih, V.1    Kavukcuoglu, K.2    Silver, D.3
  • 33
    • 85030990177 scopus 로고    scopus 로고
    • An overview of gradient descent optimization algorithms
    • Ruder, S. An overview of gradient descent optimization algorithms. CoRR, abs/1609.04747, 2016.
    • (2016) CoRR
    • Ruder, S.1
  • 34
    • 84963949906 scopus 로고    scopus 로고
    • Mastering the game of go with deep neural networks and tree search
    • Silver, D., Huang, A., Maddison, C. J., et al Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
    • (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
    • Silver, D.1    Huang, A.2    Maddison, C.J.3
  • 41
    • 84897550107 scopus 로고    scopus 로고
    • Regularization of neural networks using dropconnect
    • JMLR.org
    • Wan, L., Zeiler, M., Zhang, S., LeCun, Y., and Fergus, R. Regularization of neural networks using dropconnect. ICML'13, pp. III-1058-III-1066. JMLR.org, 2013.
    • (2013) ICML'13 , pp. III1058-III1066
    • Wan, L.1    Zeiler, M.2    Zhang, S.3    LeCun, Y.4    Fergus, R.5
  • 42
    • 85013200323 scopus 로고    scopus 로고
    • Google's neural machine translation system: Bridging the gap between human and machine translation
    • Wu, Y., Schuster, M., Chen, Z., et al Google's neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144, 2016.
    • (2016) CoRR
    • Wu, Y.1    Schuster, M.2    Chen, Z.3
  • 44
    • 85047020267 scopus 로고    scopus 로고
    • Wide residual networks
    • Zagoruyko, K. Wide residual networks. In BMVC, 2016.
    • (2016) BMVC
    • Zagoruyko, K.1
  • 45
    • 85041447831 scopus 로고    scopus 로고
    • Understanding deep learning requires rethinking generalization
    • Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning requires rethinking generalization. In ICLR, 2017.
    • (2017) ICLR
    • Zhang, C.1    Bengio, S.2    Hardt, M.3    Recht, B.4    Vinyals, O.5
  • 46
    • 84965152276 scopus 로고    scopus 로고
    • Deep learning with elastic averaging sgd
    • Zhang, S., and Choromanska, A. E., and LeCun, Y. Deep learning with elastic averaging sgd. In NIPS, pp. 685-693, 2015.
    • (2015) NIPS , pp. 685-693
    • Zhang, S.1    Choromanska, A.E.2    LeCun, Y.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.