SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Understanding deep learning requires rethinking generalization

(5) Zhang, Chiyuan a,c Recht, Benjamin b,c Bengio, Samy c Hardt, Moritz c Vinyals, Oriol d

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

b UNIVERSITY OF CALIFORNIA (United States)

c GOOGLE INC (United States)

d DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

CLASSIFICATION (OF INFORMATION); GRADIENT METHODS; NEURAL NETWORKS; SAMPLING; STOCHASTIC SYSTEMS;

CONVOLUTIONAL NETWORKS; GENERALIZATION ERROR; REGULARIZATION TECHNIQUE; STOCHASTIC GRADIENT METHODS; SYSTEMATIC EXPERIMENT; TEST PERFORMANCE; TRADITIONAL APPROACHES; TRADITIONAL MODELS;

DEEP LEARNING;

EID: 85088231398 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1692)

References (32)

1
- 84958264664
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Watten-berg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org.
- (2015) TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems
- Abadi, M.¹ Agarwal, A.² Barham, P.³ Brevdo, E.⁴ Chen, Z.⁵ Citro, C.⁶ Corrado, G.S.⁷ Davis, A.⁸ Dean, J.⁹ Devin, M.¹⁰ Ghemawat, S.¹¹ Goodfellow, I.¹² Harp, A.¹³ Irving, G.¹⁴ Isard, M.¹⁵ Jia, Y.¹⁶ Jozefowicz, R.¹⁷ Kaiser, L.¹⁸ Kudlur, M.¹⁹ Levenberg, J.²⁰ more..

2
- 0032028728
- The sample complexity of pattern classification with neural networks - The size of the weights is more important than the size of the network
- Peter L Bartlett. The Sample Complexity of Pattern Classification with Neural Networks - The Size of the Weights is More Important than the Size of the Network. IEEE Trans. Information Theory, 1998.
- (1998) IEEE Trans. Information Theory
- Bartlett, P.L.¹

3
- 0038453192
- Rademacher and Gaussian complexities: Risk bounds and structural results
- March
- Peter L Bartlett and Shahar Mendelson. Rademacher and gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3:463-482, March 2003.
- (2003) Journal of Machine Learning Research , vol.3 , pp. 463-482
- Bartlett, P.L.¹ Mendelson, S.²

4
- 0038368335
- Stability and generalization
- March
- Olivier Bousquet and André Elisseeff. Stability and generalization. Journal of Machine Learning Research, 2:499-526, March 2002.
- (2002) Journal of Machine Learning Research , vol.2 , pp. 499-526
- Bousquet, O.¹ Elisseeff, A.²

5
- 84965107578
- The loss surfaces of multilayer networks
- Anna Choromanska, Mikael Henaff, Michael Mathieu, Gérard Ben Arous, and Yann LeCun. The loss surfaces of multilayer networks. In AISTATS, 2015.
- (2015) AISTATS
- Choromanska, A.¹ Henaff, M.² Mathieu, M.³ Arous, G.B.⁴ LeCun, Y.⁵

6
- 84887730796
- Learning feature representations with k-means
- Springer
- Adam Coates and Andrew Y. Ng. Learning feature representations with k-means. In Neural Networks: Tricks of the Trade, Reloaded. Springer, 2012.
- (2012) Neural Networks: Tricks of the Trade, Reloaded
- Coates, A.¹ Ng, A.Y.²

7
- 84998891586
- Convolutional rectifier networks as generalized tensor decompositions
- Nadav Cohen and Amnon Shashua. Convolutional Rectifier Networks as Generalized Tensor Decompositions. In ICML, 2016.
- (2016) ICML
- Cohen, N.¹ Shashua, A.²

8
- 0024861871
- Approximation by superposition of sigmoidal functions
- G Cybenko. Approximation by superposition of sigmoidal functions. Mathematics of Control, Signals and Systems, 2(4):303-314, 1989.
- (1989) Mathematics of Control, Signals and Systems , vol.2 , Issue.4 , pp. 303-314
- Cybenko, G.¹

9
- 85162314283
- Shallow vs. Deep sum-product networks
- Olivier Delalleau and Yoshua Bengio. Shallow vs. Deep Sum-Product Networks. In Advances in Neural Information Processing Systems, 2011.
- (2011) Advances in Neural Information Processing Systems
- Delalleau, O.¹ Bengio, Y.²

10
- 0003779024
- Taylor & Francis
- E. Edgington and P. Onghena. Randomization Tests. Statistics: A Series of Textbooks and Monographs. Taylor & Francis, 2007. ISBN 9781584885894.
- (2007) Randomization Tests. Statistics: A Series of Textbooks and Monographs
- Edgington, E.¹ Onghena, P.²

11
- 85022188122
- The power of depth for feedforward neural networks
- Ronen Eldan and Ohad Shamir. The Power of Depth for Feedforward Neural Networks. In COLT, 2016.
- (2016) COLT
- Eldan, R.¹ Shamir, O.²

12
- 84998546946
- Train faster, generalize better: Stability of stochastic gradient descent
- Moritz Hardt, Benjamin Recht, and Yoram Singer. Train faster, generalize better: Stability of stochastic gradient descent. In ICML, 2016.
- (2016) ICML
- Hardt, M.¹ Recht, B.² Singer, Y.³

13
- 84986274465
- Deep residual learning for image recognition
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition. In CVPR, 2016.
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

14
- 84969584486
- Batch normalization: Accelerating deep network training by reducing internal covariate shift
- Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In ICML, 2015.
- (2015) ICML
- Ioffe, S.¹ Szegedy, C.²

15
- 77956002520
- Technical report, Department of Computer Science, University of Toronto
- Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images. Technical report, Department of Computer Science, University of Toronto, 2009.
- (2009) Learning Multiple Layers of Features from Tiny Images
- Krizhevsky, A.¹ Hinton, G.²

16
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems, 2012.
- (2012) Advances in Neural Information Processing Systems
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

17
- 84998886271
- Generalization properties and implicit regularization for multiple passes SGM
- Junhong Lin, Raffaello Camoriano, and Lorenzo Rosasco. Generalization Properties and Implicit Regularization for Multiple Passes SGM. In ICML, 2016.
- (2016) ICML
- Lin, J.¹ Camoriano, R.² Rosasco, L.³

18
- 84937952927
- On the computational efficiency of training neural networks
- Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir. On the computational efficiency of training neural networks. In Advances in Neural Information Processing Systems, 2014.
- (2014) Advances in Neural Information Processing Systems
- Livni, R.¹ Shalev-Shwartz, S.² Shamir, O.³

19
- 85029782784
- CoRR, abs/1608.03287
- Hrushikesh Mhaskar and Tomaso A. Poggio. Deep vs. shallow networks: An approximation theory perspective. CoRR, abs/1608.03287, 2016. URL http://arxiv.org/abs/1608.03287.
- (2016) Deep Vs. Shallow Networks: An Approximation Theory Perspective
- Mhaskar, H.¹ Poggio, T.A.²

20
- 0006863682
- Approximation properties of a multilayered feedforward artificial neural network
- Hrushikesh Narhar Mhaskar. Approximation properties of a multilayered feedforward artificial neural network. Advances in Computational Mathematics, 1(1):61-80, 1993.
- (1993) Advances in Computational Mathematics , vol.1 , Issue.1 , pp. 61-80
- Mhaskar, H.N.¹

21
- 1842515655
- Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization
- Massachusetts Institute of Technology
- Sayan Mukherjee, Partha Niyogi, Tomaso Poggio, and Ryan Rifkin. Statistical learning: Stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Technical Report AI Memo 2002-024, Massachusetts Institute of Technology, 2002.
- (2002) Technical Report AI Memo
- Mukherjee, S.¹ Niyogi, P.² Poggio, T.³ Rifkin, R.⁴

22
- 85046993129
- CoRR, abs/1412.6614
- Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro. In search of the real inductive bias: On the role of implicit regularization in deep learning. CoRR, abs/1412.6614, 2014.
- (2014) Search of the Real Inductive Bias: On the Role of Implicit Regularization in Deep Learning
- Neyshabur, B.¹ Tomioka, R.² Srebro, N.³

23
- 85007187503
- Norm-based capacity control in neural networks
- Behnam Neyshabur, Ryota Tomioka, and Nathan Srebro. Norm-Based Capacity Control in Neural Networks. In COLT, pp. 1376-1401, 2015.
- (2015) COLT , pp. 1376-1401
- Neyshabur, B.¹ Tomioka, R.² Srebro, N.³

24
- 1842420581
- General conditions for predic-tivity in learning theory
- Tomaso Poggio, Ryan Rifkin, Sayan Mukherjee, and Partha Niyogi. General conditions for predic-tivity in learning theory. Nature, 428(6981):419-422, 2004.
- (2004) Nature , vol.428 , Issue.6981 , pp. 419-422
- Poggio, T.¹ Rifkin, R.² Mukherjee, S.³ Niyogi, P.⁴

25
- 84947041871
- Imagenet large scale visual recognition challenge
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211-252, 2015. ISSN 1573-1405. doi: 10.1007/s11263-015-0816-y.
- (2015) International Journal of Computer Vision , vol.115 , Issue.3 , pp. 211-252
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

26
- 0037692278
- A generalized representer theorem
- Bernhard Schölkopf, Ralf Herbrich, and Alex J Smola. A generalized representer theorem. In COLT, 2001.
- (2001) COLT
- Schölkopf, B.¹ Herbrich, R.² Smola, A.J.³

27
- 78649409695
- Learnability, stability and uniform convergence
- October
- Shai Shalev-Shwartz, Ohad Shamir, Nathan Srebro, and Karthik Sridharan. Learnability, stability and uniform convergence. Journal of Machine Learning Research, 11:2635-2670, October 2010.
- (2010) Journal of Machine Learning Research , vol.11 , pp. 2635-2670
- Shalev-Shwartz, S.¹ Shamir, O.² Srebro, N.³ Sridharan, K.⁴

28
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- Nitish Srivastava, Geoffrey E Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929-1958, 2014.
- (2014) Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.E.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

29
- 84986296808
- Rethinking the inception architecture for computer vision
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision. In CVPR, pp. 2818-2826, 2016. doi: 10.1109/CVPR.2016.308.
- (2016) CVPR , pp. 2818-2826
- Szegedy, C.¹ Vanhoucke, V.² Ioffe, S.³ Shlens, J.⁴ Wojna, Z.⁵

30
- 84998628850
- Benefits of depth in neural networks
- Matus Telgarsky. Benefits of depth in neural networks. In COLT, 2016.
- (2016) COLT
- Telgarsky, M.¹

31
- 0003991806
- Statistical learning theory
- Wiley
- Vladimir N. Vapnik. Statistical Learning Theory. Adaptive and learning systems for signal processing, communications, and control. Wiley, 1998.
- (1998) Adaptive and Learning Systems for Signal Processing, Communications, and Control
- Vapnik, V.N.¹

32
- 34547435898
- On early stopping in gradient descent learning
- Yuan Yao, Lorenzo Rosasco, and Andrea Caponnetto. On early stopping in gradient descent learning. Constructive Approximation, 26(2):289-315, 2007.
- (2007) Constructive Approximation , vol.26 , Issue.2 , pp. 289-315
- Yao, Y.¹ Rosasco, L.² Caponnetto, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.