-
1
-
-
84971463350
-
-
arXiv preprint
-
Amodei, D., Anubhai, R., Battenberg, E., et al Deep speech 2: End-to-end speech recognition in english and mandarin. arXiv preprint arXiv:1512.02595, 2015.
-
(2015)
Deep Speech 2: End-to-end Speech Recognition in English and Mandarin
-
-
Amodei, D.1
Anubhai, R.2
Battenberg, E.3
-
2
-
-
84904136037
-
Large-scale machine learning with stochastic gradient descent
-
Springer
-
Bottou, L. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pp. 177-186. Springer, 2010.
-
(2010)
Proceedings of COMPSTAT'2010
, pp. 177-186
-
-
Bottou, L.1
-
3
-
-
0040307478
-
Anomalous diffusion in disordered media: Statistical mechanisms, models and physical applications
-
Bouchaud, J. P. and Georges, A. Anomalous diffusion in disordered media: statistical mechanisms, models and physical applications. Physics reports, 195:127-293, 1990.
-
(1990)
Physics Reports
, vol.195
, pp. 127-293
-
-
Bouchaud, J.P.1
Georges, A.2
-
4
-
-
0023403821
-
Anomalous diffusion in random media of any dimensionality
-
Bouchaud, J. P. and Comtet, A. Anomalous diffusion in random media of any dimensionality. J. Physique, 48: 1445-1450, 1987.
-
(1987)
J. Physique
, vol.48
, pp. 1445-1450
-
-
Bouchaud, J.P.1
Comtet, A.2
-
5
-
-
34147142335
-
Statistics of critical points of Gaussian fields on large-dimensional spaces
-
Bray, A. J. and Dean, D. S. Statistics of critical points of Gaussian fields on large-dimensional spaces. Physical Review Letters, 98(15):1-5, 2007.
-
(2007)
Physical Review Letters
, vol.98
, Issue.15
, pp. 1-5
-
-
Bray, A.J.1
Dean, D.S.2
-
6
-
-
84954310140
-
The loss surfaces of multilayer networks
-
Choromanska, A., Henaff, M., Mathieu, M., Arous, G. B., and LeCun, Y. The Loss Surfaces of Multilayer Networks. AISTATS15, 38, 2015.
-
AISTATS15
, vol.38
, pp. 2015
-
-
Choromanska, A.1
Henaff, M.2
Mathieu, M.3
Arous, G.B.4
LeCun, Y.5
-
7
-
-
85014228960
-
-
arXiv preprint
-
Das, D., Avancha, S., Mudigere, D., et al Distributed deep learning using synchronous stochastic gradient descent. arXiv preprint arXiv:1602.06709, 2016.
-
(2016)
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
-
-
Das, D.1
Avancha, S.2
Mudigere, D.3
-
8
-
-
84945969537
-
Rmsprop and equilibrated adaptive learning rates for non-convex optimization
-
Dauphin, Y., de Vries, H., Chung, J., and Bengio, Y. Rmsprop and equilibrated adaptive learning rates for non-convex optimization. corr abs/1502.04390 (2015).
-
(2015)
Corr Abs/1502.04390
-
-
Dauphin, Y.1
De Vries, H.2
Chung, J.3
Bengio, Y.4
-
9
-
-
84922386830
-
Identifying and attacking the saddle point problem in high-dimensional non-convex optimization
-
Dauphin, Y., Pascanu, R., and Gulcehre, C. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In NIPS, pp. 1-9, 2014.
-
(2014)
NIPS
, pp. 1-9
-
-
Dauphin, Y.1
Pascanu, R.2
Gulcehre, C.3
-
10
-
-
84877760312
-
Large scale distributed deep networks
-
Dean, J., Corrado, G., Monga, R., et al Large scale distributed deep networks. In NIPS, pp. 1223-1231, 2012.
-
(2012)
NIPS
, pp. 1223-1231
-
-
Dean, J.1
Corrado, G.2
Monga, R.3
-
11
-
-
72249100259
-
ImageNet: A large-scale hierarchical image database
-
Deng, J., Dong, W., Socher, R., et al ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
-
(2009)
CVPR09
-
-
Deng, J.1
Dong, W.2
Socher, R.3
-
12
-
-
85046996804
-
-
arXiv preprint
-
Dinh, L., Pascanu, R., Bengio, S., and Bengio, Y. Sharp minima can generalize for deep nets. arXiv preprint arXiv:1703.04933, 2017.
-
(2017)
Sharp Minima can Generalize for Deep Nets
-
-
Dinh, L.1
Pascanu, R.2
Bengio, S.3
Bengio, Y.4
-
13
-
-
80052250414
-
Adaptive subgradient methods for online learning and stochastic optimization
-
Duchi, J., Hazan, E., and Singer, Y. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(Jul):2121-2159, 2011.
-
(2011)
Journal of Machine Learning Research
, vol.12
, Issue.JUL
, pp. 2121-2159
-
-
Duchi, J.1
Hazan, E.2
Singer, Y.3
-
14
-
-
0040716207
-
Multidimensional random walks in random environments with subclassical limiting behavior
-
Durrett, R. Multidimensional random walks in random environments with subclassical limiting behavior. Communications in Mathematical Physics, 104(1):87-102, 1986.
-
(1986)
Communications in Mathematical Physics
, vol.104
, Issue.1
, pp. 87-102
-
-
Durrett, R.1
-
15
-
-
84998858755
-
Escaping from saddle points-online stochastic gradient for tensor decomposition
-
Ge, R., Huang, F., Jin, C., and Yuan, Y. Escaping from saddle points-online stochastic gradient for tensor decomposition. In COLT, pp. 797-842, 2015.
-
(2015)
COLT
, pp. 797-842
-
-
Ge, R.1
Huang, F.2
Jin, C.3
Yuan, Y.4
-
16
-
-
0001219859
-
Regularization theory and neural networks architectures
-
Girosi, F., Jones, M., and Poggio, T. Regularization theory and neural networks architectures. Neural computation, 7(2):219-269, 1995.
-
(1995)
Neural Computation
, vol.7
, Issue.2
, pp. 219-269
-
-
Girosi, F.1
Jones, M.2
Poggio, T.3
-
17
-
-
85033703452
-
-
arXiv preprint
-
Goyal, P., Dollár, P., Girshick, R., et al Accurate, large minibatch sgd: Training imagenet in 1 hour. arXiv preprint arXiv:1706.02677, 2017.
-
(2017)
Accurate, Large Minibatch Sgd: Training Imagenet in 1 Hour
-
-
Goyal, P.1
Dollár, P.2
Girshick, R.3
-
18
-
-
85015190946
-
Train faster, generalize better: Stability of stochastic gradient descent
-
Hardt, M., Recht, B., and Singer, Y. Train faster, generalize better: Stability of stochastic gradient descent. ICML, pp. 1-24, 2016.
-
(2016)
ICML
, pp. 1-24
-
-
Hardt, M.1
Recht, B.2
Singer, Y.3
-
19
-
-
84986274465
-
Deep residual learning for image recognition
-
He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770-778, 2016.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 770-778
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
21
-
-
85015249548
-
On large-batch training for deep learning: Generalization gap and sharp minima
-
Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., and Tang, P. T. P. On large-batch training for deep learning: Generalization gap and sharp minima. In ICLR, 2017.
-
(2017)
ICLR
-
-
Keskar, N.S.1
Mudigere, D.2
Nocedal, J.3
Smelyanskiy, M.4
Tang, P.T.P.5
-
25
-
-
34249873596
-
Efficient backprop in neural networks: Tricks of the trade
-
(orr, g. and müller, k., eds.)
-
LeCun, Y., Bottou, L., and Orr, G. Efficient backprop in neural networks: Tricks of the trade (orr, g. and müller, k., eds.). Lecture Notes in Computer Science, 1524, 1998a.
-
(1998)
Lecture Notes in Computer Science
, vol.1524
-
-
LeCun, Y.1
Bottou, L.2
Orr, G.3
-
26
-
-
0032203257
-
Gradient-based learning applied to document recognition
-
LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998b.
-
(1998)
Proceedings of the IEEE
, vol.86
, Issue.11
, pp. 2278-2324
-
-
LeCun, Y.1
Bottou, L.2
Bengio, Y.3
Haffner, P.4
-
28
-
-
84907022486
-
Efficient mini-batch training for stochastic optimization
-
ACM
-
Li, M., Zhang, T., Chen, Y., and Smola, A. J. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 661-670. ACM, 2014.
-
(2014)
Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
, pp. 661-670
-
-
Li, M.1
Zhang, T.2
Chen, Y.3
Smola, A.J.4
-
30
-
-
0040982291
-
Random walk in a random environment and 1f noise
-
Marinari, E., Parisi, G., Ruelle, D., and Windey, P. Random Walk in a Random Environment and 1f Noise. Physical Review Letters, 50(1):1223-1225, 1983.
-
(1983)
Physical Review Letters
, vol.50
, Issue.1
, pp. 1223-1225
-
-
Marinari, E.1
Parisi, G.2
Ruelle, D.3
Windey, P.4
-
31
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Mnih, V., Kavukcuoglu, K., Silver, D., et al Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
-
33
-
-
85030990177
-
An overview of gradient descent optimization algorithms
-
Ruder, S. An overview of gradient descent optimization algorithms. CoRR, abs/1609.04747, 2016.
-
(2016)
CoRR
-
-
Ruder, S.1
-
34
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
Silver, D., Huang, A., Maddison, C. J., et al Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
-
38
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
Srivastava, N., and Hinton, G. E., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1):1929-1958, 2014.
-
(2014)
Journal of Machine Learning Research
, vol.15
, Issue.1
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.E.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
39
-
-
84892623436
-
On the importance of initialization and momentum in deep learning
-
Sutskever, I., Martens, J., Dahl, G., and Hinton, G. On the importance of initialization and momentum in deep learning. In International conference on machine learning, pp. 1139-1147, 2013.
-
(2013)
International Conference on Machine Learning
, pp. 1139-1147
-
-
Sutskever, I.1
Martens, J.2
Dahl, G.3
Hinton, G.4
-
40
-
-
84986296808
-
Rethinking the inception architecture for computer vision
-
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., and Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818-2826, 2016.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 2818-2826
-
-
Szegedy, C.1
Vanhoucke, V.2
Ioffe, S.3
Shlens, J.4
Wojna, Z.5
-
41
-
-
84897550107
-
Regularization of neural networks using dropconnect
-
JMLR.org
-
Wan, L., Zeiler, M., Zhang, S., LeCun, Y., and Fergus, R. Regularization of neural networks using dropconnect. ICML'13, pp. III-1058-III-1066. JMLR.org, 2013.
-
(2013)
ICML'13
, pp. III1058-III1066
-
-
Wan, L.1
Zeiler, M.2
Zhang, S.3
LeCun, Y.4
Fergus, R.5
-
42
-
-
85013200323
-
Google's neural machine translation system: Bridging the gap between human and machine translation
-
Wu, Y., Schuster, M., Chen, Z., et al Google's neural machine translation system: Bridging the gap between human and machine translation. CoRR, abs/1609.08144, 2016.
-
(2016)
CoRR
-
-
Wu, Y.1
Schuster, M.2
Chen, Z.3
-
44
-
-
85047020267
-
Wide residual networks
-
Zagoruyko, K. Wide residual networks. In BMVC, 2016.
-
(2016)
BMVC
-
-
Zagoruyko, K.1
-
45
-
-
85041447831
-
Understanding deep learning requires rethinking generalization
-
Zhang, C., Bengio, S., Hardt, M., Recht, B., and Vinyals, O. Understanding deep learning requires rethinking generalization. In ICLR, 2017.
-
(2017)
ICLR
-
-
Zhang, C.1
Bengio, S.2
Hardt, M.3
Recht, B.4
Vinyals, O.5
-
46
-
-
84965152276
-
Deep learning with elastic averaging sgd
-
Zhang, S., and Choromanska, A. E., and LeCun, Y. Deep learning with elastic averaging sgd. In NIPS, pp. 685-693, 2015.
-
(2015)
NIPS
, pp. 685-693
-
-
Zhang, S.1
Choromanska, A.E.2
LeCun, Y.3
|