-
1
-
-
84862277874
-
Understanding the difficulty of training deep feedforward neural networks
-
May
-
Bengio, Yoshua and Glorot, Xavier. Understanding the difficulty of training deep feedforward neural networks. In Proceedings of AISTATS 2010, volume 9, pp. 249-256, May 2010.
-
(2010)
Proceedings of AISTATS 2010
, vol.9
, pp. 249-256
-
-
Bengio, Y.1
Glorot, X.2
-
2
-
-
84877760312
-
Large scale distributed deep networks
-
Dean, Jeffrey, Corrado, Greg S., Monga, Raj at, Chen, Kai, Devin, Matthieu, Le, Quoc V., Mao, Mark Z., Ranzato, Marc'Aurelio, Senior, Andrew, Tucker, Paul, Yang, Ke, and Ng, Andrew Y. Large scale distributed deep networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Dean, J.1
Corrado, G.S.2
Monga, R.3
Chen, K.4
Devin, M.5
Le Quoc, V.6
Mao, M.Z.7
Ranzato, M.8
Senior, A.9
Tucker, P.10
Yang, K.11
Ng, A.Y.12
-
4
-
-
80052250414
-
Adaptive subgradient methods for online learning and stochastic optimization
-
July
-
Duchi, John, Hazan, Elad, and Singer, Yoram. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res., 12: 2121-2159, July 2011. ISSN 1532-4435.
-
(2011)
J. Mach. Learn. Res.
, vol.12
, pp. 2121-2159
-
-
Duchi, J.1
Hazan, E.2
Singer, Y.3
-
5
-
-
85083951034
-
Knowledge matters: Importance of prior information for optimization
-
abs/1301.4083
-
G ülçehre, Caglar and Bengio, Yoshua. Knowledge matters: Importance of prior information for optimization. CoRR, abs/1301.4083, 2013.
-
(2013)
CoRR
-
-
Gülçehre, C.1
Bengio, Y.2
-
6
-
-
84937472647
-
Delving deep into rectifiers: Surpassing human-level performance on imageNet classification
-
February
-
He, K., Zhang, X., Ren, S., and Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. ArXiv e-prints, February 2015.
-
(2015)
ArXiv E-prints
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
7
-
-
0042826822
-
Independent component analysis: Algorithms and applications
-
May
-
Hyvarinen, A. and Oja, E. Independent component analysis: Algorithms and applications. Neural Netw., 13 (4-5): 411-130, May 2000.
-
(2000)
Neural Netw
, vol.13
, Issue.4-5
, pp. 411-430
-
-
Hyvarinen, A.1
Oja, E.2
-
9
-
-
0032203257
-
Gradient-based learning applied to document recognition
-
November
-
LeCun, Y, Bottou, L., Bengio, Y., and Haffner, P. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11): 2278-2324, November 1998a.
-
(1998)
Proceedings of the IEEE
, vol.86
, Issue.11
, pp. 2278-2324
-
-
LeCun, Y.1
Bottou, L.2
Bengio, Y.3
Haffner, P.4
-
10
-
-
0001857994
-
Efficient backprop
-
Orr, G. and K., Muller (eds.) Springer
-
LeCun, Y, Bottou, L., Orr, G., and Muller, K. Efficient backprop. In Orr, G. and K., Muller (eds.), Neural Networks: Tricks of the trade. Springer, 1998b.
-
(1998)
Neural Networks: Tricks of the Trade
-
-
LeCun, Y.1
Bottou, L.2
Orr, G.3
Muller, K.4
-
11
-
-
52249097028
-
Nonlinear image representation using divisive normalization
-
IEEE Computer Society, Jun 23-28
-
Lyu, S and Simoncelli, E P. Nonlinear image representation using divisive normalization. In Proc. Computer Vision and Pattern Recognition, pp. 1-8. IEEE Computer Society, Jun 23-28 2008. doi: 10.1109/CVPR.2008.4587821.
-
(2008)
Proc. Computer Vision and Pattern Recognition
, pp. 1-8
-
-
Lyu, S.1
Simoncelli, E.P.2
-
12
-
-
77956509090
-
Rectified linear units improve restricted Boltzmann machines
-
Omnipress
-
Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted boltzmann machines. In ICML, pp. 807-814. Omnipress, 2010.
-
(2010)
ICML
, pp. 807-814
-
-
Nair, V.1
Hinton, G.E.2
-
13
-
-
84897497795
-
On the difficulty of training recurrent neural networks
-
Pascanu, Razvan, Mikolov, Tomas, and Bengio, Yoshua. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21 June 2013, pp. 1310-1318, 2013.
-
(2013)
Proceedings of the 30th International Conference on Machine Learning ICML 2013, Atlanta, GA, USA, 16-21 June 2013
, pp. 1310-1318
-
-
Pascanu, R.1
Mikolov, T.2
Bengio, Y.3
-
14
-
-
84969522474
-
Parallel training of deep neural networks with natural gradient and parameter averaging
-
abs/1410.7455
-
Povey, Daniel, Zhang, Xiaohui, and Khudanpur, Sanjeev. Parallel training of deep neural networks with natural gradient and parameter averaging. CoRR, abs/1410.7455, 2014.
-
(2014)
CoRR
-
-
Povey, D.1
Zhang, X.2
Khudanpur, S.3
-
15
-
-
84893409634
-
Deep learning made easier by linear transformations in per-ceptrons
-
Raiko, Tapani, Valpola, Harri, and LeCun, Yann. Deep learning made easier by linear transformations in per-ceptrons. In International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 924-932, 2012.
-
(2012)
International Conference on Artificial Intelligence and Statistics (AISTATS)
, pp. 924-932
-
-
Raiko, T.1
Valpola, H.2
LeCun, Y.3
-
16
-
-
84909978410
-
-
Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang, Zhiheng, Karpa-thy, Andrej, Khosla, Aditya, Bernstein, Michael, Berg, Alexander C, and Fei-Fei, Li. ImageNet Large Scale Visual Recognition Challenge, 2014.
-
(2014)
ImageNet Large Scale Visual Recognition Challenge
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
17
-
-
84969522090
-
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks
-
abs/1312.6120
-
Saxe, Andrew M., McClelland, James L., and Ganguli, Surya. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. CoRR, abs/1312.6120, 2013.
-
(2013)
CoRR
-
-
Saxe, A.M.1
McClelland, J.L.2
Ganguli, S.3
-
18
-
-
0037527188
-
Improving predictive inference under covariate shift by weighting the log-likelihood function
-
October
-
Shimodaira, Hidetoshi. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90 (2): 227-244, October 2000.
-
(2000)
Journal of Statistical Planning and Inference
, vol.90
, Issue.2
, pp. 227-244
-
-
Shimodaira, H.1
-
19
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
January
-
Srivastava, Nitish, Hinton, Geoffrey, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15 (1): 1929-1958, January 2014.
-
(2014)
J. Mach. Learn. Res.
, vol.15
, Issue.1
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
20
-
-
84897510162
-
On the importance of initialization and momentum in deep learning
-
JMLR.org
-
Sutskever, Ilya, Martens, James, Dahl, George E., and Hinton, Geoffrey E. On the importance of initialization and momentum in deep learning. In ICML (3), volume 28 of JMLR Proceedings, pp. 1139-1147. JMLR.org, 2013.
-
(2013)
ICML (3) of JMLR Proceedings
, vol.28
, pp. 1139-1147
-
-
Sutskever, I.1
Martens, J.2
Dahl, G.E.3
Hinton, G.E.4
-
21
-
-
84941122549
-
Going deeper with convolutions
-
abs/1409.4842
-
Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Du-mitru, Vanhoucke, Vincent, and Rabinovich, Andrew. Going deeper with convolutions. CoRR, abs/1409.4842, 2014.
-
(2014)
CoRR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
22
-
-
85162533997
-
A convergence analysis of log-linear training
-
Shawe-Taylor, J., Zemel, R.S., Bartlett, P., Pereira, F.C.N., and Weinberger, K.Q. (eds.), Granada, Spain, December
-
Wiesler, Simon and Ney, Hermann. A convergence analysis of log-linear training. In Shawe-Taylor, J., Zemel, R.S., Bartlett, P., Pereira, F.C.N., and Weinberger, K.Q. (eds.), Advances in Neural Information Processing Systems 24, pp. 657-665, Granada, Spain, December 2011.
-
(2011)
Advances in Neural Information Processing Systems
, vol.24
, pp. 657-665
-
-
Wiesler, S.1
Ney, H.2
-
23
-
-
84905233897
-
Mean-normalized stochastic gradient for large-scale deep learning
-
Florence, Italy, May
-
Wiesler, Simon, Richard, Alexander, Schliiter, Ralf, and Ney, Hermann. Mean-normalized stochastic gradient for large-scale deep learning. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 180-184, Florence, Italy, May 2014.
-
(2014)
IEEE International Conference on Acoustics, Speech, and Signal Processing
, pp. 180-184
-
-
Simon, W.1
Alexander, R.2
Ralf, S.3
Hermann, N.4
-
24
-
-
84930572185
-
-
Wu, Ren, Yan, Shengen, Shan, Yi, Dang, Qingqing, and Sun, Gang. Deep image: Scaling up image recognition, 2015.
-
(2015)
Deep Image: Scaling Up Image Recognition
-
-
Wu, R.1
Yan, S.2
Shan, Y.3
Dang, Q.4
Sun, G.5
|