SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 7700 LECTURE NO, Issue , 2012, Pages 437-478

Practical recommendations for gradient-based training of deep architectures

(1) Bengio, Yoshua a

a UNIVERSITÉ DE MONTRÉAL (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

DEEP LEARNING; DEEP NEURAL NETWORKS; LEARNING ALGORITHMS; NETWORK ARCHITECTURE; NETWORK LAYERS; SIGNALING;

DEEP ARCHITECTURES; GRADIENT BASED; GRADIENT-BASED OPTIMIZATION; HYPER-PARAMETER; HYPERPARAMETERS; PRACTICAL GUIDE; PRACTICAL RECOMMENDATION;

MULTILAYER NEURAL NETWORKS;

EID: 84872577736 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-35289-8_26 Document Type: Article

Times cited : (1691)

References (120)

1
- 0000396062
- Natural gradient works efficiently in learning
- Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10(2), 251-276 (1998) (Pubitemid 128463152)
- (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
- Amari, S.-I.¹

2
- 84872521281
- Non-asymptotic analysis of stochastic approximation algorithms
- Bach, F., Moulines, E.: Non-asymptotic analysis of stochastic approximation algorithms. In: NIPS 2011 (2011)
- (2011) NIPS 2011
- Bach, F.¹ Moulines, E.²

3
- 76749123278
- Differentiable sparse coding
- Bagnell, J.A., Bradley, D.M.: Differentiable sparse coding. In: NIPS 2009, pp. 113-120 (2009)
- (2009) NIPS 2009 , pp. 113-120
- Bagnell, J.A.¹ Bradley, D.M.²

4
- 84991988347
- Learning internal representations
- Baxter, J.: Learning internal representations. In: COLT 1995, pp. 311-320 (1995)
- (1995) COLT 1995 , pp. 311-320
- Baxter, J.¹

5
- 0031187873
- A Bayesian/information theoretic model of learning via multiple task sampling
- Baxter, J.: A Bayesian/information theoretic model of learning via multiple task sampling. Machine Learning 28, 7-40 (1997)
- (1997) Machine Learning , vol.28 , pp. 7-40
- Baxter, J.¹

6
- 79959407847
- Neural net language models
- Bengio, Y.: Neural net language models. Scholarpedia 3(1), 3881 (2008)
- (2008) Scholarpedia , vol.3 , Issue.1 , pp. 3881
- Bengio, Y.¹

7
- 78650904464
- Learning deep architectures for AI
- Bengio, Y.: Learning deep architectures for AI. Now Publishers (2009)
- (2009) Now Publishers
- Bengio, Y.¹

8
- 84872343315
- Deep learning of representations for unsupervised and transfer learning
- Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. In: JMLR W&CP: Proc. Unsupervised and Transfer Learning (2011)
- (2011) JMLR W&CP: Proc. Unsupervised and Transfer Learning
- Bengio, Y.¹

9
- 80054108245
- Kivinen, J., Szepesvari, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS Springer, Heidelberg
- Bengio, Y., Delalleau, O.: On the Expressive Power of Deep Architectures. In: Kivinen, J., Szepesvari, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925, pp. 18-36. Springer, Heidelberg (2011)
- (2011) On the Expressive Power of Deep Architectures , vol.6925 , pp. 18-36
- Bengio, Y.¹ Delalleau, O.²

10
- 34547975052
- Scaling learning algorithms towards AI
- Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. In: Large Scale Kernel Machines (2007)
- (2007) Large Scale Kernel Machines
- Bengio, Y.¹ LeCun, Y.²

11
- 0142166851
- A neural probabilistic language model
- Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR 3, 1137-1155 (2003)
- (2003) JMLR , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Jauvin, C.⁴

12
- 33749245798
- Convex neural networks
- Bengio, Y., Le Roux, N., Vincent, P., Delalleau, O., Marcotte, P.: Convex neural networks. In: NIPS 2005, pp. 123-130 (2006a)
- (2006) NIPS 2005 , pp. 123-130
- Bengio, Y.¹ Le Roux, N.² Vincent, P.³ Delalleau, O.⁴ Marcotte, P.⁵

13
- 77954662106
- The curse of highly variable functions for local kernel machines
- Bengio, Y., Delalleau, O., Le Roux, N.: The curse of highly variable functions for local kernel machines. In: NIPS 2005, pp. 107-114 (2006b)
- (2006) NIPS 2005 , pp. 107-114
- Bengio, Y.¹ Delalleau, O.² Le Roux, N.³

14
- 84864073449
- Greedy layer-wise training of deep networks
- Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: NIPS 2006 (2007)
- (2007) NIPS 2006
- Bengio, Y.¹ Lamblin, P.² Popovici, D.³ Larochelle, H.⁴

15
- 71149116544
- Curriculum learning
- Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML 2009 (2009)
- (2009) ICML 2009
- Bengio, Y.¹ Louradour, J.² Collobert, R.³ Weston, J.⁴

16
- 84872509374
- Implicit density estimation by local moment matching to sample from auto-encoders
- arXiv: 1207.0057
- Bengio, Y., Alain, G., Rifai, S.: Implicit density estimation by local moment matching to sample from auto-encoders. Technical report, arXiv:1207.0057 (2012)
- (2012) Technical report
- Bengio, Y.¹ Alain, G.² Rifai, S.³

17
- 84857855190
- Random search for hyper-parameter optimization
- Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Machine Learning Res. 13, 281-305 (2012)
- (2012) J. Machine Learning Res. , vol.13 , pp. 281-305
- Bergstra, J.¹ Bengio, Y.²

18
- 84856673205
- Theano: A cpu and gpu math expression compiler
- SciPy)
- Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: A CPU and GPU math expression compiler. In: Proc. Python for Scientific Comp. Conf. (SciPy) (2010)
- (2010) Proc. Python for Scientific Comp. Conf
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

19
- 85162384813
- Algorithms for hyper-parameter optimization
- Bergstra, J., Bardenet, R., Bengio, Y., Kegl, B.: Algorithms for hyper-parameter optimization. In: NIPS 2011 (2011)
- (2011) NIPS 2011
- Bergstra, J.¹ Bardenet, R.² Bengio, Y.³ Kegl, B.⁴

20
- 84902137011
- In: Dorronsoro J.R. (ed. ICANN 2002. LNCS Springer, Heidelberg
- Berkes, P., Wiskott, L.: Applying Slow Feature Analysis to Image Sequences Yields a Rich Repertoire of Complex Cell Properties. In: Dorronsoro, J.R. (ed.) ICANN 2002. LNCS, vol. 2415, pp. 81-86. Springer, Heidelberg (2002)
- (2002) Applying Slow Feature Analysis To Image Sequences Yields A Rich Repertoire Of Complex Cell Properties , vol.2415 , pp. 81-86
- Berkes P. Wiskott, L.¹

21
- 81155141540
- Incremental gradient, subgradient, and proximal methods for convex optimization: A survey
- Bertsekas, D.P.: Incremental gradient, subgradient, and proximal methods for convex optimization: A survey. Technical Report 2848, LIDS (2010)
- (2010) Technical Report 2848, LIDS
- Bertsekas, D.P.¹

22
- 68949096711
- Sgd-qn: Careful quasi-newton stochastic gradient descent
- Bordes, A., Bottou, L., Gallinari, P.: Sgd-qn: Careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research 10, 1737-1754 (2009)
- (2009) Journal of Machine Learning Research , vol.10 , pp. 1737-1754
- Bordes, A.¹ Bottou, L.² Gallinari, P.³

23
- 85120807674
- Learning structured embeddings of knowledge bases
- Bordes, A., Weston, J., Collobert, R., Bengio, Y. (2011). Learning structured embeddings of knowledge bases. In: AAAI (2011)
- (2011) AAAI 2011
- Bordes, A.¹ Weston, J.² Collobert, R.³ Bengio, Y.⁴

24
- 84879866425
- Joint learning of words and meaning representations for open-text semantic parsing
- Bordes, A., Glorot, X., Weston, J., Bengio, Y.: Joint learning of words and meaning representations for open-text semantic parsing. In: AISTATS 2012 (2012)
- (2012) AISTATS 2012
- Bordes, A.¹ Glorot, X.² Weston, J.³ Bengio, Y.⁴

25
- 84872505653
- From machine learning to machine reasoning
- Bottou, L.: From machine learning to machine reasoning. Technical report, arXiv.1102 (2011)
- (2011) Technical report, arXiv.1102
- Bottou, L.¹

26
- 84872521733
- Montavon, G., Orr, G.B., Muller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS Springer, Heidelberg
- Bottou, L.: Stochastic Gradient Descent Tricks. In: Montavon, G., Orr, G.B., Muller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 421-436. Springer, Heidelberg (2012)
- (2012) Stochastic Gradient Descent Tricks , vol.7700 , pp. 421-436
- Bottou, L.¹

27
- 85162035281
- The tradeoffs of large scale learning
- Bottou, L., Bousquet, O.: The tradeoffs of large scale learning. In: NIPS 2008 (2008)
- (2008) NIPS 2008
- Bottou, L.¹ Bousquet, O.²

28
- 84899022736
- Large-scale on-line learning
- Bottou, L., LeCun, Y.: Large-scale on-line learning. In: NIPS 2003 (2004)
- (2004) NIPS 2003
- Bottou, L.¹ LeCun, Y.²

29
- 0030211964
- Bagging predictors
- Breiman, L.: Bagging predictors. Machine Learning 24(2), 123-140 (1994) (Pubitemid 126724382)
- (1996) Machine Learning , vol.24 , Issue.2 , pp. 123-140
- Breiman, L.¹

30
- 79959650504
- Quickly generating representative samples from an rbm-derived process
- Breuleux, O., Bengio, Y., Vincent, P.: Quickly generating representative samples from an rbm-derived process. Neural Computation 23(8), 2053-2073 (2011)
- (2011) Neural Computation , vol.23 , Issue.8 , pp. 2053-2073
- Breuleux, O.¹ Bengio, Y.² Vincent, P.³

31
- 0042214974
- Multitask connectionist learning
- Caruana, R.: Multitask connectionist learning. In: Proceedings of the 1993 Connectionist Models Summer School, pp. 372-379 (1993)
- (1993) Proceedings of the 1993 Connectionist Models Summer School , pp. 372-379
- Caruana, R.¹

32
- 80053444761
- Enhanced gradient and adaptive learning rate for training restricted boltzmann machines
- Cho, K., Raiko, T., Ilin, A.: Enhanced gradient and adaptive learning rate for training restricted boltzmann machines. In: ICML 2011, pp. 105-112 (2011)
- (2011) ICML 2011 , pp. 105-112
- Cho, K.¹ Raiko, T.² Ilin, A.³

33
- 80053442434
- The importance of encoding versus training with sparse coding and vector quantization
- Coates, A., Ng, A.Y.: The importance of encoding versus training with sparse coding and vector quantization. In: ICML 2011 (2011)
- (2011) ICML 2011
- Coates, A.¹ Ng, A.Y.²

34
- 14344264564
- In: ICML 2004
- Collobert, R., Bengio, S.: Links between perceptrons, MLPs and SVMs. In: ICML 2004 (2004a)
- (2004) Links Between Perceptrons MLPs And SVMs
- Collobert, R.¹ Bengio, S.²

35
- 14344264564
- In: International Conference On Machine Learning ICML
- Collobert, R., Bengio, S.: Links between perceptrons, MLPs and SVMs. In: International Conference on Machine Learning, ICML (2004b)
- (2004) Links Between Perceptrons MLPs And SVMs
- Collobert, R.¹ Bengio, S.²

36
- 80053558787
- Natural language processing (almost) from scratch
- Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493-2537 (2011a)
- (2011) Journal of Machine Learning Research , vol.12 , pp. 2493-2537
- Collobert, R.¹ Weston, J.² Bottou, L.³ Karlen, M.⁴ Kavukcuoglu, K.⁵ Kuksa, P.⁶

37
- 84872570173
- In: BigLearn NIPS Workshop
- Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: A matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011b)
- (2011) Torch7: A Matlab-like Environment For Machine Learning
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

38
- 80053436198
- Unsupervised models of images by spikeand-slab RBMs
- Courville, A., Bergstra, J., Bengio, Y.: Unsupervised models of images by spikeand-slab RBMs. In: ICML 2011 (2011)
- (2011) ICML 2011
- Courville, A.¹ Bergstra, J.² Bengio, Y.³

39
- 84872545161
- Sampled reconstruction for large-scale learning of embeddings
- Dauphin, Y., Glorot, X., Bengio, Y.: Sampled reconstruction for large-scale learning of embeddings. In: Proc. ICML 2011 (2011)
- (2011) Proc. ICML 2011
- Dauphin, Y.¹ Glorot, X.² Bengio, Y.³

40
- 84989525001
- Indexing by latent semantic analysis
- Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Information Science 41(6), 391-407 (1990)
- (1990) J. Am. Soc. Information Science , vol.41 , Issue.6 , pp. 391-407
- Deerwester, S.¹ Dumais, S.T.² Furnas, G.W.³ Landauer, T.K.⁴ Harshman, R.⁵

41
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research (2011)
- (2011) Journal of Machine Learning Research
- Duchi, J.¹ Hazan, E.² Singer, Y.³

42
- 0027636611
- Learning and development in neural networks: The importance of starting small
- Elman, J.L.: Learning and development in neural networks: The importance of starting small. Cognition 48, 781-799 (1993)
- (1993) Cognition , vol.48 , pp. 781-799
- Elman, J.L.¹

43
- 84872580046
- Understanding representations learned in deep architectures
- Erhan, D., Courville, A., Bengio, Y.: Understanding representations learned in deep architectures. Technical Report 1355, Universite de Montreal/DIRO (2010a)
- (2010) Technical Report 1355 Universite De MontrealDIRO
- Erhan, D.¹ Courville, A.² Bengio, Y.³

44
- 77949522811
- Why does unsupervised pre-training help deep learning?
- Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Machine Learning Res. 11, 625-660 (2010b)
- (2010) J. Machine Learning Res. , vol.11 , pp. 625-660
- Erhan, D.¹ Bengio, Y.² Courville, A.³ Manzagol, P.-A.⁴ Vincent, P.⁵ Bengio, S.⁶

45
- 0032165969
- A general framework for adaptive processing of data structures
- PII S1045922798061906
- Frasconi, P., Gori, M., Sperduti, A.: A general framework for adaptive processing of data structures. IEEE Transactions on Neural Networks 9(5), 768-786 (1998) (Pubitemid 128743645)
- (1998) IEEE Transactions on Neural Networks , vol.9 , Issue.5 , pp. 768-786
- Frasconi, P.¹ Gori, M.² Sperduti, A.³

46
- 0001942829
- Neural networks and the bias/variance dilemma
- Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4(1), 1-58 (1992)
- (1992) Neural Computation , vol.4 , Issue.1 , pp. 1-58
- Geman, S.¹ Bienenstock, E.² Doursat, R.³

47
- 34547980383
- MIT Press
- Getoor, L., Taskar, B.: Introduction to Statistical Relational Learning. MIT Press (2006)
- (2006) Introduction to Statistical Relational Learning
- Getoor, L.¹ Taskar, B.²

48
- 84862277874
- Understanding the difficulty of training deep feedforward neural networks
- Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS 2010, pp. 249-256 (2010)
- (2010) AISTATS 2010 , pp. 249-256
- Glorot, X.¹ Bengio, Y.²

49
- 84872578524
- Deep sparse rectifier neural networks
- Glorot, X., Bordes, A., Bengio, Y. (2011a). Deep sparse rectifier neural networks. In: AISTATS 2011 (2011)
- (2011) AISTATS 2011 , Issue.2011
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

50
- 80053443013
- Domain adaptation for large-scale sentiment classification: A deep learning approach
- Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: ICML 2011 (2011b)
- (2011) ICML 2011
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

51
- 84860644702
- Measuring invariances in deep networks
- Goodfellow, I., Le, Q., Saxe, A., Ng, A.: Measuring invariances in deep networks. In: NIPS 2009, pp. 646-654 (2009)
- (2009) NIPS 2009 , pp. 646-654
- Goodfellow I. Le, Q.¹ Saxe, A.² Ng, A.³

52
- 84872570804
- Spike-and-slab sparse coding for unsupervised feature discovery
- Goodfellow, I., Courville, A., Bengio, Y.: Spike-and-slab sparse coding for unsupervised feature discovery. In: NIPS Workshop on Challenges in Learning Hierarchical Models (2011)
- (2011) NIPS Workshop on Challenges in Learning Hierarchical Models
- Goodfellow, I.¹ Courville, A.² Bengio, Y.³

53
- 77956543367
- ICML
- Graepel, T., Candela, J.Q., Borchert, T., Herbrich, R.: Web-scale Bayesian clickthrough rate prediction for sponsored search advertising in microsoft's bing search engine. In: ICML (2010)
- (2010) Web-scale Bayesian clickthrough rate prediction for sponsored search advertising in microsoft's bing search engine
- Graepel, T.¹ Candela, J.Q.² Borchert, T.³ Herbrich, R.⁴

54
- 0002810605
- Almost optimal lower bounds for small depth circuits
- Hastad, J.: Almost optimal lower bounds for small depth circuits. In: STOC 1986, pp. 6-20 (1986)
- (1986) STOC 1986 , pp. 6-20
- Hastad, J.¹

55
- 0001295178
- On the power of small-depth threshold circuits
- Hastad, J., Goldmann, M.: On the power of small-depth threshold circuits. Computational Complexity 1, 113-129 (1991)
- (1991) Computational Complexity , vol.1 , pp. 113-129
- Hastad, J.¹ Goldmann, M.²

56
- 26444546351
- Ph.D. thesis University of Edinburgh
- Hinton, G.E.: Relaxation and its role in vision. Ph.D. thesis, University of Edinburgh (1978)
- (1978) Relaxation and its role in vision
- Hinton, G.E.¹

57
- 0002623785
- Learning distributed representations of concepts
- Hinton, G.E.: Learning distributed representations of concepts. In: Proc. 8th Annual Conf. Cog. Sc. Society, pp. 1-12 (1986)
- (1986) Proc. 8th Annual Conf. Cog. Sc. Society , pp. 1-12
- Hinton, G.E.¹

58
- 0024732792
- Connectionist learning procedures
- Hinton, G.E.: Connectionist learning procedures. Artificial Intelligence 40, 185-234 (1989)
- (1989) Artificial Intelligence , vol.40 , pp. 185-234
- Hinton, G.E.¹

59
- 78650474133
- A practical guide to training restricted Boltzmann machines
- University of Toronto
- Hinton, G.E.: A practical guide to training restricted Boltzmann machines. Technical Report UTML TR 2010-003, Department of Computer Science, University of Toronto (2010)
- (2010) Technical Report UTML TR 2010-2013, Department of Computer Science
- Hinton, G.E.¹

60
- 84872506495
- Montavon, G., Orr, G.B., Muller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS Springer, Heidelberg
- Hinton, G.E.: A Practical Guide to Training Restricted Boltzmann Machines. In: Montavon, G., Orr, G.B., Muller, K.-R. (eds.) NN: Tricks of the Trade, 2nd edn. LNCS, vol. 7700, pp. 599-619. Springer, Heidelberg (2012)
- (2012) A Practical Guide to Training Restricted Boltzmann Machines , vol.7700 , pp. 599-619
- Hinton, G.E.¹

61
- 33745805403
- A fast learning algorithm for deep belief nets
- DOI 10.1162/neco.2006.18.7.1527
- Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527-1554 (2006) (Pubitemid 44024729)
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

62
- 73649133736
- Ph.D. thesis University of British Columbia
- Hutter, F.: Automated Configuration of Algorithms for Solving Hard Computational Problems. Ph.D. thesis, University of British Columbia (2009)
- (2009) Automated Configuration of Algorithms for Solving Hard Computational Problems
- Hutter, F.¹

63
- 84868554032
- Coello Coello, C.A. (ed.) LION 5. LNCS Springer, Heidelberg
- Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Coello Coello, C.A. (ed.) LION 5. LNCS, vol. 6683, pp. 507-523. Springer, Heidelberg (2011)
- (2011) Sequential model-based optimization for general algorithm configuration , vol.6683 , pp. 507-523
- Hutter, F.¹ Hoos, H.H.² Leyton-Brown, K.³

64
- 77953183471
- In: ICCV
- Jarrett, K., Kavukcuoglu, K., Ranzato, M., LeCun, Y.: What is the best multistage architecture for object recognition? In: ICCV (2009)
- (2009) What is the best multistage architecture for object recognition?
- Jarrett, K.¹ Kavukcuoglu, K.² Ranzato, M.³ LeCun, Y.⁴

65
- 70450177775
- Learning invariant features through topographic filter maps
- Kavukcuoglu, K., Ranzato, M.-A., Fergus, R., LeCun, Y.: Learning invariant features through topographic filter maps. In: CVPR 2009 (2009)
- (2009) CVPR 2009
- Kavukcuoglu, K.¹ Ranzato, M.-A.² Fergus, R.³ LeCun, Y.⁴

66
- 59649113160
- Flexible shaping: How learning in small steps helps
- Krueger, K.A., Dayan, P.: Flexible shaping: How learning in small steps helps. Cognition 110, 380-394 (2009)
- (2009) Cognition , vol.110 , pp. 380-394
- Krueger, K.A.¹ Dayan, P.²

67
- 84872555515
- Important gains from supervised fine-tuning of deep architectures on large labeled sets
- Lamblin, P., Bengio, Y.: Important gains from supervised fine-tuning of deep architectures on large labeled sets. In: NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop (2010)
- (2010) NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop
- Lamblin, P.¹ Bengio, Y.²

68
- 0003966401
- The development of the time-delay neural network architecture for speech recognition
- Lang, K.J., Hinton, G.E.: The development of the time-delay neural network architecture for speech recognition. Technical Report CMU-CS-88-152, Carnegie-Mellon University (1988)
- (1988) Technical Report CMU-CS-88-152, Carnegie-Mellon University
- Lang, K.J.¹ Hinton, G.E.²

69
- 56449110012
- Classification using discriminative restricted Boltzmann machines
- Larochelle, H., Bengio, Y.: Classification using discriminative restricted Boltzmann machines. In: ICML 2008 (2008)
- (2008) ICML 2008
- Larochelle, H.¹ Bengio, Y.²

70
- 59449087310
- Exploring strategies for training deep neural networks
- Larochelle, H., Bengio, Y., Louradour, J., Lamblin, P.: Exploring strategies for training deep neural networks. J. Machine Learning Res. 10, 1-40 (2009)
- (2009) J. Machine Learning Res. , vol.10 , pp. 1-40
- Larochelle, H.¹ Bengio, Y.² Louradour, J.³ Lamblin, P.⁴

71
- 85161972005
- Tiled convolutional neural networks
- Le, Q., Ngiam, J., Chen, Z., Hao Chia, D.J., Koh, P.W., Ng, A.: Tiled convolutional neural networks. In: NIPS 2010 (2010)
- (2010) NIPS 2010
- Le, Q.¹ Ngiam, J.² Chen, Z.³ Hao Chia, D.J.⁴ Koh, P.W.⁵ Ng, A.⁶

72
- 80053437034
- On optimization methods for deep learning
- Le, Q., Ngiam, J., Coates, A., Lahiri, A., Prochnow, B., Ng, A.: On optimization methods for deep learning. In: ICML 2011 (2011)
- (2011) ICML 2011
- Le, Q.¹ Ngiam, J.² Coates, A.³ Lahiri, A.⁴ Prochnow, B.⁵ Ng, A.⁶

73
- 85162000799
- Topmoumoute online natural gradient algorithm
- Le Roux, N., Manzagol, P.-A., Bengio, Y.: Topmoumoute online natural gradient algorithm. In: NIPS 2007 (2008)
- (2008) NIPS 2007
- Le Roux, N.¹ Manzagol, P.-A.² Bengio, Y.³

74
- 85007276343
- Improving first and second-order methods by modeling uncertainty
- MIT Press
- Le Roux, N., Bengio, Y., Fitzgibbon, A.: Improving first and second-order methods by modeling uncertainty. In: Optimization for Machine Learning. MIT Press (2011)
- (2011) Optimization for Machine Learning
- Le Roux, N.¹ Bengio, Y.² Fitzgibbon, A.³

75
- 84872514178
- A stochastic gradient method with an exponential convergence rate for strongly-convex optimization with finite training sets
- arXiv: 1202.6258
- Le Roux, N., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for strongly-convex optimization with finite training sets. Technical report, arXiv:1202.6258 (2012)
- (2012) Technical report
- Le Roux, N.¹ Schmidt, M.² Bach, F.³

76
- 0039165044
- Ph.D. thesis Universite de Paris VI
- LeCun, Y.: Modeles connexionistes de l'apprentissage. Ph.D. thesis, Universite de Paris VI (1987)
- (1987) Modeles connexionistes de l'apprentissage
- LeCun, Y.¹

77
- 0342898730
- Generalization and network design strategies
- University of Toronto
- LeCun, Y.: Generalization and network design strategies. Technical Report CRGTR-89-4, University of Toronto (1989)
- (1989) Technical Report CRGTR-89-94
- LeCun, Y.¹

78
- 0000359337
- Backpropagation applied to handwritten zip code recognition
- LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Computation 1(4), 541-551 (1989)
- (1989) Neural Computation , vol.1 , Issue.4 , pp. 541-551
- LeCun, Y.¹ Boser, B.² Denker, J.S.³ Henderson, D.⁴ Howard, R.E.⁵ Hubbard, W.⁶ Jackel, L.D.⁷

79
- 0001857994
- Orr, G.B., Muller, K.-R. (eds.) NIPS-WS 1996. LNCS Springer, Heidelberg
- LeCun, Y.A., Bottou, L., Orr, G.B., Muller, K.-R.: Efficient BackProp. In: Orr, G.B., Muller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 9-50. Springer, Heidelberg (1998a)
- (1998) Efficient BackProp , vol.1524 , pp. 9-50
- LeCun, Y.A.¹ Bottou, L.² Orr, G.B.³ Muller, K.-R.⁴

80
- 0032203257
- LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient based learning applied to document recognition. IEEE 86(11), 2278-2324 (1998b)
- (1998) Gradient based learning applied to document recognition. IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

81
- 85161980001
- Sparse deep belief net model for visual area V2
- Lee, H., Ekanadham, C., Ng, A. (2008). Sparse deep belief net model for visual area V2. In: NIPS 2007 (2007)
- (2008) NIPS 2007 , Issue.2007
- Lee, H.¹ Ekanadham, C.² Ng, A.³

82
- 71149119164
- Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
- Lee, H., Grosse, R., Ranganath, R., Ng, A.Y.: Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In: ICML 2009 (2009)
- (2009) ICML 2009
- Lee, H.¹ Grosse, R.² Ranganath, R.³ Ng, A.Y.⁴

83
- 77956541496
- Deep learning via Hessian-free optimization
- Martens, J.: Deep learning via Hessian-free optimization. In: ICML 2010, pp. 735-742 (2010)
- (2010) ICML 2010 , pp. 735-742
- Martens, J.¹

84
- 84872561833
- Unsupervised and transfer learning challenge: A deep learning approach
- JMLR W&CP
- Mesnil, G., Dauphin, Y., Glorot, X., Rifai, S., Bengio, Y., Goodfellow, I., Lavoie, E., Muller, X., Desjardins, G., Warde-Farley, D., Vincent, P., Courville, A., Bergstra, J.: Unsupervised and transfer learning challenge: A deep learning approach. In: Proc. Unsupervised and Transfer Learning, JMLR W&CP, vol. 7 (2011)
- (2011) Proc. Unsupervised and Transfer Learning , vol.7
- Mesnil, G.¹ Dauphin, Y.² Glorot, X.³ Rifai, S.⁴ Bengio, Y.⁵ Goodfellow, I.⁶ Lavoie, E.⁷ Muller, X.⁸ Desjardins, G.⁹ Warde-Farley, D.¹⁰ Vincent, P.¹¹ Courville, A.¹² Bergstra, J.¹³

85
- 84872588487
- Deep Boltzmann machines as feedforward hierarchies
- Montavon, G., Braun, M.L., Muller, K.-R.: Deep Boltzmann machines as feedforward hierarchies. In: AISTATS 2012 (2012)
- (2012) AISTATS , Issue.2012
- Montavon, G.¹ Braun, M.L.² Muller, K.-R.³

86
- 77956509090
- Rectified linear units improve restricted Boltzmann machines
- Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML 2010 (2010)
- (2010) ICML 2010
- Nair, V.¹ Hinton, G.E.²

87
- 0003692801
- Wiley
- Nemirovski, A., Yudin, D.: Problem complexity and method efficiency in optimization. Wiley (1983)
- (1983) Problem complexity and method efficiency in optimization
- Nemirovski, A.¹ Yudin, D.²

88
- 65249121279
- Primal-dual subgradient methods for convex problems
- Nesterov, Y.: Primal-dual subgradient methods for convex problems. Mathematical Programming 120(1), 221-259 (2009)
- (2009) Mathematical Programming , vol.120 , Issue.1 , pp. 221-259
- Nesterov, Y.¹

89
- 0030779611
- Sparse coding with an overcomplete basis set: A strategy employed by V1?
- DOI 10.1016/S0042-6989(97)00169-7, PII S0042698997001697
- Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Research 37, 3311-3325 (1997) (Pubitemid 27493805)
- (1997) Vision Research , vol.37 , Issue.23 , pp. 3311-3325
- Olshausen, B.A.¹ Field, D.J.²

90
- 0000255539
- Fast exact multiplication by the Hessian
- Pearlmutter, B.: Fast exact multiplication by the Hessian. Neural Computation 6(1), 147-160 (1994)
- (1994) Neural Computation , vol.6 , Issue.1 , pp. 147-160
- Pearlmutter, B.¹

91
- 73449129720
- A high-throughput screening approach to discovering good forms of biologically inspired visual representation
- Pinto, N., Doukhan, D., DiCarlo, J.J., Cox, D.D.: A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput. Biol. 5(11), e1000579 (2009)
- (2009) PLoS Comput. Biol. , vol.5 , Issue.11
- Pinto, N.¹ Doukhan, D.² DiCarlo, J.J.³ Cox, D.D.⁴

92
- 0025519291
- Recursive distributed representations
- Pollack, J.B.: Recursive distributed representations. Artificial Intelligence 46(1), 77-105 (1990)
- (1990) Artificial Intelligence , vol.46 , Issue.1 , pp. 77-105
- Pollack, J.B.¹

93
- 0026899240
- Acceleration of stochastic approximation by averaging
- Polyak, B., Juditsky, A.: Acceleration of stochastic approximation by averaging. SIAM J. Control and Optimization 30(4), 838-855 (1992)
- (1992) SIAM J. Control and Optimization , vol.30 , Issue.4 , pp. 838-855
- Polyak, B.¹ Juditsky, A.²

94
- 84893414160
- Deep learning made easier by linear transformations in perceptrons 2012
- Raiko, T., Valpola, H., LeCun, Y. (2012). Deep learning made easier by linear transformations in perceptrons. In: AISTATS 2012 (2012)
- (2012) AISTATS 2012
- Raiko, T.¹ Valpola, H.² LeCun, Y.³

95
- 84864069017
- Efficient learning of sparse representations with an energy-based model
- Ranzato, M., Poultney, C., Chopra, S., LeCun, Y.: Efficient learning of sparse representations with an energy-based model. In: NIPS 2006 (2007)
- (2007) NIPS 2006
- Ranzato, M.¹ Poultney, C.² Chopra, S.³ LeCun, Y.⁴

96
- 85161966246
- Sparse feature learning for deep belief networks
- Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) (NIPS 2007) MIT Press, Cambridge
- Ranzato, M., Boureau, Y.-L., LeCun, Y.: Sparse feature learning for deep belief networks. In: Platt, J., Koller, D., Singer, Y., Roweis, S. (eds.) Advances in Neural Information Processing Systems (NIPS 2007), vol. 20, pp. 1185-1192. MIT Press, Cambridge (2008a)
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1185-1192
- Ranzato, M.¹ Boureau, Y.-L.² LeCun, Y.³

97
- 85161966246
- Sparse feature learning for deep belief networks
- Ranzato, M., Boureau, Y., LeCun, Y.: Sparse feature learning for deep belief networks. In: NIPS 2007 (2008b)
- (2008) NIPS 2007
- Ranzato, M.¹ Boureau, Y.² LeCun, Y.³

98
- 32044466073
- Markov logic networks
- DOI 10.1007/s10994-006-5833-1
- Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62, 107-136 (2006) (Pubitemid 43202307)
- (2006) Machine Learning , vol.62 , Issue.1-2 SPEC. ISS. , pp. 107-136
- Richardson, M.¹ Domingos, P.²

99
- 80053460450
- Contracting autoencoders: Explicit invariance during feature extraction
- Rifai, S., Vincent, P., Muller, X., Glorot, X., Bengio, Y.: Contracting autoencoders: Explicit invariance during feature extraction. In: ICML 2011 (2011a)
- (2011) ICML 2011
- Rifai, S.¹ Vincent, P.² Muller, X.³ Glorot, X.⁴ Bengio, Y.⁵

100
- 85162427692
- The manifold tangent classifier
- Rifai, S., Dauphin, Y., Vincent, P., Bengio, Y., Muller, X.: The manifold tangent classifier. In: NIPS 2011 (2011b)
- (2011) NIPS 2011
- Rifai, S.¹ Dauphin, Y.² Vincent, P.³ Bengio, Y.⁴ Muller, X.⁵

101
- 84867136416
- A generative process for sampling contractive auto-encoders
- Rifai, S., Bengio, Y., Dauphin, Y., Vincent, P.: A generative process for sampling contractive auto-encoders. In: ICML 2012 (2012)
- (2012) ICML 2012
- Rifai, S.¹ Bengio, Y.² Dauphin, Y.³ Vincent, P.⁴

102
- 0000016172
- A stochastic approximation method
- Robbins, H., Monro, S.: A stochastic approximation method. Annals of Mathematical Statistics 22, 400-407 (1951)
- (1951) Annals of Mathematical Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

103
- 0022471098
- Learning representations by backpropagating errors
- Rumelhart, D.E., Hinton, G.E.,Williams, R.J.: Learning representations by backpropagating errors. Nature 323, 533-536 (1986)
- (1986) Nature , vol.323 , pp. 533-536
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

104
- 84862286946
- Deep Boltzmann machines
- Salakhutdinov, R., Hinton, G.: Deep Boltzmann machines. In: AISTATS 2009 (2009)
- (2009) AISTATS 2009
- Salakhutdinov, R.¹ Hinton, G.²

105
- 80053448548
- On random weights and unsupervised feature learning
- Saxe, A.M., Koh, P.W., Chen, Z., Bhand, M., Suresh, B., Ng, A.: On random weights and unsupervised feature learning. In: ICML 2011 (2011)
- (2011) ICML 2011
- Saxe, A.M.¹ Koh, P.W.² Chen, Z.³ Bhand, M.⁴ Suresh, B.⁵ Ng, A.⁶

106
- 84872539658
- No more pesky learning rates
- Schaul, T., Zhang, S., LeCun, Y.: No More Pesky Learning Rates. Technical report (2012)
- (2012) Technical report
- Schaul, T.¹ Zhang, S.² LeCun, Y.³

107
- 0038231917
- Orr, G.B., Muller, K.-R. (eds.) NIPS-WS 1996. LNCS Springer, Heidelberg
- Schraudolph, N.N.: Centering Neural Network Gradient Factors. In: Orr, G.B., Muller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, pp. 207-548. Springer, Heidelberg (1998)
- (1998) Centering Neural Network Gradient Factors , vol.1524 , pp. 207-548
- Schraudolph, N.N.¹

108
- 80053438267
- Parsing natural scenes and natural language with recursive neural networks
- Socher, R., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: ICML 2011 (2011)
- (2011) ICML 2011
- Socher, R.¹ Manning, C.² Ng, A.Y.³

109
- 79952760123
- Parameter screening and optimisation for ILP using designed experiments
- Srinivasan, A., Ramakrishnan, G.: Parameter screening and optimisation for ILP using designed experiments. Journal of Machine Learning Research 12, 627-662 (2011)
- (2011) Journal of Machine Learning Research , vol.12 , pp. 627-662
- Srinivasan, A.¹ Ramakrishnan, G.²

110
- 77952681438
- A tutorial on stochastic approximation algorithms for training restricted boltzmann machines and deep belief nets
- Swersky, K., Chen, B., Marlin, B., de Freitas, N.: A tutorial on stochastic approximation algorithms for training restricted boltzmann machines and deep belief nets. In: Information Theory and Applications Workshop (2010)
- (2010) Information Theory and Applications Workshop
- Swersky, K.¹ Chen, B.² Marlin, B.³ De Freitas, N.⁴

111
- 0034704229
- A global geometric framework for nonlinear dimensionality reduction
- DOI 10.1126/science.290.5500.2319
- Tenenbaum, J., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319-2323 (2000) (Pubitemid 32041577)
- (2000) Science , vol.290 , Issue.5500 , pp. 2319-2323
- Tenenbaum, J.B.¹ De Silva, V.² Langford, J.C.³

112
- 71149084943
- Using fast weights to improve persistent contrastive divergence
- Tieleman, T., Hinton, G.: Using fast weights to improve persistent contrastive divergence. In: ICML 2009 (2009)
- (2009) ICML 2009
- Tieleman, T.¹ Hinton, G.²

113
- 57249084011
- Visualizing data using t-sne
- van der Maaten, L., Hinton, G.E.: Visualizing data using t-sne. J. Machine Learning Res. 9 (2008)
- (2008) J. Machine Learning Res. , vol.9
- Van Der Maaten, L.¹ Hinton, G.E.²

114
- 79959575293
- A connection between score matching and denoising autoencoders
- Vincent, P.: A connection between score matching and denoising autoencoders. Neural Computation 23(7) (2011)
- (2011) Neural Computation , vol.23 , Issue.7
- Vincent, P.¹

115
- 56449089103
- Extracting and composing robust features with denoising autoencoders
- Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: ICML 2008 (2008)
- (2008) ICML 2008
- Vincent, P.¹ Larochelle, H.² Bengio, Y.³ Manzagol, P.-A.⁴

116
- 79551480483
- Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
- Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.-A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. J. Machine Learning Res. 11 (2010)
- (2010) J. Machine Learning Res. , vol.11
- Vincent, P.¹ Larochelle, H.² Lajoie, I.³ Bengio, Y.⁴ Manzagol, P.-A.⁵

117
- 56449119888
- Deep learning via semi-supervised embedding
- Weston, J., Ratle, F., Collobert, R.: Deep learning via semi-supervised embedding. In: ICML 2008 (2008)
- (2008) ICML 2008
- Weston, J.¹ Ratle, F.² Collobert, R.³

118
- 84867117593
- Wsabie: Scaling up to large vocabulary image annotation
- IJCAI
- Weston, J., Bengio, S., Usunier, N.: Wsabie: Scaling up to large vocabulary image annotation. In: Proceedings of the International Joint Conference on Artificial Intelligence, IJCAI (2011)
- (2011) Proceedings of the International Joint Conference on Artificial Intelligence
- Weston, J.¹ Bengio, S.² Usunier, N.³

119
- 0036546660
- Slow feature analysis: Unsupervised learning of invariances
- Wiskott, L., Sejnowski, T.J.: Slow feature analysis: Unsupervised learning of invariances. Neural Computation 14(4), 715-770 (2002)
- (2002) Neural Computation , vol.14 , Issue.4 , pp. 715-770
- Wiskott, L.¹ Sejnowski, T.J.²

120
- 84872523448
- Unsupervised learning of visual invariance with temporal coherence
- Zou, W.Y., Ng, A.Y., Yu, K.: Unsupervised learning of visual invariance with temporal coherence. In: NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning (2011)
- (2011) NIPS 2011 Workshop on Deep Learning and Unsupervised Feature Learning
- Zou, W.Y.¹ Ng, A.Y.² Yu, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.