SCOPUS 정보 검색 플랫폼

Volumn 9, Issue , 2010, Pages 249-256

Understanding the difficulty of training deep feedforward neural networks

Author keywords

[No Author keywords available]

Indexed keywords

FASTER CONVERGENCE; GRADIENT DESCENT; HIDDEN LAYERS; JACOBIANS; MEAN VALUES; NON-LINEAR ACTIVATION; NON-LINEARITY; SINGULAR VALUES;

ALGORITHMS; ARTIFICIAL INTELLIGENCE; FEEDFORWARD NEURAL NETWORKS; MULTILAYER NEURAL NETWORKS;

CHEMICAL ACTIVATION;

EID: 84862277874 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Conference Paper

Times cited : (17430)

References (20)

3
- 0028392483
- Learning long-term dependencies with gradient descent is difficult
- Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5, 157-166.
- (1994) IEEE Transactions on Neural Networks , vol.5 , pp. 157-166
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

4
- 84862279895
- Département d'Informatique et de Recherche Opérationnelle, Université de Montréal
- Bergstra, J., Desjardins, G., Lamblin, P., & Bengio, Y. (2009). Quadratic polynomials learn better image features (Technical Report 1337). Département d'Informatique et de Recherche Opérationnelle, Université de Montréal.
- (2009) Quadratic Polynomials Learn Better Image Features (Technical Report 1337)
- Bergstra, J.¹ Desjardins, G.² Lamblin, P.³ Bengio, Y.⁴

5
- 77953344311
- Doctoral dissertation, The Robotics Institute, Carnegie Mellon University
- Bradley, D. (2009). Learning in modular systems. Doctoral dissertation, The Robotics Institute, Carnegie Mellon University.
- (2009) Learning in Modular Systems
- Bradley, D.¹

6
- 56449095373
- A unified architecture for natural language processing: Deep neural networks with multitask learning
- Collobert, R., &Weston, J. (2008). A unified architecture for natural language processing: Deep neural networks with multitask learning. ICML 2008.
- (2008) ICML 2008
- Collobert, R.¹ Weston, J.²

7
- 79961226155
- The difficulty of training deep architectures and the effect of unsupervised pre-training
- Erhan, D., Manzagol, P.-A., Bengio, Y., Bengio, S., & Vincent, P. (2009). The difficulty of training deep architectures and the effect of unsupervised pre-training. AISTATS'2009 (pp. 153-160).
- (2009) AISTATS'2009 , pp. 153-160
- Erhan, D.¹ Manzagol, P.-A.² Bengio, Y.³ Bengio, S.⁴ Vincent, P.⁵

8
- 33745805403
- A fast learning algorithm for deep belief nets
- Hinton, G. E., Osindero, S., & Teh, Y. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18, 1527-1554.
- (2006) Neural Computation , vol.18 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.³

9
- 77956002520
- University of Toronto
- Krizhevsky, A., & Hinton, G. (2009). Learning multiple layers of features from tiny images (Technical Report). University of Toronto.
- (2009) Learning Multiple Layers of Features from Tiny Images (Technical Report)
- Krizhevsky, A.¹ Hinton, G.²

10
- 59449087310
- Exploring strategies for training deep neural networks
- Larochelle, H., Bengio, Y., Louradour, J., & Lamblin, P. (2009). Exploring strategies for training deep neural networks. The Journal of Machine Learning Research, 10, 1-40.
- (2009) The Journal of Machine Learning Research , vol.10 , pp. 1-40
- Larochelle, H.¹ Bengio, Y.² Louradour, J.³ Lamblin, P.⁴

11
- 50249093806
- An empirical evaluation of deep architectures on problems with many factors of variation
- Larochelle, H., Erhan, D., Courville, A., Bergstra, J., & Bengio, Y. (2007). An empirical evaluation of deep architectures on problems with many factors of variation. ICML 2007.
- (2007) ICML 2007
- Larochelle, H.¹ Erhan, D.² Courville, A.³ Bergstra, J.⁴ Bengio, Y.⁵

12
- 0032203257
- Gradient-based learning applied to document recognition
- LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998a). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86, 2278-2324.
- (1998) Proceedings of the IEEE , vol.86 , pp. 2278-2324
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

14
- 84858779990
- A scalable hierarchical distributed language model
- Mnih, A., & Hinton, G. E. (2009). A scalable hierarchical distributed language model. NIPS 21 (pp. 1081-1088).
- (2009) NIPS , vol.21 , pp. 1081-1088
- Mnih, A.¹ Hinton, G.E.²

15
- 84864069017
- Efficient learning of sparse representations with an energy-based model
- Ranzato, M., Poultney, C., Chopra, S., & LeCun, Y. (2007). Efficient learning of sparse representations with an energy-based model. NIPS 19.
- (2007) NIPS , vol.19
- Ranzato, M.¹ Poultney, C.² Chopra, S.³ Lecun, Y.⁴

16
- 0022471098
- Learning representations by back-propagating errors
- Rumelhart, D. E., Hinton, G. E., &Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323, 533-536.
- (1986) Nature , vol.323 , pp. 533-536
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

17
- 0001336749
- Accelerated learning in layered neural networks
- Solla, S. A., Levin, E., & Fleisher, M. (1988). Accelerated learning in layered neural networks. Complex Systems, 2, 625-639.
- (1988) Complex Systems , vol.2 , pp. 625-639
- Solla, S.A.¹ Levin, E.² Fleisher, M.³

18
- 56449089103
- Extracting and composing robust features with denoising autoencoders
- Vincent, P., Larochelle, H., Bengio, Y., & Manzagol, P.-A. (2008). Extracting and composing robust features with denoising autoencoders. ICML 2008.
- (2008) ICML 2008
- Vincent, P.¹ Larochelle, H.² Bengio, Y.³ Manzagol, P.-A.⁴

20
- 57149122432
- Unsupervised learning of probabilistic grammar-Markov models for object categories
- Zhu, L., Chen, Y., & Yuille, A. (2009). Unsupervised learning of probabilistic grammar-markov models for object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 114-128.
- (2009) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.31 , pp. 114-128
- Zhu, L.¹ Chen, Y.² Yuille, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.