SCOPUS 정보 검색 플랫폼

Proceedings of COMPSTAT 2010 - 19th International Conference on Computational Statistics, Keynote, Invited and Contributed Papers

Volumn , Issue , 2010, Pages 177-186

Large-scale machine learning with stochastic gradient descent

(1) Bottou, Léon a

a NEC LABORATORIES AMERICA (United States)

Author keywords

Efficiency; Online learning; Stochastic gradient descent

Indexed keywords

EFFICIENCY; GRADIENT METHODS; MACHINE LEARNING; OPTIMIZATION;

ASYMPTOTICALLY EFFICIENT; LARGE-SCALE LEARNING PROBLEM; LARGE-SCALE MACHINE LEARNING; LARGE-SCALE PROBLEM; ONLINE LEARNING; OPTIMIZATION ALGORITHMS; STATISTICAL MACHINE LEARNING; STOCHASTIC GRADIENT DESCENT;

STOCHASTIC SYSTEMS;

EID: 84904136037 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1007/978-3-7908-2604-3_16 Document Type: Conference Paper

Times cited : (5711)

References (24)

1
- 68949096711
- SGD-QN: Careful quasi- newton stochastic gradient descent
- With Erratum (to appear
- BORDES, A., BOTTOU, L., and GALLINARI, P. (2009): SGD-QN: Careful Quasi- Newton Stochastic Gradient Descent. Journal of Machine Learning Research, 10:1737-1754. With Erratum (to appear).
- (2009) Journal of Machine Learning Research , vol.10 , pp. 1737-1754
- Bordes, A.¹ Bottou, L.² Gallinari, P.³

2
- 85162035281
- The tradeoffs of large scale learning
- BOTTOU, L. and BOUSQUET, O. (2008): The Tradeoffs of Large Scale Learning, In Advances in Neural Information Processing Systems, vol.20, 161-168.
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 161-168
- Bottou, L.¹ Bousquet, O.²

3
- 17444425307
- On-line learning for very large datasets
- BOTTOU, L. and LECUN, Y. (2004): On-line Learning for Very Large Datasets. Applied Stochastic Models in Business and Industry, 21(2):137-151.
- (2004) Applied Stochastic Models in Business and Industry , vol.21 , Issue.2 , pp. 137-151
- Bottou, L.¹ Lecun, Y.²

4
- 0142095807
- Thèse de doctorat, Ecole Polytechnique, Palaiseau, France
- BOUSQUET, O. (2002): Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms. Thèse de doctorat, Ecole Polytechnique, Palaiseau, France.
- (2002) Concentration Inequalities and Empirical Processes Theory Applied to the Analysis of Learning Algorithms
- Bousquet, O.¹

5
- 34249753618
- Support vector networks
- CORTES, C. and VAPNIK, V. N. (1995): Support Vector Networks, Machine Learning, 20:273-297.
- (1995) Machine Learning , vol.20 , pp. 273-297
- Cortes, C.¹ Vapnik, V.N.²

6
- 0004041275
- Prentice-Hall
- DENNIS, J. E. Jr., and SCHNABEL, R. B. (1983): Numerical Methods For Unconstrained Optimization and Nonlinear Equations. Prentice-Hall.
- (1983) Numerical Methods for Unconstrained Optimization and Nonlinear Equations
- Dennis Jr., J.E.¹ Schnabel, R.B.²

7
- 33749563073
- Training linear SVMs in linear time
- ACM Press
- JOACHIMS, T. (2006): Training Linear SVMs in Linear Time. In Proceedings of the 12th ACM SIGKDD, ACM Press.
- (2006) Proceedings of the 12th ACM SIGKDD
- Joachims, T.¹

8
- 0142192295
- Conditional random fields: Probabilistic models for segmenting and labeling sequence data
- Morgan Kaufman
- LAFFERTY, J. D., MCCALLUM, A., and PEREIRA, F. (2001): Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data. In Proceedings of ICML 2001, 282-289, Morgan Kaufman.
- (2001) Proceedings of ICML 2001 , pp. 282-289
- Lafferty, J.D.¹ McCallum, A.² Pereira, F.³

9
- 0032166052
- The importance of convexity in learning with squared loss
- LEE, W. S., BARTLETT, P. L., and WILLIAMSON, R. C. (1998): The Importance of Convexity in Learning with Squared Loss. IEEE Transactions on Information Theory, 44(5):1974-1980.
- (1998) IEEE Transactions on Information Theory , vol.44 , Issue.5 , pp. 1974-1980
- Lee, W.S.¹ Bartlett, P.L.² Williamson, R.C.³

10
- 84876811202
- RCV1: A new benchmark collection for text categorization research
- LEWIS, D. D., YANG, Y., ROSE, T. G., and LI, F. (2004): RCV1: A New Benchmark Collection for Text Categorization Research. Journal of Machine Learning Research, 5:361-397.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 361-397
- Lewis, D.D.¹ Yang, Y.² Rose, T.G.³ Li, F.⁴

11
- 34547982357
- Trust region newton methods for large-scale logistic regression
- ACM Press
- LIN, C. J., WENG, R. C., and KEERTHI, S. S. (2007): Trust region Newton methods for large-scale logistic regression. In Proceedings of ICML 2007, 561- 568, ACM Press.
- (2007) Proceedings of ICML 2007 , pp. 561-568
- Lin, C.J.¹ Weng, R.C.² Keerthi, S.S.³

12
- 0001457509
- Some methods for classification and analysis of multivariate observations
- University of California Press
- MACQUEEN, J. (1967): Some Methods for Classification and Analysis of Multivariate Observations. In Fifth Berkeley Symposium on Mathematics, Statistics, and Probabilities, vol.1, 281-297, University of California Press.
- (1967) Fifth Berkeley Symposium on Mathematics, Statistics, and Probabilities , vol.1 , pp. 281-297
- MacQueen, J.¹

13
- 0000595627
- Some applications of concentration inequalities to Statistics
- series 6
- MASSART, P. (2000): Some applications of concentration inequalities to Statistics, Annales de la Facult́e des Sciences de Toulouse, series 6, 9, (2):245-303.
- (2000) Annales de la Faculté des Sciences de Toulouse , vol.9 , Issue.2 , pp. 245-303
- Massart, P.¹

14
- 0001955526
- A statistical study of on-line learning
- Cambridge University Press
- MURATA, N. (1998): A Statistical Study of On-line Learning. In Online Learning and Neural Networks, Cambridge University Press.
- (1998) Online Learning and Neural Networks
- Murata, N.¹

15
- 0026899240
- Acceleration of stochastic approximation by averaging
- POLYAK, B. T. and JUDITSKY, A. B. (1992): Acceleration of stochastic approximation by averaging. SIAM J. Control and Optimization, 30(4):838-855.
- (1992) SIAM J. Control and Optimization , vol.30 , Issue.4 , pp. 838-855
- Polyak, B.T.¹ Juditsky, A.B.²

16
- 0003846541
- The perceptron: A perceiving and recognizing automaton
- ROSENBLATT, F. (1957): The Perceptron: A perceiving and recognizing automaton. Technical Report 85-460-1, Project PARA, Cornell Aeronautical Lab.
- (1957) Technical Report 85-460-1, Project PARA, Cornell Aeronautical Lab
- Rosenblatt, F.¹

17
- 0000646059
- Learning internal representations by error propagation
- Bradford Books
- RUMELHART, D. E., HINTON, G. E., and WILLIAMS, R. J. (1986): Learning internal representations by error propagation. In Parallel distributed processing: Explorations in the microstructure of cognition, vol.I, 318-362, Bradford Books.
- (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol.1 , pp. 318-362
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

18
- 56449110590
- SVM optimization: Inverse dependence on training set size
- ACM
- SHALEV-SHWARTZ, S. and SREBRO, N. (2008): SVM optimization: inverse dependence on training set size. In Proceedings of the ICML 2008, 928-935, ACM.
- (2008) Proceedings of the ICML 2008 , pp. 928-935
- Shalev-shwartz, S.¹ Srebro, N.²

19
- 0001287271
- Regression shrinkage and selection via the lasso
- Series B
- TIBSHIRANI, R. (1996): Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society, Series B, 58(1):267-288.
- (1996) Journal of the Royal Statistical Society , vol.58 , Issue.1 , pp. 267-288
- Tibshirani, R.¹

20
- 85109864082
- Introduction to the CoNLL-2000 shared task: Chunking
- TJONG Kim Sang E. F., and BUCHHOLZ, S. (2000): Introduction to the CoNLL-2000 Shared Task: Chunking. In Proceedings of CoNLL-2000, 127-132.
- (2000) Proceedings of CoNLL-2000 , pp. 127-132
- Tjong Kim Sang, E.F.¹ Buchholz, S.²

21
- 3142725508
- Optimal aggregation of classifiers in statistical learning
- TSYBAKOV, A. B. (2004): Optimal aggregation of classifiers in statistical learning, Annals of Statististics, 32(1).
- (2004) Annals of Statististics , vol.32 , Issue.1
- Tsybakov, A.B.¹

22
- 0001024505
- On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities
- VAPNIK, V. N. and CHERVONENKIS, A. YA. (1971): On the Uniform Convergence of Relative Frequencies of Events to Their Probabilities. Theory of Probability and its Applications, 16(2):264-280.
- (1971) Theory of Probability and its Applications , vol.16 , Issue.2 , pp. 264-280
- Vapnik, V.N.¹ Chervonenkis, A.Ya.²

23
- 0002278965
- Adaptive switching circuits
- WIDROW, B. and HOFF, M. E. (1960): Adaptive switching circuits. IRE WESCON Conv. Record, Part 4., 96-104.
- (1960) IRE WESCON Conv. Record , Issue.PART 4 , pp. 96-104
- Widrow, B.¹ Hoff, M.E.²

24
- 77956944936
- Towards optimal one pass large scale learning with averaged stochastic gradient descent
- to appear
- XU, W. (2010): Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent. Journal of Machine Learning Research (to appear).
- (2010) Journal of Machine Learning Research
- Xu, W.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.