SCOPUS 정보 검색 플랫폼

Volumn 17, Issue , 2016, Pages

On the influence of momentum acceleration on online learning

(3) Yuan, Kun a Ying, Bicheng a Sayed, Ali H a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

Convergence Rate; Heavy ball Method; Mean Square Error Analysis; Momentum Acceleration; Nesterov's Method; Online Learning; Stochastic Gradient

Indexed keywords

GRADIENT METHODS; MEAN SQUARE ERROR; MOMENTUM; OPTIMIZATION; RISK ASSESSMENT; STOCHASTIC SYSTEMS;

CONVERGENCE RATES; HEAVY-BALL METHOD; NESTEROV'S METHODS; ONLINE LEARNING; STOCHASTIC GRADIENT;

E-LEARNING;

EID: 84995388336 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Article

Times cited : (72)

References (58)

1
- 0033116881
- Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance
- N. O. Attoh-Okine. Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance. Advances in Engineering Software, 30(4):291-302, 1999.
- (1999) Advances in Engineering Software , vol.30 , Issue.4 , pp. 291-302
- Attoh-Okine, N.O.¹

2
- 85014561619
- A fast iterative shrinkage-thresholding algorithm for linear inverse problems
- A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183-202, 2009.
- (2009) SIAM Journal on Imaging Sciences , vol.2 , Issue.1 , pp. 183-202
- Beck, A.¹ Teboulle, M.²

3
- 0003438217
- 2nd Edition, Marcel Dekker
- M. Bellanger. Adaptive Digital Filters and Signal Analysis. 2nd Edition, Marcel Dekker, 2001.
- (2001) Adaptive Digital Filters and Signal Analysis
- Bellanger, M.¹

4
- 0003360318
- Nonlinear programming
- D. P. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
- (1999) Athena Scientific
- Bertsekas, D.P.¹

5
- 84904136037
- Large-scale machine learning with stochastic gradient descent
- Springer, Paris, France
- L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proc. International Conference on Computational Statistics, pages 177-186. Springer, Paris, France, 2010.
- (2010) Proc. International Conference on Computational Statistics , pp. 177-186
- Bottou, L.¹

6
- 85162035281
- The tradeoffs of large scale learning
- Vancouver, Canada
- O. Bousquet and L. Bottou. The tradeoffs of large scale learning. In Proc. Advances in Neural Information Processing Systems, pages 161-168, Vancouver, Canada, 2008.
- (2008) Proc. Advances in Neural Information Processing Systems , pp. 161-168
- Bousquet, O.¹ Bottou, L.²

7
- 0004055894
- Cambridge University Press
- S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, 2004.
- (2004) Convex Optimization
- Boyd, S.¹ Vandenberghe, L.²

8
- 85032752418
- Convex optimization for big data: Scalable randomized, and parallel algorithms for big data analytics
- V. Cevher, S. Becker, and M. Schmidt. Convex optimization for big data: Scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Processing Magazine, 31(5):32-43, 2014.
- (2014) IEEE Signal Processing Magazine , vol.31 , Issue.5 , pp. 32-43
- Cevher, V.¹ Becker, S.² Schmidt, M.³

9
- 57249107300
- Smooth optimization with approximate gradient
- A. d'Aspremont. Smooth optimization with approximate gradient. SIAM Journal on Optimization, 19(3):1171-1183, 2008.
- (2008) SIAM Journal on Optimization , vol.19 , Issue.3 , pp. 1171-1183
- D'Aspremont, A.¹

10
- 84937908747
- SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives
- Montreal, Canada
- A. Defazio, F. Bach, and S. Lacoste-Julien. SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In Proc. Advances in Neural Information Processing Systems, pages 1646-1654, Montreal, Canada, 2014.
- (2014) Proc. Advances in Neural Information Processing Systems , pp. 1646-1654
- Defazio, A.¹ Bach, F.² Lacoste-Julien, S.³

11
- 84905567870
- First-order methods of smooth convex optimization with inexact oracle
- O. Devolder, F. Glineur, and Y. Nesterov. First-order methods of smooth convex optimization with inexact oracle. Mathematical Programming, 146(1-2):37-75, 2014.
- (2014) Mathematical Programming , vol.146 , Issue.1-2 , pp. 37-75
- Devolder, O.¹ Glineur, F.² Nesterov, Y.³

12
- 84995469644
- arXiv: 1602.05419, Feb
- A. Dieuleveut, N. Flammarion, and F. Bach. Harder, better, faster, stronger convergence rates for least-squares regression. arXiv: 1602.05419, Feb. 2016.
- (2016) Harder, Better, Faster, Stronger Convergence Rates for Least-squares Regression
- Dieuleveut, A.¹ Flammarion, N.² Bach, F.³

13
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(2):2121-2159, 2011.
- (2011) Journal of Machine Learning Research , vol.12 , Issue.2 , pp. 2121-2159
- Duchi, J.¹ Hazan, E.² Singer, Y.³

14
- 84984697477
- From averaging to acceleration, there is only a step-size
- N. Flammarion and F. Bach. From averaging to acceleration, there is only a step-size. Journal of Machine Learning Research, 40(1):1-38, 2015.
- (2015) Journal of Machine Learning Research , vol.40 , Issue.1 , pp. 1-38
- Flammarion, N.¹ Bach, F.²

15
- 80052668032
- Large-scale matrix factorization with distributed stochastic gradient descent
- Alberta, Canada
- R. Gemulla, E. Nijkamp, P. J. Haas, and Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. In Proc. International Conference on Knowledge Discovery and Data Mining, pages 69-77, Alberta, Canada, 2011.
- (2011) Proc. International Conference on Knowledge Discovery and Data Mining , pp. 69-77
- Gemulla, R.¹ Nijkamp, E.² Haas, P.J.³ Sismanis, Y.⁴

16
- 84871576447
- Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework
- S. Ghadimi and G. Lan. Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework. SIAM Journal on Optimization, 22(4):1469-1492, 2012.
- (2012) SIAM Journal on Optimization , vol.22 , Issue.4 , pp. 1469-1492
- Ghadimi, S.¹ Lan, G.²

17
- 0003807773
- Fourth Edition Prentice-Hall, NJ
- S. Haykin. Adaptive Filter Theory. Fourth Edition, Prentice-Hall, NJ, 2008.
- (2008) Adaptive Filter Theory
- Haykin, S.¹

18
- 77956508892
- Accelerated gradient methods for stochastic optimization and online learning
- Vancouver, Canada
- C. Hu, W. Pan, and J. T. Kwok. Accelerated gradient methods for stochastic optimization and online learning. In Proc. Advances in Neural Information Processing Systems, pages 781-789, Vancouver, Canada, 2009.
- (2009) Proc. Advances in Neural Information Processing Systems , pp. 781-789
- Hu, C.¹ Pan, W.² Kwok, J.T.³

19
- 84898963415
- Accelerating stochastic gradient descent using predictive variance reduction
- Lake Tahoe, Navada
- R. Johnson and T. Zhang. Accelerating stochastic gradient descent using predictive variance reduction. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 315-323, Lake Tahoe, Navada, 2013.
- (2013) Proc. Advances in Neural Information Processing Systems (NIPS) , pp. 315-323
- Johnson, R.¹ Zhang, T.²

20
- 84892582758
- Combining modality specific deep neural networks for emotion recognition in video
- Sydney, Australia
- S. Kahou, C. Pal, X. Bouthillier, P. Froumenty, and et al. Combining modality specific deep neural networks for emotion recognition in video. In Proc. International Conference on Multimodal Interaction, pages 543-550, Sydney, Australia, 2013.
- (2013) Proc. International Conference on Multimodal Interaction , pp. 543-550
- Kahou, S.¹ Pal, C.² Bouthillier, X.³ Froumenty, P.⁴

21
- 84876231242
- ImageNet classification with deep convo-lutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convo-lutional neural networks. In Proc. Advances in Neural Information Processing Systems, pages 1097-1105, 2012.
- (2012) Proc. Advances in Neural Information Processing Systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

22
- 84862273593
- An optimal method for stochastic composite optimization
- G. Lan. An optimal method for stochastic composite optimization. Mathematical Programming, 133(1-2):365-397, 2012.
- (2012) Mathematical Programming , vol.133 , Issue.1-2 , pp. 365-397
- Lan, G.¹

23
- 84962476790
- Analysis and design of optimization algorithms via integral quadratic constraints
- L. Lessard, B. Recht, and A. Packard. Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization, 26(1):57-95, 2016.
- (2016) SIAM Journal on Optimization , vol.26 , Issue.1 , pp. 57-95
- Lessard, L.¹ Recht, B.² Packard, A.³

24
- 0005422061
- Convergence rate of incremental subgradient algorithms
- S. Uryasey and M. Pardalos P, editors, Springer
- A. Nedić and D. P. Bertsekas. Convergence rate of incremental subgradient algorithms. In S. Uryasey and M. Pardalos P, editors, Stochastic Optimization: Algorithms and Applications, volume 54, pages 223-264. Springer, 2001.
- (2001) Stochastic Optimization: Algorithms and Applications , vol.54 , pp. 223-264
- Nedić, A.¹ Bertsekas, D.P.²

25
- 34548480020
- A method of solving a convex programming problem with convergence rate O(1/k2)
- Y. Nesterov. A method of solving a convex programming problem with convergence rate O(1/k2). Soviet Mathematics Doklady, 27(2):372-376, 1983.
- (1983) Soviet Mathematics Doklady , vol.27 , Issue.2 , pp. 372-376
- Nesterov, Y.¹

26
- 0003696537
- Springer
- Y. Nesterov. Introductory Lectures on Convex Optimization. Springer, 2004.
- (2004) Introductory Lectures on Convex Optimization
- Nesterov, Y.¹

27
- 17444406259
- Smooth minimization of non-smooth functions
- Y. Nesterov. Smooth minimization of non-smooth functions. Mathematical Programming, 103(1):127-152, 2005.
- (2005) Mathematical Programming , vol.103 , Issue.1 , pp. 127-152
- Nesterov, Y.¹

28
- 84937865260
- Stochastic proximal gradient descent with acceleration techniques
- Montreal, Canada
- A. Nitanda. Stochastic proximal gradient descent with acceleration techniques. In Proc. Advances in Neural Information Processing Systems, pages 1574-1582, Montreal, Canada, 2014.
- (2014) Proc. Advances in Neural Information Processing Systems , pp. 1574-1582
- Nitanda, A.¹

29
- 50549197532
- Some methods of speeding up the convergence of iteration methods
- B. T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1-17, 1964.
- (1964) USSR Computational Mathematics and Mathematical Physics , vol.4 , Issue.5 , pp. 1-17
- Polyak, B.T.¹

30
- 85049776636
- Optimization Software NY
- B. T. Polyak. Introduction to Optimization. Optimization Software, NY, 1987.
- (1987) Introduction to Optimization
- Polyak, B.T.¹

31
- 0016321936
- Channel identification for high speed digital communications
- J. G. Proakis. Channel identification for high speed digital communications. IEEE Transactions on Automatic Control, 19(6):916-922, 1974.
- (1974) IEEE Transactions on Automatic Control , vol.19 , Issue.6 , pp. 916-922
- Proakis, J.G.¹

32
- 0032983160
- On the momentum term in gradient descent learning algorithms
- N. Qian. On the momentum term in gradient descent learning algorithms. Neural Networks, 12(1):145-151, 1999.
- (1999) Neural Networks , vol.12 , Issue.1 , pp. 145-151
- Qian, N.¹

33
- 84877725219
- A stochastic gradient method with an exponential convergence rate for finite training sets
- Lake Tahoe, Navada
- N. L. Roux, M. Schmidt, and F. R. Bach. A stochastic gradient method with an exponential convergence rate for finite training sets. In Proc. Advances in Neural Information Processing Systems (NIPS), pages 2663-2671, Lake Tahoe, Navada, 2012.
- (2012) Proc. Advances in Neural Information Processing Systems (NIPS) , pp. 2663-2671
- Roux, N.L.¹ Schmidt, M.² Bach, F.R.³

34
- 0025641526
- Analysis of the momentum LMS algorithm
- S. Roy and J. J. Shynk. Analysis of the momentum LMS algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing, 38(12):2088-2098, 1990.
- (1990) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.38 , Issue.12 , pp. 2088-2098
- Roy, S.¹ Shynk, J.J.²

35
- 84889476065
- Wiley, NY
- A. H. Sayed. Adaptive Filters. Wiley, NY, 2008.
- (2008) Adaptive Filters
- Sayed, A.H.¹

36
- 84905009821
- Adaptation, learning, and optimization over networks
- Jul
- A. H. Sayed. Adaptation, learning, and optimization over networks. Foundations and Trends in Machine Learning, 7(4-5):311-801, Jul. 2014a.
- (2014) Foundations and Trends in Machine Learning , vol.7 , Issue.4-5 , pp. 311-801
- Sayed, A.H.¹

37
- 84897483887
- Adaptive networks
- A. H. Sayed. Adaptive networks. Proceedings of the IEEE, 102(4):460-497, 2014b.
- (2014) Proceedings of the IEEE , vol.102 , Issue.4 , pp. 460-497
- Sayed, A.H.¹

38
- 84995377955
- arXiv: 1502.06177, Feb
- S. Shalev-Shwartz. SDCA without duality. arXiv:1502.06177, Feb. 2015.
- (2015) SDCA Without Duality
- Shalev-Shwartz, S.¹

39
- 84965174061
- Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization
- Beijing, China
- S. Shalev-Shwartz and T. Zhang. Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization. In Proc. International Conference on Machine Learning, pages 64-72, Beijing, China, 2014.
- (2014) Proc. International Conference on Machine Learning , pp. 64-72
- Shalev-Shwartz, S.¹ Zhang, T.²

40
- 0032069997
- Analysis of momentum adaptive filtering algorithms
- R. Sharma, W. A. Sethares, and J. A. Bucklew. Analysis of momentum adaptive filtering algorithms. IEEE Transactions on Signal Processing, 46(5):1430-1434, 1998.
- (1998) IEEE Transactions on Signal Processing , vol.46 , Issue.5 , pp. 1430-1434
- Sharma, R.¹ Sethares, W.A.² Bucklew, J.A.³

41
- 0024125970
- The LMS algorithm with momentum updating
- Espoo, Finland, June
- J. J. Shynk and S. Roy. The LMS algorithm with momentum updating. In Proc. IEEE International Symposium on Circuits and Systems, pages 2651-2654, Espoo, Finland, June 1988.
- (1988) Proc. IEEE International Symposium on Circuits and Systems , pp. 2651-2654
- Shynk, J.J.¹ Roy, S.²

42
- 84897510162
- On the importance of initialization and momentum in deep learning
- Atlanta, USA
- I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In Proc. International Conference on Machine Learning, pages 1139-1147, Atlanta, USA, 2013.
- (2013) Proc. International Conference on Machine Learning , pp. 1139-1147
- Sutskever, I.¹ Martens, J.² Dahl, G.³ Hinton, G.⁴

43
- 84937522268
- Going deeper with convolutions
- Boston, USA, June
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelo, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, pages 1-9, Boston, USA, June 2015.
- (2015) Proc. IEEE Conference on Computer Vision and Pattern Recognition , pp. 1-9
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelo, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

44
- 84934340175
- Academic Press, NY
- S. Theodoridis. Machine Learning: A Bayesian and Optimization Perspective. Academic Press, NY, 2015.
- (2015) Machine Learning: A Bayesian and Optimization Perspective
- Theodoridis, S.¹

45
- 84937061485
- Tracking performance of momentum LMS algorithm for a chirped sinusoidal signal
- Tampere, Finland
- L. K. Ting, C. F. N. Cowan, and R. F. Woods. Tracking performance of momentum LMS algorithm for a chirped sinusoidal signal. In Proc. European Signal Processing Conference, pages 1-4, Tampere, Finland, 2000.
- (2000) Proc. European Signal Processing Conference , pp. 1-4
- Ting, L.K.¹ Cowan, C.F.N.² Woods, R.F.³

46
- 0024749726
- Properties of the momentum LMS algorithm
- M. A. Tugay and Y. Tanik. Properties of the momentum LMS algorithm. Signal Processing, 18(2):117-127, 1989.
- (1989) Signal Processing , vol.18 , Issue.2 , pp. 117-127
- Tugay, M.A.¹ Tanik, Y.²

47
- 84995450468
- arXiv: 1602.02823, Feb
- M. Tygert. Poor starting points in machine learning. arXiv:1602.02823, Feb. 2016.
- (2016) Poor Starting Points in Machine Learning
- Tygert, M.¹

48
- 0004113431
- Prentice-Hall, NJ
- B. Widrow and S. D. Stearns. Adaptive Signal Processing. Prentice-Hall, NJ, 1985.
- (1985) Adaptive Signal Processing
- Widrow, B.¹ Stearns, S.D.²

49
- 21344488792
- Stochastic dynamics of learning with momentum in neural networks
- W. Wiegerinck, A. Komoda, and T. Heskes. Stochastic dynamics of learning with momentum in neural networks. Journal of Physics A: Mathematical and General, 27(13): 4425-4438, 1994.
- (1994) Journal of Physics A: Mathematical and General , vol.27 , Issue.13 , pp. 4425-4438
- Wiegerinck, W.¹ Komoda, A.² Heskes, T.³

50
- 78649396336
- Dual averaging methods for regularized stochastic learning and online optimization
- (Oct)
- L. Xiao. Dual averaging methods for regularized stochastic learning and online optimization. Journal of Machine Learning Research, 11(Oct):2543-2596, 2010.
- (2010) Journal of Machine Learning Research , vol.11 , pp. 2543-2596
- Xiao, L.¹

51
- 84995443457
- arXiv: 1511.07902, Oct
- B. Ying and A. H. Sayed. Performance limits of online stochastic sub-gradient learning. arXiv:1511.07902, Oct. 2015.
- (2015) Performance Limits of Online Stochastic Sub-gradient Learning
- Ying, B.¹ Sayed, A.H.²

52
- 84973369182
- Performance limits of single-agent and multi-agent sub-gradient stochastic learning
- Shanghai, China, March
- B. Ying and A. H. Sayed. Performance limits of single-agent and multi-agent sub-gradient stochastic learning. In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 4905-4909, Shanghai, China, March 2016.
- (2016) Proc. International Conference on Acoustics, Speech and Signal Processing , pp. 4905-4909
- Ying, B.¹ Sayed, A.H.²

53
- 84973390661
- On the influence of momentum acceleration on online learning
- Shanghai, China, March
- K. Yuan, B. Ying, and A. H. Sayed. On the influence of momentum acceleration on online learning. In Proc. International Conference on Acoustics, Speech and Signal Processing, pages 4915-4919, Shanghai, China, March 2016.
- (2016) Proc. International Conference on Acoustics, Speech and Signal Processing , pp. 4915-4919
- Yuan, K.¹ Ying, B.² Sayed, A.H.³

54
- 84906535729
- Accelerated learning for restricted Boltzmann machine with momentum term
- Coventry, UK
- Ś. Zareba, A. Gonczarek, J. M. Tomczak, and J. Swiatek. Accelerated learning for restricted Boltzmann machine with momentum term. In Proc. International Conference on Systems Engineering, pages 187-192, Coventry, UK, 2015.
- (2015) Proc. International Conference on Systems Engineering , pp. 187-192
- Zareba, S.¹ Gonczarek, A.² Tomczak, J.M.³ Swiatek, J.⁴

55
- 33749243068
- Solving large scale linear prediction problems using stochastic gradient descent algorithms
- Alberta, Canada
- T. Zhang. Solving large scale linear prediction problems using stochastic gradient descent algorithms. In Proc. International Conference on Machine Learning, page 116, Alberta, Canada, 2004.
- (2004) Proc. International Conference on Machine Learning , pp. 116
- Zhang, T.¹

56
- 84943152816
- arXiv: 1502.01710, Feb
- X. Zhang and Y. LeCun. Text understanding from scratch. arXiv:1502.01710, Feb. 2015.
- (2015) Text Understanding from Scratch
- Zhang, X.¹ Lecun, Y.²

57
- 84955446688
- Accelerated stochastic gradient method for composite regular-ization
- Reykjavik, Iceland
- W. Zhong and J. T. Kwok. Accelerated stochastic gradient method for composite regular-ization. In Proc. International Conference on Artificial Intelligence and Statistics, pages 1086-1094, Reykjavik, Iceland, 2014.
- (2014) Proc. International Conference on Artificial Intelligence and Statistics , pp. 1086-1094
- Zhong, W.¹ Kwok, J.T.²

58
- 84995444517
- arXiv: 1603.05953, Mar
- Z. Zhu. Katyusha: Accelerated variance reduction for faster SGD. arXiv:1603.05953, Mar. 2016.
- (2016) Katyusha: Accelerated Variance Reduction for Faster SGD
- Zhu, Z.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.