SCOPUS 정보 검색 플랫폼

Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, AISTATS 2016

Volumn , Issue , 2016, Pages 1051-1060

Bridging the gap between stochastic gradient MCMC and stochastic optimization

(5) Chen, Changyou a Carlson, David b Gan, Zhe a Li, Chunyuan a Carin, Lawrence a

a Duke University (United States)

b Columbia University ^* (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; DEEP NEURAL NETWORKS; MARKOV PROCESSES; MOMENTUM; MONTE CARLO METHODS; SIMULATED ANNEALING; STOCHASTIC SYSTEMS; TEMPERATURE;

CONVENTIONAL OPTIMIZATION; MARKOV CHAIN MONTE-CARLO; NEURAL NETWORK MODEL; STOCHASTIC GRADIENT; STOCHASTIC OPTIMIZATION ALGORITHM; STOCHASTIC OPTIMIZATION METHODS; STOCHASTIC OPTIMIZATIONS; ZERO TEMPERATURES;

STOCHASTIC MODELS;

EID: 84986265678 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (99)

References (39)

1
- 0013141758
- Reversible jump mcmc simulated annealing for neural networks
- C. Andrieu, N. de Freitas, and A. Doucet. Reversible jump mcmc simulated annealing for neural networks. In UAI, 2000.
- (2000) UAI
- Andrieu, C.¹ de Freitas, N.² Doucet, A.³

2
- 84969752808
- Weight uncertainty in neural networks
- C. Blundell, J. Cornebise, K. Kavukcuoglu, and D. Wierstra. Weight uncertainty in neural networks. In ICML, 2015.
- (2015) ICML
- Blundell, C.¹ Cornebise, J.² Kavukcuoglu, K.³ Wierstra, D.⁴

3
- 84904136037
- Large-scale machine learning with stochastic gradient descent
- L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proc. COMPSTAT, 2010.
- (2010) Proc. COMPSTAT
- Bottou, L.¹

4
- 84867129058
- Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription
- N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent. Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription. In ICML, 2012.
- (2012) ICML
- Boulanger-Lewandowski, N.¹ Bengio, Y.² Vincent, P.³

5
- 84965148019
- Preconditioned spectral descent for deep learning
- D. E. Carlson, E. Collins, Y.-P. Hsieh, L. Carin, and V. Cevher. Preconditioned spectral descent for deep learning. In Advances in Neural Information Processing Systems, pages 2953–2961, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 2953-2961
- Carlson, D.E.¹ Collins, E.² Hsieh, Y.-P.³ Carin, L.⁴ Cevher, V.⁵

6
- 84965095225
- On the convergence of stochastic gradient mcmc algorithms with high-order integrators
- C. Chen, N. Ding, and L. Carin. On the convergence of stochastic gradient mcmc algorithms with high-order integrators. In NIPS, 2015.
- (2015) NIPS
- Chen, C.¹ Ding, N.² Carin, L.³

7
- 84919787787
- Stochastic gradient hamiltonian monte carlo
- T. Chen, E. B. Fox, and C. Guestrin. Stochastic gradient Hamiltonian Monte Carlo. In ICML, 2014.
- (2014) ICML
- Chen, T.¹ Fox, E.B.² Guestrin, C.³

8
- 84961291190
- K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In arXiv:1406.1078, 2014.
- (2014) Learning Phrase Representations Using Rnn Encoder-Decoder for Statistical Machine Translation
- Cho, K.¹ Van Merriënboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

9
- 84965117097
- Equilibrated adaptive learning rates for non-convex optimization
- Y. N. Dauphin, H. de Vries, and Y. Bengio. Equilibrated adaptive learning rates for non-convex optimization. In NIPS, 2015.
- (2015) NIPS
- Dauphin, Y.N.¹ de Vries, H.² Bengio, Y.³

10
- 84937959155
- Bayesian sampling using stochastic gradient thermostats
- N. Ding, Y. Fang, R. Babbush, C. Chen, R. D. Skeel, and H. Neven. Bayesian sampling using stochastic gradient thermostats. In NIPS, 2014.
- (2014) NIPS
- Ding, N.¹ Fang, Y.² Babbush, R.³ Chen, C.⁴ Skeel, R.D.⁵ Neven, H.⁶

11
- 80052250414
- Adaptive sub-gradient methods for online learning and stochastic optimization
- J. Duchi, E. Hazan, and Y. Singer. Adaptive sub-gradient methods for online learning and stochastic optimization. In JMLR, 2011.
- (2011) JMLR
- Duchi, J.¹ Hazan, E.² Singer, Y.³

12
- 84970024465
- Scalable deep poisson factor analysis for topic modeling
- Z. Gan, C. Chen, R. Henao, D. Carlson, and L. Carin. Scalable deep Poisson factor analysis for topic modeling. In ICML, 2015.
- (2015) ICML
- Gan, Z.¹ Chen, C.² Henao, R.³ Carlson, D.⁴ Carin, L.⁵

13
- 0021518209
- Stochastic relaxation, gibbs distributions, and the bayesian restoration of images
- S. Geman and D. Geman. Stochastic relaxation, gibbs distributions, and the bayesian restoration of images. In PAMI, 1984.
- (1984) PAMI
- Geman, S.¹ Geman, D.²

14
- 79952295497
- Riemann manifold langevin and hamiltonian monte carlo methods
- M. Girolami and B. Calderhead. Riemann manifold Langevin and Hamiltonian Monte Carlo methods. In JRSS, 2011.
- (2011) JRSS
- Girolami, M.¹ Calderhead, B.²

15
- 84897543523
- Maxout networks
- I. Goodfellow, D. Warde-farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In ICML, 2013.
- (2013) ICML
- Goodfellow, I.¹ Warde-farley, D.² Mirza, M.³ Courville, A.⁴ Bengio, Y.⁵

16
- 77953183471
- What is the best multi-stage architecture for object recognition?
- K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. Le-Cun. What is the best multi-stage architecture for object recognition? In ICCV, 2009.
- (2009) ICCV
- Jarrett, K.¹ Kavukcuoglu, K.² Ranzato, M.³ Le-Cun, Y.⁴

17
- 85083951076
- A method for stochastic optimization
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- (2015) ICLR
- Kingma, D.¹ Adam, J.Ba.²

18
- 26444479778
- Optimization by simulated annealing
- C. D. G. Jr
- S. Kirkpatrick, C. D. G. Jr, and M. P. Vecchi. Optimization by simulated annealing. In Science, 1983.
- (1983) Science
- Kirkpatrick, S.¹ Vecchi, M.P.²

19
- 85007196088
- Preconditioned stochastic gradient langevin dynamics for deep neural networks
- C. Li, C. Chen, D. Carlson, and L. Carin. Preconditioned stochastic gradient Langevin dynamics for deep neural networks. In AAAI, 2016a.
- (2016) AAAI
- Li, C.¹ Chen, C.² Carlson, D.³ Carin, L.⁴

20
- 85007273869
- High-order stochastic gradient thermostats for bayesian learning of deep models
- C. Li, C. Chen, K. Fan, and L. Carin. High-order stochastic gradient thermostats for Bayesian learning of deep models. In AAAI, 2016b.
- (2016) AAAI
- Li, C.¹ Chen, C.² Fan, K.³ Carin, L.⁴

21
- 67349202839
- Hybrid parallel tempering and simulated annealing method
- Y. Li, V. A. Protopopescu, N. Arnold, X. Zhang, and A. Gorin. Hybrid parallel tempering and simulated annealing method. In Applied Mathematics and Computation, 2009.
- (2009) Applied Mathematics and Computation
- Li, Y.¹ Protopopescu, V.A.² Arnold, N.³ Zhang, X.⁴ Gorin, A.⁵

22
- 85083953135
- Network in network
- M. Lin, Q. Chen, and S. Yan. Network in network. In ICLR, 2014.
- (2014) ICLR
- Lin, M.¹ Chen, Q.² Yan, S.³

23
- 77950857322
- Construction of numerical time-average and stationary measures via poisson equations
- J. C. Mattingly, A. M. Stuart, and M. V. Tretyakov. Construction of numerical time-average and stationary measures via Poisson equations. In SIAM J. NUMER. ANAL., 2010.
- (2010) SIAM J. NUMER. ANAL.
- Mattingly, J.C.¹ Stuart, A.M.² Tretyakov, M.V.³

24
- 0000273048
- Annealed importance sampling
- R. M. Neal. Annealed importance sampling. In Statistics and Computing, 2001.
- (2001) Statistics and Computing
- Neal, R.M.¹

25
- 85057196821
- Mcmc using hamiltonian dynamics
- R. M. Neal. Mcmc using hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, 2011.
- (2011) Handbook of Markov Chain Monte Carlo
- Neal, R.M.¹

26
- 85067545570
- Scaling nonparametric bayesian inference via subsample-annealing
- F. Obermeyer, J. Glidden, and E. Jonas. Scaling nonparametric bayesian inference via subsample-annealing. In AISTATS, 2014.
- (2014) AISTATS
- Obermeyer, F.¹ Glidden, J.² Jonas, E.³

27
- 84898939739
- Stochastic gradient riemannian langevin dynamics on the probability simplex
- S. Patterson and Y. W. Teh. Stochastic gradient Riemannian Langevin dynamics on the probability simplex. In NIPS, 2013.
- (2013) NIPS
- Patterson, S.¹ Teh, Y.W.²

28
- 0004140926
- Springer-Verlag, New York
- H. Risken. The Fokker-Planck equation. Springer-Verlag, New York, 1989.
- (1989) The Fokker-Planck Equation
- Risken, H.¹

29
- 0022471098
- Learning representations by back-propagating errors
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by back-propagating errors. In Nature, 1986.
- (1986) Nature
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

30
- 0003572234
- Elsevier
- E. Schechter. Handbook of Analysis and Its Foundations. Elsevier, 1997.
- (1997) Handbook of Analysis and Its Foundations
- Schechter, E.¹

31
- 84897510162
- On the importance of initialization and momentum in deep learning
- I. Sutskever, J. Martens, G. Dahl, and G. E. Hinton. On the importance of initialization and momentum in deep learning. In ICML, 2013.
- (2013) ICML
- Sutskever, I.¹ Martens, J.² Dahl, G.³ Hinton, G.E.⁴

32
- 84962855547
- Y. W. Teh, A. H. Thiery, and S. J. Vollmer. Consistency and fluctuations for stochastic gradient Langevin dynamics. In arXiv:1409.0578, 2014.
- (2014) Consistency and Fluctuations for Stochastic Gradient Langevin Dynamics
- Teh, Y.W.¹ Thiery, A.H.² Vollmer, S.J.³

33
- 84893343292
- Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude
- T. Tieleman and G. E. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. In Coursera: Neural Networks for Machine Learning, 2012.
- (2012) Coursera: Neural Networks for Machine Learning
- Tieleman, T.¹ Hinton, G.E.²

34
- 84955478486
- J. W. van de Meent, B. Paige, and F. Wood. Tempering by subsampling. In arXiv:1401.7145, 2014.
- (2014) Tempering by Subsampling
- van de Meent, J.W.¹ Paige, B.² Wood, F.³

35
- 0021819411
- Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm
- V. Černý. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm. In J. Optimization Theory and Applications, 1985.
- (1985) J. Optimization Theory and Applications
- Černý, V.¹

36
- 84962870797
- S. J. Vollmer, K. C. Zygalakis, and Y. W. Teh. (Non-)asymptotic properties of stochastic gradient Langevin dynamics. In arXiv:1501.00438, 2015.
- (2015) Non-)Asymptotic Properties of Stochastic Gradient Langevin Dynamics
- Vollmer, S.J.¹ Zygalakis, K.C.² Teh, Y.W.³

37
- 80053452150
- Bayesian learning via stochastic gradient langevin dynamics
- M. Welling and Y. W. Teh. Bayesian learning via stochastic gradient Langevin dynamics. In ICML, 2011.
- (2011) ICML
- Welling, M.¹ Teh, Y.W.²

38
- 85083954484
- Stochastic pooling for regularization of deep convolutional neural networks
- M. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. In ICLR, 2013.
- (2013) ICLR
- Zeiler, M.¹ Fergus, R.²

39
- 84969736572
- M. D. Zeiler. Adadelta: An adaptive learning rate method. In arXiv:1212.5701, 2012.
- (2012) Adadelta: An Adaptive Learning Rate Method
- Zeiler, M.D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.