SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 2, Issue , 2012, Pages 836-844

Regularized off-policy TD-learning

(3) Liu, Bo a Mahadevan, Sridhar a Liu, Ji b

a Biologically Inspired Neural and Dynamical Systems Laboratory (United States)

b University of Wisconsin Madison (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMIC FRAMEWORK; COMPUTATIONAL COSTS; CONVEX REGULARIZATIONS; LOW COMPUTATIONAL COMPLEXITY; NON-SMOOTH CONVEX OPTIMIZATIONS; SADDLE-POINT FORMULATIONS; SPARSE REPRESENTATION; THEORETICAL AND EXPERIMENTAL;

CONVEX OPTIMIZATION;

ALGORITHMS;

EID: 84877748309 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (59)

References (21)

1
- 17444361978
- Non-Euclidean restricted memory level method for large-scale convex optimization
- A. Ben-Tal and A. Nemirovski. Non-Euclidean restricted memory level method for large-scale convex optimization. Mathematical Programming, 102(3):407-456, 2005.
- (2005) Mathematical Programming , vol.102 , Issue.3 , pp. 407-456
- Ben-Tal, A.¹ Nemirovski, A.²

2
- 84877768262
- Linear off-policy actor-critic
- T. Degris, M. White, and R. S. Sutton. Linear off-policy actor-critic. In International Conference on Machine Learning, 2012.
- (2012) International Conference on Machine Learning
- Degris, T.¹ White, M.² Sutton, R.S.³

3
- 84867137477
- A dantzig selector approach to temporal difference learning
- M. Geist, B. Scherrer, A. Lazaric, and M. Ghavamzadeh. A Dantzig Selector Approach to Temporal Difference Learning. In International Conference on Machine Learning, 2012.
- (2012) International Conference on Machine Learning
- Geist, M.¹ Scherrer, B.² Lazaric, A.³ Ghavamzadeh, M.⁴

4
- 80053440025
- Finite-sample analysis of Lasso- TD
- M. Ghavamzadeh, A. Lazaric, R. Munos, and M. Hoffman. Finite-Sample Analysis of Lasso- TD. In Proceedings of the 28th International Conference on Machine Learning, 2011.
- (2011) Proceedings of the 28th International Conference on Machine Learning
- Ghavamzadeh, M.¹ Lazaric, A.² Munos, R.³ Hoffman, M.⁴

5
- 85162069759
- Linear complementarity for regularized policy evaluation and improvement
- J. Johns, C. Painter-Wakefield, and R. Parr. Linear complementarity for regularized policy evaluation and improvement. In Proceedings of the International Conference on Neural Information Processing Systems, 2010.
- (2010) Proceedings of the International Conference on Neural Information Processing Systems
- Johns, J.¹ Painter-Wakefield, C.² Parr, R.³

6
- 84877770255
- Optimization for machine learning
- chapter, MIT Press
- A. Juditsky and A. Nemirovski. Optimization for Machine Learning, chapter First-Order Methods for Nonsmooth Convex Large-Scale Optimization. MIT Press, 2011.
- (2011) First-Order Methods for Nonsmooth Convex Large-Scale Optimization
- Juditsky, A.¹ Nemirovski, A.²

7
- 71149121683
- Regularization and feature selection in least-squares temporal difference learning
- J. Zico Kolter and A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In Proceedings of 27 th International Conference on Machine Learning, 2009.
- (2009) Proceedings of 27 Th International Conference on Machine Learning
- Zico Kolter, J.¹ Ng, A.Y.²

8
- 80055028007
- Value function approximation in reinforcement learning using the fourier basis
- G. Konidaris, S. Osentoski, and PS Thomas. Value function approximation in reinforcement learning using the fourier basis. In Proceedings of the Twenty-Fifth Conference on Artificial Intelligence, 2011.
- (2011) Proceedings of the Twenty-Fifth Conference on Artificial Intelligence
- Konidaris, G.¹ Osentoski, S.² Thomas, P.S.³

9
- 33747014011
- G. M. Korpelevich. The extragradient method for finding saddle points and other problems. 1976.
- (1976) The Extragradient Method for Finding Saddle Points and Other Problems
- Korpelevich, G.M.¹

10
- 77954101982
- GQ (σ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
- H.R. Maei and R.S. Sutton. GQ (σ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, pages 91-96, 2010.
- (2010) Proceedings of the Third Conference on Artificial General Intelligence , pp. 91-96
- Maei, H.R.¹ Sutton, R.S.²

11
- 84886008156
- Sparse Q-learning with mirror descent
- S. Mahadevan and B. Liu. Sparse Q-learning with Mirror Descent. In Proceedings of the Conference on Uncertainty in AI, 2012.
- (2012) Proceedings of the Conference on Uncertainty in AI
- Mahadevan, S.¹ Liu, B.²

12
- 70349687250
- Subgradient methods for saddle-point problems
- A. Nedić and A. Ozdaglar. Subgradient methods for saddle-point problems. Journal of optimization theory and applications, 142(1):205-228, 2009.
- (2009) Journal of Optimization Theory and Applications , vol.142 , Issue.1 , pp. 205-228
- Nedić, A.¹ Ozdaglar, A.²

13
- 70450197241
- Robust stochastic approximation approach to stochastic programming
- A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19:1574-1609, 2009.
- (2009) SIAM Journal on Optimization , vol.19 , pp. 1574-1609
- Nemirovski, A.¹ Juditsky, A.² Lan, G.³ Shapiro, A.⁴

14
- 67651063011
- Y. Nesterov. Gradient methods for minimizing composite objective function. In www.optimization-online.org, 2007.
- (2007) Gradient Methods for Minimizing Composite Objective Function
- Nesterov, Y.¹

15
- 84867131813
- Greedy algorithms for sparse reinforcement learning
- C. Painter-Wakefield and R. Parr. Greedy algorithms for sparse reinforcement learning. In International Conference on Machine Learning, 2012.
- (2012) International Conference on Machine Learning
- Painter-Wakefield, C.¹ Parr, R.²

16
- 84877791310
- L1 regularized linear temporal difference learning
- C. Painter-Wakefield and R. Parr. L1 regularized linear temporal difference learning. Technical report, Duke CS Technical Report TR-2012-01, 2012.
- (2012) Technical Report, Duke CS Technical Report TR-2012-01
- Painter-Wakefield, C.¹ Parr, R.²

17
- 77956538796
- Feature selection using regularization in approximate linear programs for Markov decision processes
- M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. In Proceedings of the International Conference on Machine learning (ICML), 2010.
- (2010) Proceedings of the International Conference on Machine Learning (ICML)
- Petrik, M.¹ Taylor, G.² Parr, R.³ Zilberstein, S.⁴

18
- 0035273403
- Online learning control by association and reinforcement
- J. Si and Y. Wang. Online learning control by association and reinforcement. IEEE Transactions on Neural Networks, 12:264-276, 2001.
- (2001) IEEE Transactions on Neural Networks , vol.12 , pp. 264-276
- Si, J.¹ Wang, Y.²

19
- 0004102479
- MIT Press
- R. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.G.²

20
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- R.S. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In International Conference on Machine Learning, pages 993-1000, 2009.
- (2009) International Conference on Machine Learning , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

21
- 85162349973
- The fixed points of off-policy TD
- J. Zico Kolter. The Fixed Points of Off-Policy TD. In Advances in Neural Information Processing Systems 24, pages 2169-2177, 2011.
- (2011) Advances in Neural Information Processing Systems , vol.24 , pp. 2169-2177
- Zico Kolter, J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.