SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Volumn , Issue , 2011, Pages

The fixed points of off-policy TD

(1) Kolter, J Zico a

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; REINFORCEMENT LEARNING;

APPROXIMATORS; FIXED POINTS; FUNCTIONS APPROXIMATIONS; KEY ELEMENTS; LEARN+; POLICY DISTRIBUTION; POLICY LEARNING; REINFORCEMENT LEARNINGS; SOLUTION QUALITY; VALUE FUNCTIONS;

SET THEORY;

EID: 85162349973 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (64)

References (19)

1
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the International Conference on Machine Learning, 1995.
- (1995) Proceedings of the International Conference on Machine Learning
- Baird, L.C.¹

2
- 0004055894
- Cambridge University Press
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
- (2004) Convex Optimization
- Boyd, S.¹ Vandenberghe, L.²

3
- 0000430514
- The convergence of TD(λ) for general λ
- P. Dayan. The convergence of TD(λ) for general λ. Machine Learning, 8(3-4), 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4
- Dayan, P.¹

4
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1994.
- (1994) Neural Computation , vol.6 , pp. 6
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

5
- 77956086700
- Low-rank optimization on the cone of positive semidefinite matrices
- M. Journee, F. Bach, P.A. Absil, and R. Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM Journal on Optimization, 20(5):2327-2351, 2010.
- (2010) SIAM Journal on Optimization , vol.20 , Issue.5 , pp. 2327-2351
- Journee, M.¹ Bach, F.² Absil, P.A.³ Sepulchre, R.⁴

6
- 71149121683
- Regularization and feature selection in least-squares temporal difference learning
- J.Z. Kolter and A.Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In Proceedings of the International Conference on Machine Learning, 2009.
- (2009) Proceedings of the International Conference on Machine Learning
- Kolter, J.Z.¹ Ng, A.Y.²

7
- 4644323293
- Least-squares policy iteration
- M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

8
- 77956549349
- Finite-sample analysis of LSTD
- A. Lazaric, M. Ghavamzadeh, and R.Munos. Finite-sample analysis of LSTD. In Proceedings of the International Conference on Machine Learning, 2010.
- (2010) Proceedings of the International Conference on Machine Learning
- Lazaric, A.¹ Ghavamzadeh, M.² Munos, R.³

9
- 77954101982
- GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
- H.R. Maei and R.S. Sutton. GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, 2010.
- (2010) Proceedings of the Third Conference on Artificial General Intelligence
- Maei, H.R.¹ Sutton, R.S.²

10
- 77956541799
- Toward off-policy learning control with function approximation
- H.R. Maei, Cs. Szepesvari, S. Bhatnagar, and R.S. Sutton. Toward off-policy learning control with function approximation. In Proceedings of the International Conference on Machine Learning, 2010.
- (2010) Proceedings of the International Conference on Machine Learning
- Maei, H.R.¹ Szepesvari, Cs.² Bhatnagar, S.³ Sutton, R.S.⁴

11
- 1942516880
- Error bounds for approximate policy iteration
- R. Munos. Error bounds for approximate policy iteration. In Proceedings of the International Conference on Machine Learning, 2003.
- (2003) Proceedings of the International Conference on Machine Learning
- Munos, R.¹

12
- 0003982971
- Springer
- J. Nocedal and S.J. Wright. Numerical Optimization. Springer, 1999.
- (1999) Numerical Optimization
- Nocedal, J.¹ Wright, S.J.²

13
- 85162310185
- Personal communication
- B. Scherrer. Personal communication, 2011.
- (2011)
- Scherrer, B.¹

14
- 84860607818
- minfunc
- M. Schmidt. minfunc, 2005. Available at http://www.cs.ubc.ca/~schmidtm/ Software/minFunc.html.
- (2005) M. Schmidt

15
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- R.S. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, Cs. Szepesvari, and E. Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the International Conference on Machine Learning, 2009.
- (2009) Proceedings of the International Conference on Machine Learning
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvari, Cs.⁶ Wiewiora, E.⁷

16
- 77956513316
- A convergent O(n) algorithm for off-policy temporal-different learning with linear function approximation
- R.S. Sutton, Cs. Szepesvari, and H.R. Maei. A convergent O(n) algorithm for off-policy temporal-different learning with linear function approximation. In Advances in Neural Information Processing, 2008.
- (2008) Advances in Neural Information Processing
- Sutton, R.S.¹ Szepesvari, Cs.² Maei, H.R.³

17
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions and Auotomatic Control, 42:674-690, 1997.
- (1997) IEEE Transactions and Auotomatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

18
- 0033221519
- Average cost temporal difference learning
- J.N. Tsitsiklis and B. Van Roy. Average cost temporal difference learning. Automatica, 35(11):1799-1808, 1999.
- (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

19
- 77953119098
- Error bounds for approximations from projected linear equations
- H. Yu and D. P. Bertsekas. Error bounds for approximations from projected linear equations. Mathematics of Operations Research, 35:306-329, 2010.
- (2010) Mathematics of Operations Research , vol.35 , pp. 306-329
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.