SCOPUS 정보 검색 플랫폼

ICML 2010 - Proceedings, 27th International Conference on Machine Learning

Volumn , Issue , 2010, Pages 1071-1078

Least-Squares λ Policy Iteration: Bias-variance trade-off in control problems

(2) Thiery, Christophe a Scherrer, Bruno a

Author keywords

[No Author keywords available]

Indexed keywords

IN-CONTROL; LARGE SPACES; LEAST SQUARE; PERFORMANCE BOUNDS; POLICY ITERATION; TETRIS GAME; TRAINING SAMPLE; VALUE FUNCTION APPROXIMATION; VALUE ITERATION;

LEARNING SYSTEMS;

EID: 77956525931 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (23)

References (16)

1
- 4243567726
- Temporal differences-based policy iteration and applications in neuro-dynamic programming
- Bertsekas, D. and Ioffe, S. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical report, MIT, 1996.
- (1996) Technical Report, MIT
- Bertsekas, D.¹ Ioffe, S.²

2
- 0004211236
- Athena Scientific
- Bertsekas, D.P. and Tsitsiklis, J.N. Neurodynamic Programming. Athena Scientific, 1996.
- (1996) Neurodynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 0036832950
- Technical update: Least-squares temporal difference learning
- Boyan, J. A. Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246, 2002.
- (2002) Machine Learning , vol.49 , pp. 233-246
- Boyan, J.A.¹

4
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- Bradtke, S. J. and Barto, A.G. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57, 1996.
- (1996) Machine Learning , vol.22 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

5
- 26944457467
- Bias-variance error bounds for temporal difference updates
- Kearns, M. and Singh, S. Bias-variance error bounds for temporal difference updates. In In Proceedings of the 13th Annual Conference on Computational Learning Theory, pp. 142-147, 2000.
- (2000) Proceedings of the 13th Annual Conference on Computational Learning Theory , pp. 142-147
- Kearns, M.¹ Singh, S.²

6
- 4644323293
- Least-squares policy iteration
- Lagoudakis, M. G. and Parr, R. Least-squares policy iteration. Journal of Machine Learning Research, 4: 1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

7
- 35048819671
- Least-squares methods in reinforcement learning for control
- Springer-Verlag
- Lagoudakis, Michail G., Parr, Ronald, and Littman, Michael L. Least-squares methods in reinforcement learning for control. In In SETN'02: Proceedings of the Second Hellenic Conference on AI, pp. 249-260. Springer-Verlag, 2002.
- (2002) SETN'02: Proceedings of the Second Hellenic Conference on AI , pp. 249-260
- Lagoudakis, M.G.¹ Parr, R.² Littman, M.L.³

8
- 0037288398
- Least squares policy evaluation algorithms with linear function approximation
- Nedić, A. and Bertsekas, D. P. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13(1-2): 79-110, 2003.
- (2003) Discrete Event Dynamic Systems , vol.13 , Issue.1-2 , pp. 79-110
- Nedić, A.¹ Bertsekas, D.P.²

9
- 0003998452
- Wiley, New York
- Puterman, M. Markov Decision Processes. Wiley, New York, 1994.
- (1994) Markov Decision Processes
- Puterman, M.¹

10
- 1942482175
- Optimality of reinforcement learning algorithms with linear function approximation
- Schoknecht, Ralf. Optimality of reinforcement learning algorithms with linear function approximation. In NIPS, pp. 1555-1562, 2002.
- (2002) NIPS , pp. 1555-1562
- Schoknecht, R.¹

11
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- Sutton, R. S., Maei, H. R., Precup, D., Bhatna-gar, S., Silver, D., Szepesvári, C, and Wiewiora, E. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 993-1000, 2009.
- (2009) ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatna-Gar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

12
- 0004102479
- BradFord Book. The MIT Press
- Sutton, R.S. and Barto, A.G. Reinforcement Learning, An introduction. BradFord Book. The MIT Press, 1998.
- (1998) Reinforcement Learning, an Introduction
- Sutton, R.S.¹ Barto, A.G.²

13
- 31844456754
- Finite time bounds for sampling based fitted value iteration
- ACM
- Szepesvári, Csaba and Munos, Rémi. Finite time bounds for sampling based fitted value iteration. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pp. 880-887. ACM, 2005.
- (2005) ICML '05: Proceedings of the 22nd International Conference on Machine Learning , pp. 880-887
- Szepesvári, C.¹ Munos, R.²

14
- 70350140182
- Building Controllers for Tetris
- Thiery, Christophe and Scherrer, Bruno. Building Controllers for Tetris. International Computer Games Association Journal, 32:3-11, 2009.
- (2009) International Computer Games Association Journal , vol.32 , pp. 3-11
- Thiery, C.¹ Scherrer, B.²

15
- 77956535758
- Technical report, URL
- Thiery, Christophe and Scherrer, Bruno. Performance bound for Approximate Optimistic Policy Iteration. Technical report, 2010. URL http://hal.inria.fr/ inria-00480952.
- (2010) Performance Bound for Approximate Optimistic Policy Iteration
- Thiery, C.¹ Scherrer, B.²

16
- 67949109470
- Convergence results for some temporal difference methods based on least squares
- Yu, H. and Bertsekas, D. P. Convergence Results for Some Temporal Difference Methods Based on Least Squares. IEEE Trans. Automatic Control, 54:1515-1531, 2009.
- (2009) IEEE Trans. Automatic Control , vol.54 , pp. 1515-1531
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.