SCOPUS 정보 검색 플랫폼

Volumn WS-06-11, Issue , 2006, Pages 50-56

PAC reinforcement learning bounds for RTDP and Rand-RTDP

(3) Strehl, Alexander L a Li, Hong a Littman, Michael L a

Author keywords

[No Author keywords available]

Indexed keywords

CONVERGENCE OF NUMERICAL METHODS; DECISION MAKING; LEARNING ALGORITHMS; MARKOV PROCESSES; POLYNOMIAL APPROXIMATION; PROBABILITY DISTRIBUTIONS; REAL TIME SYSTEMS;

PROBABLY APPROXIMATELY CORRECT (PAC); REAL-TIME DYNAMIC PROGRAMMING (RTDP);

DYNAMIC PROGRAMMING;

EID: 33845972675 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (7)

References (14)

1
- 0029210635
- Learning to act using real-time dynamic programming
- Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81-138.
- (1995) Artificial Intelligence , vol.72 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

2
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
- (1996) Neuro-dynamic programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 9444233135
- Labeled RTDP: Improving the convergence of real-time dynamic programming
- Trento, Italy: AAAI Press
- Bonet, B., & Geffner, H. (2003). Labeled RTDP: Improving the convergence of real-time dynamic programming. Proc. 13th International Conf. on Automated Planning and Scheduling (pp. 12-21). Trento, Italy: AAAI Press.
- (2003) Proc. 13th International Conf. on Automated Planning and Scheduling , pp. 12-21
- Bonet, B.¹ Geffner, H.²

4
- 0041965975
- R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R. I., & Tennenholtz, M (2002). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

5
- 23244466805
- Doctoral dissertation, Gatsby Computational Neuro-science Unit, University College London
- Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuro-science Unit, University College London.
- (2003) On the sample complexity of reinforcement learning
- Kakade, S.M.¹

6
- 84880649215
- A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Kearns, M., Mansour, Y., & Ng, A. Y. (1999). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99) (pp. 1324-1331).
- (1999) Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99) , pp. 1324-1331
- Kearns, M.¹ Mansour, Y.² Ng, A.Y.³

7
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209-232.
- (2002) Machine Learning , vol.49 , pp. 209-232
- Kearns, M.J.¹ Singh, S.P.²

8
- 0003657590
- Addison-Wesley
- Knuth, D. (1969). The art of computer programming: Vol 2 / seminumerical algorithms, chapter 3: Random numbers. Addison-Wesley.
- (1969) The art of computer programming: Vol 2 / seminumerical algorithms, chapter 3: Random numbers
- Knuth, D.¹

9
- 0025400088
- Real-time heuristic search
- Korf, R. E. (1990). Real-time heuristic search. Artificial Intelligence, 42, 189-211.
- (1990) Artificial Intelligence , vol.42 , pp. 189-211
- Korf, R.E.¹

10
- 31844446535
- Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees
- McMahan, H. B., Likhachev, M., & Gordon, G. J. (2005). Bounded real-time dynamic programming: RTDP with monotone upper bounds and performance guarantees. Proceedings of the Twenty-second International Conference on Machine Learning (ICML-05) (pp. 569-576).
- (2005) Proceedings of the Twenty-second International Conference on Machine Learning (ICML-05) , pp. 569-576
- McMahan, H.B.¹ Likhachev, M.² Gordon, G.J.³

11
- 34548745051
- Incremental model-based learners with formal learning-time guarantees
- To appear in
- Strehl, A. L., Li, L., & Littman, M. L. (2006). Incremental model-based learners with formal learning-time guarantees. To appear in Proceedings of the Twenty-second Conference on Uncertainty in Artificial Intelligence (UAI-06).
- (2006) Proceedings of the Twenty-second Conference on Uncertainty in Artificial Intelligence (UAI-06)
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

12
- 0004102479
- The MIT Press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. The MIT Press.
- (1998) Reinforcement learning: An introduction
- Sutton, R.S.¹ Barto, A.G.²

13
- 0021518106
- A theory of the learnable
- Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134-1142.
- (1984) Communications of the ACM , vol.27 , pp. 1134-1142
- Valiant, L.G.¹

14
- 34249833101
- Q-learning
- Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8, 279-292.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.