SCOPUS 정보 검색 플랫폼

21st Annual Conference on Learning Theory, COLT 2008

Volumn , Issue , 2008, Pages 323-334

Adaptive aggregation for reinforcement learning with efficient exploration: Deterministic domains

(2) Bernstein, Andrey a Shimkin, Nahum a

a TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE AGGREGATION; COARSER RESOLUTION; CONTINUOUS STATE SPACE; DETERMINISTIC DOMAINS; EXPLORATION TECHNIQUES; ONLINE LEARNING; STATE AGGREGATION; UNCERTAINTY INTERVALS;

ALGORITHMS;

REINFORCEMENT LEARNING;

EID: 84898060153 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (4)

References (21)

1
- 74049127928
- Fitted qiteration in continuous action-space MDPs
- A. Antos, R. Munos, and C. Szepesvari. Fitted Qiteration in continuous action-space MDPs. In Proceedings of Neural Information Processing Systems Conference (NIPS), 2007.
- (2007) Proceedings of Neural Information Processing Systems Conference (NIPS)
- Antos, A.¹ Munos, R.² Szepesvari, C.³

2
- 85151789426
- Logarithmic online regret bounds for undiscounted reinforcement learning
- P. Auer and R. Ortner. Logarithmic online regret bounds for undiscounted reinforcement learning. In Proceedings of Neural Information Processing Systems Conference (NIPS), 2006.
- (2006) Proceedings of Neural Information Processing Systems Conference (NIPS)
- Auer, P.¹ Ortner, R.²

3
- 78649714480
- Master's thesis, Technion - Israel Institute of Technology
- A. Bernstein. Adaptive state aggregation for reinforcement learning. Master's thesis, Technion - Israel Institute of Technology, 2007. URL: http://tx.technion.ac.il/~andreyb/MSc-Thesis-final.pdf.
- (2007) Adaptive State Aggregation for Reinforcement Learning
- Bernstein, A.¹

4
- 0003565783
- Athena Scientific, Belmont, MA, third edition
- D. P. Bertsekas. Dynamic Programming and Optimal Control, vol. 2. Athena Scientific, Belmont, MA, third edition, 2007.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

5
- 84898080178
- To appear
- A. Bonarini, A. Lazaric, and M. Restelli. LEAP: an adaptive multi-resolution reinforcement learning algorithm. To appear.
- LEAP: An Adaptive Multi-resolution Reinforcement Learning Algorithm
- Bonarini, A.¹ Lazaric, A.² Restelli, M.³

6
- 0346942368
- Decision-theoretic planning: Structural assumptions and computational leverage
- C. Boutilier, T. Dean, and S. Hanks. Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11: 1-94, 1999.
- (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
- Boutilier, C.¹ Dean, T.² Hanks, S.³

7
- 0041965975
- R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
- R. I. Brafman and M. Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3: 213-231, 2002.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

8
- 78649712889
- Master's thesis, Technion - Israel Institute of Technology
- H. Chapman. Global confidence bound algorithms for the exploration-exploitation tradeoff in reinforcement learning. Master's thesis, Technion - Israel Institute of Technology, 2007.
- (2007) Global Confidence Bound Algorithms for the Exploration-exploitation Tradeoff in Reinforcement Learning
- Chapman, H.¹

9
- 0026206780
- An optimal oneway multigrid algorithm for discrete-time stochastic control
- C.-S Chow and J.N. Tsitsiklis. An optimal oneway multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control, 36(8): 898-914, 1991.
- (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
- Chow, C.-S.¹ Tsitsiklis, J.N.²

10
- 34247204877
- A hierarchical approach to efficient reinforcement learning in deterministic domains
- C. Diuk, A. L. Strehl, and M. L. Littman. A hierarchical approach to efficient reinforcement learning in deterministic domains. In Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems, pages 313-319, 2006.
- (2006) Proceedings of the 5th International Joint Conference on Autonomous Agents and Multiagent Systems , pp. 313-319
- Diuk, C.¹ Strehl, A.L.² Littman, M.L.³

11
- 0742284358
- Reinforcement learning with function approximation converges to a region
- G. J. Gordon. Reinforcement learning with function approximation converges to a region. In Advances in Neural Information Processing Systems (NIPS) 12, pages 1040-1046, 2000.
- (2000) Advances in Neural Information Processing Systems (NIPS) , vol.12 , pp. 1040-1046
- Gordon, G.J.¹

12
- 23244466805
- PhD thesis, Gatsby Computational Neuroscience Unit, University College London, UK
- S. M. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, UK, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.M.¹

13
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- M. Kearns and S. P. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49: 209-232, 2002.
- (2002) Machine Learning , vol.49 , pp. 209-232
- Kearns, M.¹ Singh, S.P.²

14
- 14344264466
- Q-cut - Dynamic discovery of sub-goals in reinforcement learning
- I. Menache, S. Mannor, and N. Shimkin. Q-Cut - dynamic discovery of sub-goals in reinforcement learning. In Proceedings of the 13th European Conference on Machine Learning (ECML 2002), pages 187-195, 2002.
- (2002) Proceedings of the 13th European Conference on Machine Learning (ECML 2002) , pp. 187-195
- Menache, I.¹ Mannor, S.² Shimkin, N.³

15
- 0029514510
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- A. W. Moore and C. G. Atkeson. The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21: 199-233, 1995.
- (1995) Machine Learning , vol.21 , pp. 199-233
- Moore, A.W.¹ Atkeson, C.G.²

16
- 0036832953
- Variable resolution discretization in optimal control
- R. Munos and A. W. Moore. Variable resolution discretization in optimal control. Machine Learning, 49: 291-323, 2002.
- (2002) Machine Learning , vol.49 , pp. 291-323
- Munos, R.¹ Moore, A.W.²

17
- 0003998452
- John Wiley & Sons, Inc., New York, NY, USA
- M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

18
- 31844432138
- A theoretical analysis of model-based interval estimation
- A. L. Strehl and M. L. Littman. A theoretical analysis of model-based interval estimation. In Proceedings of the 22nd International Conference on Machine Learning, pages 857-864, 2005.
- (2005) Proceedings of the 22nd International Conference on Machine Learning , pp. 857-864
- Strehl, A.L.¹ Littman, M.L.²

19
- 33749255382
- PAC model-free reinforcement learning
- A. L. Strehl, E. Wiewiora, J. Langford, and M. L. Littman. PAC model-free reinforcement learning. In Proceedings of the 23nd International Conference on Machine Learning, pages 881-888, 2006.
- (2006) Proceedings of the 23nd International Conference on Machine Learning , pp. 881-888
- Strehl, A.L.¹ Wiewiora, E.² Langford, J.³ Littman, M.L.⁴

20
- 85042938295
- Optimistic linear programming gives logarithmic regret for irreducible MDPs
- A. Tewari and P. L. Bartlett. Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Proceedings of Neural Information Processing Systems Conference (NIPS), 2007.
- (2007) Proceedings of Neural Information Processing Systems Conference (NIPS)
- Tewari, A.¹ Bartlett, P.L.²

21
- 0017997986
- Approximations of dynamic programs, I
- W. Whitt. Approximations of dynamic programs, I. Mathematics of Operations Research, 3(3): 231-243, 1978.
- (1978) Mathematics of Operations Research , vol.3 , Issue.3 , pp. 231-243
- Whitt, W.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.