SCOPUS 정보 검색 플랫폼

Journal of Machine Learning Research

Volumn 10, Issue , 2009, Pages 1955-1988

Provably efficient learning with typed parametric models

(5) Brunskill, Emma a Leffler, Bethany R a Li, Hong a Littman, Michael L b Roy, Nicholas b

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

b RUTGERS UNIVERSITY (United States)

Author keywords

Provably efficient learning; Reinforcement learning

Indexed keywords

EFFICIENT LEARNING; MARKOV DECISION PROCESSES; PARAMETRIC MODELS; PROBABLY APPROXIMATELY CORRECT; PROVABLY EFFICIENT LEARNING; REAL WORLD DOMAIN; REAL-WORLD; ROBOT NAVIGATION; SAMPLE COMPLEXITY; SAMPLE COMPLEXITY BOUNDS; SMALL ROBOTS; STATE-SPACE; TRAJECTORY DATA;

EDUCATION; MARKOV PROCESSES; REINFORCEMENT; REINFORCEMENT LEARNING;

LEARNING ALGORITHMS;

EID: 70349416596 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Article

Times cited : (23)

References (34)

1
- 31844444663
- Exploration and apprenticeship learning in reinforcement learning
- Pieter Abbeel and Andrew Y. Ng. Exploration and apprenticeship learning in reinforcement learning. In Proceedings of the 22nd International Conference on Machine Learning (ICML), pages 1-8, 2005.
- (2005) Proceedings of the 22nd International Conference on Machine Learning (ICML) , pp. 1-8
- Abbeel, P.¹ Ng, A.Y.²

2
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- Justin Boyan and Andrew Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems (NIPS) 7, pages 369-376, 1995.
- (1995) Advances in Neural Information Processing Systems (NIPS) , vol.7 , pp. 369-376
- Boyan, J.¹ Moore, A.²

3
- 0041965975
- R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
- Ronen I. Brafman and Moshe Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

4
- 70049084399
- CORL: A continuous-state offset-dynamics reinforcement learner
- Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, and Nicholas Roy. CORL: A continuous-state offset-dynamics reinforcement learner. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI), pages 53-61, 2008.
- (2008) Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI) , pp. 53-61
- Brunskill, E.¹ Leffler, B.R.² Li, H.³ Littman, M.L.⁴ Roy, N.⁵

5
- 0003919624
- Prentice Hall, ISBN 9780201808681.
- Jeffrey B. Burl. Linear Optimal Control. Prentice Hall, 1998. ISBN 9780201808681.
- (1998) Linear Optimal Control
- Burl, J.B.¹

6
- 70349431917
- Using linear programming for Bayesian exploration in Markov decision processes
- Pablo Castro and Doina Precup. Using linear programming for Bayesian exploration in Markov decision processes. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2437-2442, 2007.
- (2007) Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI) , pp. 2437-2442
- Castro, P.¹ Precup, D.²

7
- 38249024662
- The complexity of dynamic programming
- Chee-Seng Chow and John N. Tsitsiklis. The complexity of dynamic programming. Journal of Complexity, 5(4):466-488, 1989.
- (1989) Journal of Complexity , vol.5 , Issue.4 , pp. 466-488
- Chow, C.-S.¹ Tsitsiklis, J.N.²

8
- 0026206780
- An optimal one-way multigrid algorithm for discrete-time stochastic control
- DOI 10.1109/9.133184
- Chee-Seng Chow and John N. Tsitsiklis. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control, 36(8):898-914, 1991. (Pubitemid 21674882)
- (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
- Chow, C.-S.¹ Tsitsiklis, J.N.²

9
- 56449086386
- Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
- Finale Doshi, Joelle Pineau, and Nicholas Roy. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 256-263, 2008.
- (2008) Proceedings of the 25th International Conference on Machine Learning (ICML) , pp. 256-263
- Doshi, F.¹ Pineau, J.² Roy, N.³

10
- 70349417489
- Reinforcement learning benchmarks and bake-offs II
- Workshop
- Alain Dutech, Timothy Edmunds, Jelle Kok, Michail Lagoudakis, Michael L. Littman, Martin Riedmiller, Bryan Russell, Bruno Scherrer, Richard Sutton, Stephan Timmer, Nikos Vlassis, Adam White, and Shimon Whiteson. Reinforcement learning benchmarks and bake-offs II. In Advances in Neural Information Processing Systems (NIPS) 17 Workshop, 2005.
- (2005) Advances in Neural Information Processing Systems (NIPS) , vol.17
- Dutech, A.¹ Edmunds, T.² Kok, J.³ Lagoudakis, M.⁴ Littman, M.L.⁵ Riedmiller, M.⁶ Russell, B.⁷ Scherrer, B.⁸ Sutton, R.⁹ Timmer, S.¹⁰ Vlassis, N.¹¹ White, A.¹² Whiteson, S.¹³

11
- 60149104098
- Cabernet: A WiFi-Based Vehicular Content Delivery Network
- Jakob Eriksson, Hari Balakrishnan, and Samuel Madden. Cabernet: A WiFi-Based Vehicular Content Delivery Network. In Proceedings of the 14th Conference on Mobile Computing and Networking (MOBICOM), pages 199-210, 2008.
- (2008) Proceedings of the 14th Conference on Mobile Computing and Networking (MOBICOM) , pp. 199-210
- Eriksson, J.¹ Balakrishnan, H.² Madden, S.³

12
- 0004236492
- The Johns Hopkins University Press, 3rd edition, ISBN 0-801-85414-8.
- Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd edition, 1996. ISBN 0-801-85414-8.
- Matrix Computations , vol.1996
- Golub, G.H.¹ Van Loan, C.F.²

13
- 0004151494
- Cambridge University Press, ISBN 0-521-38632-2.
- Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1986. ISBN 0-521-38632-2.
- (1986) Matrix Analysis
- Horn, R.A.¹ Johnson, C.R.²

14
- 70350579633
- Prediction, expectation, and surprise: Methods, designs, and study of a deployed traffic forecasting service
- Eric Horvitz, Johnson Apacible, Raman Sarin, and Lin Liao. Prediction, expectation, and surprise: Methods, designs, and study of a deployed traffic forecasting service. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), pages 275-283, 2005.
- (2005) Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI) , pp. 275-283
- Horvitz, E.¹ Apacible, J.² Sarin, R.³ Liao, L.⁴

15
- 34548083538
- Model-based exploration in continuous state spaces
- Nicholas K. Jong and P. Stone. Model-based exploration in continuous state spaces. In Proceedings of the 7th Symposium on Abstraction, Reformulation, and Approximation (SARA), pages 258-272, 2007.
- (2007) Proceedings of the 7th Symposium on Abstraction, Reformulation, and Approximation (SARA) , pp. 258-272
- Jong, N.K.¹ Stone, P.²

16
- 23244466805
- PhD thesis, University College London
- Sham Kakade. On the sample complexity of reinforcement learning. PhD thesis, University College London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.¹

17
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Michael J. Kearns and Satinder P Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
- Kearns, M.J.¹ Singh, S.P.²

18
- 33750293964
- Bandit based Monte-Carlo planning
- Levente Kocsis and Csaba Szepesvári. Bandit based Monte-Carlo planning. In Proceedings of the 17th European Conference on Machine Learning (ECML), pages 282-293, 2006.
- (2006) Proceedings of the 17th European Conference on Machine Learning (ECML) , pp. 282-293
- Kocsis, L.¹ Szepesvári, C.²

19
- 84941465845
- A lower bound for discrimination in terms of variation
- January
- Solomon Kullback. A lower bound for discrimination in terms of variation. IEEE Transactions on Information Theory, 13(1): 126-127, January 1967.
- (1967) IEEE Transactions on Information Theory , vol.13 , Issue.1 , pp. 126-127
- Kullback, S.¹

20
- 33746054938
- Solving factored MDPs with exponential-family transition models
- Branislav Kveton and Milos Hauskrecht. Solving factored MDPs with exponential-family transition models. In Proceedings of the 16th International Conference on Automated Planning and Scheduling (ICAPS), pages 114-120, 2006.
- (2006) Proceedings of the 16th International Conference on Automated Planning and Scheduling (ICAPS) , pp. 114-120
- Kveton, B.¹ Hauskrecht, M.²

21
- 4644323293
- Least-squares policy iteration
- M.G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

22
- 36349026477
- Efficient reinforcement learning with relocatable action models
- Bethany R. Leffler, Michael L. Littman, and Timothy Edmunds. Efficient reinforcement learning with relocatable action models. In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI), pages 572-577, 2007.
- (2007) Proceedings of the 22nd Conference on Artificial Intelligence (AAAI) , pp. 572-577
- Leffler, B.R.¹ Littman, M.L.² Edmunds, T.³

23
- 70349428076
- PhD thesis, Rutgers University, New Brunswick, NJ
- Lihong Li. A Unifying Framework for Computational Reinforcement Learning Theory. PhD thesis, Rutgers University, New Brunswick, NJ, 2009.
- (2009) A Unifying Framework for Computational Reinforcement Learning Theory
- Li, L.¹

24
- 56449122733
- Knows what it knows: A framework for self-aware learning.
- Lihong Li, Michael L. Littman, and Thomas J. Walsh. Knows what it knows: A framework for self-aware learning. In Proceedings of the 25th International Conference on Machine Learning (ICML), pages 568-575, 2008.
- (2008) Proceedings of the 25th International Conference on Machine Learning (ICML) , pp. 568-575
- Li, H.¹ Littman, M.L.² Walsh, T.J.³

25
- 57749176370
- Towards faster planning with continuous resources in stochastic domains
- Janusz Marecki and Milind Tambe. Towards faster planning with continuous resources in stochastic domains. In Proceedings of the 23rd Conference on Artificial Intelligence (AAAI), pages 1049-1055, 2008.
- (2008) Proceedings of the 23rd Conference on Artificial Intelligence (AAAI) , pp. 1049-1055
- Marecki, J.¹ Tambe, M.²

26
- 84898980684
- Autonomous helicopter flight via reinforcement learning
- Andrew Ng, H.Jin Kim, Michael Jordan, and Shankar Sastry. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) 16, pages 799-806, 2004.
- (2004) Advances in Neural Information Processing Systems (NIPS) , vol.16 , pp. 799-806
- Ng, A.¹ Kim, H.J.² Jordan, M.³ Sastry, S.⁴

27
- 33749251297
- An analytic solution to discrete Bayesian reinforcement learning
- Pascal Poupart, Nikos Vlassis, Jesse Hoey, and Kevin Regan. An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 697-704, 2006.
- (2006) Proceedings of the 23rd International Conference on Machine Learning (ICML) , pp. 697-704
- Poupart, P.¹ Vlassis, N.² Hoey, J.³ Regan, K.⁴

28
- 51649091499
- Bayesian reinforcement learning in continuous POMDPs with application to robot navigation
- Stephane Ross, Brahim Chaib-draa, and Joelle Pineau. Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In Proceedings of the International Conference on Robotics and Automation (ICRA), pages 2845-2851, 2008.
- (2008) Proceedings of the International Conference on Robotics and Automation (ICRA) , pp. 2845-2851
- Ross, S.¹ Chaib-draa, B.² Pineau, J.³

29
- 85162058047
- Online linear regression and its application to modelbased reinforcement learning
- Alexander L. Střehl and Michael L. Littman. Online linear regression and its application to modelbased reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) 20, pages 1417-1424, 2008.
- (2008) Advances in Neural Information Processing Systems (NIPS) , vol.20 , pp. 1417-1424
- Střehl, A.L.¹ Littman, M.L.²

30
- 34548745051
- Incremental model-based learners with formal learning-time guarantees
- Alexander L. Strehl, Lihong Li, and Michael L. Littman. Incremental model-based learners with formal learning-time guarantees. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI), pages 485-492, 2006.
- (2006) Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI) , pp. 485-492
- Strehl, A.L.¹ Li, H.² Littman, M.L.³

31
- 0004102479
- MIT Press, Cambridge, MA
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction.
- Sutton, R.S.¹ Barto, A.G.²

32
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves master-level play
- Gerald J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
- (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
- Tesauro, G.J.¹

33
- 0000011340
- Some matrix-inequalities and metrization of matrix-space
- John von Neumann. Some matrix-inequalities and metrization of matrix-space. Tomsk University Review, 1:286-300, 1937.
- (1937) Tomsk University Review , vol.1 , pp. 286-300
- Von Neumann, J.¹

34
- 0004049893
- PhD thesis, King's College, University of Cambridge, United Kingdom
- Christopher J.C.H. Watkins. Learning from delayed rewards. PhD thesis, King's College, University of Cambridge, United Kingdom, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.