SCOPUS 정보 검색 플랫폼

Foundations and Trends in Machine Learning

Volumn 6, Issue 4, 2013, Pages 375-454

A tutorial on linear function approximators for dynamic programming and reinforcement learning

(6) Geramifard, Alborz a Walsh, Thomas J a Tellex, Stefanie b Chowdhary, Girish a Roy, Nicholas b How, Jonathan P a

a MIT LIDS (United States)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATE DYNAMIC PROGRAMMING; COMPLEXITY ANALYSIS; DECISION-MAKING PROBLEM; DYNAMIC PROGRAMMING METHODS; EMPIRICAL EVALUATIONS; MARKOV DECISION PROCESSES; REINFORCEMENT LEARNING METHOD; UNIFIED FRAMEWORK;

DYNAMIC PROGRAMMING; LEARNING ALGORITHMS; LEAST SQUARES APPROXIMATIONS; MARKOV PROCESSES; REINFORCEMENT LEARNING;

ITERATIVE METHODS;

EID: 84890920160 PISSN: 19358237 EISSN: 19358245 Source Type: Journal
DOI: 10.1561/2200000042 Document Type: Review

Times cited : (102)

References (107)

1
- 84890921683
- Accessed: 20/08/2012
- RL competition. http://www.rl-competition.org/, 2012. Accessed: 20/08/2012.
- (2012) RL Competition

2
- 84890955680
- IFSA: Incremental feature-set augmentation for reinforcement learning tasks
- New York, NY, USA
- M. Ahmadi, M. E. Taylor, and P. Stone. IFSA: incremental feature-set augmentation for reorcement learning tasks. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1-8, New York, NY, USA, 2007.
- (2007) International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 1-8
- Ahmadi, M.¹ Taylor, M.E.² Stone, P.³

3
- 74049127928
- Fitted Q-iteration in continuous action-space MDPs
- ACM. .
- ACM. . A. Antos, R. Munos, and C. Szepesvári. Fitted Q-iteration in continuous action-space MDPs. In Proceedings of Neural Information Processing Systems Conference (NIPS), 2007.
- (2007) Proceedings of Neural Information Processing Systems Conference (NIPS)
- Antos, A.¹ Munos, R.² Szepesvári, C.³

4
- 40849145988
- Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path
- A. Antos, C. Szepesvári, and R. Munos. Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1):89-129, 2008.
- (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
- Antos, A.¹ Szepesvári, C.² Munos, R.³

5
- 78649507911
- A bayesian sampling approach to exploration in reinforcement learning
- Arlington, Virginia, United States
- J. Asmuth, L. Li, M. Littman, A. Nouri, and D. Wingate. A Bayesian sampling approach to exploration in reinforcement learning. In International Conference on Uncertainty in Artificial Intelligence (UAI), pages 19-26, Arlington, Virginia, United States, 2009.
- (2009) International Conference on Uncertainty in Artificial Intelligence (UAI) , pp. 19-26
- Asmuth, J.¹ Li, L.² Littman, M.³ Nouri, A.⁴ Wingate, D.⁵

6
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- AUAI Press
- AUAI Press. L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In ICML, pages 30-37, 1995.
- (1995) ICML , pp. 30-37
- Baird, L.C.¹

7
- 38149008840
- Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
- A. d. M. S. Barreto and C. W. Anderson. Restricted gradient-descent algorithm for value-function approximation in reinforcement learning. Artificial Intelligence, 172:454 - 482, 2008.
- (2008) Artificial Intelligence , vol.172 , pp. 454-482
- Barreto, A.D.M.S.¹ Anderson, C.W.²

8
- 2442603180
- Monte carlo matrix inversion and reinforcement learning
- Morgan Kaufmann
- A. Barto and M. Duff. Monte carlo matrix inversion and reinforcement learning. In Neural Information Processing Systems (NIPS), pages 687-694. Morgan Kaufmann, 1994.
- (1994) Neural Information Processing Systems (NIPS) , pp. 687-694
- Barto, A.¹ Duff, M.²

9
- 0029210635
- Learning to act using real-time dynamic programming
- A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.
- (1995) Artificial Intelligence , vol.72 , pp. 81-138
- Barto, A.¹ Bradtke, S.² Singh, S.³

10
- 0033685422
- Direct gradient-based reinforcement learning
- J. Baxter and P. Bartlett. Direct gradient-based reinforcement learning. In Circuits and Systems, 2000.
- (2000) Circuits and Systems
- Baxter, J.¹ Bartlett, P.²

11
- 85117867460
- Proceedings. ISCAS 2000 Geneva IEEE
- Proceedings. ISCAS 2000 Geneva. The 2000 IEEE International Symposium on, volume 3, pages 271-274. IEEE, 2000.
- (2000) The 2000 IEEE International Symposium On, Volume 3 , pp. 271-274

12
- 0003787146
- Princeton University Press,Princeton, NJ
- R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
- (1957) Dynamic Programming
- Bellman, R.E.¹

13
- 0003487482
- Athena Scientific, Belmont, MA
- D. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

14
- 84870429498
- AP
- D. P. Bertsekas. Dynamic Programming and Optimal Control. AP, 1976.
- (1976) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

15
- 0003487482
- (Optimization and Neural Computation Series, 3) Athena Scientific, May
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientific, May 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

16
- 70449631193
- In American Control Conference (ACC), St. Louis, MO
- B. Bethke and J. How. Approximate Dynamic Programming Using Bellman Residual Elimination and Gaussian Process Regression. In American Control Conference (ACC), St. Louis, MO, 2009.
- (2009) Approximate Dynamic Programming Using Bellman Residual Elimination and Gaussian Process Regression
- Bethke, B.¹ How, J.²

17
- 70349982705
- Incremental natural actorcritic algorithms
- In J.C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors MIT Press
- S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actorcritic algorithms. In J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, editors, Advances in Neural Information Processing Systems (NIPS), pages 105-112. MIT Press, 2007.
- (2007) Advances in Neural Information Processing Systems (NIPS) , pp. 105-112
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

18
- 84899910885
- Sigma point policy iteration
- Richland, SC
- M. Bowling, A. Geramifard, and D. Wingate. Sigma Point Policy Iteration. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), volume 1, pages 379-386, Richland, SC, 2008.
- (2008) International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), Volume 1 , pp. 379-386
- Bowling, M.¹ Geramifard, A.² Wingate, D.³

19
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- In G. Tesauro, D. Touretzky, and T. Lee, editors Cambridge, MA
- J. Boyan and A. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. Touretzky, and T. Lee, editors, Neural Information Processing Systems (NIPS), pages 369-376, Cambridge, MA, 1995.
- (1995) Neural Information Processing Systems (NIPS) , pp. 369-376
- Boyan, J.¹ Moore, A.²

20
- 0038595396
- Least-squares temporal difference learning
- The MIT Press. Morgan Kaufmann, San Francisco, CA
- The MIT Press. J. A. Boyan. Least-squares temporal difference learning. In International Conference on Machine Learning (ICML), pages 49-56. Morgan Kaufmann, San Francisco, CA, 1999.
- (1999) International Conference on Machine Learning (ICML) , pp. 49-56
- Boyan, J.A.¹

21
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Journal of Machine Learning Research (JMLR), 22:33-57, 1996. (Pubitemid 126724362)
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
- Bradtke, S.J.¹

22
- 0041965975
- Journal of Machine Learning Research (JMLR)
- R. Brafman and M. Tennenholtz. R-Max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning. Journal of Machine Learning Research (JMLR), 3:213-231, 2002.
- (2002) R-Max - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , vol.3 , pp. 213-231
- Brafman, R.¹ Tennenholtz, M.²

23
- 85046476577
- CRC Press, Boca Raton, Florida
- L. Bu̧soniu, R. Babuška, B. De Schutter, and D. Ernst. Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, Boca Raton, Florida, 2010.
- Reinforcement Learning and Dynamic Programming Using Function Approximators , pp. 2010
- Bu̧soniu, L.¹ Babuška, R.² De Schutter, B.³ Ernst, D.⁴

24
- 0002278788
- Hierarchical reinforcement learning with the MAXQ value function decomposition
- T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence and Research (JAIR), 13(1):227- 303, Nov. 2000. (Pubitemid 33682087)
- (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
- Dietterich, T.G.¹

25
- 70349417489
- Reinforcement learning benchmarks and bake-offs II
- A. Dutech, T. Edmunds, J. Kok, M. Lagoudakis, M. Littman, M. Riedmiller, B. Russell, B. Scherrer, R. Sutton, S. Timmer, et al. Reinforcement learning benchmarks and bake-offs II. In Advances in Neural Information Processing Systems (NIPS) 17 Workshop, 2005.
- (2005) Advances in Neural Information Processing Systems (NIPS) 17 Workshop
- Dutech, A.¹ Edmunds, T.² Kok, J.³ Lagoudakis, M.⁴ Littman, M.⁵ Riedmiller, M.⁶ Russell, B.⁷ Scherrer, B.⁸ Sutton, R.⁹ Timmer, S.¹⁰

26
- 1942421151
- Bayes meets bellman: The gaussian process approach to temporal difference learning
- Y. Engel, S. Mannor, and R. Meir. Bayes meets bellman: The gaussian process approach to temporal difference learning. In International Conference on Machine Learning (ICML), pages 154-161, 2003.
- (2003) International Conference on Machine Learning (ICML) , pp. 154-161
- Engel, Y.¹ Mannor, S.² Meir, R.³

27
- 84890899728
- PhD thesis, Department of Computing Science, University of Alberta
- A. Farahmand. Regularities in Sequential Decision-Making Problems. PhD thesis, Department of Computing Science, University of Alberta, 2009.
- (2009) Regularities in Sequential Decision-Making Problems
- Farahmand, A.¹

28
- 70049096468
- Regularized policy iteration
- In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors MIT Press
- A. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor. Regularized policy iteration. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 441-448. MIT Press, 2008.
- (2008) Advances in Neural Information Processing Systems (NIPS) , pp. 441-448
- Farahmand, A.¹ Ghavamzadeh, M.² Szepesvári, C.³ Mannor, S.⁴

29
- 31144477417
- Risk-sensitive reinforcement learning applied to chance constrained control
- P. Geibel and F. Wysotzki. Risk-sensitive reinforcement learning applied to chance constrained control. Journal of Artificial Intelligence and Research (JAIR), 24, 2005.
- (2005) Journal of Artificial Intelligence and Research (JAIR) , vol.24
- Geibel, P.¹ Wysotzki, F.²

30
- 80053456360
- Online discovery of feature dependencies
- In L. Getoor and T. Scheffer, editors ACM, June
- A. Geramifard, F. Doshi, J. Redding, N. Roy, and J. How. Online discovery of feature dependencies. In L. Getoor and T. Scheffer, editors, International Conference on Machine Learning (ICML), pages 881-888. ACM, June 2011.
- (2011) International Conference on Machine Learning (ICML) , pp. 881-888
- Geramifard, A.¹ Doshi, F.² Redding, J.³ Roy, N.⁴ How, J.⁵

31
- 84869387619
- Model estimation within planning and learning
- June
- A. Geramifard, J. Redding, J. Joseph, N. Roy, and J. P. How. Model estimation within planning and learning. In American Control Conference (ACC), June 2012.
- (2012) American Control Conference (ACC)
- Geramifard, A.¹ Redding, J.² Joseph, J.³ Roy, N.⁴ How, J.P.⁵

32
- 84888175926
- April
- A. Geramifard, R. H. Klein, and J. P. How. RLPy: The Reinforcement Learning Library for Education and Research. http://acl.mit.edu/RLPy, April 2013a.
- (2013) RLPy: The Reinforcement Learning Library for Education and Research
- Geramifard, A.¹ Klein, R.H.² How, P.J.³

33
- 84906570323
- Batch iFDD: A scalable matching pursuit algorithm for solving MDPs
- Bellevue, Washington, USA AUAI Press
- Geramifard, T. J. Walsh, N. Roy, and J. How. Batch iFDD: A Scalable Matching Pursuit Algorithm for Solving MDPs. In Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI), Bellevue, Washington, USA, 2013b. AUAI Press.
- (2013) Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (UAI)
- Geramifard¹ Walsh, T.J.² Roy, N.³ How, J.⁴

34
- 70350172883
- Feature discovery in reinforcement learning using genetic programming
- INRIA
- S. Girgin and P. Preux. Feature Discovery in Reinforcement Learning using Genetic Programming. Research Report RR-6358, INRIA, 2007.
- (2007) Research Report RR-6358
- Girgin, S.¹ Preux, P.²

35
- 0004236492
- The John Hopkins University Press
- G. H. Golub and C. F. V. Loan. Matrix Computations. The John Hopkins University Press, 1996.
- (1996) Matrix Computations
- Golub, G.H.¹ Loan, C.F.V.²

36
- 84880694195
- Stable function approximation in dynamic programming
- Tahoe City, California, July 9-12
- G. Gordon. Stable function approximation in dynamic programming. In International Conference on Machine Learning (ICML), page 261, Tahoe City, California, July 9-12 1995.
- (1995) International Conference on Machine Learning (ICML) , pp. 261
- Gordon, G.¹

37
- 67649964731
- Reinforcement learning: A tutorial survey and recent advances
- April
- Morgan Kaufmann. A. Gosavi. Reinforcement learning: A tutorial survey and recent advances. INFORMS J. on Computing, 21(2):178-192, April 2009.
- (2009) INFORMS J. on Computing , vol.21 , Issue.2 , pp. 178-192
- Kaufmann, M.¹ Gosavi, A.²

38
- 57749096203
- Adaptive importance sampling with automatic model selection in value function approximation
- H. Hachiya, T. Akiyama, M. Sugiyama, and J. Peters. Adaptive importance sampling with automatic model selection in value function approximation. In Association for the Advancement of Artificial Intelligence (AAAI), pages 1351-1356, 2008.
- (2008) Association for the Advancement of Artificial Intelligence (AAAI) , pp. 1351-1356
- Hachiya, H.¹ Akiyama, T.² Sugiyama, M.³ Peters, J.⁴

39
- 0003413187
- Macmillan, New York
- S. Haykin. Neural Networks: A Comprehensive Foundation. Macmillan, New York, 1994.
- (1994) Neural Networks: A Comprehensive Foundation
- Haykin, S.¹

40
- 0003644124
- MIT Press, Cambridge, Massachusetts
- R. A. Howard. Dynamic Programming and Markov Processes. MIT Press, Cambridge, Massachusetts, 1960.
- (1960) Dynamic Programming and Markov Processes
- Howard, R.A.¹

41
- 4243385070
- On the convergence of stochastic iterative dynamic programming algorithms
- Cambridge, MA, August
- T. Jaakkola, M. Jordan, and S. Singh. on the convergence of stochastic iterative dynamic programming algorithms. Technical report, Massachusetts Institute of Technology, Cambridge, MA, August 1993.
- (1993) Technical Report, Massachusetts Institute of Technology
- Jaakkola, T.¹ Jordan, M.² Singh, S.³

42
- 77951952841
- Near-optimal regret bounds for reinforcement learning
- T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research (JMLR), 11:1563-1600, 2010.
- (2010) Journal of Machine Learning Research (JMLR) , vol.11 , pp. 1563-1600
- Jaksch, T.¹ Ortner, R.² Auer, P.³

43
- 84890935117
- PhD thesis, University of Amsterdam
- W. Josemans. Generalization in Reinforcement Learning. PhD thesis, University of Amsterdam, 2009.
- (2009) Generalization in Reinforcement Learning.
- Josemans, W.¹

44
- 84890925143
- Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration
- September
- T. Jung and P. Stone. Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration. In European Conference on Machine Learning (ECML), September 2010.
- (2010) European Conference on Machine Learning (ECML)
- Jung, T.¹ Stone, P.²

45
- 0029679044
- Reinforcement learning: A survey
- L. P. Kaelbling, M. L. Littman, and A.W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence and Research (JAIR), 4:237-285, 1996. (Pubitemid 126646155)
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

46
- 79958852534
- Characterizing reinforcement learning methods through parameterized learning problems
- S. Kalyanakrishnan and P. Stone. Characterizing reinforcement learning methods through parameterized learning problems. Machine Learning, 2011.
- (2011) Machine Learning
- Kalyanakrishnan, S.¹ Stone, P.²

47
- 71149121683
- Regularization and feature selection in least-squares temporal difference learning
- New York, NY, USA
- J. Z. Kolter and A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In International Conference on Machine Learning (ICML), pages 521-528, New York, NY, USA, 2009.
- (2009) International Conference on Machine Learning (ICML) , pp. 521-528
- Kolter, J.Z.¹ Ng, A.Y.²

48
- 0030721089
- Comparison of cmacs and radial basis functions for local function approximators in reinforcement learning
- ACM
- ACM. R. Kretchmar and C. Anderson. Comparison of cmacs and radial basis functions for local function approximators in reinforcement learning. In International Conference on Neural Networks, volume 2, pages 834-837 vol.2, 1997.
- (1997) International Conference on Neural Networks, Volume 2 , vol.2 , pp. 834-837
- Kretchmar, R.¹ Anderson, C.²

49
- 85162488584
- A non-parametric approach to dynamic programming
- O. Kroemer and J. Peters. A non-parametric approach to dynamic programming. In Advances in Neural Information Processing Systems (NIPS), pages 1719-1727, 2011.
- (2011) Advances in Neural Information Processing Systems (NIPS) , pp. 1719-1727
- Kroemer, O.¹ Peters, J.²

50
- 0037631834
- Model-based reinforcement learning with an approximate, learned model
- L. Kuvayev and R. Sutton. Model-based reinforcement learning with an approximate, learned model. In Proceeding of the ninth Yale workshop on adaptive and learning systems, pages 101-105, 1996.
- (1996) Proceeding of the Ninth Yale Workshop on Adaptive and Learning Systems , pp. 101-105
- Kuvayev, L.¹ Sutton, R.²

51
- 4644323293
- Least-squares policy iteration
- M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research (JMLR), 4:1107-1149, 2003.
- (2003) Journal of Machine Learning Research (JMLR) , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

52
- 84890939404
- Sample complexity bounds of exploration
- In M. Wiering and M. van Otterlo, editors Springer Verlag
- L. Li. Sample complexity bounds of exploration. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art. Springer Verlag, 2012.
- (2012) Reinforcement Learning: State of the Art
- Li, L.¹

53
- 84899834143
- Online exploration in least-squares policy iteration
- Richland, SC
- L. Li, M. L. Littman, and C. R. Mansley. Online exploration in least-squares policy iteration. In International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 733-739, Richland, SC, 2009a.
- (2009) International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 733-739
- Li, L.¹ Littman, M.L.² Mansley, C.R.³

54
- 84890947216
- International Foundation for Autonomous Agents and Multiagent Systems
- International Foundation for Autonomous Agents and Multiagent Systems. L. Li, J. D. Williams, and S. Balakrishnan. Reinforcement learning for dialog management using least-squares policy iteration and fast feature selection. In New York Academy of Sciences Symposium on Machine Learning, 2009b.
- (2009) Reinforcement Learning for Dialog Management Using Least-Squares Policy Iteration and Fast Feature Selection -Rftxt in New York Academy of Sciences Symposium on Machine Learning
- Li, L.¹ Williams, J.D.² Balakrishnan, S.³

55
- 39649089144
- The kernel least-mean-square algorithm
- DOI 10.1109/TSP.2007.907881
- W. Liu, P. Pokharel, and J. Principe. The kernel least-mean-square algorithm. IEEE Transactions on Signal Processing, 56(2):543-554, 2008. (Pubitemid 351285052)
- (2008) IEEE Transactions on Signal Processing , vol.56 , Issue.2 , pp. 543-554
- Liu, W.¹ Pokharel, P.P.² Principe, J.C.³

56
- 84888754031
- Wiley, Hoboken, New Jersey
- W. Liu, J. C. Principe, and S. Haykin. Kernel Adaptive Filtering: A Comprehensive Introduction. Wiley, Hoboken, New Jersey, 2010.
- (2010) Kernel Adaptive Filtering: A Comprehensive Introduction
- Liu, W.¹ Principe, J.C.² Haykin, S.³

57
- 77954101982
- GQ(-): A general gradient algorithm for temporaldifference prediction learning with eligibility traces
- Lugano, Switzerland
- H. R. Maei and R. S. Sutton. GQ(-): A general gradient algorithm for temporaldifference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence (AGI), Lugano, Switzerland, 2010.
- (2010) Proceedings of the Third Conference on Artificial General Intelligence (AGI)
- Maei, H.R.¹ Sutton, R.S.²

58
- 77956541799
- Toward off-policy learning control with function approximation
- In J. Fürnkranz and T. Joachims, editors Omnipress
- H. R. Maei, C. Szepesvári, S. Bhatnagar, and R. S. Sutton. Toward off-policy learning control with function approximation. In J. Fürnkranz and T. Joachims, editors, International Conference on Machine Learning (ICML), pages 719-726. Omnipress, 2010.
- (2010) International Conference on Machine Learning (ICML) , pp. 719-726
- Maei, H.R.¹ Szepesvári, C.² Bhatnagar, S.³ Sutton, R.S.⁴

59
- 34547966269
- Representation policy iteration
- S. Mahadevan. Representation policy iteration. International Conference on Uncertainty in Artificial Intelligence (UAI), 2005.
- (2005) International Conference on Uncertainty in Artificial Intelligence (UAI)
- Mahadevan, S.¹

60
- 84863855328
- Planning with markov decision processes: An ai perspective
- Morgan & Claypool Publishers
- Mausam and A. Kolobov. Planning with Markov Decision Processes: An AI Perspective. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2012.
- (2012) Synthesis Lectures on Artificial Intelligence and Machine Learning
- Mausam¹ Kolobov, A.²

61
- 56449091120
- An analysis of reinforcement learning with function approximation
- F. S. Melo, S. P. Meyn, and M. I. Ribeiro. An analysis of reinforcement learning with function approximation. In International Conference on Machine Learning (ICML), pages 664-671, 2008.
- (2008) International Conference on Machine Learning (ICML) , pp. 664-671
- Melo, F.S.¹ Meyn, S.P.² Ribeiro, M.I.³

62
- 0036832952
- Risk-sensitive reinforcement learning
- DOI 10.1023/A:1017940631555
- O. Mihatsch and R. Neuneier. Risk-sensitive reinforcement learning. Journal of Machine Learning Research (JMLR), 49(2-3):267-290, 2002. (Pubitemid 34325690)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 267-290
- Mihatsch, O.¹ Neuneier, R.²

63
- 0000672424
- Fast learning in networks of locally-tuned processing units
- June
- J. Moody and C. J. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281-294, June 1989.
- (1989) Neural Computation , vol.1 , Issue.2 , pp. 281-294
- Moody, J.¹ Darken, C.J.²

64
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less time
- A.W. Moore and C. G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. In Machine Learning, pages 103-130, 1993.
- (1993) Machine Learning , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

65
- 84858776393
- Multi-resolution exploration in continuous spaces
- In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors MIT Press
- A. Nouri and M. L. Littman. Multi-resolution exploration in continuous spaces. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 1209-1216. MIT Press, 2009.
- (2009) Advances in Neural Information Processing Systems (NIPS) , pp. 1209-1216
- Nouri, A.¹ Littman, M.L.²

66
- 34547982545
- Analyzing feature generation for value-function approximation
- New York, NY, USA
- R. Parr, C. Painter-Wakefield, L. Li, and M. Littman. Analyzing feature generation for value-function approximation. In International Conference on Machine Learning (ICML), pages 737-744, New York, NY, USA, 2007.
- (2007) International Conference on Machine Learning (ICML) , pp. 737-744
- Parr, R.¹ Painter-Wakefield, C.² Li, L.³ Littman, M.⁴

67
- 56449092660
- An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
- ACM New York, NY, USA
- ACM. R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M. L. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In International Conference on Machine Learning (ICML), pages 752-759, New York, NY, USA, 2008.
- (2008) International Conference on Machine Learning (ICML) , pp. 752-759
- Parr, R.¹ Li, L.² Taylor, G.³ Painter-Wakefield, C.⁴ Littman, M.L.⁵

68
- 34250635407
- Policy gradient methods for robotics
- DOI 10.1109/IROS.2006.282564, 4058714, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
- ACM. J. Peters and S. Schaal. Policy gradient methods for robotics. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2219-2225. IEEE, October 2006. (Pubitemid 46928224)
- (2006) IEEE International Conference on Intelligent Robots and Systems , pp. 2219-2225
- Peters, J.¹ Schaal, S.²

69
- 40649106649
- Natural actor-critic
- March
- J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71:1180-1190, March 2008.
- (2008) Neurocomputing , vol.71 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

70
- 77956538796
- Feature selection using regularization in approximate linear programs for Markov decision processes
- M. Petrik, G. Taylor, R. Parr, and S. Zilberstein. Feature selection using regularization in approximate linear programs for Markov decision processes. In International Conference on Machine Learning (ICML), 2010.
- (2010) International Conference on Machine Learning (ICML)
- Petrik, M.¹ Taylor, G.² Parr, R.³ Zilberstein, S.⁴

71
- 85102627959
- Wiley
- M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.¹

72
- 25444448065
- MIT Press, Cambridge, MA
- C. Rasmussen and C. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.
- (2006) Gaussian Processes for Machine Learning
- Rasmussen, C.¹ Williams, C.²

73
- 22944448066
- Sparse Distributed Memories for on-line value-based reinforcement learning
- Machine Learning: ECML 2004 - 15th European Conference on Machine Learning
- B. Ratitch and D. Precup. Sparse distributed memories for on-line value-based reinforcement learning. In European Conference on Machine Learning (ECML), pages 347-358, 2004. (Pubitemid 41050104)
- (2004) Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science) , vol.3201 , pp. 347-358
- Ratitch, B.¹ Precup, D.²

74
- 34548763245
- Evaluation of policy gradient methods and variants on the cart-pole benchmark
- DOI 10.1109/ADPRL.2007.368196, 4220841, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
- M. Riedmiller, J. Peters, and S. Schaal. Evaluation of policy gradient methods and variants on the Cart-Pole benchmark. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), pages 254-261, April 2007. (Pubitemid 47431393)
- (2007) Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 , pp. 254-261
- Riedmiller, M.¹ Peters, J.² Schaal, S.³

75
- 0003636089
- Cambridge University Engineering Department
- G. A. Rummery and M. Niranjan. Online Q-learning using connectionist systems (tech. rep. no. cued/f-infeng/tr 166). Cambridge University Engineering Department, 1994.
- (1994) Online Q-learning Using Connectionist Systems (Tech. Rep. No. Cued/f-infeng/tr 166)
- Rummery, G.A.¹ Niranjan, M.²

76
- 84890923571
- Accessed: 26/09/2012
- S. Sanner. International Probabilistic Planning Competition (IPPC) at International Joint Conference on Artificial Intelligence (IJCAI). http://users.cecs. anu.edu.au/~ssanner/IPPC-2011/, 2011. Accessed: 26/09/2012.
- (2011) International Probabilistic Planning Competition (IPPC) at International Joint Conference on Artificial Intelligence (IJCAI)
- Sanner, S.¹

77
- 77956551905
- Should one compute the temporal difference fix point or minimize the bellman residual the unified oblique projection view
- B. Scherrer. Should one compute the temporal difference fix point or minimize the bellman residual the unified oblique projection view. In International Conference on Machine Learning (ICML), 2010.
- (2010) International Conference on Machine Learning (ICML)
- Scherrer, B.¹

78
- 0347243182
- Nonlinear Component Analysis as a Kernel Eigenvalue Problem
- B. Schölkopf and A. Smola. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computations, 10(5):1299-1319, 1998. (Pubitemid 128463674)
- (1998) Neural Computation , vol.10 , Issue.5 , pp. 1299-1319
- Scholkopf, B.¹ Smola, A.² Muller, K.-R.³

79
- 0003408420
- MIT Press, Cambridge, MA
- B. Schölkopf and A. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2002.
- (2002) Learning with Kernels: Support Vector Machines, Regularization, Optimization, and beyond
- Schölkopf, B.¹ Smola, A.²

80
- 0000273218
- Generalized polynomial approximation in Markovian decision processes
- P. Schweitzer and A. Seidman. Generalized polynomial approximation in Markovian decision processes. Journal of mathematical analysis and applications, 110:568- 582, 1985.
- (1985) Journal of Mathematical Analysis and Applications , vol.110 , pp. 568-582
- Schweitzer, P.¹ Seidman, A.²

81
- 56449110907
- Sample-based learning and search with permanent and transient memories
- New York, NY, USA
- D. Silver, R. S. Sutton, and M. Müller. Sample-based learning and search with permanent and transient memories. In International Conference on Machine Learning (ICML), pages 968-975, New York, NY, USA, 2008.
- (2008) International Conference on Machine Learning (ICML) , pp. 968-975
- Silver, D.¹ Sutton, R.S.² Müller, M.³

82
- 84863416482
- Temporal-difference search in computer go
- ACM
- ACM. D. Silver, R. S. Sutton, and M. Müller. Temporal-difference search in computer go. Machine Learning, 87(2):183-219, 2012.
- (2012) Machine Learning , vol.87 , Issue.2 , pp. 183-219
- Silver, D.¹ Sutton, R.S.² Müller, M.³

83
- 0026962175
- Reinforcement learning with a hierarchy of abstract models
- MIT/AAAI Press
- S. P. Singh. Reinforcement learning with a hierarchy of abstract models. In Proceeding of the Tenth National Conference on Artificial Intelligence, pages 202-207. MIT/AAAI Press, 1992.
- (1992) Proceeding of the Tenth National Conference on Artificial Intelligence , pp. 202-207
- Singh, S.P.¹

84
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- DOI 10.1023/A:1007678930559
- S. P. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári. Convergence results for single-step on-policy reinforcement-learning algorithms. Journal of Machine Learning Research (JMLR), 38:287-308, 2000. (Pubitemid 30572449)
- (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
- Singh, S.¹ Jaakkola, T.² Littman, M.L.³ Szepesvari, C.⁴

85
- 27544506565
- Reinforcement learning for RoboCupsoccer keepaway
- P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCupsoccer keepaway. International Society for Adaptive Behavior, 13(3):165-188, 2005a.
- (2005) International Society for Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
- Stone, P.¹ Sutton, R.S.² Kuhlmann, G.³

86
- 27544506565
- Reinforcement learning for RoboCup soccer keepaway
- DOI 10.1177/105971230501300301
- P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup soccer keepaway. Adaptive Behavior, 13(3):165-188, September 2005b. (Pubitemid 41546119)
- (2005) Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
- Stone, P.¹ Sutton, R.S.² Kuhlmann, G.³

87
- 73549084301
- Reinforcement learning in finite mdps: Pac analysis
- Dec.
- A. L. Strehl, L. Li, and M. L. Littman. Reinforcement learning in finite mdps: Pac analysis. Journal of Machine Learning Research (JMLR), 10:2413-2444, Dec. 2009.
- (2009) Journal of Machine Learning Research (JMLR) , vol.10 , pp. 2413-2444
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

88
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- The MIT Press
- R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Neural Information Processing Systems (NIPS), pages 1038-1044.The MIT Press, 1996.
- (1996) Neural Information Processing Systems (NIPS) , pp. 1038-1044
- Sutton, R.S.¹

89
- 0004102479
- MIT Press
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

90
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems (NIPS), 12(22):1057-1063, 2000.
- (2000) Advances in Neural Information Processing Systems (NIPS) , vol.12 , Issue.22 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

91
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- New York, NY, USA
- R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E.Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In International Conference on Machine Learning (ICML), pages 993-1000, New York, NY, USA, 2009.
- (2009) International Conference on Machine Learning (ICML) , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

92
- 77955790905
- Algorithms for reinforcement learning
- ACM. Morgan & Claypool Publishers
- ACM. C. Szepesvári. Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2010.
- (2010) Synthesis Lectures on Artificial Intelligence and Machine Learning
- Szepesvári, C.¹

93
- 77956520676
- Model-based reinforcement learning with nearly tight exploration complexity bounds
- I. Szita and C. Szepesvári. Model-based reinforcement learning with nearly tight exploration complexity bounds. In International Conference on Machine Learning (ICML), pages 1031-1038, 2010.
- (2010) International Conference on Machine Learning (ICML) , pp. 1031-1038
- Szita, I.¹ Szepesvári, C.²

94
- 71149100225
- Kernelized value function approximation for reinforcement learning
- New York, NY, USA
- G. Taylor and R. Parr. Kernelized value function approximation for reinforcement learning. In International Conference on Machine Learning (ICML), pages 1017- 1024, New York, NY, USA, 2009.
- (2009) International Conference on Machine Learning (ICML) , pp. 1017-1024
- Taylor, G.¹ Parr, R.²

95
- 0031143730
- An analysis of temporal-difference learning with function approximation
- PII S0018928697034375
- ACM. J. N. Tsitsiklis and B. V. Roy. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674- 690, May 1997. (Pubitemid 127760263)
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

96
- 0033221519
- Average cost temporal-difference learning
- DOI 10.1016/S0005-1098(99)00099-0
- J. N. Tsitsiklis and B. V. Roy. Average cost temporal-difference learning. Automatica, 35(11):1799 - 1808, 1999. (Pubitemid 32078092)
- (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

97
- 84880581275
- Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery
- N. K. Ure, A. Geramifard, G. Chowdhary, and J. P. How. Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery. In European Conference on Machine Learning (ECML), 2012.
- (2012) European Conference on Machine Learning (ECML)
- Ure, N.K.¹ Geramifard, A.² Chowdhary, G.³ How, P.J.⁴

98
- 0003522149
- PhD thesis, Cambridge Univ.
- C. J. Watkins. Models of Delayed Reinforcement Learning. PhD thesis, Cambridge Univ., 1989.
- (1989) Models of Delayed Reinforcement Learning
- Watkins, C.J.¹

99
- 34249833101
- Q-learning
- C. J. Watkins. Q-learning. Machine Learning, 8(3):279-292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
- Watkins, C.J.¹

100
- 34249833101
- Q-learning
- May
- C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3):279-292, May 1992.
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

101
- 85158158334
- A complexity analysis of cooperative mechanisms in reinforcement learning
- S. D. Whitehead. A complexity analysis of cooperative mechanisms in reinforcement learning. In Association for the Advancement of Artificial Intelligence (AAAI), pages 607-613, 1991.
- (1991) Association for the Advancement of Artificial Intelligence (AAAI) , pp. 607-613
- Whitehead, S.D.¹

102
- 79957667076
- Introduction to the special issue on empirical evaluations in reinforcement learning
- S. Whiteson and M. Littman. Introduction to the special issue on empirical evaluations in reinforcement learning. Machine Learning, pages 1-6, 2011.
- (2011) Machine Learning , pp. 1-6
- Whiteson, S.¹ Littman, M.²

103
- 0001859165
- Pattern-recognizing control systems
- Washington DC
- B. Widrow and F. Smith. Pattern-recognizing control systems. In Computer and Information Sciences: Collected Papers on Learning, Adaptation and Control in Information Systems, COINS symposium proceedings, volume 12, pages 288-317, Washington DC, 1964.
- (1964) Computer and Information Sciences: Collected Papers on Learning, Adaptation and Control in Information Systems, COINS Symposium Proceedings, Volume 12 , pp. 288-317
- Widrow, B.¹ Smith, F.²

104
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Machine Learning, pages 229-256, 1992.
- (1992) Machine Learning , pp. 229-256
- Williams, R.J.¹

105
- 21844472209
- Procedures as a representation for data in a computer program for understanding natural language
- Massachusetts Institute Of Technology
- T. Winograd. Procedures as a representation for data in a computer program for understanding natural language. Technical Report 235, Massachusetts Institute of Technology, 1971.
- (1971) Technical Report 235
- Winograd, T.¹

106
- 81855211901
- The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate
- Y. Ye. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Math. Oper. Res., 36(4): 593-603, 2011.
- (2011) Math. Oper. Res. , vol.36 , Issue.4 , pp. 593-603
- Ye, Y.¹

107
- 77953119098
- Error bounds for approximations from projected linear equations
- H. Yu and D. P. Bertsekas. Error bounds for approximations from projected linear equations. Math. Oper. Res., 35(2):306-329, 2010.
- (2010) Math. Oper. Res. , vol.35 , Issue.2 , pp. 306-329
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.