메뉴 건너뛰기




Volumn 6, Issue 4, 2013, Pages 375-454

A tutorial on linear function approximators for dynamic programming and reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATE DYNAMIC PROGRAMMING; COMPLEXITY ANALYSIS; DECISION-MAKING PROBLEM; DYNAMIC PROGRAMMING METHODS; EMPIRICAL EVALUATIONS; MARKOV DECISION PROCESSES; REINFORCEMENT LEARNING METHOD; UNIFIED FRAMEWORK;

EID: 84890920160     PISSN: 19358237     EISSN: 19358245     Source Type: Journal    
DOI: 10.1561/2200000042     Document Type: Review
Times cited : (102)

References (107)
  • 1
    • 84890921683 scopus 로고    scopus 로고
    • Accessed: 20/08/2012
    • RL competition. http://www.rl-competition.org/, 2012. Accessed: 20/08/2012.
    • (2012) RL Competition
  • 4
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path
    • A. Antos, C. Szepesvári, and R. Munos. Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning, 71(1):89-129, 2008.
    • (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 6
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • AUAI Press
    • AUAI Press. L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In ICML, pages 30-37, 1995.
    • (1995) ICML , pp. 30-37
    • Baird, L.C.1
  • 7
    • 38149008840 scopus 로고    scopus 로고
    • Restricted gradient-descent algorithm for value-function approximation in reinforcement learning
    • A. d. M. S. Barreto and C. W. Anderson. Restricted gradient-descent algorithm for value-function approximation in reinforcement learning. Artificial Intelligence, 172:454 - 482, 2008.
    • (2008) Artificial Intelligence , vol.172 , pp. 454-482
    • Barreto, A.D.M.S.1    Anderson, C.W.2
  • 8
    • 2442603180 scopus 로고
    • Monte carlo matrix inversion and reinforcement learning
    • Morgan Kaufmann
    • A. Barto and M. Duff. Monte carlo matrix inversion and reinforcement learning. In Neural Information Processing Systems (NIPS), pages 687-694. Morgan Kaufmann, 1994.
    • (1994) Neural Information Processing Systems (NIPS) , pp. 687-694
    • Barto, A.1    Duff, M.2
  • 9
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • A. Barto, S. Bradtke, and S. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:81-138, 1995.
    • (1995) Artificial Intelligence , vol.72 , pp. 81-138
    • Barto, A.1    Bradtke, S.2    Singh, S.3
  • 12
    • 0003787146 scopus 로고
    • Princeton University Press,Princeton, NJ
    • R. E. Bellman. Dynamic Programming. Princeton University Press, Princeton, NJ, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 15
    • 0003487482 scopus 로고    scopus 로고
    • (Optimization and Neural Computation Series, 3) Athena Scientific, May
    • D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientific, May 1996.
    • (1996) Neuro-Dynamic Programming
    • Bertsekas, D.P.1    Tsitsiklis, J.N.2
  • 19
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • In G. Tesauro, D. Touretzky, and T. Lee, editors Cambridge, MA
    • J. Boyan and A. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. Touretzky, and T. Lee, editors, Neural Information Processing Systems (NIPS), pages 369-376, Cambridge, MA, 1995.
    • (1995) Neural Information Processing Systems (NIPS) , pp. 369-376
    • Boyan, J.1    Moore, A.2
  • 20
    • 0038595396 scopus 로고    scopus 로고
    • Least-squares temporal difference learning
    • The MIT Press. Morgan Kaufmann, San Francisco, CA
    • The MIT Press. J. A. Boyan. Least-squares temporal difference learning. In International Conference on Machine Learning (ICML), pages 49-56. Morgan Kaufmann, San Francisco, CA, 1999.
    • (1999) International Conference on Machine Learning (ICML) , pp. 49-56
    • Boyan, J.A.1
  • 21
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto. Linear least-squares algorithms for temporal difference learning. Journal of Machine Learning Research (JMLR), 22:33-57, 1996. (Pubitemid 126724362)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1
  • 24
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence and Research (JAIR), 13(1):227- 303, Nov. 2000. (Pubitemid 33682087)
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 26
  • 34
    • 70350172883 scopus 로고    scopus 로고
    • Feature discovery in reinforcement learning using genetic programming
    • INRIA
    • S. Girgin and P. Preux. Feature Discovery in Reinforcement Learning using Genetic Programming. Research Report RR-6358, INRIA, 2007.
    • (2007) Research Report RR-6358
    • Girgin, S.1    Preux, P.2
  • 36
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • Tahoe City, California, July 9-12
    • G. Gordon. Stable function approximation in dynamic programming. In International Conference on Machine Learning (ICML), page 261, Tahoe City, California, July 9-12 1995.
    • (1995) International Conference on Machine Learning (ICML) , pp. 261
    • Gordon, G.1
  • 37
    • 67649964731 scopus 로고    scopus 로고
    • Reinforcement learning: A tutorial survey and recent advances
    • April
    • Morgan Kaufmann. A. Gosavi. Reinforcement learning: A tutorial survey and recent advances. INFORMS J. on Computing, 21(2):178-192, April 2009.
    • (2009) INFORMS J. on Computing , vol.21 , Issue.2 , pp. 178-192
    • Kaufmann, M.1    Gosavi, A.2
  • 44
    • 84890925143 scopus 로고    scopus 로고
    • Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration
    • September
    • T. Jung and P. Stone. Gaussian processes for sample efficient reinforcement learning with RMAX-like exploration. In European Conference on Machine Learning (ECML), September 2010.
    • (2010) European Conference on Machine Learning (ECML)
    • Jung, T.1    Stone, P.2
  • 46
    • 79958852534 scopus 로고    scopus 로고
    • Characterizing reinforcement learning methods through parameterized learning problems
    • S. Kalyanakrishnan and P. Stone. Characterizing reinforcement learning methods through parameterized learning problems. Machine Learning, 2011.
    • (2011) Machine Learning
    • Kalyanakrishnan, S.1    Stone, P.2
  • 47
    • 71149121683 scopus 로고    scopus 로고
    • Regularization and feature selection in least-squares temporal difference learning
    • New York, NY, USA
    • J. Z. Kolter and A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In International Conference on Machine Learning (ICML), pages 521-528, New York, NY, USA, 2009.
    • (2009) International Conference on Machine Learning (ICML) , pp. 521-528
    • Kolter, J.Z.1    Ng, A.Y.2
  • 48
    • 0030721089 scopus 로고    scopus 로고
    • Comparison of cmacs and radial basis functions for local function approximators in reinforcement learning
    • ACM
    • ACM. R. Kretchmar and C. Anderson. Comparison of cmacs and radial basis functions for local function approximators in reinforcement learning. In International Conference on Neural Networks, volume 2, pages 834-837 vol.2, 1997.
    • (1997) International Conference on Neural Networks, Volume 2 , vol.2 , pp. 834-837
    • Kretchmar, R.1    Anderson, C.2
  • 52
    • 84890939404 scopus 로고    scopus 로고
    • Sample complexity bounds of exploration
    • In M. Wiering and M. van Otterlo, editors Springer Verlag
    • L. Li. Sample complexity bounds of exploration. In M. Wiering and M. van Otterlo, editors, Reinforcement Learning: State of the Art. Springer Verlag, 2012.
    • (2012) Reinforcement Learning: State of the Art
    • Li, L.1
  • 58
  • 62
    • 0036832952 scopus 로고    scopus 로고
    • Risk-sensitive reinforcement learning
    • DOI 10.1023/A:1017940631555
    • O. Mihatsch and R. Neuneier. Risk-sensitive reinforcement learning. Journal of Machine Learning Research (JMLR), 49(2-3):267-290, 2002. (Pubitemid 34325690)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 267-290
    • Mihatsch, O.1    Neuneier, R.2
  • 63
    • 0000672424 scopus 로고
    • Fast learning in networks of locally-tuned processing units
    • June
    • J. Moody and C. J. Darken. Fast learning in networks of locally-tuned processing units. Neural Computation, 1(2):281-294, June 1989.
    • (1989) Neural Computation , vol.1 , Issue.2 , pp. 281-294
    • Moody, J.1    Darken, C.J.2
  • 64
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • A.W. Moore and C. G. Atkeson. Prioritized sweeping: Reinforcement learning with less data and less time. In Machine Learning, pages 103-130, 1993.
    • (1993) Machine Learning , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 65
    • 84858776393 scopus 로고    scopus 로고
    • Multi-resolution exploration in continuous spaces
    • In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors MIT Press
    • A. Nouri and M. L. Littman. Multi-resolution exploration in continuous spaces. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems (NIPS), pages 1209-1216. MIT Press, 2009.
    • (2009) Advances in Neural Information Processing Systems (NIPS) , pp. 1209-1216
    • Nouri, A.1    Littman, M.L.2
  • 67
    • 56449092660 scopus 로고    scopus 로고
    • An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
    • ACM New York, NY, USA
    • ACM. R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M. L. Littman. An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In International Conference on Machine Learning (ICML), pages 752-759, New York, NY, USA, 2008.
    • (2008) International Conference on Machine Learning (ICML) , pp. 752-759
    • Parr, R.1    Li, L.2    Taylor, G.3    Painter-Wakefield, C.4    Littman, M.L.5
  • 68
    • 34250635407 scopus 로고    scopus 로고
    • Policy gradient methods for robotics
    • DOI 10.1109/IROS.2006.282564, 4058714, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
    • ACM. J. Peters and S. Schaal. Policy gradient methods for robotics. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2219-2225. IEEE, October 2006. (Pubitemid 46928224)
    • (2006) IEEE International Conference on Intelligent Robots and Systems , pp. 2219-2225
    • Peters, J.1    Schaal, S.2
  • 69
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • March
    • J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71:1180-1190, March 2008.
    • (2008) Neurocomputing , vol.71 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 77
    • 77956551905 scopus 로고    scopus 로고
    • Should one compute the temporal difference fix point or minimize the bellman residual the unified oblique projection view
    • B. Scherrer. Should one compute the temporal difference fix point or minimize the bellman residual the unified oblique projection view. In International Conference on Machine Learning (ICML), 2010.
    • (2010) International Conference on Machine Learning (ICML)
    • Scherrer, B.1
  • 78
    • 0347243182 scopus 로고    scopus 로고
    • Nonlinear Component Analysis as a Kernel Eigenvalue Problem
    • B. Schölkopf and A. Smola. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computations, 10(5):1299-1319, 1998. (Pubitemid 128463674)
    • (1998) Neural Computation , vol.10 , Issue.5 , pp. 1299-1319
    • Scholkopf, B.1    Smola, A.2    Muller, K.-R.3
  • 82
    • 84863416482 scopus 로고    scopus 로고
    • Temporal-difference search in computer go
    • ACM
    • ACM. D. Silver, R. S. Sutton, and M. Müller. Temporal-difference search in computer go. Machine Learning, 87(2):183-219, 2012.
    • (2012) Machine Learning , vol.87 , Issue.2 , pp. 183-219
    • Silver, D.1    Sutton, R.S.2    Müller, M.3
  • 84
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • DOI 10.1023/A:1007678930559
    • S. P. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári. Convergence results for single-step on-policy reinforcement-learning algorithms. Journal of Machine Learning Research (JMLR), 38:287-308, 2000. (Pubitemid 30572449)
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szepesvari, C.4
  • 86
    • 27544506565 scopus 로고    scopus 로고
    • Reinforcement learning for RoboCup soccer keepaway
    • DOI 10.1177/105971230501300301
    • P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup soccer keepaway. Adaptive Behavior, 13(3):165-188, September 2005b. (Pubitemid 41546119)
    • (2005) Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
    • Stone, P.1    Sutton, R.S.2    Kuhlmann, G.3
  • 88
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • The MIT Press
    • R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Neural Information Processing Systems (NIPS), pages 1038-1044.The MIT Press, 1996.
    • (1996) Neural Information Processing Systems (NIPS) , pp. 1038-1044
    • Sutton, R.S.1
  • 93
    • 77956520676 scopus 로고    scopus 로고
    • Model-based reinforcement learning with nearly tight exploration complexity bounds
    • I. Szita and C. Szepesvári. Model-based reinforcement learning with nearly tight exploration complexity bounds. In International Conference on Machine Learning (ICML), pages 1031-1038, 2010.
    • (2010) International Conference on Machine Learning (ICML) , pp. 1031-1038
    • Szita, I.1    Szepesvári, C.2
  • 94
    • 71149100225 scopus 로고    scopus 로고
    • Kernelized value function approximation for reinforcement learning
    • New York, NY, USA
    • G. Taylor and R. Parr. Kernelized value function approximation for reinforcement learning. In International Conference on Machine Learning (ICML), pages 1017- 1024, New York, NY, USA, 2009.
    • (2009) International Conference on Machine Learning (ICML) , pp. 1017-1024
    • Taylor, G.1    Parr, R.2
  • 95
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • PII S0018928697034375
    • ACM. J. N. Tsitsiklis and B. V. Roy. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674- 690, May 1997. (Pubitemid 127760263)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 96
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • DOI 10.1016/S0005-1098(99)00099-0
    • J. N. Tsitsiklis and B. V. Roy. Average cost temporal-difference learning. Automatica, 35(11):1799 - 1808, 1999. (Pubitemid 32078092)
    • (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 97
    • 84880581275 scopus 로고    scopus 로고
    • Adaptive planning for markov decision processes with uncertain transition models via incremental feature dependency discovery
    • N. K. Ure, A. Geramifard, G. Chowdhary, and J. P. How. Adaptive Planning for Markov Decision Processes with Uncertain Transition Models via Incremental Feature Dependency Discovery. In European Conference on Machine Learning (ECML), 2012.
    • (2012) European Conference on Machine Learning (ECML)
    • Ure, N.K.1    Geramifard, A.2    Chowdhary, G.3    How, P.J.4
  • 99
    • 34249833101 scopus 로고
    • Q-learning
    • C. J. Watkins. Q-learning. Machine Learning, 8(3):279-292, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
    • Watkins, C.J.1
  • 102
    • 79957667076 scopus 로고    scopus 로고
    • Introduction to the special issue on empirical evaluations in reinforcement learning
    • S. Whiteson and M. Littman. Introduction to the special issue on empirical evaluations in reinforcement learning. Machine Learning, pages 1-6, 2011.
    • (2011) Machine Learning , pp. 1-6
    • Whiteson, S.1    Littman, M.2
  • 104
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Machine Learning, pages 229-256, 1992.
    • (1992) Machine Learning , pp. 229-256
    • Williams, R.J.1
  • 105
    • 21844472209 scopus 로고
    • Procedures as a representation for data in a computer program for understanding natural language
    • Massachusetts Institute Of Technology
    • T. Winograd. Procedures as a representation for data in a computer program for understanding natural language. Technical Report 235, Massachusetts Institute of Technology, 1971.
    • (1971) Technical Report 235
    • Winograd, T.1
  • 106
    • 81855211901 scopus 로고    scopus 로고
    • The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate
    • Y. Ye. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Math. Oper. Res., 36(4): 593-603, 2011.
    • (2011) Math. Oper. Res. , vol.36 , Issue.4 , pp. 593-603
    • Ye, Y.1
  • 107
    • 77953119098 scopus 로고    scopus 로고
    • Error bounds for approximations from projected linear equations
    • H. Yu and D. P. Bertsekas. Error bounds for approximations from projected linear equations. Math. Oper. Res., 35(2):306-329, 2010.
    • (2010) Math. Oper. Res. , vol.35 , Issue.2 , pp. 306-329
    • Yu, H.1    Bertsekas, D.P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.