메뉴 건너뛰기




Volumn , Issue , 2011, Pages 1-8

Approximate reinforcement learning: An overview

Author keywords

function approximation; policy iteration; policy search; reinforcement learning; value iteration

Indexed keywords

COMPLEX ENVIRONMENTS; FUNCTION APPROXIMATION; OFFLINE; ON-LINE ALGORITHMS; POLICY GRADIENT METHODS; POLICY ITERATION; POLICY SEARCH; SIMULATION-BASED; VALUE ITERATION;

EID: 80052220856     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ADPRL.2011.5967353     Document Type: Conference Paper
Times cited : (54)

References (82)
  • 8
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • J. Peters and S. Schaal, "Reinforcement learning of motor skills with policy gradients," Neural Networks, vol. 21, pp. 682-697, 2008.
    • (2008) Neural Networks , vol.21 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 9
    • 84864030941 scopus 로고    scopus 로고
    • An application of reinforcement learning to aerobatic helicopter flight
    • B. Schölkopf, J. C. Platt, and T. Hoffman, Eds MIT Press
    • P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng, "An application of reinforcement learning to aerobatic helicopter flight," in Advances in Neural Information Processing Systems 19, B. Schölkopf, J. C. Platt, and T. Hoffman, Eds. MIT Press, 2007, pp. 1-8.
    • (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 1-8
    • Abbeel, P.1    Coates, A.2    Quigley, M.3    Ng, A.Y.4
  • 11
    • 39649096058 scopus 로고    scopus 로고
    • Clinical data based optimal STI strategies for HIV: A reinforcement learning approach
    • 4177178, Proceedings of the 45th IEEE Conference on Decision and Control 2006, CDC
    • D. Ernst, G.-B. Stan, J. Gonçalves, and L. Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proceedings 45th IEEE Conference on Decision & Control, San Diego, US, 13-15 December 2006, pp. 667-672. (Pubitemid 351283311)
    • (2006) Proceedings of the IEEE Conference on Decision and Control , pp. 667-672
    • Ernst, D.1    Stan, G.-B.2    Goncalves, J.3    Wehenkel, L.4
  • 15
    • 0003636089 scopus 로고
    • Engineering Department, Cambridge University, UK, Tech. Rep. CUED/F-INFENG/TR166, September
    • G. A. Rummery and M. Niranjan, "On-line Q-learning using connectionist systems," Engineering Department, Cambridge University, UK, Tech. Rep. CUED/F-INFENG/TR166, September 1994, available at http://mi.eng.cam.ac.uk/reports/svr-ftp/rummery tr166.ps.Z.
    • (1994) On-line Q-learning Using Connectionist Systems
    • Rummery, G.A.1    Niranjan, M.2
  • 17
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q-iteration - First experiences with a data efficient neural reinforcement learning method
    • ser. Lecture Notes in Computer Science, Porto, Portugal, 3-7 October
    • M. Riedmiller, "Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method," in Proceedings 16th European Conference on Machine Learning (ECML-05), ser. Lecture Notes in Computer Science, vol. 3720, Porto, Portugal, 3-7 October 2005, pp. 317-328.
    • (2005) Proceedings 16th European Conference on Machine Learning (ECML-05) , vol.3720 , pp. 317-328
    • Riedmiller, M.1
  • 18
    • 85161978146 scopus 로고    scopus 로고
    • Fitted Q-iteration in continuous action-space MDPs
    • J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. MIT Press
    • A. Antos, R. Munos, and Cs. Szepesvári, "Fitted Q-iteration in continuous action-space MDPs," in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. MIT Press, 2008, pp. 9-16.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 9-16
    • Antos, A.1    Munos, R.2    Szepesvári, C.S.3
  • 19
    • 85153965130 scopus 로고
    • Reinforcement learning with soft state aggregation
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press
    • S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press, 1995, pp. 361-368.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 22
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • Aug. , special issue on reinforcement learning
    • L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching," Machine Learning, vol. 8, no. 3-4, pp. 293-321, Aug. 1992, special issue on reinforcement learning.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 293-321
    • Lin, L.-J.1
  • 23
    • 70049104729 scopus 로고    scopus 로고
    • Fitted Q-iteration by advantage weighted regression
    • D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press
    • G. Neumann and J. Peters, "Fitted Q-iteration by advantage weighted regression," in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press, 2009, pp. 1177-1184.
    • (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 1177-1184
    • Neumann, G.1    Peters, J.2
  • 24
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • DOI 10.1023/A:1017928328829
    • D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Machine Learning, vol. 49, no. 2-3, pp. 161-178, 2002. (Pubitemid 34325684)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, A.2
  • 29
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • A. Antos, Cs. Szepesvári, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Machine Learning, vol. 71, no. 1, pp. 89-129, 2008.
    • (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
    • Antos, A.1    Szepesvári, C.S.2    Munos, R.3
  • 32
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 1-3, pp. 33-57, 1996. (Pubitemid 126724362)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1
  • 37
    • 48349140736 scopus 로고    scopus 로고
    • Rollout sampling approximate policy iteration
    • C. Dimitrakakis and M. Lagoudakis, "Rollout sampling approximate policy iteration," Machine Learning, vol. 72, no. 3, pp. 157-171, 2008.
    • (2008) Machine Learning , vol.72 , Issue.3 , pp. 157-171
    • Dimitrakakis, C.1    Lagoudakis, M.2
  • 40
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press
    • R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press, 1996, pp. 1038-1044.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.S.1
  • 45
    • 77956517288 scopus 로고    scopus 로고
    • Convergence of least squares temporal difference methods under general conditions
    • Haifa, Israel, 21-24 June
    • H. Yu, "Convergence of least squares temporal difference methods under general conditions," in Proceedings 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, 21-24 June 2010, pp. 1207-1214.
    • (2010) Proceedings 27th International Conference on Machine Learning (ICML-10) , pp. 1207-1214
    • Yu, H.1
  • 46
    • 77956551905 scopus 로고    scopus 로고
    • Should one compute the Temporal Difference fix point or minimize the Bellman Residual? the unified oblique projection view
    • Haifa, Israel, 21-24 June
    • B. Scherrer, "Should one compute the Temporal Difference fix point or minimize the Bellman Residual? the unified oblique projection view," in Proceedings 27th International Conference on Machine Learning (ICML- 10), Haifa, Israel, 21-24 June 2010, pp. 959-966.
    • (2010) Proceedings 27th International Conference on Machine Learning (ICML-10) , pp. 959-966
    • Scherrer, B.1
  • 48
    • 0037288469 scopus 로고    scopus 로고
    • Approximate gradient methods in policy-space optimization of Markov reward processes
    • P. Marbach and J. N. Tsitsiklis, "Approximate gradient methods in policy-space optimization of Markov reward processes," Discrete Event Dynamic Systems: Theory and Applications, vol. 13, no. 1-2, pp. 111- 148, 2003.
    • (2003) Discrete Event Dynamic Systems: Theory and Applications , vol.13 , Issue.1-2 , pp. 111-148
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 49
  • 50
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • S. A. Solla, T. K. Leen, and K.-R. Müller, Eds. MIT Press
    • R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Müller, Eds. MIT Press, 2000, pp. 1057-1063.
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.A.2    Singh, S.P.3    Mansour, Y.4
  • 52
    • 33646243319 scopus 로고    scopus 로고
    • A natural policy gradient
    • T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds. MIT Press
    • S. Kakade, "A natural policy gradient," in Advances in Neural Information Processing Systems 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds. MIT Press, 2001, pp. 1531-1538.
    • (2001) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
    • Kakade, S.1
  • 53
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, no. 7-9, pp. 1180-1190, 2008.
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 54
    • 70349984547 scopus 로고    scopus 로고
    • Natural actorcritic algorithms
    • S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee, "Natural actorcritic algorithms," Automatica, vol. 45, no. 11, pp. 2471-2482, 2009.
    • (2009) Automatica , vol.45 , Issue.11 , pp. 2471-2482
    • Bhatnagar, S.1    Sutton, R.2    Ghavamzadeh, M.3    Lee, M.4
  • 59
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Machine Learning, vol. 22, no. 1-3, pp. 59-94, 1996. (Pubitemid 126724363)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 60
    • 33646714634 scopus 로고    scopus 로고
    • Evolutionary function approximation for reinforcement learning
    • S. Whiteson and P. Stone, "Evolutionary function approximation for reinforcement learning," Journal of Machine Learning Research, vol. 7, pp. 877-917, 2006. (Pubitemid 43736560)
    • (2006) Journal of Machine Learning Research , vol.7 , pp. 877-917
    • Whiteson, S.1    Stone, P.2
  • 61
    • 33749263205 scopus 로고    scopus 로고
    • Automatic basis function construction for approximate dynamic programming and reinforcement learning
    • Pittsburgh, US, 25-29 June
    • P. W. Keller, S. Mannor, and D. Precup, "Automatic basis function construction for approximate dynamic programming and reinforcement learning," in Proceedings 23rd International Conference on Machine Learning (ICML-06), Pittsburgh, US, 25-29 June 2006, pp. 449-456.
    • (2006) Proceedings 23rd International Conference on Machine Learning (ICML-06) , pp. 449-456
    • Keller, P.W.1    Mannor, S.2    Precup, D.3
  • 65
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • DOI 10.1007/s10479-005-5732-z
    • I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Annals of Operations Research, vol. 134, no. 1, pp. 215-238, 2005. (Pubitemid 40550047)
    • (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 66
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
    • S. Mahadevan and M. Maggioni, "Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes," Journal of Machine Learning Research, vol. 8, pp. 2169-2231, 2007. (Pubitemid 350046199)
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 67
    • 71149121683 scopus 로고    scopus 로고
    • Regularization and feature selection in leastsquares temporal difference learning
    • Montreal, Canada, 14-18 June
    • J. Z. Kolter and A. Ng, "Regularization and feature selection in leastsquares temporal difference learning," in Proceedings 26th International Conference on Machine Learning (ICML-09), Montreal, Canada, 14-18 June 2009, pp. 521-528.
    • (2009) Proceedings 26th International Conference on Machine Learning (ICML-09) , pp. 521-528
    • Kolter, J.Z.1    Ng, A.2
  • 71
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Austin, US, 21-23 June
    • R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proceedings 7th International Conference on Machine Learning (ICML-90), Austin, US, 21-23 June 1990, pp. 216-224.
    • (1990) Proceedings 7th International Conference on Machine Learning (ICML-90) , pp. 216-224
    • Sutton, R.S.1
  • 74
    • 77952027689 scopus 로고    scopus 로고
    • Online optimization in X-armed bandits
    • D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press
    • S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvári, "Online optimization in X-armed bandits," in Advances in Neural Information Processing Systems 21, D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, Eds. MIT Press, 2009, pp. 201-208.
    • (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 201-208
    • Bubeck, S.1    Munos, R.2    Stoltz, G.3    Szepesvári, C.4
  • 76
    • 49049110053 scopus 로고    scopus 로고
    • Guest editorial - Special issue on adaptive dynamic programming and reinforcement learning in feedback control
    • F. Lewis, D. Liu, and G. Lendaris, "Guest editorial - special issue on adaptive dynamic programming and reinforcement learning in feedback control," IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 38, no. 4, pp. 896-897, 2008.
    • (2008) IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics , vol.38 , Issue.4 , pp. 896-897
    • Lewis, F.1    Liu, D.2    Lendaris, G.3
  • 79
    • 0003565783 scopus 로고    scopus 로고
    • 20 November, update of Chapter 6 in volume 2 of the book Dynamic Programming and Optimal Control
    • D. P. Bertsekas, "Approximate dynamic programming," 20 November 2010, update of Chapter 6 in volume 2 of the book Dynamic Programming and Optimal Control. Available at http://web.mit.edu/dimitrib/www/dpchapter.html.
    • (2010) Approximate Dynamic Programming
    • Bertsekas, D.P.1
  • 81
    • 33645410501 scopus 로고    scopus 로고
    • Dynamic programming and suboptimal control: A survey from ADP to MPC
    • special issue for the CDC-ECC-05 in Seville, Spain
    • D. P. Bertsekas, "Dynamic programming and suboptimal control: A survey from ADP to MPC," European Journal of Control, vol. 11, no. 4-5, pp. 310-334, 2005, special issue for the CDC-ECC-05 in Seville, Spain.
    • (2005) European Journal of Control , vol.11 , Issue.4-5 , pp. 310-334
    • Bertsekas, D.P.1
  • 82
    • 70350192140 scopus 로고    scopus 로고
    • Numerical dynamic programming in economics
    • H. M. Amman, D. A. Kendrick, and J. Rust, Eds. Elsevier, ch. 14
    • J. Rust, "Numerical dynamic programming in economics," in Handbook of Computational Economics, H. M. Amman, D. A. Kendrick, and J. Rust, Eds. Elsevier, 1996, vol. 1, ch. 14, pp. 619-729.
    • (1996) Handbook of Computational Economics , vol.1 , pp. 619-729
    • Rust, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.