메뉴 건너뛰기




Volumn 23, Issue 4, 2010, Pages 541-550

Online learning of shaping rewards in reinforcement learning

Author keywords

Learning heuristic; Potential based reward shaping; Reinforcement learning

Indexed keywords

BACKGROUND KNOWLEDGE; CONVERGENCE RATES; DISCRETISATION; FREE SPACE; LEARNING AGENTS; LEARNING HEURISTIC; MODEL FREE; MODEL-BASED; MULTI-GRID; NOVEL ALGORITHM; ONLINE LEARNING; POTENTIAL FUNCTION; REINFORCEMENT LEARNING AGENT; TEMPORAL DIFFERENCE LEARNING;

EID: 77950298151     PISSN: 08936080     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.neunet.2010.01.001     Document Type: Article
Times cited : (68)

References (40)
  • 2
    • 57749177069 scopus 로고    scopus 로고
    • Potential-based shaping in model-based reinforcement learning
    • In Proceedings of AAAI conference on artificial intelligence
    • Asmuth, J., Littman, M. L., & Zinkov, R. (2008). Potential-based shaping in model-based reinforcement learning. In Proceedings of AAAI conference on artificial intelligence.
    • (2008)
    • Asmuth, J.1    Littman, M.L.2    Zinkov, R.3
  • 6
    • 84910019014 scopus 로고    scopus 로고
    • An evolutionary approach to tetris
    • In Proceedings of the sixth metaheuristics international conference
    • Böhm, N., Kókai, G., & Mandl, S. (2005). An evolutionary approach to tetris. In Proceedings of the sixth metaheuristics international conference.
    • (2005)
    • Böhm, N.1    Kókai, G.2    Mandl, S.3
  • 8
    • 0041965975 scopus 로고    scopus 로고
    • R-max - A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman R.I., Tennenholtz M. R-max - A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 2002, 213-231.
    • (2002) Journal of Machine Learning Research , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 9
    • 0026206780 scopus 로고
    • An optimal one-way multigrid algorithm for discrete-time stochastic control
    • Chow C.S., Tsitsiklis J.N. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 1991, 36(8):898-914.
    • (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
    • Chow, C.S.1    Tsitsiklis, J.N.2
  • 11
    • 0001234682 scopus 로고
    • In Proceedings of advances in neural information processing systems
    • Dayan, P., & Hinton, G. E. (1993). Feudal reinforcement learning. In Proceedings of advances in neural information processing systems.
    • (1993) Feudal reinforcement learning
    • Dayan, P.1    Hinton, G.E.2
  • 12
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • Dietterich T.G. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 2000, 13:227-303.
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 13
    • 34250717446 scopus 로고    scopus 로고
    • Qualitative reinforcement learning
    • In Proceedings of the 23rd international conference on machine learning
    • Epshteyn, A., & DeJong, G. (2006). Qualitative reinforcement learning. In Proceedings of the 23rd international conference on machine learning (pp. 305-312).
    • (2006) , pp. 305-312
    • Epshteyn, A.1    DeJong, G.2
  • 15
    • 58849111871 scopus 로고    scopus 로고
    • Multigrid reinforcement learning with reward shaping
    • Springer-Verlag, Proceedings of the 18th international conference on artificial neural networks
    • Grzes M., Kudenko D. Multigrid reinforcement learning with reward shaping. LNCS 2008, Springer-Verlag.
    • (2008) LNCS
    • Grzes, M.1    Kudenko, D.2
  • 17
    • 84875095635 scopus 로고
    • Shaping as a method for accelerating reinforcement learning
    • In Proceedings of the 1992 IEEE international symposium on intelligent control
    • Gullapalli, V., & Barto, A. G. (1992). Shaping as a method for accelerating reinforcement learning. In Proceedings of the 1992 IEEE international symposium on intelligent control (pp. 554-559).
    • (1992) , pp. 554-559
    • Gullapalli, V.1    Barto, A.G.2
  • 18
    • 85143168613 scopus 로고
    • Hierarchical learning in stochastic domains
    • In Proceedings of international conference on machine learning
    • Kaelbling, L. P. (1993). Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of international conference on machine learning (pp. 167-173).
    • (1993) Preliminary results , pp. 167-173
    • Kaelbling, L.P.1
  • 19
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Kearns M., Singh S. Near-optimal reinforcement learning in polynomial time. Machine Learning 2002, 209-232.
    • (2002) Machine Learning , pp. 209-232
    • Kearns, M.1    Singh, S.2
  • 20
    • 33749243349 scopus 로고    scopus 로고
    • Autonomous shaping: Knowledge transfer in reinforcement learning. In The 23th international conference on machine learning
    • Konidaris, G., & Barto, A. (2006). Autonomous shaping: Knowledge transfer in reinforcement learning. In The 23th international conference on machine learning.
    • (2006)
    • Konidaris, G.1    Barto, A.2
  • 21
    • 34547964974 scopus 로고    scopus 로고
    • Automatic shaping and decomposition of reward functions
    • In Proceedings of the 24th international conference on machine learning
    • Marthi, B. (2007). Automatic shaping and decomposition of reward functions. In Proceedings of the 24th international conference on machine learning (pp. 601-608).
    • (2007) , pp. 601-608
    • Marthi, B.1
  • 22
    • 84957895797 scopus 로고
    • Reward functions for accelerated learning
    • In Proceedings of the 11th international conference on machine learning
    • Mataric, M. J. (1994). Reward functions for accelerated learning. In Proceedings of the 11th international conference on machine learning (pp. 181-189).
    • (1994) , pp. 181-189
    • Mataric, M.J.1
  • 24
    • 84880688141 scopus 로고    scopus 로고
    • Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs
    • In Proceedings of the international joint conference on artificial intelligence
    • Moore, A., Baird, L., & Kaelbling, L. P. (1999). Multi-value-functions: Efficient automatic action hierarchies for multiple goal MDPs. In Proceedings of the international joint conference on artificial intelligence (pp. 1316-1323).
    • (1999) , pp. 1316-1323
    • Moore, A.1    Baird, L.2    Kaelbling, L.P.3
  • 25
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations
    • In Proceedings of the 16th international conference on machine learning
    • Ng, A. Y., Harada, D., & Russell, S. J. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In Proceedings of the 16th international conference on machine learning (pp. 278-287).
    • (1999) Theory and application to reward shaping , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.J.3
  • 26
    • 0141819580 scopus 로고    scopus 로고
    • PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of uncertainty in artificial intelligence
    • Ng, A. Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of uncertainty in artificial intelligence (pp. 406-415).
    • (2000) , pp. 406-415
    • Ng, A.Y.1    Jordan, M.2
  • 27
    • 0001070375 scopus 로고    scopus 로고
    • Reinforcement learning with hierarchies of machines
    • In Proceedings of advances in neural information processing systems
    • Parr, R., & Russell, S. (1997). Reinforcement learning with hierarchies of machines. In Proceedings of advances in neural information processing systems. Vol. 10.
    • (1997) , vol.10
    • Parr, R.1    Russell, S.2
  • 29
    • 1642401055 scopus 로고    scopus 로고
    • Learning to drive a bicycle using reinforcement learning and shaping
    • In Proceedings of the 15th international conference on machine learning
    • Randløv, J., & Alstrom, P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th international conference on machine learning (pp. 463-471).
    • (1998) , pp. 463-471
    • Randløv, J.1    Alstrom, P.2
  • 30
    • 84880859730 scopus 로고    scopus 로고
    • Real-time heuristic search with a priority queue
    • In Proceedings of the 2007 international joint conference on artificial intelligence
    • Rayner, D. C., Davison, K., Bulitko, V., Anderson, K., & Lu, J. (2007). Real-time heuristic search with a priority queue. In Proceedings of the 2007 international joint conference on artificial intelligence (pp. 2372-2377).
    • (2007) , pp. 2372-2377
    • Rayner, D.C.1    Davison, K.2    Bulitko, V.3    Anderson, K.4    Lu, J.5
  • 32
    • 77950296264 scopus 로고    scopus 로고
    • In Proceedings of the 11th European conference on machine learning
    • Stone, P., & Veloso, M. (2000). Layered learning. In Proceedings of the 11th European conference on machine learning.
    • (2000) Layered learning
    • Stone, P.1    Veloso, M.2
  • 33
    • 33845972675 scopus 로고    scopus 로고
    • Pac reinforcement learning bounds for rtdp and rand-rtdp. In Proceedings of AAAI workshop on learning for search
    • Strehl, A. L., Li, L., & Littman, M. L. (2006). Pac reinforcement learning bounds for rtdp and rand-rtdp. In Proceedings of AAAI workshop on learning for search.
    • (2006)
    • Strehl, A.L.1    Li, L.2    Littman, M.L.3
  • 34
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th international conference on machine learning
    • Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th international conference on machine learning (pp. 216-224).
    • (1990) , pp. 216-224
    • Sutton, R.S.1
  • 36
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • Sutton R.S., Precup D., Singh S.P. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 1999, 112(1-2):181-211.
    • (1999) Artificial Intelligence , vol.112 , Issue.1-2 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.P.3
  • 37
    • 27544473171 scopus 로고    scopus 로고
    • Behavior transfer for value-function-based reinforcement learning
    • In Proceedings of the 4th international joint conference on autonomous agents and multiagent systems
    • Taylor, M. E., & Stone, P. (2005). Behavior transfer for value-function-based reinforcement learning. In Proceedings of the 4th international joint conference on autonomous agents and multiagent systems (pp. 53-59).
    • (2005) , pp. 53-59
    • Taylor, M.E.1    Stone, P.2
  • 38
    • 0000985504 scopus 로고
    • TD-gammon, a self-teaching backgammon program, achieves master-level play
    • Tesauro G.J. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation 1994, 6(2):215-219.
    • (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G.J.1
  • 39
    • 27344453198 scopus 로고    scopus 로고
    • Potential-based shaping and q-value initialisation are equivalent
    • Wiewiora E. Potential-based shaping and q-value initialisation are equivalent. Journal of Artificial Intelligence Research 2003, 19:205-208.
    • (2003) Journal of Artificial Intelligence Research , vol.19 , pp. 205-208
    • Wiewiora, E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.