메뉴 건너뛰기




Volumn 227, Issue , 2007, Pages 601-608

Automatic shaping and decomposition of reward functions

Author keywords

[No Author keywords available]

Indexed keywords

ABSTRACTING; APPROXIMATION ALGORITHMS; DECISION MAKING; LEARNING ALGORITHMS; PROBLEM SOLVING; REINFORCEMENT LEARNING;

EID: 34547964974     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1273496.1273572     Document Type: Conference Paper
Times cited : (108)

References (28)
  • 1
    • 34547996078 scopus 로고    scopus 로고
    • QUICR-learning for multi-agent coordination
    • Agogino, A., & Turner, K. (2006). QUICR-learning for multi-agent coordination. AAAI 2006.
    • (2006) AAAI 2006
    • Agogino, A.1    Turner, K.2
  • 2
    • 1642401055 scopus 로고    scopus 로고
    • Learning to drive a bicycle using reinforcement learning and shaping
    • Alstrom, J. R. P. (1998). Learning to drive a bicycle using reinforcement learning and shaping. ICML 1998.
    • (1998) ICML 1998
    • Alstrom, J.R.P.1
  • 4
    • 34547989671 scopus 로고    scopus 로고
    • On local rewards and scaling distributed reinforcement learning
    • MIT Press
    • Bagnell, J., & Ng, A. (2006). On local rewards and scaling distributed reinforcement learning. Neural Information Processing Systems. MIT Press.
    • (2006) Neural Information Processing Systems
    • Bagnell, J.1    Ng, A.2
  • 6
    • 84899032145 scopus 로고    scopus 로고
    • All learning is local: Multi-agent learning in global reward games
    • S. Thrun, L. Saul and B. Scholkopf Eds, Cambridge, MA: MIT Press
    • Chang, Y.-H., Ho, T., & Kaelbling, L. P. (2004). All learning is local: Multi-agent learning in global reward games. In S. Thrun, L. Saul and B. Scholkopf (Eds.), Advances in neural information processing systems 16. Cambridge, MA: MIT Press.
    • (2004) Advances in neural information processing systems 16
    • Chang, Y.-H.1    Ho, T.2    Kaelbling, L.P.3
  • 7
    • 0026206780 scopus 로고
    • An optimal one-way multigrid algorithm for discrete-time stochastic control
    • Chow, C., & Tsitsiklis, J. (1991). An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE transactions on automatic control, 36, 898-914.
    • (1991) IEEE transactions on automatic control , vol.36 , pp. 898-914
    • Chow, C.1    Tsitsiklis, J.2
  • 8
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • Dietterieh, T. G. (2000). Hierarchical reinforcement learning with the MAXQ value function decomposition. JAIR, 13.
    • (2000) JAIR , vol.13
    • Dietterieh, T.G.1
  • 9
  • 12
    • 1942482706 scopus 로고    scopus 로고
    • Reinforcement learning and shaping: Encouraging intended behaviors
    • Laud, A., & DeJong, G. (2002). Reinforcement learning and shaping: Encouraging intended behaviors. ICML (pp. 355-362).
    • (2002) ICML , pp. 355-362
    • Laud, A.1    DeJong, G.2
  • 13
    • 1942484890 scopus 로고    scopus 로고
    • The influence of reward on the speed of reinforcement learning: An analysis of shaping
    • Laud, A., & Dejong, G. (2003). The influence of reward on the speed of reinforcement learning: An analysis of shaping. ICML 2003.
    • (2003) ICML 2003
    • Laud, A.1    Dejong, G.2
  • 15
    • 0343048727 scopus 로고
    • A distributed reinforcement learning scheme for network routing
    • Carnegie Mellon University, Pittsburgh, PA, USA
    • Littman, M., & Boyan, J. (1993). A distributed reinforcement learning scheme for network routing (Technical Report). Carnegie Mellon University, Pittsburgh, PA, USA.
    • (1993) Technical Report
    • Littman, M.1    Boyan, J.2
  • 18
    • 84957895797 scopus 로고
    • Reward functions for accelerated learning
    • Mataric, M. J. (1994). Reward functions for accelerated learning. ICML 1994,
    • (1994) ICML 1994
    • Mataric, M.J.1
  • 19
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and application to reward shaping
    • Ng, A., Harada, D., & Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. ICML 1999.
    • (1999) ICML 1999
    • Ng, A.1    Harada, D.2    Russell, S.3
  • 20
    • 1942484759 scopus 로고    scopus 로고
    • Q-decomposition for reinforcement learning agents
    • Russell, S., & Zimdars, A. (2003). Q-decomposition for reinforcement learning agents. ICML 2003.
    • (2003) ICML 2003
    • Russell, S.1    Zimdars, A.2
  • 22
    • 0028497385 scopus 로고
    • An upper bound on the loss from approximate optimal-value functions
    • Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16, 227-233.
    • (1994) Machine Learning , vol.16 , pp. 227-233
    • Singh, S.P.1    Yee, R.C.2
  • 23
    • 34547989108 scopus 로고    scopus 로고
    • Combining dynamic abstractions in large MDPs
    • MIT
    • Steinkraus, K., & Kaelbling, L. (2004). Combining dynamic abstractions in large MDPs (Technical Report). MIT.
    • (2004) Technical Report
    • Steinkraus, K.1    Kaelbling, L.2
  • 24
    • 0013528313 scopus 로고    scopus 로고
    • Scaling reinforcement learning toward RoboCup soccer
    • Stone, P., & Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer. ICML 2001.
    • (2001) ICML 2001
    • Stone, P.1    Sutton, R.S.2
  • 25
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • Sutton, R. S., Precup, D., & Singh, S. P. (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181-211.
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.P.3
  • 26
    • 34547996491 scopus 로고    scopus 로고
    • Reinforcement learning and its application to Othello
    • Erasmus University
    • VanEck, N., & VanWezel, M. (2005). Reinforcement learning and its application to Othello (Technical Report). Erasmus University.
    • (2005) Technical Report
    • VanEck, N.1    VanWezel, M.2
  • 27
    • 27344453198 scopus 로고    scopus 로고
    • Potential-based shaping and Q-value initialization are equivalent
    • Wiewiora, E. (2003). Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research, 19, 205-208.
    • (2003) Journal of Artificial Intelligence Research , vol.19 , pp. 205-208
    • Wiewiora, E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.