메뉴 건너뛰기




Volumn 14, Issue 2, 2011, Pages 279-305

Action discovery for single and multi-agent reinforcement learning

Author keywords

Multi agent reinforcement learning; multi agent systems

Indexed keywords


EID: 79955402817     PISSN: 02195259     EISSN: None     Source Type: Journal    
DOI: 10.1142/S0219525911002937     Document Type: Conference Paper
Times cited : (6)

References (17)
  • 1
    • 84864030941 scopus 로고    scopus 로고
    • An application of reinforcement learning to aerobatic helicopter flight
    • Abbeel, P., Coates, A., Quigley, M. and Ng, A. Y., An application of reinforcement learning to aerobatic helicopter flight, NIPS 19 (2007).
    • (2007) NIPS , vol.19
    • Abbeel, P.1    Coates, A.2    Quigley, M.3    Ng, A.Y.4
  • 4
    • 35248823118 scopus 로고    scopus 로고
    • Generalized multiagent learning with performance bound
    • Banerjee, B. and Peng, J., Generalized multiagent learning with performance bound, Autonomous Agents and Multiagent Systems 15(3) (2007) 281-312.
    • (2007) Autonomous Agents and Multiagent Systems , vol.15 , Issue.3 , pp. 281-312
    • Banerjee, B.1    Peng, J.2
  • 6
    • 0031630561 scopus 로고    scopus 로고
    • The dynamics of reinforcement learning in cooperative multiagent systems
    • Menlo Park, CA, (AAAI Press/MIT Press)
    • Claus, C. and Boutilier, C., The dynamics of reinforcement learning in cooperative multiagent systems, in Proceedings of the 15th National Conference on Artificial Intelligence, Menlo Park, CA, (AAAI Press/MIT Press, 1998), pp. 746-752.
    • (1998) Proceedings of the 15th National Conference on Artificial Intelligence , pp. 746-752
    • Claus, C.1    Boutilier, C.2
  • 8
    • 4644369748 scopus 로고    scopus 로고
    • Nash Q-learning for general-sum stochastic games
    • Hu, J. and Wellman, M. P., Nash Q-learning for general-sum stochastic games, J. Mach. Learn. Res. 4 (2003) 1039-1069.
    • (2003) J. Mach. Learn. Res. , vol.4 , pp. 1039-1069
    • Hu, J.1    Wellman, M.P.2
  • 9
    • 85161968592 scopus 로고    scopus 로고
    • Reinforcement learning in continuous action spaces through sequential monte carlo methods
    • Platt, J., Koller, D., Singer, Y. and Roweis, S., (eds.), Cambridge, MA, MIT Press
    • Lazaric, A., Restelli, M. and Bonarini, A., Reinforcement learning in continuous action spaces through sequential monte carlo methods, Platt, J., Koller, D., Singer, Y. and Roweis, S., (eds.), Advances in Neural Information Processing Systems 20, Cambridge, MA, (MIT Press, 2008), pp. 833-840.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 833-840
    • Lazaric, A.1    Restelli, M.2    Bonarini, A.3
  • 10
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • San Mateo, CA (Morgan Kaufmann)
    • Littman, M. L., Markov games as a framework for multi-agent reinforcement learning, in Proc. of the 11th Int. Conf. on Machine Learning, San Mateo, CA (Morgan Kaufmann, 1994), pp. 157-163.
    • (1994) Proc. of the 11th Int. Conf. on Machine Learning , pp. 157-163
    • Littman, M.L.1
  • 11
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and application to reward shaping
    • Morgan Kaufmann
    • Ng, A. Y., Harada, D. and Russell, S., Policy invariance under reward transformations: Theory and application to reward shaping, in Proc. 16th International Conf. on Machine Learning (Morgan Kaufmann, 1999), pp. 278-287.
    • (1999) Proc. 16th International Conf. on Machine Learning , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.3
  • 12
    • 0013530450 scopus 로고    scopus 로고
    • Lyapunov-constrained action sets for reinforcement learning
    • Perkins, T. J. and Barto, A. G., Lyapunov-constrained action sets for reinforcement learning, in Proceedings of the ICML (2001), pp. 409-416.
    • (2001) Proceedings of the ICML , pp. 409-416
    • Perkins, T.J.1    Barto, A.G.2
  • 15
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • DOI 10.1016/S0004-3702(99)00052-1
    • Sutton, R., Precup, D. and Singh, S., Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, Artif. Intel. 112 (1999) 181-211. (Pubitemid 32079890)
    • (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 16
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-gammon
    • Tesauro, G., Temporal difference learning and TD-gammon, Commun. ACM 38(3) (1995) 58-68.
    • (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 17
    • 27344453198 scopus 로고    scopus 로고
    • Potential-based shaping and Q-value initialization are equivalent
    • Wiewiora, E., Potential based shaping and Q-value initialization are equivalent, J. Artif. Intel. Res. 19 (2003) 205-208. (Pubitemid 41525920)
    • (2003) Journal of Artificial Intelligence Research , vol.19 , pp. 205-208
    • Wiewiora, E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.