메뉴 건너뛰기




Volumn , Issue , 2010, Pages 30-37

Reinforcement learning with action discovery

Author keywords

Multi agent learning; Reinforcement learning

Indexed keywords

COORDINATION POLICIES; COORDINATION PROBLEMS; MULTI-AGENT COORDINATIONS; MULTI-AGENT LEARNING; MULTI-AGENT REINFORCEMENT LEARNING; REINFORCEMENT LEARNING SOLUTION; SAMPLE COMPLEXITY; SOLUTION QUALITY;

EID: 79955460528     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (8)

References (15)
  • 1
    • 84864030941 scopus 로고    scopus 로고
    • An application of reinforcement learning to aerobatic helicopter flight
    • P. Abbeel, A. Coates, M. Quigley, and A. Y. Ng. An application of reinforcement learning to aerobatic helicopter flight. In NIPS 19, 2007.
    • (2007) NIPS 19
    • Abbeel, P.1    Coates, A.2    Quigley, M.3    Ng, A.Y.4
  • 4
    • 0031630561 scopus 로고    scopus 로고
    • The dynamics of reinforcement learning in cooperative multiagent systems
    • Menlo Park, CA, AAAI Press/MIT Press
    • C. Claus and C. Boutilier. The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the 15th National Conference on Artificial Intelligence, pages 746-752, Menlo Park, CA, 1998. AAAI Press/MIT Press.
    • (1998) Proceedings of the 15th National Conference on Artificial Intelligence , pp. 746-752
    • Claus, C.1    Boutilier, C.2
  • 6
    • 4644369748 scopus 로고    scopus 로고
    • Nash Q-learning for general-sum stochastic games
    • J. Hu and M. P. Wellman. Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research, 4:1039-1069, 2003.
    • (2003) Journal of Machine Learning Research , vol.4 , pp. 1039-1069
    • Hu, J.1    Wellman, M.P.2
  • 7
    • 85161968592 scopus 로고    scopus 로고
    • Reinforcement learning in continuous action spaces through sequential monte carlo methods
    • J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Cambridge, MA, MIT Press
    • A. Lazaric, M. Restelli, and A. Bonarini. Reinforcement learning in continuous action spaces through sequential monte carlo methods. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 833-840, Cambridge, MA, 2008. MIT Press.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 833-840
    • Lazaric, A.1    Restelli, M.2    Bonarini, A.3
  • 8
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • San Mateo, CA, Morgan Kaufmann
    • M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proc. of the 11th Int. Conf. on Machine Learning, pages 157-163, San Mateo, CA, 1994. Morgan Kaufmann.
    • (1994) Proc. of the 11th Int. Conf. on Machine Learning , pp. 157-163
    • Littman, M.L.1
  • 9
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and application to reward shaping
    • Morgan Kaufmann
    • A. Y. Ng, D. Harada, and S. Russell. Policy invariance under reward transformations: Theory and application to reward shaping. In Proc. 16th International Conf. on Machine Learning, pages 278-287. Morgan Kaufmann, 1999.
    • (1999) Proc. 16th International Conf. on Machine Learning , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.3
  • 10
    • 0013530450 scopus 로고    scopus 로고
    • Lyapunov-constrained action sets for reinforcement learning
    • T. J. Perkins and A. G. Barto. Lyapunov-constrained action sets for reinforcement learning. In Proceedings of the ICML, pages 409-416, 2001.
    • (2001) Proceedings of the ICML , pp. 409-416
    • Perkins, T.J.1    Barto, A.G.2
  • 13
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • R. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211, 1999.
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.1    Precup, D.2    Singh, S.3
  • 14
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-gammon
    • G. Tesauro. Temporal difference learning and TD-gammon. Communications of the ACM, 38(3):58-68, 1995.
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 15
    • 27344453198 scopus 로고    scopus 로고
    • Potential based shaping and Q-value initialization are equivalent
    • E. Wiewiora. Potential based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research, pages 205-208, 2003.
    • (2003) Journal of Artificial Intelligence Research , pp. 205-208
    • Wiewiora, E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.