메뉴 건너뛰기




Volumn 3, Issue , 2012, Pages 1749-1755

Action selection for MDPs: Anytime AO* versus UCT

Author keywords

[No Author keywords available]

Indexed keywords

ACTION SELECTION; AND/OR GRAPHS; DYNAMIC PROGRAMMING ALGORITHM; EXPLICIT GRAPHS; HEURISTIC SEARCH; INFINITE HORIZONS; OPTIMAL ALGORITHM; OPTIMAL POLICIES; OPTIMAL VARIANTS;

EID: 84868269234     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (28)

References (24)
  • 1
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Auer, P.; Cesa-Bianchi, N.; and Fischer, P. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47(2):235-256.
    • (2002) Machine Learning , vol.47 , Issue.2 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 2
    • 71549133876 scopus 로고    scopus 로고
    • UCT for tactical assault planning in real-time strategy games
    • Balla, R., and Fern, A. 2009. UCT for tactical assault planning in real-time strategy games. In Proc. IJCAI-09, 40-45.
    • (2009) Proc. IJCAI-09 , pp. 40-45
    • Balla, R.1    Fern, A.2
  • 3
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Barto, A.; Bradtke, S.; and Singh, S. 1995. Learning to act using real-time dynamic programming. Artificial Intelligence 72:81-138.
    • (1995) Artificial Intelligence , vol.72 , pp. 81-138
    • Barto, A.1    Bradtke, S.2    Singh, S.3
  • 4
    • 0031272681 scopus 로고    scopus 로고
    • Rollout algorithms for combinatorial optimization
    • Bertsekas, D.; Tsitsiklis, J.; and Wu, C. 1997. Rollout algorithms for combinatorial optimization. J. of Heuristics 3(3):245-262.
    • (1997) J. of Heuristics , vol.3 , Issue.3 , pp. 245-262
    • Bertsekas, D.1    Tsitsiklis, J.2    Wu, C.3
  • 5
    • 9444233135 scopus 로고    scopus 로고
    • Labeled RTDP: Improving the convergence of real-time dynamic programming
    • Bonet, B., and Geffner, H. 2003. Labeled RTDP: Improving the convergence of real-time dynamic programming. In Proc. ICAPS, 12-31.
    • (2003) Proc. ICAPS , pp. 12-31
    • Bonet, B.1    Geffner, H.2
  • 7
    • 85167430664 scopus 로고    scopus 로고
    • High-quality policies for the canadian traveler's problem
    • Eyerich, P.; Keller, T.; and Helmert, M. 2010. High-quality policies for the canadian traveler's problem. In Proc. AAAI.
    • (2010) Proc. AAAI
    • Eyerich, P.1    Keller, T.2    Helmert, M.3
  • 8
    • 57749181518 scopus 로고    scopus 로고
    • Simulation-based approach to general game playing
    • Finnsson, H., and Björnsson, Y. 2008. Simulation-based approach to general game playing. In Proc. AAAI, 259-264.
    • (2008) Proc. AAAI , pp. 259-264
    • Finnsson, H.1    Björnsson, Y.2
  • 9
    • 34547990649 scopus 로고    scopus 로고
    • Combining online and offline knowledge in uct
    • Gelly, S., and Silver, D. 2007. Combining online and offline knowledge in uct. In Proc. ICML, 273-280.
    • (2007) Proc. ICML , pp. 273-280
    • Gelly, S.1    Silver, D.2
  • 12
    • 84880649215 scopus 로고    scopus 로고
    • A sparse sampling algorithm for near-optimal planning in large MDPs
    • Kearns, M.; Mansour, Y.; and Ng, A. 1999. A sparse sampling algorithm for near-optimal planning in large MDPs. In Proc. IJCAI-99, 1324-1331.
    • (1999) Proc. IJCAI-99 , pp. 1324-1331
    • Kearns, M.1    Mansour, Y.2    Ng, A.3
  • 13
    • 33750293964 scopus 로고    scopus 로고
    • Bandit based Monte-Carlo planning
    • Kocsis, L., and Szepesvári, C. 2006. Bandit based Monte-Carlo planning. In Proc. ECML-2006, 282-293.
    • (2006) Proc. ECML-2006 , pp. 282-293
    • Kocsis, L.1    Szepesvári, C.2
  • 14
    • 59849106768 scopus 로고    scopus 로고
    • Comparing real-time and incremental heuristic search for real-time situated agents
    • Koenig, S., and Sun, X. 2009. Comparing real-time and incremental heuristic search for real-time situated agents. Autonomous Agents and Multi-Agent Systems 18(3):313-341.
    • (2009) Autonomous Agents and Multi-Agent Systems , vol.18 , Issue.3 , pp. 313-341
    • Koenig, S.1    Sun, X.2
  • 15
    • 33745735854 scopus 로고    scopus 로고
    • ARA*: Anytime A* with provable bounds on sub-optimality
    • Likhachev, M.; Gordon, G.; and Thrun, S. 2003. ARA*: Anytime A* with provable bounds on sub-optimality. In Proc. NIPS.
    • (2003) Proc. NIPS
    • Likhachev, M.1    Gordon, G.2    Thrun, S.3
  • 16
    • 70349275222 scopus 로고    scopus 로고
    • Bandit algorithms for tree search
    • Munos, R., and Coquelin, P. 2007. Bandit algorithms for tree search. In Proc. UAI, 67-74.
    • (2007) Proc. UAI , pp. 67-74
    • Munos, R.1    Coquelin, P.2
  • 19
    • 78650622420 scopus 로고    scopus 로고
    • On adversarial search spaces and sampling-based planning
    • Ramanujan, R.; Sabharwal, A.; and Selman, B. 2010. On adversarial search spaces and sampling-based planning. In Proc. ICAPS, 242-245.
    • (2010) Proc. ICAPS , pp. 242-245
    • Ramanujan, R.1    Sabharwal, A.2    Selman, B.3
  • 20
    • 85161963598 scopus 로고    scopus 로고
    • Monte-carlo planning in large POMDPs
    • Silver, D., and Veness, J. 2010. Monte-carlo planning in large POMDPs. In Proc. NIPS, 2164-2172.
    • (2010) Proc. NIPS , pp. 2164-2172
    • Silver, D.1    Veness, J.2
  • 22
    • 84868275750 scopus 로고    scopus 로고
    • Anytime heuristic search: Frameworks and algorithms
    • Thayer, J., and Ruml, W. 2010. Anytime heuristic search: Frameworks and algorithms. In Proc. SOCS.
    • (2010) Proc. SOCS
    • Thayer, J.1    Ruml, W.2
  • 23
    • 85167397400 scopus 로고    scopus 로고
    • Integrating sample-based planning and model-based reinforcement learning
    • Walsh, T.; Goschin, S.; and Littman, M. 2010. Integrating sample-based planning and model-based reinforcement learning. In Proc. AAAI.
    • (2010) Proc. AAAI
    • Walsh, T.1    Goschin, S.2    Littman, M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.