메뉴 건너뛰기




Volumn , Issue , 2004, Pages

Approximate policy iteration with a policy language bias

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES;

EID: 22944468731     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (48)

References (28)
  • 1
    • 0036784224 scopus 로고    scopus 로고
    • Using genetic programming to learn and improve control knowledge
    • Ricardo Aler, Daniel Borrajo, and Pedro Isasi. Using genetic programming to learn and improve control knowledge. AIJ, 141(1-2):29-56, 2002.
    • (2002) AIJ , vol.141 , Issue.1-2 , pp. 29-56
    • Aler, R.1    Borrajo, D.2    Isasi, P.3
  • 2
    • 0035442648 scopus 로고    scopus 로고
    • The AIPS '00 planning competition
    • 3
    • Fahiem Bacchus. The AIPS '00 planning competition. AI Magazine, 22(3)(3):57-62, 2001.
    • (2001) AI Magazine , vol.22 , Issue.3 , pp. 57-62
    • Bacchus, F.1
  • 3
    • 0033897011 scopus 로고    scopus 로고
    • Using temporal logics to express search control knowledge for planning
    • Fahiem Bacchus and Froduald Kabanza. Using temporal logics to express search control knowledge for planning. AIJ, 16:123-191, 2000.
    • (2000) AIJ , vol.16 , pp. 123-191
    • Bacchus, F.1    Kabanza, F.2
  • 6
    • 0012352653 scopus 로고    scopus 로고
    • Approximating value trees in structured dynamic programming
    • Lorenza Saitta, editor
    • Craig Boutilier and Richard Dearden. Approximating value trees in structured dynamic programming. In Lorenza Saitta, editor, ICML, 1996.
    • (1996) ICML
    • Boutilier, C.1    Dearden, R.2
  • 7
    • 0034248853 scopus 로고    scopus 로고
    • Stochastic dynamic programming with factored representations
    • Craig Boutilier, Richard Dearden, and Moises Goldszmidt. Stochastic dynamic programming with factored representations. AIJ, 121(1-2):49-107, 2000.
    • (2000) AIJ , vol.121 , Issue.1-2 , pp. 49-107
    • Boutilier, C.1    Dearden, R.2    Goldszmidt, M.3
  • 8
    • 84880891360 scopus 로고    scopus 로고
    • Symbolic dynamic programming for firstorder MDPs
    • Craig Boutilier, Raymond Reiter, and Bob Price. Symbolic dynamic programming for firstorder MDPs. In IJCAI, 2001.
    • (2001) IJCAI
    • Boutilier, C.1    Reiter, R.2    Price, B.3
  • 9
    • 0035312760 scopus 로고    scopus 로고
    • Relational reinforcement learning
    • S. Dzeroski, L. De Raedt & K. Driessens. Relational reinforcement learning. MLJ, 43:7-52, 2001.
    • (2001) MLJ , vol.43 , pp. 7-52
    • Dzeroski, S.1    De Raedt, L.2    Driessens, K.3
  • 10
    • 33845987323 scopus 로고    scopus 로고
    • Multi-strategy learning of search control for partialorder planning
    • Tara A. Estlin and Raymond J. Mooney. Multi-strategy learning of search control for partialorder planning. In AAAI, 1996.
    • (1996) AAAI
    • Estlin, T.A.1    Mooney, R.J.2
  • 11
    • 0038517214 scopus 로고    scopus 로고
    • Equivalence notions and model minimization in markov decision processes
    • Robert Givan, Thomas Dean, and Matt Greig. Equivalence notions and model minimization in Markov decision processes. AIJ, 147(1-2):163-223, 2003.
    • (2003) AIJ , vol.147 , Issue.1-2 , pp. 163-223
    • Givan, R.1    Dean, T.2    Greig, M.3
  • 12
    • 84880898477 scopus 로고    scopus 로고
    • Max-norm projections for factored MDPs
    • Carlos Guestrin, Daphne Koller, and Ronald Parr. Max-norm projections for factored MDPs. In IJCAI, pages 673-680, 2001.
    • (2001) IJCAI , pp. 673-680
    • Guestrin, C.1    Koller, D.2    Parr, R.3
  • 13
    • 0036377352 scopus 로고    scopus 로고
    • The FF planning system: Fast plan generation through heuristic search
    • Jorg Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. JAIR, 14:263-302, 2001.
    • (2001) JAIR , vol.14 , pp. 263-302
    • Hoffmann, J.1    Nebel, B.2
  • 15
    • 8344223155 scopus 로고    scopus 로고
    • Learning declarative control rules for constraint-based planning
    • Yi-Cheng Huang, Bart Selman, and Henry Kautz. Learning declarative control rules for constraint-based planning. In ICML, pages 415-422, 2000.
    • (2000) ICML , pp. 415-422
    • Huang, Y.-C.1    Selman, B.2    Kautz, H.3
  • 16
    • 0036832951 scopus 로고    scopus 로고
    • A sparse sampling algorithm for nearoptimal planning in large markov decision processes
    • Michael J. Kearns, Yishay Mansour, and Andrew Y. Ng. A sparse sampling algorithm for nearoptimal planning in large markov decision processes. MLJ, 49(2-3):193-208, 2002.
    • (2002) MLJ , vol.49 , Issue.2-3 , pp. 193-208
    • Kearns, M.J.1    Mansour, Y.2    Ng, A.Y.3
  • 17
    • 0033189384 scopus 로고    scopus 로고
    • Learning action strategies for planning domains
    • Roni Khardon. Learning action strategies for planning domains. AIJ, 113(1-2):125-148, 1999.
    • (1999) AIJ , vol.113 , Issue.1-2 , pp. 125-148
    • Khardon, R.1
  • 18
    • 1942420814 scopus 로고    scopus 로고
    • Reinforcement learning as classification: Leveraging modern classifiers
    • M. Lagoudakis and R. Parr. Reinforcement learning as classification: Leveraging modern classifiers. In ICML, 2003.
    • (2003) ICML
    • Lagoudakis, M.1    Parr, R.2
  • 19
    • 0038362668 scopus 로고    scopus 로고
    • Learning generalized policies in planning domains using concept languages
    • Mario Martin and Hector Geffner. Learning generalized policies in planning domains using concept languages. In KRR, 2000.
    • (2000) KRR
    • Martin, M.1    Geffner, H.2
  • 20
    • 0027574520 scopus 로고
    • Taxonomic syntax for 1st-order inference
    • D. McAllester & R. Givan. Taxonomic syntax for 1st-order inference. JACM, 40:246-83, 1993.
    • (1993) JACM , vol.40 , pp. 246-283
    • Mcallester, D.1    Givan, R.2
  • 21
    • 84990622495 scopus 로고
    • Quantitative results on the utility of explanation-based learning
    • S. Minton. Quantitative results on the utility of explanation-based learning. In AAAI, 1988.
    • (1988) AAAI
    • Minton, S.1
  • 24
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • G. Tesauro. Practical issues in temporal difference learning. MLJ, 8:257-277, 1992.
    • (1992) MLJ , vol.8 , pp. 257-277
    • Tesauro, G.1
  • 25
    • 0001332415 scopus 로고    scopus 로고
    • Online policy improvement via monte-carlo search
    • G. Tesauro & G. Galperin. Online policy improvement via monte-carlo search. In NIPS, 1996.
    • (1996) NIPS
    • Tesauro, G.1    Galperin, G.2
  • 26
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale DP
    • J. Tsitsiklis and B. Van Roy. Feature-based methods for large scale DP. MLJ, 22:59-94, 1996.
    • (1996) MLJ , vol.22 , pp. 59-94
    • Tsitsiklis, J.1    Van Roy, B.2
  • 28
    • 13444310066 scopus 로고    scopus 로고
    • Inductive policy selection for first-order MDPs
    • S. Yoon, A. Fern, and R. Givan. Inductive policy selection for first-order MDPs. In UAI, 2002.
    • (2002) UAI
    • Yoon, S.1    Fern, A.2    Givan, R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.