메뉴 건너뛰기




Volumn 3194, Issue , 2004, Pages 180-197

Logical Markov decision programs and the convergence of logical TD(λ)

Author keywords

[No Author keywords available]

Indexed keywords

DECISION THEORY; LOGIC PROGRAMMING; MARKOV PROCESSES; PROBABILISTIC LOGICS; REGRESSION ANALYSIS; TREES (MATHEMATICS); REINFORCEMENT LEARNING;

EID: 22944490192     PISSN: 03029743     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1007/978-3-540-30109-7_16     Document Type: Conference Paper
Times cited : (11)

References (35)
  • 2
    • 0032652172 scopus 로고    scopus 로고
    • Towards a model of intelligence as an economy of agents
    • E. B. Baum. Towards a Model of Intelligence as an Economy of Agents. Machine Learning, 35(2):155-185, 1999.
    • (1999) Machine Learning , vol.35 , Issue.2 , pp. 155-185
    • Baum, E.B.1
  • 3
    • 0346942368 scopus 로고    scopus 로고
    • Decision-theoretic planning: Structural assumptions and computational leverage
    • C. Boutilier, T. Deam, and S. Hanks. Decision-Theoretic Planning: Structural Assumptions and Computational Leverage. JAIR, 11:1-94, 1999.
    • (1999) JAIR , vol.11 , pp. 1-94
    • Boutilier, C.1    Deam, T.2    Hanks, S.3
  • 5
    • 0001133021 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: safely approximating the value function. In Advances in Neural Information Processing Systems, volume 7, 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Boyan, J.A.1    Moore, A.W.2
  • 7
    • 0030697013 scopus 로고    scopus 로고
    • Abstraction and approximate decision theoretic planning
    • R. Dearden and C. Boutilier. Abstraction and approximate decision theoretic planning. Artificial Intelligence, 89(1):219-283, 1997.
    • (1997) Artificial Intelligence , vol.89 , Issue.1 , pp. 219-283
    • Dearden, R.1    Boutilier, C.2
  • 8
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • Thomas G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 2000.
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 16
    • 0038517214 scopus 로고    scopus 로고
    • Equivalence notions and model minimization in Markov decision processes
    • R. Givan, T. Dean, and M. Greig. Equivalence notions and model minimization in Markov decision processes. Artificial Intelligence, 147:163-224, 2003.
    • (2003) Artificial Intelligence , vol.147 , pp. 163-224
    • Givan, R.1    Dean, T.2    Greig, M.3
  • 17
  • 22
    • 0038178323 scopus 로고    scopus 로고
    • Solving factored mdps using non-homogeneous partitions
    • K.-E. Kim and T. Dean. Solving factored mdps using non-homogeneous partitions. Artificial Intelligence, 147:225-251, 2003.
    • (2003) Artificial Intelligence , vol.147 , pp. 225-251
    • Kim, K.-E.1    Dean, T.2
  • 24
    • 0028429573 scopus 로고
    • Inductive logic programming: Theory and methods
    • S. Muggleton and L. De Raedt. Inductive logic programming: Theory and methods. Journal of Logic Programming, 19(20):629-679, 1994.
    • (1994) Journal of Logic Programming , vol.19 , Issue.20 , pp. 629-679
    • Muggleton, S.1    De Raedt, L.2
  • 25
    • 0033315871 scopus 로고    scopus 로고
    • Influence and variance of a Markov chain: Application to adaptive discretization in optimal control
    • R. Munos and A. Moore. Influence and Variance of a Markov Chain: Application to Adaptive Discretization in Optimal Control. In Proceedings of the IEEE Conference on Decision and Control, 1999.
    • (1999) Proceedings of the IEEE Conference on Decision and Control
    • Munos, R.1    Moore, A.2
  • 26
    • 0031187203 scopus 로고    scopus 로고
    • The independent choice logic for modelling multiple agents under uncertainty
    • D. Poole. The independent choice logic for modelling multiple agents under uncertainty. Artificial Intelligence, 94(1-2):7-56, 1997.
    • (1997) Artificial Intelligence , vol.94 , Issue.1-2 , pp. 7-56
    • Poole, D.1
  • 31
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • R. S. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211, 1999.
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 32
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions of Automatic Control, 42:674-690, 1997.
    • (1997) IEEE Transactions of Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 34
    • 0002557085 scopus 로고
    • Learning to perceive and act by trial and error
    • S. D. Whitehead and D. H. Ballard. Learning to perceive and act by trial and error. Machine Learning, 7(1):45-83, 1991.
    • (1991) Machine Learning , vol.7 , Issue.1 , pp. 45-83
    • Whitehead, S.D.1    Ballard, D.H.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.