메뉴 건너뛰기




Volumn 38, Issue , 2010, Pages 687-755

Automatic induction of Bellman-error features for probabilistic planning

Author keywords

[No Author keywords available]

Indexed keywords

AUTOMATIC INDUCTION; BELLMAN EQUATIONS; BELLMAN ERROR; DECISION-THEORETIC; DOMAIN SPECIFIC; ERROR FEATURE; FEATURE LANGUAGE; FEATURE SETS; FIRST DOMAIN; HUMAN EXPERT; HUMAN INTERVENTION; HYPOTHESIS SPACE; LEARNING METHODS; MACHINE-LEARNING; PROBABILISTIC PLANNING; PROBLEM STRUCTURE; REAL WORLD DOMAIN; RELATIONAL FEATURE SPACES; STATE FEATURE; STATE-SPACE; TRAINING SETS; VALUE ITERATION;

EID: 77957872063     PISSN: None     EISSN: 10769757     Source Type: Journal    
DOI: 10.1613/jair.3021     Document Type: Article
Times cited : (9)

References (63)
  • 1
    • 0033897011 scopus 로고    scopus 로고
    • Using temporal logics to express search control knowledge for planning
    • Bacchus, F., & Kabanza, F. (2000). Using temporal logics to express search control knowledge for planning. Artificial Intelligence, 116, 123-191.
    • (2000) Artificial Intelligence , vol.116 , pp. 123-191
    • Bacchus, F.1    Kabanza, F.2
  • 4
    • 0032069371 scopus 로고    scopus 로고
    • Top-down induction of first-order logical decision trees
    • Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101, 285-297.
    • (1998) Artificial Intelligence , vol.101 , pp. 285-297
    • Blockeel, H.1    De Raedt, L.2
  • 9
    • 4444312102 scopus 로고    scopus 로고
    • Integrating guidance into relational reinforcement learning
    • Driessens, K., & Džeroski, S. (2004). Integrating guidance into relational reinforcement learning. Machine Learning, 57, 271-304.
    • (2004) Machine Learning , vol.57 , pp. 271-304
    • Driessens, K.1    Džeroski, S.2
  • 10
    • 33748273074 scopus 로고    scopus 로고
    • Graph kernels and gaussian processes for relational reinforcement learning
    • Driessens, K., Ramon, J., & Gärtner, T. (2006). Graph kernels and gaussian processes for relational reinforcement learning. Machine Learning, 64, 91-119.
    • (2006) Machine Learning , vol.64 , pp. 91-119
    • Driessens, K.1    Ramon, J.2    Gärtner, T.3
  • 11
    • 0035312760 scopus 로고    scopus 로고
    • Relational reinforcement learning
    • DOI 10.1023/A:1007694015589
    • Džeroski, S., DeRaedt, L., & Driessens, K. (2001). Relational reinforcement learning. Machine Learning, 43, 7-52. (Pubitemid 32286614)
    • (2001) Machine Learning , vol.43 , Issue.1-2 , pp. 7-52
    • Dzeroski, S.1    De Raedt, L.2    Driessens, K.3
  • 16
    • 0030087080 scopus 로고    scopus 로고
    • Knowledge-based feature discovery for evaluation functions
    • Fawcett, T. (1996). Knowledge-based feature discovery for evaluation functions. Computational Intelligence, 12(1), 42-64.
    • (1996) Computational Intelligence , vol.12 , Issue.1 , pp. 42-64
    • Fawcett, T.1
  • 18
    • 33744466799 scopus 로고    scopus 로고
    • Approximate policy iteration with a policy language bias: Solving relational Markov decision processes
    • Fern, A., Yoon, S., & Givan, R. (2006). Approximate policy iteration with a policy language bias: Solving relational Markov decision processes. Journal of Artificial Intelligence Research, 25, 75-118.
    • (2006) Journal of Artificial Intelligence Research , vol.25 , pp. 75-118
    • Fern, A.1    Yoon, S.2    Givan, R.3
  • 27
    • 0030416991 scopus 로고    scopus 로고
    • Failure driven dynamic search control for partial order planners: An explanation based approach
    • Kambhampati, S., Katukam, S., & Qu, Y. (1996). Failure driven dynamic search control for partial order planners: An explanation based approach. Artificial Intelligence, 88(1-2), 253-315.
    • (1996) Artificial Intelligence , vol.88 , Issue.1-2 , pp. 253-315
    • Kambhampati, S.1    Katukam, S.2    Qu, Y.3
  • 28
    • 0031074323 scopus 로고    scopus 로고
    • First order regression
    • Karalic, A., & Bratko, I. (1997). First order regression. Machine Learning, 26, 147-176.
    • (1997) Machine Learning , vol.26 , pp. 147-176
    • Karalic, A.1    Bratko, I.2
  • 32
    • 0033189384 scopus 로고    scopus 로고
    • Learning action strategies for planning domains
    • Khardon, R. (1999). Learning action strategies for planning domains. Artificial Intelligence, 113(1-2), 125-148.
    • (1999) Artificial Intelligence , vol.113 , Issue.1-2 , pp. 125-148
    • Khardon, R.1
  • 34
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
    • Mahadevan, S., & Maggioni, M. (2007). Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 8, 2169-2231.
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 35
    • 1142281116 scopus 로고    scopus 로고
    • Learning generalized policies from planning examples using concept languages
    • Martin, M., & Geffner, H. (2004). Learning generalized policies from planning examples using concept languages. Applied Intelligence, 20, 9-19.
    • (2004) Applied Intelligence , vol.20 , pp. 9-19
    • Martin, M.1    Geffner, H.2
  • 37
    • 0000640432 scopus 로고
    • Inductive logic programming
    • Muggleton, S. (1991). Inductive logic programming. New Generation Computing, 8(4), 295-318.
    • (1991) New Generation Computing , vol.8 , Issue.4 , pp. 295-318
    • Muggleton, S.1
  • 46
    • 60549103706 scopus 로고    scopus 로고
    • Practical solution techniques for first-order MDPs
    • Sanner, S., & Boutilier, C. (2009). Practical solution techniques for first-order MDPs. Artificial Intelligence, 173(5-6), 748-788.
    • (2009) Artificial Intelligence , vol.173 , Issue.5-6 , pp. 748-788
    • Sanner, S.1    Boutilier, C.2
  • 47
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • Singh, S., Jaakkola, T., Littman, M., & Szepesvari, C. (2000). Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3), 287-308.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.3    Szepesvari, C.4
  • 48
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 50
    • 33845344721 scopus 로고    scopus 로고
    • Learning tetris using the noisy cross-entropy method
    • Szita, I., & Lorincz, A. (2006). Learning tetris using the noisy cross-entropy method. Neural Computation, 18, 2936-2941.
    • (2006) Neural Computation , vol.18 , pp. 2936-2941
    • Szita, I.1    Lorincz, A.2
  • 51
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38(3), 58-68.
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 52
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J., & Roy, B. V. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.1    Roy, B.V.2
  • 53
    • 0008864313 scopus 로고    scopus 로고
    • Relative value function approximation
    • University of Massachusetts, Department of Computer Science
    • Utgoff, P. E., & Precup, D. (1997). Relative value function approximation. Tech. rep., University of Massachusetts, Department of Computer Science.
    • (1997) Tech. Rep.
    • Utgoff, P.E.1    Precup, D.2
  • 57
    • 0012252296 scopus 로고
    • Tight performance bounds on greedy policies based on imperfect value functions
    • Northeastern University
    • Williams, R. J., & Baird, L. C. (1993). Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep., Northeastern University.
    • (1993) Tech. Rep.
    • Williams, R.J.1    Baird, L.C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.