메뉴 건너뛰기




Volumn 6911 LNAI, Issue PART 1, 2011, Pages 312-327

Preference-based policy iteration: Leveraging preference learning for reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

CLASSIFICATION METHODS; NOVEL METHODS; POLICY ITERATION; POLICY MODEL; PREFERENCE LEARNING; PREFERENCE-BASED; QUALITATIVE FEEDBACK; RANKING FUNCTIONS; ROLL-OUTS; SUBFIELDS; LABEL RANKINGS; POLICY LEARNING;

EID: 80052414142     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-23780-5_30     Document Type: Conference Paper
Times cited : (33)

References (17)
  • 3
    • 48349140736 scopus 로고    scopus 로고
    • Rollout sampling approximate policy iteration
    • Dimitrakakis, C., Lagoudakis, M.G.: Rollout sampling approximate policy iteration. Machine Learning 72(3), 157-171 (2008)
    • (2008) Machine Learning , vol.72 , Issue.3 , pp. 157-171
    • Dimitrakakis, C.1    Lagoudakis, M.G.2
  • 4
    • 33744466799 scopus 로고    scopus 로고
    • Approximate policy iteration with a policy language bias: Solving relational markov decision processes
    • Fern, A., Yoon, S.W., Givan, R.: Approximate policy iteration with a policy language bias: Solving relational markov decision processes. Journal of Artificial Intelligence Research 25, 75-118 (2006)
    • (2006) Journal of Artificial Intelligence Research , vol.25 , pp. 75-118
    • Fern, A.1    Yoon, S.W.2    Givan, R.3
  • 9
    • 56449088242 scopus 로고    scopus 로고
    • Non-parametric policy gradients: A unified treatment of propositional and relational domains
    • Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) ACM, Helsinki
    • Kersting, K., Driessens, K.: Non-parametric policy gradients: a unified treatment of propositional and relational domains. In: Cohen, W.W., McCallum, A., Roweis, S.T. (eds.) Proceedings of the 25th International Conference on Machine Learning (ICML 2008), pp. 456-463. ACM, Helsinki (2008)
    • (2008) Proceedings of the 25th International Conference on Machine Learning (ICML 2008) , pp. 456-463
    • Kersting, K.1    Driessens, K.2
  • 11
    • 1942420814 scopus 로고    scopus 로고
    • Reinforcement learning as classification: Leveraging modern classifiers
    • Fawcett, T.E., Mishra, N. (eds.) AAAI Press, Washington, DC
    • Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Fawcett, T.E., Mishra, N. (eds.) Proceedings of the 20th International Conference on Machine Learning (ICML 2003), pp. 424-431. AAAI Press, Washington, DC (2003)
    • (2003) Proceedings of the 20th International Conference on Machine Learning (ICML 2003) , pp. 424-431
    • Lagoudakis, M.G.1    Parr, R.2
  • 12
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R.S.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9-44 (1988)
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 13
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) MIT Press, Denver
    • Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K.-R. (eds.) Advances in Neural Information Processing Systems 12 (NIPS- 1999), pp. 1057-1063. MIT Press, Denver (1999)
    • (1999) Advances in Neural Information Processing Systems 12 (NIPS- 1999) , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.A.2    Singh, S.P.3    Mansour, Y.4
  • 14
    • 84890217212 scopus 로고    scopus 로고
    • Label ranking algorithms: A survey
    • Fürnkranz and Hüllermeier
    • Vembu, S., Gärtner, T.: Label ranking algorithms: A survey. In: Fürnkranz and Hüllermeier [5], pp. 45-64.
    • Preference Learning , pp. 45-64
    • Vembu, S.1    Gärtner, T.2
  • 16
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229-256 (1992)
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 17
    • 70449449564 scopus 로고    scopus 로고
    • Reinforcement learning design for cancer clinical trials
    • Zhao, Y., Kosorok, M., Zeng, D.: Reinforcement learning design for cancer clinical trials. Statistics in Medicine 28, 3295-3315 (2009)
    • (2009) Statistics in Medicine , vol.28 , pp. 3295-3315
    • Zhao, Y.1    Kosorok, M.2    Zeng, D.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.