메뉴 건너뛰기




Volumn , Issue , 2007, Pages 2437-2442

Using linear programming for bayesian exploration in markov decision processes

Author keywords

[No Author keywords available]

Indexed keywords

BANDIT PROBLEMS; EXPLORATION ALGORITHMS; EXPLORATION METHODS; MARKOV DECISION PROCESSES; OPTIMIZATION PROBLEMS; ORIGINAL SYSTEMS; SPARSE SAMPLING; VALUE FUNCTIONS;

EID: 70349431917     PISSN: 10450823     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (28)

References (20)
  • 1
    • 84938011869 scopus 로고
    • On adaptive control processes
    • R. Bellman and R. Kalaba. On adaptive control processes. In IRE Trans., volume 4, pages 1-9, 1959.
    • (1959) IRE Trans. , vol.4 , pp. 1-9
    • Bellman, R.1    Kalaba, R.2
  • 2
    • 84880854156 scopus 로고    scopus 로고
    • R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
    • R. I. Brafman andM. Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. In IJCAI, pages 953-958, 2001.
    • (2001) IJCAI , pp. 953-958
    • Brafman, R.I.1    Tennenholtz, M.2
  • 3
    • 0348090400 scopus 로고    scopus 로고
    • The linear programming approach to approximate dynamic programming
    • D.P. de Farias and B. Van Roy. The linear programming approach to approximate dynamic programming. Operations Research, 51(6):850-865, 2003.
    • (2003) Operations Research , vol.51 , Issue.6 , pp. 850-865
    • De Farias, D.P.1    Van Roy, B.2
  • 4
    • 5544258192 scopus 로고    scopus 로고
    • On constraint sampling for the linear programming approach to approximate dynamic programming
    • D.P. de Farias and B. Van Roy. On constraint sampling for the linear programming approach to approximate dynamic programming. Mathematics of Operations Research, 29(3):462-478, 2004.
    • (2004) Mathematics of Operations Research , vol.29 , Issue.3 , pp. 462-478
    • De Farias, D.P.1    Van Roy, B.2
  • 9
  • 10
    • 0012257655 scopus 로고    scopus 로고
    • Nearoptimal reinforcement learning in polynomial time
    • M. Kearns and S. Singh. Nearoptimal reinforcement learning in polynomial time. In Proc. 15th ICML, pages 260-268, 1998.
    • (1998) Proc. 15th ICML , pp. 260-268
    • Kearns, M.1    Singh, S.2
  • 11
    • 84880649215 scopus 로고    scopus 로고
    • A sparse sampling algorithm for near-optimal planning in large markov decision processes
    • M. Kearns, Y. Mansour, and A. Y. Ng. A sparse sampling algorithm for near-optimal planning in large markov decision processes. In IJCAI, pages 1324-1231, 1999.
    • (1999) IJCAI , pp. 1324-11231
    • Kearns, M.1    Mansour, Y.2    Ng, A.Y.3
  • 13
    • 0032679082 scopus 로고    scopus 로고
    • Exploration of multi-state environments: Local measures and back-propagation of uncertainty
    • N. Meuleau and P. Bourgine. Exploration of multi-state environments: Local measures and back-propagation of uncertainty. Machine Learning, 35(2):117-154, 1999.
    • (1999) Machine Learning , vol.35 , Issue.2 , pp. 117-154
    • Meuleau, N.1    Bourgine, P.2
  • 15
    • 0000273218 scopus 로고
    • Generalized polynomial approximations in markovian decision processes
    • P. Schweitzer and A. Seidmann. Generalized polynomial approximations in markovian decision processes. J. Math. Anal. Appl., 110:568-582, 1985.
    • (1985) J. Math. Anal. Appl. , vol.110 , pp. 568-582
    • Schweitzer, P.1    Seidmann, A.2
  • 16
    • 31844432138 scopus 로고    scopus 로고
    • A theoretical analysis of model-based interval estimation
    • A. L. Strehl and M. L. Littman. A theoretical analysis of model-based interval estimation. In Proc. 21st ICML, 2005.
    • Proc. 21st ICML, 2005
    • Strehl, A.L.1    Littman, M.L.2
  • 19
    • 0003411271 scopus 로고
    • Technical Report CMU-CS-92-102, Carnegie Mellon University, School of Computer Science
    • S. Thrun. Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, School of Computer Science, 1992.
    • (1992) Efficient Exploration in Reinforcement Learning
    • Thrun, S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.