메뉴 건너뛰기




Volumn 53, Issue 1, 2005, Pages 126-139

An adaptive sampling algorithm for solving Markov decision processes

Author keywords

Dynamic programming optimal control: Markov finite state

Indexed keywords

ADAPTIVE SAMPLING; DYNAMIC PROGRAMMING/OPTIMAL CONTROL; MARKOV DECISION PROCESSES; MARKOV FINITE STATE;

EID: 14644444172     PISSN: 0030364X     EISSN: None     Source Type: Journal    
DOI: 10.1287/opre.1040.0145     Document Type: Article
Times cited : (120)

References (17)
  • 1
    • 0000616723 scopus 로고
    • Sample mean based index policies with O(log n) regret for the multiarmed bandit problem
    • Agrawal, R. 1995. Sample mean based index policies with O(log n) regret for the multiarmed bandit problem. Advances Appl. Probab. 27 1054-1078.
    • (1995) Advances Appl. Probab. , vol.27 , pp. 1054-1078
    • Agrawal, R.1
  • 2
    • 0024886640 scopus 로고
    • Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space
    • Agrawal, R., D. Teneketzis, V. Anantharam. 1989. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: Finite parameter space. IEEE Trans. Automat. Control 34 1249-1259.
    • (1989) IEEE Trans. Automat. Control , vol.34 , pp. 1249-1259
    • Agrawal, R.1    Teneketzis, D.2    Anantharam, V.3
  • 3
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Auer, P., N. Cesa-Bianchi, P. Fisher. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 235-256.
    • (2002) Machine Learning , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fisher, P.3
  • 6
    • 0034264701 scopus 로고    scopus 로고
    • A survey of computational complexity results in systems and control
    • Blondel, V. D., J. Tsitsiklis. 2000. A survey of computational complexity results in systems and control. Automatica 36 1249-1274.
    • (2000) Automatica , vol.36 , pp. 1249-1274
    • Blondel, V.D.1    Tsitsiklis, J.2
  • 7
    • 0031590025 scopus 로고    scopus 로고
    • Pricing American-style securities using simulation
    • Broadie, M., P. Glasserman. 1997. Pricing American-style securities using simulation. J. Econom. Dynamics Control 21 1323-1352.
    • (1997) J. Econom. Dynamics Control , vol.21 , pp. 1323-1352
    • Broadie, M.1    Glasserman, P.2
  • 8
    • 0007163041 scopus 로고    scopus 로고
    • Finite-time regret bounds for the multiarmed bandit problem
    • Morgan Kaufmann Publishers, San Francisco, CA
    • Cesa-Bianchi, N., P. Fisher. 1998. Finite-time regret bounds for the multiarmed bandit problem. Proc. 15th Int. Conf. Machine Learning. Morgan Kaufmann Publishers, San Francisco, CA, 101-108.
    • (1998) Proc. 15th Int. Conf. Machine Learning , pp. 101-108
    • Cesa-Bianchi, N.1    Fisher, P.2
  • 10
    • 0031145551 scopus 로고    scopus 로고
    • Asymptotically efficient adaptive choice of control laws in controlled Markov chains
    • Graves, T. L., T. L. Lai. 1997. Asymptotically efficient adaptive choice of control laws in controlled Markov chains. SIAM J. Control Optim. 35 715-743.
    • (1997) SIAM J. Control Optim. , vol.35 , pp. 715-743
    • Graves, T.L.1    Lai, T.L.2
  • 12
    • 0025502594 scopus 로고
    • Error bounds for rolling horizon policies in discrete-time Markov control processes
    • Hernández-Lerma, O., J. B. Lasserre. 1990. Error bounds for rolling horizon policies in discrete-time Markov control processes. IEEE Trans. Automat. Control 35 1118-1124.
    • (1990) IEEE Trans. Automat. Control , vol.35 , pp. 1118-1124
    • Hernández-Lerma, O.1    Lasserre, J.B.2
  • 13
    • 84947403595 scopus 로고
    • Probability inequalities for sums of bounded random variables
    • Hoeffding, W. 1963. Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13-30.
    • (1963) J. Amer. Statist. Assoc. , vol.58 , pp. 13-30
    • Hoeffding, W.1
  • 14
    • 0036832951 scopus 로고    scopus 로고
    • A sparse sampling algorithm for near-optimal planning in large Markov decision processes
    • Kearns, M., Y. Mansour, A. Y. Ng. 2001. A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning 49 193-208.
    • (2001) Machine Learning , vol.49 , pp. 193-208
    • Kearns, M.1    Mansour, Y.2    Ng, A.Y.3
  • 15
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Advances Appl. Math. 6 4-22.
    • (1985) Advances Appl. Math. , vol.6 , pp. 4-22
    • Lai, T.1    Robbins, H.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.