메뉴 건너뛰기




Volumn 7, Issue , 2006, Pages 1079-1105

Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

Author keywords

[No Author keywords available]

Indexed keywords

MANNOR; MULTI-ARMED BANDIT; REINFORCEMENT LEARNING ALGORITHMS; REINFORCEMENT LEARNING PROBLEMS; TSITSIKLIS;

EID: 33745295134     PISSN: 15337928     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (668)

References (30)
  • 3
    • 0018454769 scopus 로고
    • Fast probabilistic algorithms for Hamiltonian circuits and matchings
    • D. Angluin and L. G. Valiant. Fast probabilistic algorithms for Hamiltonian circuits and matchings. Journal of Computer and System Sciences, 18:155-193, 1979.
    • (1979) Journal of Computer and System Sciences , vol.18 , pp. 155-193
    • Angluin, D.1    Valiant, L.G.2
  • 9
    • 0033876515 scopus 로고    scopus 로고
    • The O.D.E. method for convergence of stochastic approximation and reinforcement learning
    • V. S. Borkar and S.P Meyn. The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control Optim., 38(2):447-469, 2000.
    • (2000) SIAM J. Control Optim. , vol.38 , Issue.2 , pp. 447-469
    • Borkar, V.S.1    Meyn, S.P.2
  • 12
    • 84947403595 scopus 로고
    • Probability inequalities for sums of bounded random variables
    • W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58(301): 13-30, 1963.
    • (1963) Journal of the American Statistical Association , vol.58 , Issue.301 , pp. 13-30
    • Hoeffding, W.1
  • 15
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • M. Kearns and S. Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.1    Singh, S.2
  • 17
    • 84899026236 scopus 로고    scopus 로고
    • Finite-sample convergence rates for Q-learning and indirect algorithms
    • M. Kearns and S. P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems 10, pages 996-1002, 1998.
    • (1998) Neural Information Processing Systems , vol.10 , pp. 996-1002
    • Kearns, M.1    Singh, S.P.2
  • 20
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 21
    • 0006193487 scopus 로고
    • A modified dynamic programming method for Markov decision problems
    • J. MacQueen. A modified dynamic programming method for Markov decision problems. J. Math. Anal. Appl., 14:38-43, 1966.
    • (1966) J. Math. Anal. Appl. , vol.14 , pp. 38-43
    • MacQueen, J.1
  • 22
    • 30044441333 scopus 로고    scopus 로고
    • The sample complexity of exploration in the multi-armed bandit problem
    • S. Mannor and J. N. Tsitsiklis. The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5:623-648, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 623-648
    • Mannor, S.1    Tsitsiklis, J.N.2
  • 24
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and S. Sen. Kernel-based reinforcement learning. Machine Learning, 49(2-3): 161-178, 2002.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 26
    • 84966203785 scopus 로고
    • Some aspects of sequential design of experiments
    • H. Robbins. Some aspects of sequential design of experiments. Bull. Amer. Math. Soc., 55:527-535, 1952.
    • (1952) Bull. Amer. Math. Soc. , vol.55 , pp. 527-535
    • Robbins, H.1
  • 27
    • 0028497385 scopus 로고
    • An upper bound on the loss from approximate optimal-value functions
    • S. P. Singh and R. C. Yee. An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16(3):227-233, 1994.
    • (1994) Machine Learning , vol.16 , Issue.3 , pp. 227-233
    • Singh, S.P.1    Yee, R.C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.