메뉴 건너뛰기




Volumn , Issue , 2009, Pages 2946-2953

Arbitrarily modulated Markov decision processes

Author keywords

[No Author keywords available]

Indexed keywords

BEHAVIORAL RESEARCH; COMPUTATIONAL EFFICIENCY; DECISION MAKING; MARKOV PROCESSES; REINFORCEMENT LEARNING;

EID: 77950787050     PISSN: 07431546     EISSN: 25762370     Source Type: Conference Proceeding    
DOI: 10.1109/CDC.2009.5400662     Document Type: Conference Paper
Times cited : (51)

References (24)
  • 1
    • 0000392613 scopus 로고
    • Stochastic games
    • L. Shapley, "Stochastic games," PNAS, vol.39, no.10, pp. 1095-1100, 1953.
    • (1953) PNAS , vol.39 , Issue.10 , pp. 1095-1100
    • Shapley, L.1
  • 2
    • 41649111187 scopus 로고    scopus 로고
    • Experts in a Markov decision process
    • E. Even-Dar, S. Kakade, and Y. Mansour, "Experts in a Markov decision process," in NIPS, 2004, pp. 401-408.
    • (2004) NIPS , pp. 401-408
    • Even-Dar, E.1    Kakade, S.2    Mansour, Y.3
  • 3
    • 0038386340 scopus 로고    scopus 로고
    • The empirical Bayes envelope and regret minimization in competitive Markov decision processes
    • S. Mannor and N. Shimkin, "The empirical Bayes envelope and regret minimization in competitive Markov decision processes," Mathematics of Operations Research, vol.28, no.2, pp. 327-345, 2003.
    • (2003) Mathematics of Operations Research , vol.28 , Issue.2 , pp. 327-345
    • Mannor, S.1    Shimkin, N.2
  • 6
    • 14344250395 scopus 로고    scopus 로고
    • Robust control of Markov decision processes with uncertain transition matrices
    • A. Nilim and L. E. Ghaoui, "Robust control of Markov decision processes with uncertain transition matrices," Operations Research, vol.53, no.5, pp. 780-798, 2005.
    • (2005) Operations Research , vol.53 , Issue.5 , pp. 780-798
    • Nilim, A.1    Ghaoui, L.E.2
  • 8
    • 0041965975 scopus 로고    scopus 로고
    • R-max - A general polynomial time algorithm for near-optimal reinforcement learning
    • R. I. Brafman and M. Tennenholtz, "R-max - a general polynomial time algorithm for near-optimal reinforcement learning," Journal of Machine Learning Research, vol.3, pp. 213-231, 2003.
    • (2003) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 9
    • 70349280578 scopus 로고    scopus 로고
    • Markov decision processes with arbitrary rewards
    • to appear
    • J. Y. Yu, S. Mannor, and N. Shimkin, "Markov decision processes with arbitrary rewards," Math. Oper. Res., 2009, to appear.
    • (2009) Math. Oper. Res.
    • Yu, J.Y.1    Mannor, S.2    Shimkin, N.3
  • 10
    • 70349986740 scopus 로고    scopus 로고
    • Online learning in Markov decision processes with arbitrarily changing rewards and transitions
    • J. Y. Yu and S. Mannor, "Online learning in Markov decision processes with arbitrarily changing rewards and transitions," in GameNets, 2009.
    • (2009) GameNets
    • Yu, J.Y.1    Mannor, S.2
  • 13
    • 0001616908 scopus 로고
    • Uncertainty principles and signal recovery
    • D. L. Donoho and P. B. Stark, "Uncertainty principles and signal recovery," SIAM J. Appl. Math., vol.49, no.3, pp. 906-931, 1989.
    • (1989) SIAM J. Appl. Math. , vol.49 , Issue.3 , pp. 906-931
    • Donoho, D.L.1    Stark, P.B.2
  • 14
    • 24344490792 scopus 로고    scopus 로고
    • Asymptotic operating characteristics of an optimal change point detection in hidden Markov models
    • C. D. Fuh, "Asymptotic operating characteristics of an optimal change point detection in hidden Markov models," Ann. Statist., pp. 2305-2339, 2004.
    • (2004) Ann. Statist. , pp. 2305-2339
    • Fuh, C.D.1
  • 16
    • 37349042879 scopus 로고    scopus 로고
    • The robustness-performance tradeoff in Markov decision processes
    • H. Xu and S. Mannor, "The robustness-performance tradeoff in Markov decision processes," in NIPS, 2006, pp. 1537-1544.
    • (2006) NIPS , pp. 1537-1544
    • Xu, H.1    Mannor, S.2
  • 17
    • 0001976283 scopus 로고
    • Approximation to Bayes risk in repeated play
    • Princeton University Press
    • J. Hannan, "Approximation to Bayes risk in repeated play," in Contributions to the Theory of Games. Princeton University Press, 1957, vol.3, pp. 97-139.
    • (1957) Contributions to the Theory of Games , vol.3 , pp. 97-139
    • Hannan, J.1
  • 18
  • 21
    • 24644463787 scopus 로고    scopus 로고
    • Efficient algorithms for online decision problems
    • A. Kalai and S. Vempala, "Efficient algorithms for online decision problems," Journal of Computer and System Sciences, vol.71, no.3, pp. 291-307, 2005.
    • (2005) Journal of Computer and System Sciences , vol.71 , Issue.3 , pp. 291-307
    • Kalai, A.1    Vempala, S.2
  • 23
    • 0033876515 scopus 로고    scopus 로고
    • The O.D.E. method for convergence of stochastic approximation and reinforcement learning
    • V. S. Borkar and S. P. Meyn, "The O.D.E. method for convergence of stochastic approximation and reinforcement learning," SIAM J. Control Optim., vol.38, no.2, pp. 447-469, 2000.
    • (2000) SIAM J. Control Optim. , vol.38 , Issue.2 , pp. 447-469
    • Borkar, V.S.1    Meyn, S.P.2
  • 24
    • 0001296683 scopus 로고
    • Perturbation theory and finite Markov chains
    • P. J. Schweitzer, "Perturbation theory and finite Markov chains," Journal of Applied Probability, vol.5, pp. 401-413, 1968.
    • (1968) Journal of Applied Probability , vol.5 , pp. 401-413
    • Schweitzer, P.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.