메뉴 건너뛰기




Volumn , Issue , 2009, Pages 314-322

Online learning in Markov decision processes with arbitrarily changing rewards and transitions

Author keywords

[No Author keywords available]

Indexed keywords

DECISION MAKERS; DECISION-MAKING PROBLEM; MARKOV DECISION PROCESSES; NONSTATIONARY; ONLINE LEARNING; TRANSITION PROBABILITIES;

EID: 70349986740     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/GAMENETS.2009.5137416     Document Type: Conference Paper
Times cited : (50)

References (21)
  • 2
    • 0000392613 scopus 로고
    • Stochastic games
    • L. Shapley, "Stochastic games," PNAS, vol. 39, no. 10, pp. 1095-1100, 1953.
    • (1953) PNAS , vol.39 , Issue.10 , pp. 1095-1100
    • Shapley, L.1
  • 3
    • 41649111187 scopus 로고    scopus 로고
    • Experts in a Markov decision process
    • E. Even-Dar, S. Kakade, and Y. Mansour, "Experts in a Markov decision process," in NIPS, 2004, pp. 401-408.
    • (2004) NIPS , pp. 401-408
    • Even-Dar, E.1    Kakade, S.2    Mansour, Y.3
  • 4
    • 0038386340 scopus 로고    scopus 로고
    • The empirical Bayes envelope and regret minimization in competitive Markov decision processes
    • S. Mannor and N. Shimkin, "The empirical Bayes envelope and regret minimization in competitive Markov decision processes," Mathematics of Operations Research, vol. 28, no. 2, pp. 327-345, 2003.
    • (2003) Mathematics of Operations Research , vol.28 , Issue.2 , pp. 327-345
    • Mannor, S.1    Shimkin, N.2
  • 6
    • 14344250395 scopus 로고    scopus 로고
    • Robust control of Markov decision processes with uncertain transition matrices
    • A. Nilim and L. E. Ghaoui, "Robust control of Markov decision processes with uncertain transition matrices," Operations Research, vol. 53, no. 5, pp. 780-798, 2005.
    • (2005) Operations Research , vol.53 , Issue.5 , pp. 780-798
    • Nilim, A.1    Ghaoui, L.E.2
  • 8
    • 0041965975 scopus 로고    scopus 로고
    • R-max - a general polynomial time algorithm for near-optimal reinforcement learning
    • R. I. Brafman and M. Tennenholtz, "R-max - a general polynomial time algorithm for near-optimal reinforcement learning," Journal of Machine Learning Research, vol. 3, pp. 213-231, 2003.
    • (2003) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 9
    • 58449132310 scopus 로고    scopus 로고
    • J. Y. Yu, S. Mannor, and N. Shimkin, Markov decision processes with arbitrary reward processes, in Lecture Notes in Computer Science, 5323, 2009, http://www.cim.mcgill.ca/~jiayuan/mdp.pdf.
    • J. Y. Yu, S. Mannor, and N. Shimkin, "Markov decision processes with arbitrary reward processes," in Lecture Notes in Computer Science, vol. 5323, 2009, http://www.cim.mcgill.ca/~jiayuan/mdp.pdf.
  • 11
    • 0001976283 scopus 로고
    • Approximation to Bayes risk in repeated play
    • Princeton University Press
    • J. Hannan, "Approximation to Bayes risk in repeated play," in Contributions to the Theory of Games. Princeton University Press, 1957, vol. 3, pp. 97-139.
    • (1957) Contributions to the Theory of Games , vol.3 , pp. 97-139
    • Hannan, J.1
  • 12
    • 61449152333 scopus 로고    scopus 로고
    • Applications of dynamic games in queues
    • E. Altman, "Applications of dynamic games in queues," Advances in Dynamic Games, vol. 7, pp. 309-342, 2005.
    • (2005) Advances in Dynamic Games , vol.7 , pp. 309-342
    • Altman, E.1
  • 13
    • 0032182921 scopus 로고    scopus 로고
    • Reliable communication under channel uncertainty
    • A. Lapidoth and P. Narayan, "Reliable communication under channel uncertainty," IEEE Trans. Inf. Theory, vol. 44, no. 6, pp. 2148-2177, 1998.
    • (1998) IEEE Trans. Inf. Theory , vol.44 , Issue.6 , pp. 2148-2177
    • Lapidoth, A.1    Narayan, P.2
  • 14
    • 35148838877 scopus 로고
    • The weighted majority algorithm
    • N. Littlestone and M. Warmuth, "The weighted majority algorithm," Information and Computation, vol. 108, no. 2, pp. 212-261, 1994.
    • (1994) Information and Computation , vol.108 , Issue.2 , pp. 212-261
    • Littlestone, N.1    Warmuth, M.2
  • 15
    • 37349042879 scopus 로고    scopus 로고
    • The robustness-performance tradeoff in Markov decision processes
    • H. Xu and S. Mannor, "The robustness-performance tradeoff in Markov decision processes," in NIPS, 2006, pp. 1537-1544.
    • (2006) NIPS , pp. 1537-1544
    • Xu, H.1    Mannor, S.2
  • 17
    • 24644463787 scopus 로고    scopus 로고
    • Efficient algorithms for online decision problems
    • A. Kalai and S. Vempala, "Efficient algorithms for online decision problems," Journal of Computer and System Sciences, vol. 71, no. 3, pp. 291-307, 2005.
    • (2005) Journal of Computer and System Sciences , vol.71 , Issue.3 , pp. 291-307
    • Kalai, A.1    Vempala, S.2
  • 19
    • 0001296683 scopus 로고
    • Perturbation theory and finite Markov chains
    • P. J. Schweitzer, "Perturbation theory and finite Markov chains," Journal of Applied Probability, vol. 5, pp. 401-413, 1968.
    • (1968) Journal of Applied Probability , vol.5 , pp. 401-413
    • Schweitzer, P.J.1
  • 20
    • 70349991097 scopus 로고    scopus 로고
    • On-line Markov decision processes
    • preprint
    • E. Even-Dar, S. Kakade, and Y. Mansour., "On-line Markov decision processes," preprint.
    • Even-Dar, E.1    Kakade, S.2    Mansour, Y.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.