SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 5323 LNAI, Issue , 2008, Pages 268-281

Markov decision processes with arbitrary reward processes

(3) Yu, Jia Yuan a Mannor, Shie a Shimkin, Nahum b

a MCGILL UNIVERSITY (Canada)

b TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

Author keywords

[No Author keywords available]

Indexed keywords

CONTROL PROBLEMS; DECISION MAKER (DM); MARKOV DECISION PROCESS (MDP); MARKOV DECISION PROCESSES (MDPS); PERFORMANCE LOSSES; REWARD FUNCTIONS;

REINFORCEMENT; REINFORCEMENT LEARNING;

LEARNING ALGORITHMS;

EID: 58449132310 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-540-89722-4_21 Document Type: Conference Paper

Times cited : (3)

References (19)

1
- 0037709910
- The nonstochastic multiarmed bandit problem
- Auer, P., Cesa-Bianchi, N., Freund, Y., Schapire, R.E.: The nonstochastic multiarmed bandit problem. SIAM J. Computing 32(1), 48-77 (2002)
- (2002) SIAM J. Computing , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

2
- 0002056057
- Markets with a continuum of traders
- Aumann, R.J.: Markets with a continuum of traders. Econometrica 32, 39-50 (1964)
- (1964) Econometrica , vol.32 , pp. 39-50
- Aumann, R.J.¹

3
- 0003565783
- 2nd edn, Athena Scientific
- Bertsekas, D.P.: Dynamic programming and optimal control, 2nd edn., vol. 2. Athena Scientific (2001)
- (2001) Dynamic programming and optimal control , vol.2
- Bertsekas, D.P.¹

4
- 0003487482
- Athena Scientific
- Bertsekas, D.P., TsitsiMis, J.N.: Neuro-dynamic programming. Athena Scientific (1996)
- (1996) Neuro-dynamic programming
- Bertsekas, D.P.¹ TsitsiMis, J.N.²

5
- 33750501028
- Modified logarithmic Sobolev inequalities in discrete settings
- Bobkov, S.G., Tetali, P.: Modified logarithmic Sobolev inequalities in discrete settings. Journal of Theoretical Probability 19(2), 289-336 (2006)
- (2006) Journal of Theoretical Probability , vol.19 , Issue.2 , pp. 289-336
- Bobkov, S.G.¹ Tetali, P.²

6
- 0033876515
- Borkar, V.S., Meyn, S.P.: The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control and Optimization 38(2), 447-469 (2000)
- Borkar, V.S., Meyn, S.P.: The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM J. Control and Optimization 38(2), 447-469 (2000)

7
- 0041965975
- R-max-a general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R.I., Tennenholtz, M.: R-max-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3, 213-231 (2003)
- (2003) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

8
- 84926078662
- Cambridge University Press, Cambridge
- Cesa-Bianchi, N., Lugosi, G.: Prediction, learning, and games. Cambridge University Press, Cambridge (2006)
- (2006) Prediction, learning, and games
- Cesa-Bianchi, N.¹ Lugosi, G.²

9
- 85153927127
- An actor/critic algorithm that is equivalent to Q-learning
- Crites, R.H., Barto, A.G.: An actor/critic algorithm that is equivalent to Q-learning. In: Advances in Neural Information Processing Systems, pp. 401-408 (1995)
- (1995) In: Advances in Neural Information Processing Systems , pp. 401-408
- Crites, R.H.¹ Barto, A.G.²

10
- 23044525872
- A nonstationary offered-load model for packet networks
- Duffield, N.G., Massey, W.A., Whitt, W.: A nonstationary offered-load model for packet networks. Telecommunication Systems 16(3-4), 271-296 (2001)
- (2001) Telecommunication Systems , vol.16 , Issue.3-4 , pp. 271-296
- Duffield, N.G.¹ Massey, W.A.² Whitt, W.³

11
- 41649111187
- Experts in a Markov decision process
- Even-Dar, E., Kakade, S., Mansour, Y.: Experts in a Markov decision process. In: NIPS, pp. 401-408 (2004)
- (2004) NIPS , pp. 401-408
- Even-Dar, E.¹ Kakade, S.² Mansour, Y.³

12
- 0000466473
- Learning mixed equilibria
- Fudenberg, D., Kreps, D.M.: Learning mixed equilibria. Games and Economic Behavior 5(3), 320-367 (1993)
- (1993) Games and Economic Behavior , vol.5 , Issue.3 , pp. 320-367
- Fudenberg, D.¹ Kreps, D.M.²

13
- 0001976283
- Approximation to Bayes risk in repeated play
- Princeton University Press, Princeton
- Hannan, J.: Approximation to Bayes risk in repeated play. In: Contributions to the Theory of Games, vol. 3, pp. 97-139. Princeton University Press, Princeton (1957)
- (1957) Contributions to the Theory of Games , vol.3 , pp. 97-139
- Hannan, J.¹

14
- 0032137328
- Tracking the best expert
- Herbster, M., Warmuth, M.K.: Tracking the best expert. Machine Learning 32(2), 151-178 (1998)
- (1998) Machine Learning , vol.32 , Issue.2 , pp. 151-178
- Herbster, M.¹ Warmuth, M.K.²

15
- 24644463787
- Efficient algorithms for online decision problems
- 15
- 15.Kalai, A., Vempala, S.: Efficient algorithms for online decision problems. Journal of Computer and System Sciences 71(3), 291-307 (2005)
- (2005) Journal of Computer and System Sciences , vol.71 , Issue.3 , pp. 291-307
- Kalai, A.¹ Vempala, S.²

16
- 0038386340
- The empirical Bayes envelope and regret minimization in competitive Markov decision processes
- Mannor, S., Shimkin, N.: The empirical Bayes envelope and regret minimization in competitive Markov decision processes. Mathematics of Operations Research 28(2), 327-345 (2003)
- (2003) Mathematics of Operations Research , vol.28 , Issue.2 , pp. 327-345
- Mannor, S.¹ Shimkin, N.²

17
- 0036649565
- On sequential strategies for loss functions with memory
- Merhav, N., Ordentlich, E., Seroussi, G., Weinberger, M.J.: On sequential strategies for loss functions with memory. IEEE Trans. Inf. Theory 48(7), 1947-1958 (2002)
- (2002) IEEE Trans. Inf. Theory , vol.48 , Issue.7 , pp. 1947-1958
- Merhav, N.¹ Ordentlich, E.² Seroussi, G.³ Weinberger, M.J.⁴

18
- 0000392613
- Stochastic games
- Shapley, L.: Stochastic games. PNAS 39(10), 1095-1100 (1953)
- (1953) PNAS , vol.39 , Issue.10 , pp. 1095-1100
- Shapley, L.¹

19
- 84868886330
- Preprint, 2008
- Yu, J.Y., Mannor, S., Shimkin, N.: Markov decision processes with arbitrarily varying rewards (Preprint, 2008), http://www.cim.mcgill.ca/~jiayuan/ mdp.pdf:
- Markov decision processes with arbitrarily varying rewards
- Yu, J.Y.¹ Mannor, S.² Shimkin, N.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.