SCOPUS 정보 검색 플랫폼

Uncertainty in Artificial Intelligence - Proceedings of the 28th Conference, UAI 2012

Volumn , Issue , 2012, Pages 93-101

Deterministic MDPs with adversarial rewards and bandit feedback

(3) Arora, Raman a Dekel, Ofer b Tewari, Ambuj c

a TTI C (United States)

b MICROSOFT RESEARCH (United States)

c UNIVERSITY OF MICHIGAN (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BANDIT FEEDBACKS; DECISION MAKERS; MARKOV DECISION PROCESSES; ON-LINE DECISION MAKINGS; STATE TRANSITION DYNAMICS; TRANSITION DYNAMICS;

ARTIFICIAL INTELLIGENCE; DECISION MAKING; MARKOV PROCESSES;

FEEDBACK;

EID: 84886067084 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (28)

References (21)

1
- 84898063697
- Competing in the dark: An efficient algorithm for bandit linear optimization
- J. Abernethy, E. Hazan, and A. Rakhlin. Competing in the dark: An efficient algorithm for bandit linear optimization. In Proceedings of the 21st Annual Conference on Learning Theory, pages 263-274, 2008.
- (2008) Proceedings of the 21st Annual Conference on Learning Theory , pp. 263-274
- Abernethy, J.¹ Hazan, E.² Rakhlin, A.³

2
- 84867129684
- Online bandit learning against an adaptive adversary: From regret to policy regret
- R. Arora, O. Dekel, and A. Tewari. Online bandit learning against an adaptive adversary: from regret to policy regret. In Proceedings of the 29th International Conference on Machine Learning, 2012.
- (2012) Proceedings of the 29th International Conference on Machine Learning
- Arora, R.¹ Dekel, O.² Tewari, A.³

3
- 0037709910
- The nonstochastic multiarmed bandit problem
- P. Auer, N. Cesa-Bianchi, Y. Freund, and R. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.
- (2002) SIAM Journal on Computing , vol.32 , Issue.1 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.⁴

4
- 4544345025
- Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches
- B. Awerbuch and R. D. Kleinberg. Adaptive routing with end-to-end feedback: Distributed learning and geometric approaches. In Proceedings of the 36th Annual ACM Symposium on the Theory of Computing, pages 45-53, 2004.
- (2004) Proceedings of the 36th Annual ACM Symposium on the Theory of Computing , pp. 45-53
- Awerbuch, B.¹ Kleinberg, R.D.²

5
- 0003511743
- John Wiley and Sons
- M. S. Bazaraa, J. J. Jarvis, and H. D. Sherali. Linear Programming and Network Flows, Third Edition. John Wiley and Sons, 2010.
- (2010) Linear Programming and Network Flows, Third Edition
- Bazaraa, M.S.¹ Jarvis, J.J.² Sherali, H.D.³

6
- 0003565783
- Athena Scientific, Third edition
- D. P. Bertsekas. Dynamic Programming and Optimal Control, volume 1. Athena Scientific, Third edition, 2005.
- (2005) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

7
- 65749318481
- Uber den variabilitatsbereich der fourierschen konstanten von positiven harmonischen funktionen
- C. Caratheodory. Uber den variabilitatsbereich der fourierschen konstanten von positiven harmonischen funktionen. Rendiconti del Circolo Matematico di Palermo, 32:193-217, 1911.
- (1911) Rendiconti Del Circolo Matematico di Palermo , vol.32 , pp. 193-217
- Caratheodory, C.¹

8
- 33845302015
- Combining expert advice in reactive environments
- D. P. de Farias and N. Megiddo. Combining expert advice in reactive environments. Journal of the ACM, 53(5):762-799, 2006.
- (2006) Journal of the ACM , vol.53 , Issue.5 , pp. 762-799
- De Farias, D.P.¹ Megiddo, N.²

9
- 70349277420
- Online Markov decision processes
- E. Even-Dar, S. M. Kakade, and Y. Mansour. Online Markov decision processes. Mathematics of Operations Research, 34(3):726-736, 2009.
- (2009) Mathematics of Operations Research , vol.34 , Issue.3 , pp. 726-736
- Even-Dar, E.¹ Kakade, S.M.² Mansour, Y.³

10
- 77951573287
- Universal reinforcement learning
- V. F. Farias, C. C. Moallemi, B. Van Roy, and T. Weissman. Universal reinforcement learning. IEEE Transactions on Information Theory, 56(5): 2441-2454, 2010.
- (2010) IEEE Transactions on Information Theory , vol.56 , Issue.5 , pp. 2441-2454
- Farias, V.F.¹ Moallemi, C.C.² Van Roy, B.³ Weissman, T.⁴

11
- 50249167647
- On polynomial cases of the unichain classification problem for markov decision processes
- E. A. Feinberg and F. Yang. On polynomial cases of the unichain classification problem for Markov decision processes. Operations Research Letters, 36(5): 527-530, 2008.
- (2008) Operations Research Letters , vol.36 , Issue.5 , pp. 527-530
- Feinberg, E.A.¹ Yang, F.²

12
- 20744454447
- Online convex optimization in the bandit setting: Gradient descent without a gradient
- A. D. Flaxman, A. Tauman Kalai, and H. B. McMahan. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the 16th annual ACM-SIAM symposium on discrete algorithms, pages 385-394, 2005.
- (2005) Proceedings of the 16th Annual ACM-SIAM Symposium on Discrete Algorithms , pp. 385-394
- Flaxman, A.D.¹ Tauman Kalai, A.² McMahan, H.B.³

13
- 35948943542
- The on-line shortest path problem under partial monitoring
- A. Gyorgy, T. Linder, G. Lugosi, and G. Ottucsak. The on-line shortest path problem under partial monitoring. Journal of Machine Learning Research, 8:2369-2403, 2007.
- (2007) Journal of Machine Learning Research , vol.8 , pp. 2369-2403
- Gyorgy, A.¹ Linder, T.² Lugosi, G.³ Ottucsak, G.⁴

14
- 84862277771
- Adaptive bandits: Towards the best history-dependent strategy
- of JMLR W&CP
- O. Maillard and R. Munos. Adaptive bandits: Towards the best history-dependent strategy. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of JMLR W&CP, pages 570-578, 2011.
- (2011) Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , vol.15 , pp. 570-578
- Maillard, O.¹ Munos, R.²

15
- 9444257628
- Online geometric optimization in the bandit setting against an adaptive adversary
- H. B. McMahan and A. Blum. Online geometric optimization in the bandit setting against an adaptive adversary. In Proceedings of the 17th Annual Conference on Learning Theory, pages 109-123, 2004.
- (2004) Proceedings of the 17th Annual Conference on Learning Theory , pp. 109-123
- McMahan, H.B.¹ Blum, A.²

16
- 85162052729
- Online Markov decision processes under bandit feedback
- MIT Press
- G. Neu, A. Gyorgy, C. Szepesvari, and A. Antos. Online Markov decision processes under bandit feedback. In Advances in Neural Information Processing Systems 23, pages 1804-1812. MIT Press, 2010.
- (2010) Advances in Neural Information Processing Systems , vol.23 , pp. 1804-1812
- Neu, G.¹ Gyorgy, A.² Szepesvari, C.³ Antos, A.⁴

17
- 77953539718
- Online regret bounds for Markov decision processes with deterministic transitions
- R. Ortner. Online regret bounds for Markov decision processes with deterministic transitions. Theoretical Computer Science, 411(29-30):2684-2695, 2010.
- (2010) Theoretical Computer Science , vol.411 , Issue.29-30 , pp. 2684-2695
- Ortner, R.¹

18
- 77949509398
- On the possibility of learning in reactive environments with arbitrary dependence
- D. Ryabko and M. Hutter. On the possibility of learning in reactive environments with arbitrary dependence. Theoretical Computer Science, 405(3):274-284, 2008.
- (2008) Theoretical Computer Science , vol.405 , Issue.3 , pp. 274-284
- Ryabko, D.¹ Hutter, M.²

19
- 77955790905
- Algorithms for reinforcement learning
- C. Szepesvari. Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 4(1), 2010.
- (2010) Synthesis Lectures on Artificial Intelligence and Machine Learning , vol.4 , Issue.1
- Szepesvari, C.¹

20
- 77950787050
- Arbitrarily modulated markov decision processes
- J. Y. Yu and S. Mannor. Arbitrarily modulated markov decision processes. In Proceedings of the IEEE Conference on Decision and Control, 2009.
- (2009) Proceedings of the IEEE Conference on Decision and Control
- Yu, J.Y.¹ Mannor, S.²

21
- 70349280578
- Markov decision processes with arbitrary reward processes
- J. Y. Yu, S. Mannor, and N. Shimkin. Markov decision processes with arbitrary reward processes. Mathematics of Operations Research, 34(3):737-757, 2009.
- (2009) Mathematics of Operations Research , vol.34 , Issue.3 , pp. 737-757
- Yu, J.Y.¹ Mannor, S.² Shimkin, N.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.