-
1
-
-
78649507911
-
A bayesian sampling approach to exploration in reinforcement learning
-
J. Asmuth, L. Li, M. Littman, A. Nouri, and D. Wingate. A bayesian sampling approach to exploration in reinforcement learning. In Conference on Uncertainty in Artificial Intelligence (UAI), 2009.
-
(2009)
Conference on Uncertainty in Artificial Intelligence (UAI)
-
-
Asmuth, J.1
Li, L.2
Littman, M.3
Nouri, A.4
Wingate, D.5
-
2
-
-
56449090814
-
Logarithmic online regret bounds for undiscounted reinforcement learning
-
P. Auer and R. Ortner. Logarithmic online regret bounds for undiscounted reinforcement learning. In Neural Information Processing Systems (NIPS), volume 19, pages 49-56, 2006.
-
(2006)
Neural Information Processing Systems (NIPS)
, vol.19
, pp. 49-56
-
-
Auer, P.1
Ortner, R.2
-
6
-
-
0041965975
-
R-max - A general polynomial time algorithm for near-optimal reinforcement learning
-
R. I. Brafman and M. Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research (JMLR), 3:213-231, 2003.
-
(2003)
Journal of Machine Learning Research (JMLR)
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
12
-
-
56449086386
-
Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs
-
ACM
-
F. Doshi, J. Pineau, and N. Roy. Reinforcement learning with limited reinforcement: Using Bayes risk for active learning in POMDPs. In International Conference on Machine Learning, pages 256-263. ACM, 2008.
-
(2008)
International Conference on Machine Learning
, pp. 256-263
-
-
Doshi, F.1
Pineau, J.2
Roy, N.3
-
15
-
-
79960116893
-
Monte-Carlo algorithms for the improvement of finite-state stochastic controllers: Application to bayes-adaptive Markov decision processes
-
M. Duff. Monte-Carlo algorithms for the improvement of finite-state stochastic controllers: Application to bayes-adaptive Markov decision processes. In International Workshop on Artificial Intelligence and Statistics (AISTATS), 2001.
-
(2001)
International Workshop on Artificial Intelligence and Statistics (AISTATS)
-
-
Duff, M.1
-
19
-
-
0011716051
-
Dual control theory, parts i and ii
-
1033-1039
-
A. A. Feldbaum. Dual control theory, parts i and ii. Automation and Remote Control, 21:874-880 and 1033-1039, 1961.
-
(1961)
Automation and Remote Control
, vol.21
, pp. 874-880
-
-
Feldbaum, A.A.1
-
20
-
-
0033882494
-
Survey of adaptive dual control methods
-
DOI 10.1049/ip-cta:20000107
-
N.M. Filatov and H. Unbehauen. Survey of adaptive dual control methods. In IEEE Control Theory and Applications, volume 147, pages 118-128, 2000. (Pubitemid 30563857)
-
(2000)
IEE Proceedings: Control Theory and Applications
, vol.147
, Issue.1
, pp. 118-128
-
-
Filatov, N.M.1
-
23
-
-
24344438276
-
Adaptive control of nonlinear stochastic systems by particle filtering
-
ThA01-6, Fourth International Conference on Control and Automation
-
A. Greenfield and A. Brockwell. Adaptive control of nonlinear stochastic systems by particle filtering. In International Conference on Control and Automation (ICCA), pages 887-890, 2003. (Pubitemid 41244024)
-
(2003)
International Conference on Control and Automation
, pp. 887-890
-
-
Greenfield, A.1
Brockwell, A.2
-
24
-
-
34249761849
-
Learning bayesian networks: The combination of knowledge and statistical data
-
D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197-243, 1995.
-
(1995)
Machine Learning
, vol.20
, Issue.3
, pp. 197-243
-
-
Heckerman, D.1
Geiger, D.2
Chickering, D.M.3
-
29
-
-
0032073263
-
Planning and acting in partially observable stochastic domains
-
PII S000437029800023X
-
L. P. Kaelbling, M. L. Littman, and A. R. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99-134, 1998. (Pubitemid 128387390)
-
(1998)
Artificial Intelligence
, vol.101
, Issue.1-2
, pp. 99-134
-
-
Kaelbling, L.P.1
Littman, M.L.2
Cassandra, A.R.3
-
33
-
-
84898982129
-
Predictive representations of state
-
M. L. Littman, R. S. Sutton, and S. Singh. Predictive representations of state. In Neural Information Processing Systems (NIPS), volume 14, pages 1555-1561, 2002.
-
(2002)
Neural Information Processing Systems (NIPS)
, vol.14
, pp. 1555-1561
-
-
Littman, M.L.1
Sutton, R.S.2
Singh, S.3
-
38
-
-
34250730267
-
An analytic solution to discrete bayesian reinforcement learning
-
P. Poupart, N. Vlassis, J. Hoey, and K. Regan. An analytic solution to discrete bayesian reinforcement learning. In International Conference on Machine learning (ICML), pages 697-704, 2006.
-
(2006)
International Conference on Machine Learning (ICML)
, pp. 697-704
-
-
Poupart, P.1
Vlassis, N.2
Hoey, J.3
Regan, K.4
-
40
-
-
85162018872
-
Bayes-adaptive POMDPs
-
S. Ross, B. Chaib-draa, and J. Pineau. Bayes-adaptive POMDPs. In Neural Information Processing Systems (NIPS), volume 20, pages 1225-1232, 2008a.
-
(2008)
Neural Information Processing Systems (NIPS)
, vol.20
, pp. 1225-1232
-
-
Ross, S.1
Chaib-Draa, B.2
Pineau, J.3
-
42
-
-
52249086942
-
Online POMDPs
-
S. Ross, J. Pineau, S. Paquet, and B. Chaib-draa. Online POMDPs. Journal of Artificial Intelligence Research (JAIR), 32:663-704, 2008c.
-
(2008)
Journal of Artificial Intelligence Research (JAIR)
, vol.32
, pp. 663-704
-
-
Ross, S.1
Pineau, J.2
Paquet, S.3
Chaib-Draa, B.4
-
45
-
-
0015658957
-
The optimal control of partially observable Markov processes over a finite horizon
-
Sep/Oct
-
R. D. Smallwood and E. J. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operations Research, 21(5):1071-1088, Sep/Oct 1973.
-
(1973)
Operations Research
, vol.21
, Issue.5
, pp. 1071-1088
-
-
Smallwood, R.D.1
Sondik, E.J.2
-
48
-
-
31144472319
-
Perseus: Randomized point-based value iteration for POMDPs
-
M. T. J. Spaan and N. Vlassis. Perseus: randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research (JAIR), 24:195-220, 2005. (Pubitemid 43130936)
-
(2005)
Journal of Artificial Intelligence Research
, vol.24
, pp. 195-220
-
-
Spaan, M.T.J.1
Vlassis, N.2
-
53
-
-
85162041468
-
Optimistic linear programming gives logarithmic regret for irreducible MDPs
-
A. Tewari and P. Bartlett. Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Neural Information Processing Systems (NIPS), volume 20, pages 1505-1512, 2008.
-
(2008)
Neural Information Processing Systems (NIPS)
, vol.20
, pp. 1505-1512
-
-
Tewari, A.1
Bartlett, P.2
-
54
-
-
79956344726
-
A monte-carlo aixi approximation
-
J. Veness, K. S. Ng, M. Hutter, W. Uther, and D. Silver. A monte-carlo aixi approximation. Journal of Artificial Intelligence Research (JAIR), 2011.
-
(2011)
Journal of Artificial Intelligence Research (JAIR)
-
-
Veness, J.1
Ng, K.S.2
Hutter, M.3
Uther, W.4
Silver, D.5
-
55
-
-
31844436266
-
Bayesian sparse sampling for on-line reward optimization
-
T. Wang, D. Lizotte, M. Bowling, and D. Schuurmans. Bayesian sparse sampling for on-line reward optimization. In International Conference on Machine learning (ICML), pages 956-963, 2005.
-
(2005)
International Conference on Machine Learning (ICML)
, pp. 956-963
-
-
Wang, T.1
Lizotte, D.2
Bowling, M.3
Schuurmans, D.4
-
56
-
-
51649096429
-
Discrete-time bayesian adaptive control problems with complete information
-
O. Zane. Discrete-time bayesian adaptive control problems with complete information. In IEEE Conference on Decision and Control, pages 2748-2749, 1992.
-
(1992)
IEEE Conference on Decision and Control
, pp. 2748-2749
-
-
Zane, O.1
|