-
5
-
-
84872997543
-
Learning evaluation functions for large acyclic domains
-
Boyan, J., & Moore, A. (1996). Learning evaluation functions for large acyclic domains. Proceedings ICML.
-
(1996)
Proceedings ICML
-
-
Boyan, J.1
Moore, A.2
-
6
-
-
84880854156
-
R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, R., &Tennenholtz, M. (2001). R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Proceedings IJCAI.
-
(2001)
Proceedings IJCAI
-
-
Brafman, R.1
Tennenholtz, M.2
-
9
-
-
1942421151
-
Bayes meets Bellman: The Gaussian process approach to temporal difference learning
-
Engel, Y., Manner, S., & Meir, R. (2003). Bayes meets Bellman: The Gaussian process approach to temporal difference learning. Proceedings ICML.
-
(2003)
Proceedings ICML
-
-
Engel, Y.1
Manner, S.2
Meir, R.3
-
12
-
-
0028442413
-
Associative reinforcement learning: Functions in k-DNF
-
Kaelbling, L. P. (1994). Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15, 279-298.
-
(1994)
Machine Learning
, vol.15
, pp. 279-298
-
-
Kaelbling, L.P.1
-
13
-
-
84880649215
-
A sparse sampling algorithm for near-optimal planning in large markov decision processes
-
Kearns, M., Mansour, Y., & Ng, A. (2001). A sparse sampling algorithm for near-optimal planning in large markov decision processes. JMLR, 1324-1331.
-
(2001)
JMLR
, pp. 1324-1331
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.3
-
14
-
-
0012257655
-
Near-optimal reinforcement learning in polynomial time
-
Kearns, M., & Singh, S. (1998). Near-optimal reinforcement learning in polynomial time. Proceedings ICML.
-
(1998)
Proceedings ICML
-
-
Kearns, M.1
Singh, S.2
-
15
-
-
0036374190
-
Nonapproximability results for partially observable Markov decision processes
-
Lusena, C., Goldsmith, J., & Mundhenk, M. (2001). Nonapproximability results for partially observable Markov decision processes. JAIR, 14, 83-103.
-
(2001)
JAIR
, vol.14
, pp. 83-103
-
-
Lusena, C.1
Goldsmith, J.2
Mundhenk, M.3
-
17
-
-
0001205548
-
Complexity of finite-horizon Markov decision processes
-
Mundhenk, M., Goldsmith, J., Lusena, C., & Allender, E. (2000). Complexity of finite-horizon Markov decision processes. JACM, 47, 681-720.
-
(2000)
JACM
, vol.47
, pp. 681-720
-
-
Mundhenk, M.1
Goldsmith, J.2
Lusena, C.3
Allender, E.4
-
19
-
-
0141819580
-
Pegasus: A policy search method for large MDPs and POMDPs
-
Ng, A., & Jordan, M. (2000). Pegasus: A policy search method for large MDPs and POMDPs. Proceedings UAI.
-
(2000)
Proceedings UAI
-
-
Ng, A.1
Jordan, M.2
-
20
-
-
33750307958
-
On-line search for solving Markov decision processes via heuristic sampling
-
Péret, L., & Garcia, F. (2004). On-line search for solving Markov decision processes via heuristic sampling. Proceedings ECAI.
-
(2004)
Proceedings ECAI
-
-
Péret, L.1
Garcia, F.2
-
21
-
-
4243097786
-
Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices
-
Salganicoff, M., & Ungar, L. (1995). Active exploration and learning in real-valued spaces using multi-armed bandit allocation indices. Proceedings ICML.
-
(1995)
Proceedings ICML
-
-
Salganicoff, M.1
Ungar, L.2
-
22
-
-
14344258433
-
A Bayesian framework for reinforcement learning
-
Strens, M. (2000). A Bayesian framework for reinforcement learning. Proceedings ICML.
-
(2000)
Proceedings ICML
-
-
Strens, M.1
-
23
-
-
0141607821
-
Policy search using paired comparisons
-
Strens, M., & Moore, A. (2002). Policy search using paired comparisons. JMLR, 3, 921-950.
-
(2002)
JMLR
, vol.3
, pp. 921-950
-
-
Strens, M.1
Moore, A.2
-
25
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25, 285-294.
-
(1933)
Biometrika
, vol.25
, pp. 285-294
-
-
Thompson, W.R.1
-
26
-
-
0004049893
-
-
Doctoral dissertation, King's College Cambridge
-
Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College Cambridge.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
-
28
-
-
0003017575
-
Prediction with Gaussian processes
-
MIT Press
-
Williams, C. (1999). Prediction with Gaussian processes. In Learning in graphical models. MIT Press.
-
(1999)
Learning in Graphical Models
-
-
Williams, C.1
-
29
-
-
15744382410
-
Exploration control in reinforcement learning using optimistic model selection
-
Wyatt, J. (2001). Exploration control in reinforcement learning using optimistic model selection. Proc. ICML.
-
(2001)
Proc. ICML
-
-
Wyatt, J.1
|