-
1
-
-
80053140972
-
Learning wireless network association control with Gaussian process temporal difference methods
-
Aharony, N., Zehavi, T., Engel, Y.: Learning wireless network association control with Gaussian process temporal difference methods. In: Proceedings of OPNETWORK (2005)
-
(2005)
Proceedings of OPNETWORK
-
-
Aharony, N.1
Zehavi, T.2
Engel, Y.3
-
2
-
-
78649507911
-
A Bayesian sampling approach to exploration in reinforcement learning
-
AUAI Press
-
Asmuth, J., Li, L., Littman, M.L., Nouri, A., Wingate, D.: A Bayesian sampling approach to exploration in reinforcement learning. In: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009, pp. 19–26. AUAI Press (2009)
-
(2009)
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, UAI 2009
, pp. 19-26
-
-
Asmuth, J.1
Li, L.2
Littman, M.L.3
Nouri, A.4
Wingate, D.5
-
4
-
-
0020970738
-
Neuron-like elements that can solve difficult learning control problems
-
Barto, A., Sutton, R., Anderson, C.: Neuron-like elements that can solve difficult learning control problems. IEEE Transaction on Systems, Man and Cybernetics 13, 835–846 (1983)
-
(1983)
IEEE Transaction on Systems, Man and Cybernetics
, vol.13
, pp. 835-846
-
-
Barto, A.1
Sutton, R.2
Anderson, C.3
-
7
-
-
0004870746
-
A problem in sequential design of experiments
-
Bellman, R.: A problem in sequential design of experiments. Sankhya 16, 221–229 (1956)
-
(1956)
Sankhya
, vol.16
, pp. 221-229
-
-
Bellman, R.1
-
8
-
-
84863764672
-
Adaptive Control Processes: A Guided Tour
-
Bellman, R.: Adaptive Control Processes: A Guided Tour. Princeton University Press (1961)
-
(1961)
Princeton University Press
-
-
Bellman, R.1
-
9
-
-
84938011869
-
On adaptive control processes
-
Bellman, R., Kalaba, R.: On adaptive control processes. Transactions on Automatic Control, IRE 4(2), 1–9 (1959)
-
(1959)
Transactions on Automatic Control, IRE
, vol.4
, Issue.2
, pp. 1-9
-
-
Bellman, R.1
Kalaba, R.2
-
10
-
-
85162049326
-
Incremental natural actor-critic algorithms
-
MIT Press
-
Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Incremental natural actor-critic algorithms. In: Proceedings of Advances in Neural Information Processing Systems, vol. 20, pp. 105–112. MIT Press (2007)
-
(2007)
Proceedings of Advances in Neural Information Processing Systems
, vol.20
, pp. 105-112
-
-
Bhatnagar, S.1
Sutton, R.2
Ghavamzadeh, M.3
Lee, M.4
-
11
-
-
70349984547
-
Natural actor-critic algorithms
-
Bhatnagar, S., Sutton, R., Ghavamzadeh, M., Lee, M.: Natural actor-critic algorithms. Automatica 45(11), 2471–2482 (2009)
-
(2009)
Automatica
, vol.45
, Issue.11
, pp. 2471-2482
-
-
Bhatnagar, S.1
Sutton, R.2
Ghavamzadeh, M.3
Lee, M.4
-
12
-
-
0041965975
-
R-max - A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, R., Tennenholtz, M.: R-max - a general polynomial time algorithm for near-optimal reinforcement learning. JMLR 3, 213–231 (2002)
-
(2002)
JMLR
, vol.3
, pp. 213-231
-
-
Brafman, R.1
Tennenholtz, M.2
-
13
-
-
0031189914
-
Multitask learning
-
Caruana, R.: Multitask learning. Machine Learning 28(1), 41–75 (1997)
-
(1997)
Machine Learning
, vol.28
, Issue.1
, pp. 41-75
-
-
Caruana, R.1
-
17
-
-
84900550689
-
-
Tech. Rep. Technical Report No. 11, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology
-
Cozzolino, J., Gonzales-Zubieta, R., Miller, R.L.: Markovian decision processes with uncertain transition probabilities. Tech. Rep. Technical Report No. 11, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology (1965)
-
(1965)
Markovian Decision Processes with Uncertain Transition Probabilities
-
-
Cozzolino, J.1
Gonzales-Zubieta, R.2
Miller, R.L.3
-
18
-
-
85042917175
-
Optimal sequential decision making under uncertainty. Master’s thesis, Massachusetts Institute of
-
Cozzolino, J.M.: Optimal sequential decision making under uncertainty. Master’s thesis, Massachusetts Institute of Technology (1964)
-
(1964)
Technology
-
-
Cozzolino, J.M.1
-
19
-
-
0031619316
-
Learning
-
Dearden, R., Friedman, N., Russell, S.: Bayesian Q-learning. In: Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 761–768 (1998)
-
(1998)
Proceedings of the Fifteenth National Conference on Artificial Intelligence
, pp. 761-768
-
-
Dearden, R.1
Friedman, N.2
Russell, S.3
Bayesian, Q.4
-
20
-
-
1142281527
-
Model based Bayesian exploration
-
Dearden, R., Friedman, N., Andre, D.: Model based Bayesian exploration. In: UAI, pp. 150– 159 (1999)
-
(1999)
UAI
, pp. 150-159
-
-
Dearden, R.1
Friedman, N.2
Andre, D.3
-
22
-
-
77249117255
-
Percentile optimization for Markov decision processes with parameter uncertainty
-
Delage, E., Mannor, S.: Percentile optimization for Markov decision processes with parameter uncertainty. Operations Research 58(1), 203–213 (2010)
-
(2010)
Operations Research
, vol.58
, Issue.1
, pp. 203-213
-
-
Delage, E.1
Mannor, S.2
-
23
-
-
77956270152
-
Complexity of stochastic branch and bound methods for belief tree search in bayesian reinforcement learning
-
Dimitrakakis, C.: Complexity of stochastic branch and bound methods for belief tree search in bayesian reinforcement learning. In: ICAART (1), pp. 259–264 (2010)
-
(2010)
ICAART
, vol.1
, pp. 259-264
-
-
Dimitrakakis, C.1
-
25
-
-
85162013390
-
Nonparametric Bayesian policy priors for reinforcement learning
-
Doshi-Velez, F., Wingate, D., Roy, N., Tenenbaum, J.: Nonparametric Bayesian policy priors for reinforcement learning. In: NIPS (2010)
-
(2010)
NIPS
-
-
Doshi-Velez, F.1
Wingate, D.2
Roy, N.3
Tenenbaum, J.4
-
27
-
-
1942421168
-
Design for an optimal probe
-
Duff, M.: Design for an optimal probe. In: ICML, pp. 131–138 (2003)
-
(2003)
ICML
, pp. 131-138
-
-
Duff, M.1
-
29
-
-
84945284029
-
Sparse Online Greedy Support Vector Regression
-
Elomaa, T., Mannila, H., Toivonen, H. (eds.), Springer, Heidelberg
-
Engel, Y., Mannor, S., Meir, R.: Sparse Online Greedy Support Vector Regression. In: Elomaa, T., Mannila, H., Toivonen, H. (eds.) ECML 2002. LNCS (LNAI), vol. 2430, pp. 84–96. Springer, Heidelberg (2002)
-
(2002)
ECML 2002. LNCS (LNAI)
, vol.2430
, pp. 84-96
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
30
-
-
1942421151
-
Bayes meets Bellman: The Gaussian process approach to temporal difference learning
-
Engel, Y., Mannor, S., Meir, R.: Bayes meets Bellman: The Gaussian process approach to temporal difference learning. In: Proceedings of the Twentieth International Conference on Machine Learning, pp. 154–161 (2003)
-
(2003)
Proceedings of the Twentieth International Conference on Machine Learning
, pp. 154-161
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
31
-
-
31844451013
-
Reinforcement learning with Gaussian processes
-
Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: Proceedings of the Twenty Second International Conference on Machine Learning, pp. 201– 208 (2005a)
-
(2005)
Proceedings of the Twenty Second International Conference on Machine Learning
, pp. 201-208
-
-
Engel, Y.1
Mannor, S.2
Meir, R.3
-
32
-
-
34548800682
-
Learning to control an octopus arm with Gaussian process temporal difference methods
-
MIT Press
-
Engel, Y., Szabo, P., Volkinshtein, D.: Learning to control an octopus arm with Gaussian process temporal difference methods. In: Proceedings of Advances in Neural Information Processing Systems, vol. 18, pp. 347–354. MIT Press (2005b)
-
(2005)
Proceedings of Advances in Neural Information Processing Systems
, vol.18
, pp. 347-354
-
-
Engel, Y.1
Szabo, P.2
Volkinshtein, D.3
-
33
-
-
85162020739
-
PAC-Bayesian model selection for reinforcement learning
-
Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.)
-
Fard, M.M., Pineau, J.: PAC-Bayesian model selection for reinforcement learning. In: Lafferty, J., Williams, C.K.I., Shawe-Taylor, J., Zemel, R., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 23, pp. 1624–1632 (2010)
-
(2010)
Advances in Neural Information Processing Systems
, vol.23
, pp. 1624-1632
-
-
Fard, M.M.1
Pineau, J.2
-
37
-
-
84897694817
-
Variance reduction techniques for gradient estimates in reinforcement learning
-
Greensmith, E., Bartlett, P., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5, 1471–1530 (2004)
-
(2004)
Journal of Machine Learning Research
, vol.5
, pp. 1471-1530
-
-
Greensmith, E.1
Bartlett, P.2
Baxter, J.3
-
42
-
-
84880649215
-
A sparse sampling algorithm for near-optimal planning in large Markov decision processes
-
Kearns, M., Mansour, Y., Ng, A.: A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In: Proc. IJCAI (1999)
-
(1999)
Proc. IJCAI
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.3
-
43
-
-
71149109483
-
Near-bayesian exploration in polynomial time
-
ACM, New York
-
Kolter, J.Z., Ng, A.Y.: Near-bayesian exploration in polynomial time. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, pp. 513–520. ACM, New York (2009)
-
(2009)
Proceedings of the 26Th Annual International Conference on Machine Learning, ICML 2009
, pp. 513-520
-
-
Kolter, J.Z.1
Ng, A.Y.2
-
46
-
-
56049125072
-
Transfer of samples in batch reinforcement learning
-
Lazaric, A., Restelli, M., Bonarini, A.: Transfer of samples in batch reinforcement learning. In: Proceedings of ICML, vol. 25, pp. 544–551 (2008)
-
(2008)
Proceedings of ICML
, vol.25
, pp. 544-551
-
-
Lazaric, A.1
Restelli, M.2
Bonarini, A.3
-
47
-
-
33847336943
-
Bias and variance approximation in value function estimates
-
Mannor, S., Simester, D., Sun, P., Tsitsiklis, J.N.: Bias and variance approximation in value function estimates. Management Science 53(2), 308–322 (2007)
-
(2007)
Management Science
, vol.53
, Issue.2
, pp. 308-322
-
-
Mannor, S.1
Simester, D.2
Sun, P.3
Tsitsiklis, J.N.4
-
50
-
-
55149090494
-
Transfer in variable-reward hierarchical reinforcement learning
-
Mehta, N., Natarajan, S., Tadepalli, P., Fern, A.: Transfer in variable-reward hierarchical reinforcement learning. Machine Learning 73(3), 289–312 (2008)
-
(2008)
Machine Learning
, vol.73
, Issue.3
, pp. 289-312
-
-
Mehta, N.1
Natarajan, S.2
Tadepalli, P.3
Fern, A.4
-
51
-
-
0032679082
-
Exploration of multi-state environments: Local measures and backpropagation of uncertainty
-
Meuleau, N., Bourgine, P.: Exploration of multi-state environments: local measures and backpropagation of uncertainty. Machine Learning 35, 117–154 (1999)
-
(1999)
Machine Learning
, vol.35
, pp. 117-154
-
-
Meuleau, N.1
Bourgine, P.2
-
52
-
-
14344250395
-
Robust control of Markov decision processes with uncertain transition matrices
-
Nilim, A., El Ghaoui, L.: Robust control of Markov decision processes with uncertain transition matrices. Operations Research 53(5), 780–798 (2005)
-
(2005)
Operations Research
, vol.53
, Issue.5
, pp. 780-798
-
-
Nilim, A.1
El Ghaoui, L.2
-
53
-
-
0013521749
-
Monte Carlo is fundamentally unsound
-
O’Hagan, A.: Monte Carlo is fundamentally unsound. The Statistician 36, 247–249 (1987)
-
(1987)
The Statistician
, vol.36
, pp. 247-249
-
-
O’Hagan, A.1
-
56
-
-
44949241322
-
Reinforcement learning of motor skills with policy gradients
-
Peters, J., Schaal, S.: Reinforcement learning of motor skills with policy gradients. Neural Networks 21(4), 682–697 (2008)
-
(2008)
Neural Networks
, vol.21
, Issue.4
, pp. 682-697
-
-
Peters, J.1
Schaal, S.2
-
58
-
-
33646413135
-
Natural Actor-Critic
-
Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.), Springer, Heidelberg
-
Peters, J., Vijayakumar, S., Schaal, S.: Natural Actor-Critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280–291. Springer, Heidelberg (2005)
-
(2005)
ECML 2005. LNCS (LNAI)
, vol.3720
, pp. 280-291
-
-
Peters, J.1
Vijayakumar, S.2
Schaal, S.3
-
61
-
-
33749251297
-
An analytic solution to discrete Bayesian reinforcement learning
-
Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: Proc. Int. Conf. on Machine Learning, Pittsburgh, USA (2006)
-
(2006)
Proc. Int. Conf. on Machine Learning, Pittsburgh, USA
-
-
Poupart, P.1
Vlassis, N.2
Hoey, J.3
Regan, K.4
-
64
-
-
56449116172
-
Online kernel selection for Bayesian reinforcement learning
-
Reisinger, J., Stone, P., Miikkulainen, R.: Online kernel selection for Bayesian reinforcement learning. In: Proceedings of the Twenty-Fifth Conference on Machine Learning, pp. 816– 823 (2008)
-
(2008)
Proceedings of the Twenty-Fifth Conference on Machine Learning
, pp. 816-823
-
-
Reisinger, J.1
Stone, P.2
Miikkulainen, R.3
-
66
-
-
84858778653
-
Bayes-adaptive POMDPs
-
Ross, S., Chaib-Draa, B., Pineau, J.: Bayes-adaptive POMDPs. In: Advances in Neural Information Processing Systems, NIPS (2007)
-
(2007)
Advances in Neural Information Processing Systems, NIPS
-
-
Ross, S.1
Chaib-Draa, B.2
Pineau, J.3
-
67
-
-
51649091499
-
Bayesian reinforcement learning in continuous POMDPs with application to robot navigation
-
Ross, S., Chaib-Draa, B., Pineau, J.: Bayesian reinforcement learning in continuous POMDPs with application to robot navigation. In: IEEE International Conference on Robotics and Automation (ICRA), pp. 2845–2851 (2008)
-
(2008)
IEEE International Conference on Robotics and Automation (ICRA)
, pp. 2845-2851
-
-
Ross, S.1
Chaib-Draa, B.2
Pineau, J.3
-
69
-
-
34547984629
-
-
Tech. Rep. Technical Report No. 1, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology
-
Silver, E.A.: Markov decision processes with uncertain transition probabilities or rewards. Tech. Rep. Technical Report No. 1, Research in the Control of Complex Systems. Operations Research Center, Massachusetts Institute of Technology (1963)
-
(1963)
Markov Decision Processes with Uncertain Transition Probabilities Or Rewards
-
-
Silver, E.A.1
-
71
-
-
34548745051
-
Incremental model-based learners with formal learning-time guarantees
-
Strehl, A.L., Li, L., Littman, M.L.: Incremental model-based learners with formal learning-time guarantees. In: UAI (2006)
-
(2006)
UAI
-
-
Strehl, A.L.1
Li, L.2
Littman, M.L.3
-
72
-
-
14344258433
-
A Bayesian framework for reinforcement learning
-
Strens, M.: A Bayesian framework for reinforcement learning. In: ICML (2000)
-
(2000)
ICML
-
-
Strens, M.1
-
74
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R.: Learning to predict by the methods of temporal differences. Machine Learning 3, 9–44 (1988)
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.1
-
75
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Proceedings of Advances in Neural Information Processing Systems, vol. 12, pp. 1057–1063 (2000)
-
(2000)
Proceedings of Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
76
-
-
34848816477
-
Transfer learning via inter-task mappings for temporal difference learning
-
Taylor, M., Stone, P., Liu, Y.: Transfer learning via inter-task mappings for temporal difference learning. JMLR 8, 2125–2167 (2007)
-
(2007)
JMLR
, vol.8
, pp. 2125-2167
-
-
Taylor, M.1
Stone, P.2
Liu, Y.3
-
77
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
Thompson, W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285–294 (1933)
-
(1933)
Biometrika
, vol.25
, pp. 285-294
-
-
Thompson, W.R.1
-
78
-
-
85167449087
-
Reinforcement learning via AIXI approximation
-
Veness, J., Ng, K.S., Hutter, M., Silver, D.: Reinforcement learning via AIXI approximation. In: AAAI (2010)
-
(2010)
AAAI
-
-
Veness, J.1
Ng, K.S.2
Hutter, M.3
Silver, D.4
-
79
-
-
31844436266
-
Bayesian sparse sampling for on-line reward optimization
-
Wang, T., Lizotte, D., Bowling, M., Schuurmans, D.: Bayesian sparse sampling for on-line reward optimization. In: ICML (2005)
-
(2005)
ICML
-
-
Wang, T.1
Lizotte, D.2
Bowling, M.3
Schuurmans, D.4
-
82
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.1
-
83
-
-
78650267403
-
Multi-task reinforcement learning: A hierarchical Bayesian approach
-
Wilson, A., Fern, A., Ray, S., Tadepalli, P.: Multi-task reinforcement learning: A hierarchical Bayesian approach. In: Proceedings of ICML, vol. 24, pp. 1015–1022 (2007)
-
(2007)
Proceedings of ICML
, vol.24
, pp. 1015-1022
-
-
Wilson, A.1
Fern, A.2
Ray, S.3
Tadepalli, P.4
|