-
1
-
-
84864030941
-
An application of reinforcement learning to aerobatic helicopter flight
-
B. Schölkopf, J. Platt and T. Hoffman Eds, Cambridge, MA: MIT Press
-
Abbeel, P., Coates, A., Quigley, M., & Y., N. A. (2007). An application of reinforcement learning to aerobatic helicopter flight. In B. Schölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19, 1-8. Cambridge, MA: MIT Press.
-
(2007)
Advances in neural information processing systems 19
, pp. 1-8
-
-
Abbeel, P.1
Coates, A.2
Quigley, M.3
Ng, A.Y.4
-
3
-
-
0002241694
-
The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem
-
Celeux, G., & Diebolt, J. (1985). The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comp. Statis. Quaterly, 2, 73-82.
-
(1985)
Comp. Statis. Quaterly
, vol.2
, pp. 73-82
-
-
Celeux, G.1
Diebolt, J.2
-
4
-
-
0008586604
-
A method for using belief networks as influence diagrams
-
Minneapolis, Minnesota, USA
-
Cooper, G. F. (1988). A method for using belief networks as influence diagrams. Proc. 4th Workshop on Uncertainty in Artificial Intelligence (pp. 55-63). Minneapolis, Minnesota, USA.
-
(1988)
Proc. 4th Workshop on Uncertainty in Artificial Intelligence
, pp. 55-63
-
-
Cooper, G.F.1
-
5
-
-
0346982426
-
Using Expectation-Maximization for reinforcement learning
-
Dayan, P., & Hinton, G. E. (1997). Using Expectation-Maximization for reinforcement learning. Neural Computation, 9, 271-278.
-
(1997)
Neural Computation
, vol.9
, pp. 271-278
-
-
Dayan, P.1
Hinton, G.E.2
-
6
-
-
0031619316
-
Bayesian Q-learning
-
Madison, Wisconsin, USA
-
Dearden, R., Friedman, N., & Russell, S. (1998). Bayesian Q-learning. Proc. 15th National Conf. on Artificial Intelligence (pp. 761-768). Madison, Wisconsin, USA.
-
(1998)
Proc. 15th National Conf. on Artificial Intelligence
, pp. 761-768
-
-
Dearden, R.1
Friedman, N.2
Russell, S.3
-
7
-
-
0033243858
-
Convergence of a stochastic approximation version of the EM algorithm
-
Delyon, B., Lavielle, M., & Moulines, E. (1999). Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics, 27, 94-128.
-
(1999)
The Annals of Statistics
, vol.27
, pp. 94-128
-
-
Delyon, B.1
Lavielle, M.2
Moulines, E.3
-
8
-
-
57649089060
-
-
λ, Technical Report, CMU Learning Lab internal report
-
Gordon, G. (1996). Chattering in Sarsa(λ) (Technical Report). CMU Learning Lab internal report.
-
(1996)
Chattering in Sarsa
-
-
Gordon, G.1
-
10
-
-
70350090880
-
Bayesian policy learning with trans-dimensional MCMC
-
J. Platt, D. Koller, Y. Singer and S. Roweis Eds, Cambridge, MA: MIT Press
-
Hoffman, M., Doucet, A., De Freitas, N., & Jasra, A. (2008). Bayesian policy learning with trans-dimensional MCMC. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20, 665-672. Cambridge, MA: MIT Press.
-
(2008)
Advances in neural information processing systems 20
, pp. 665-672
-
-
Hoffman, M.1
Doucet, A.2
De Freitas, N.3
Jasra, A.4
-
11
-
-
85153938292
-
Reinforcement learning algorithm for partially observable Markov decision problems
-
MIT Press
-
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement learning algorithm for partially observable Markov decision problems. In Advances in neural information processing systems 7, 345-352. MIT Press.
-
(1995)
Advances in neural information processing systems 7
, pp. 345-352
-
-
Jaakkola, T.1
Singh, S.P.2
Jordan, M.I.3
-
12
-
-
78049390740
-
Policy search for motor primitives in robotics
-
D. Koller, D. Schuurmans, Y. Bengio and L. Bottou Eds
-
Kober, J., & Peters, J. (2009). Policy search for motor primitives in robotics. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in neural information processing systems 21, 849-856.
-
(2009)
Advances in neural information processing systems 21
, pp. 849-856
-
-
Kober, J.1
Peters, J.2
-
14
-
-
0012327484
-
Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes
-
Madison, Wisconsin, USA
-
Loch, J., & Singh, S. P. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. Proc. 15th Int. Conf on Machine Learning (pp. 323-331). Madison, Wisconsin, USA.
-
(1998)
Proc. 15th Int. Conf on Machine Learning
, pp. 323-331
-
-
Loch, J.1
Singh, S.P.2
-
15
-
-
56449091120
-
An analysis of reinforcement learning with function approximation
-
Helsinki, Finland
-
Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. Proc. 25th Int. Conf. on Machine Learning (pp. 664-671). Helsinki, Finland.
-
(2008)
Proc. 25th Int. Conf. on Machine Learning
, pp. 664-671
-
-
Melo, F.S.1
Meyn, S.P.2
Ribeiro, M.I.3
-
16
-
-
0002788893
-
A view of the EM algorithm that justifies incremental, sparse, and other variants
-
M. I. Jordan Ed, Kluwer Academic Publishers
-
Neal, R. M., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in graphical models, 355-368. Kluwer Academic Publishers.
-
(1998)
Learning in graphical models
, pp. 355-368
-
-
Neal, R.M.1
Hinton, G.E.2
-
17
-
-
56449099734
-
On the existence of fixed points for Q-learning and Sarsa in partially observable domains
-
Perkins, T. J., & Pendrith, M. D. (2002). On the existence of fixed points for Q-learning and Sarsa in partially observable domains. Proc. 19th Int. Conf. on Machine Learning (pp. 490-497).
-
(2002)
Proc. 19th Int. Conf. on Machine Learning
, pp. 490-497
-
-
Perkins, T.J.1
Pendrith, M.D.2
-
18
-
-
84898960655
-
A convergent form of approximate policy iteration
-
S. T. S. Becker and K. Obermayer Eds, Cambridge, MA: MIT Press
-
Perkins, T. J., & Precup, D. (2003). A convergent form of approximate policy iteration. In S. T. S. Becker and K. Obermayer (Eds.), Advances in neural information processing systems 15, 1595-1602. Cambridge, MA: MIT Press.
-
(2003)
Advances in neural information processing systems 15
, pp. 1595-1602
-
-
Perkins, T.J.1
Precup, D.2
-
23
-
-
33749234798
-
Probabilistic inference for solving discrete and continuous state Markov decision processes
-
Pittsburgh, Pennsylvania, USA
-
Toussaint, M., & Storkey, A. (2006). Probabilistic inference for solving discrete and continuous state Markov decision processes. Proc. 23rd Int. Conf. on Machine Learning (pp. 945-952). Pittsburgh, Pennsylvania, USA.
-
(2006)
Proc. 23rd Int. Conf. on Machine Learning
, pp. 945-952
-
-
Toussaint, M.1
Storkey, A.2
-
24
-
-
0042466434
-
On the convergence of optimistic policy iteration
-
Tsitsiklis, J. N. (2002). On the convergence of optimistic policy iteration. Journal of Machine Learning Research, 3, 59-72.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 59-72
-
-
Tsitsiklis, J.N.1
-
25
-
-
84950432017
-
A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm
-
Wei, G., & Tanner, M. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm. J. Amer. Statist. As-socation, 85, 699-704.
-
(1990)
J. Amer. Statist. As-socation
, vol.85
, pp. 699-704
-
-
Wei, G.1
Tanner, M.2
|