-
1
-
-
4243567726
-
Temporal differences-based policy iteration and applications in neuro-dynamic programming
-
D. Bertsekas and S. Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical report, MIT, 1996.
-
(1996)
Technical Report MIT
-
-
Bertsekas, D.1
Ioffe, S.2
-
3
-
-
0442320716
-
How to lose at tetris
-
H. Burgiel. How to Lose at Tetris. Mathematical Gazette, 81:194-200, 1997.
-
(1997)
Mathematical Gazette
, vol.81
, pp. 194-200
-
-
Burgiel, H.1
-
4
-
-
35248818685
-
Tetris is hard, even to approximate
-
E. Demaine, S. Hohenberger, and D. Liben-Nowell. Tetris is hard, even to approximate. In Proceedings of the Ninth International Computing and Combinatorics Conference, pages 351-363, 2003.
-
(2003)
Proceedings of the Ninth International Computing and Combinatorics Conference
, pp. 351-363
-
-
Demaine, E.1
Hohenberger, S.2
Liben-Nowell, D.3
-
7
-
-
33744466799
-
Approximate policy iteration with a policy language bias: Solving relational markov decision processes
-
A. Fern, S. Yoon, and R. Givan. Approximate Policy Iteration with a Policy Language Bias: Solving Relational Markov Decision Processes. Journal of Artificial Intelligence Research, 25:75-118, 2006.
-
(2006)
Journal of Artificial Intelligence Research
, vol.25
, pp. 75-118
-
-
Fern, A.1
Yoon, S.2
Givan, R.3
-
9
-
-
80053437853
-
Classification-based policy iteration with a critic
-
V. Gabillon, A. Lazaric, M. Ghavamzadeh, and B. Scherrer. Classification-based policy iteration with a critic. In Proceedings of ICML, pages 1049-1056, 2011.
-
(2011)
Proceedings of ICML
, pp. 1049-1056
-
-
Gabillon, V.1
Lazaric, A.2
Ghavamzadeh, M.3
Scherrer, B.4
-
10
-
-
0035377566
-
Completely derandomized self-adaptation in evolution strategies
-
N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9:159-195, 2001.
-
(2001)
Evolutionary Computation
, vol.9
, pp. 159-195
-
-
Hansen, N.1
Ostermeier, A.2
-
12
-
-
1942420814
-
Reinforcement learning as classification: Leveraging modern classifiers
-
M. Lagoudakis and R. Parr. Reinforcement Learning as Classification: Leveraging Modern Classifiers. In Proceedings of ICML, pages 424-431, 2003.
-
(2003)
Proceedings of ICML
, pp. 424-431
-
-
Lagoudakis, M.1
Parr, R.2
-
13
-
-
77956523230
-
Analysis of a classification-based policy iteration algorithm
-
A. Lazaric, M. Ghavamzadeh, and R. Munos. Analysis of a Classification-based Policy Iteration Algorithm. In Proceedings of ICML, pages 607-614, 2010.
-
(2010)
Proceedings of ICML
, pp. 607-614
-
-
Lazaric, A.1
Ghavamzadeh, M.2
Munos, R.3
-
14
-
-
0037581251
-
Modified policy iteration algorithms for discounted Markov decision problems
-
M. Puterman and M. Shin. Modified policy iteration algorithms for discounted Markov decision problems. Management Science, 24(11), 1978.
-
(1978)
Management Science
, vol.24
, Issue.11
-
-
Puterman, M.1
Shin, M.2
-
15
-
-
33845309387
-
The cross-entropy method: A unified approach to combinatorial optimization
-
Springer-Verlag
-
R. Rubinstein and D. Kroese. The cross-entropy method: A unified approach to combinatorial optimization, Monte-Carlo simulation, and machine learning. Springer-Verlag, 2004.
-
(2004)
Monte-Carlo Simulation, and Machine Learning
-
-
Rubinstein, R.1
Kroese, D.2
-
16
-
-
84877625141
-
Performance bounds for λ-policy iteration and application to the game of tetris
-
B. Scherrer. Performance Bounds for λ-Policy Iteration and Application to the Game of Tetris. Journal of Machine Learning Research, 14:1175-1221, 2013.
-
(2013)
Journal of Machine Learning Research
, vol.14
, pp. 1175-1221
-
-
Scherrer, B.1
-
17
-
-
84867117249
-
Approximate modified policy iteration
-
B. Scherrer, M. Ghavamzadeh, V. Gabillon, and M. Geist. Approximate modified policy iteration. In Proceedings of ICML, pages 1207-1214, 2012.
-
(2012)
Proceedings of ICML
, pp. 1207-1214
-
-
Scherrer, B.1
Ghavamzadeh, M.2
Gabillon, V.3
Geist, M.4
-
18
-
-
33845344721
-
Learning tetris using the noisy cross-entropy method
-
I. Szita and A. Lorincz. Learning Tetris Using the Noisy Cross-Entropy Method. Neural Computation, 18(12):2936-2941, 2006.
-
(2006)
Neural Computation
, vol.18
, Issue.12
, pp. 2936-2941
-
-
Szita, I.1
Lorincz, A.2
-
22
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
J. Tsitsiklis and B Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.1
Van Roy, B.2
|