-
1
-
-
0141988716
-
Recent advances in hierarchical reinforcement learning
-
Special Issue on Reinforcement Learning
-
Andrew Barto and Sridhar Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Systems 13, Special Issue on Reinforcement Learning 41-77, 2003.
-
(2003)
Discrete Event Systems
, vol.13
, pp. 41-77
-
-
Barto, A.1
Mahadevan, S.2
-
3
-
-
0041965975
-
R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
-
[Brafman and Tennenholtz, 2002] Ronen I. Brafman and Moshe Tennenholtz. R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
4
-
-
0001909869
-
Incremental Pruning: A simple, fast, exact method for partially observable Markov decision processes
-
San Francisco, CA Morgan Kaufmann Publishers
-
[Cassandra et al., 1997] Anthony Cassandra, Michael L. Littman, and Nevin L. Zhang. Incremental Pruning: A simple, fast, exact method for partially observable Markov decision processes. In Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97), pages 54-61, San Francisco, CA, 1997. Morgan Kaufmann Publishers.
-
(1997)
Proceedings of the Thirteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-97
, pp. 54-61
-
-
Cassandra, A.1
Littman, M.L.2
Zhang, N.L.3
-
6
-
-
0002629270
-
Maximum likelihood from incomplete data via the em algorithm
-
[Dempster et al., 1977] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39(1):1-38, 1977.
-
(1977)
Journal of the Royal Statistical Society
, vol.39
, Issue.1
, pp. 1-38
-
-
Dempster, A.P.1
Laird, N.M.2
Rubin, D.B.3
-
11
-
-
84891584370
-
-
Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY
-
[Gittins, 1989] J. C. Gittins. Multi-Armed Bandit Allocation Indices. Wiley-Interscience series in systems and optimization. Wiley, Chichester, NY, 1989.
-
(1989)
Multi-Armed Bandit Allocation Indices
-
-
Gittins, J.C.1
-
14
-
-
0036832954
-
Nearoptimal reinforcement learning in polynomial time
-
[Kearns and Singh, 2002] Michael J. Kearns and Satinder P. Singh. Nearoptimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
15
-
-
30044441333
-
The sample complexity of exploration in the multi-Armed bandit problem
-
[Mannor and Tsitsiklis, 2004] Shie Mannor and John N. Tsitsiklis. The sample complexity of exploration in the multi-Armed bandit problem. Journal of Artificial Intelligence Research, 5:623-648, 2004.
-
(2004)
Journal of Artificial Intelligence Research
, vol.5
, pp. 623-648
-
-
Mannor, S.1
Tsitsiklis, J.N.2
-
20
-
-
0002210775
-
The role of exploration in learning control
-
In David A. White and Donald A. Sofge, editors Van Nostrand Reinhold, New York, NY
-
[Thrun, 1992] Sebastian B. Thrun. The role of exploration in learning control. In David A. White and Donald A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, pages 527-559. Van Nostrand Reinhold, New York, NY, 1992.
-
(1992)
Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches
, pp. 527-559
-
-
Sebastian, B.T.1
-
21
-
-
16244368573
-
-
Technical Report HPL-2003-97R1, Hewlett-Packard Labs
-
[Weissman et al., 2003] Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, and Marcelo J. Weinberger. Inequalities for the L1 deviation of the empirical distribution. Technical Report HPL-2003-97R1, Hewlett-Packard Labs, 2003.
-
(2003)
Inequalities for the L1 Deviation of the Empirical Distribution
-
-
Weissman, T.1
Ordentlich, E.2
Seroussi, G.3
Verdu, S.4
Weinberger, M.J.5
|