-
1
-
-
1942514241
-
Scalable internal-state policy-gradient methods for POMDPs
-
Morgan Kaufmann
-
Aberdeen, D., & Baxter, J. (2002). Scalable internal-state policy-gradient methods for POMDPs. ICML-2002 (pp. 3-10). Morgan Kaufmann.
-
(2002)
ICML-2002
, pp. 3-10
-
-
Aberdeen, D.1
Baxter, J.2
-
2
-
-
0003272616
-
Reinforcement learning in POMDPs via direct gradient ascent
-
Cambridge, MA: The MIT Press
-
Baxter, J., & Barlett, P. (2000). Reinforcement learning in POMDPs via direct gradient ascent. ICML-2000 (pp. 41-48). Cambridge, MA: The MIT Press.
-
(2000)
ICML-2000
, pp. 41-48
-
-
Baxter, J.1
Barlett, P.2
-
3
-
-
85166207010
-
Exploiting structure in policy construction
-
San Francisco: Morgan Kaufmann
-
Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. IJCAI-1995 (pp. 1104-1111). San Francisco: Morgan Kaufmann.
-
(1995)
IJCAI-1995
, pp. 1104-1111
-
-
Boutilier, C.1
Dearden, R.2
Goldszmidt, M.3
-
4
-
-
84899029004
-
Batch value function approximation via support vectors
-
Cambridge, MA: The MIT Press
-
Dietterich, T. G., & Wang, X. (2002). Batch value function approximation via support vectors. NIPS-2001 (pp. 1491-1498). Cambridge, MA: The MIT Press.
-
(2002)
NIPS-2001
, pp. 1491-1498
-
-
Dietterich, T.G.1
Wang, X.2
-
5
-
-
84898983933
-
Variance reduction techniques for gradient estimates in reinforcement learning
-
Cambridge, MA: The MIT Press
-
Greensmith, E., Bartlett, P., & Baxter, J. (2002). Variance reduction techniques for gradient estimates in reinforcement learning. NIPS-2001 (pp. 1507-1514). Cambridge, MA: The MIT Press.
-
(2002)
NIPS-2001
, pp. 1507-1514
-
-
Greensmith, E.1
Bartlett, P.2
Baxter, J.3
-
6
-
-
84880898477
-
Max-norm projections for factored MDPs
-
Guestrin, C., Koller, D., & Parr, R. (2001). Max-norm projections for factored MDPs. IJCAI-2001 (pp. 673-682).
-
(2001)
IJCAI-2001
, pp. 673-682
-
-
Guestrin, C.1
Koller, D.2
Parr, R.3
-
7
-
-
0002409769
-
Limited discrepancy search
-
Montréal, Québec, Canada: Morgan Kaufmann, 1995
-
Harvey, W. D., & Ginsberg, M. L. (1995). Limited discrepancy search. IJCAI-95 (pp. 607-615). Montréal, Québec, Canada: Morgan Kaufmann, 1995.
-
(1995)
IJCAI-95
, pp. 607-615
-
-
Harvey, W.D.1
Ginsberg, M.L.2
-
8
-
-
84898967749
-
Approximate learning in large POMDPs via reusable trajectories
-
Cambridge, MA: The MIT Press
-
Kearns, M., Mansour, Y., & Ng, A. Y. (2000). Approximate learning in large POMDPs via reusable trajectories. NIPS-1999 (pp. 1001-1007). Cambridge, MA: The MIT Press.
-
(2000)
NIPS-1999
, pp. 1001-1007
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.Y.3
-
9
-
-
4244031272
-
Approximates solutions to factored markov decision processes via greedy search in the space of finite state controllers
-
Kim, K.-E., Dean, T., & Meuleau, N. (2000). Approximates solutions to factored markov decision processes via greedy search in the space of finite state controllers. Artificial Intelligence Planning Systems (pp. 323-330).
-
(2000)
Artificial Intelligence Planning Systems
, pp. 323-330
-
-
Kim, K.-E.1
Dean, T.2
Meuleau, N.3
-
10
-
-
84898938510
-
Actor-critic algorithms
-
Cambridge, MA: The MIT Press
-
Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-critic algorithms. NIPS-1999. Cambridge, MA: The MIT Press.
-
(2000)
NIPS-1999
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
11
-
-
0036832953
-
Variable resolution discretization in optimal control
-
Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
-
(2002)
Machine Learning
, vol.49
, pp. 291-323
-
-
Munos, R.1
Moore, A.2
-
12
-
-
84898967780
-
Policy search via density estimation
-
Cambridge, MA: The MIT Press
-
Ng, A. Y., Parr, R., & Koller, D. (2000). Policy search via density estimation. NIPS-1999. Cambridge, MA: The MIT Press.
-
(2000)
NIPS-1999
-
-
Ng, A.Y.1
Parr, R.2
Koller, D.3
-
13
-
-
0001998385
-
Learning policies with external memory
-
Cambridge, MA: The MIT Press
-
Peshkin, L., Meuleau, N., & Kaelbling, L. P. (1999). Learning policies with external memory. ICML-1999 (pp. 307-314). Cambridge, MA: The MIT Press.
-
(1999)
ICML-1999
, pp. 307-314
-
-
Peshkin, L.1
Meuleau, N.2
Kaelbling, L.P.3
-
15
-
-
18544374225
-
Policy improvement for POMDPs using normalized importance sampling
-
Shelton, C. R. (2001). Policy improvement for POMDPs using normalized importance sampling. UAI-2001 (pp. 496-503).
-
(2001)
UAI-2001
, pp. 496-503
-
-
Shelton, C.R.1
-
17
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Cambridge, MA: The MIT Press
-
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. NIPS-1999 (pp. 1057-1063). Cambridge, MA: The MIT Press.
-
(2000)
NIPS-1999
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
18
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229-256.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
-
19
-
-
84918834208
-
A reinforcement learning approach to job-shop scheduling
-
Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. IJCAI-1995 (pp. 1114-1120).
-
(1995)
IJCAI-1995
, pp. 1114-1120
-
-
Zhang, W.1
Dietterich, T.G.2
|