메뉴 건너뛰기




Volumn 2, Issue , 2003, Pages 776-783

Model-based Policy Gradient Reinforcement Learning

Author keywords

[No Author keywords available]

Indexed keywords

POLICY GRADIENT METHODS; REINFORCEMENT LEARNING; RESOURCE-CONSTRAINED SCHEDULING; STOCHASTIC POLICIES;

EID: 1942451973     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (19)

References (19)
  • 1
    • 1942514241 scopus 로고    scopus 로고
    • Scalable internal-state policy-gradient methods for POMDPs
    • Morgan Kaufmann
    • Aberdeen, D., & Baxter, J. (2002). Scalable internal-state policy-gradient methods for POMDPs. ICML-2002 (pp. 3-10). Morgan Kaufmann.
    • (2002) ICML-2002 , pp. 3-10
    • Aberdeen, D.1    Baxter, J.2
  • 2
    • 0003272616 scopus 로고    scopus 로고
    • Reinforcement learning in POMDPs via direct gradient ascent
    • Cambridge, MA: The MIT Press
    • Baxter, J., & Barlett, P. (2000). Reinforcement learning in POMDPs via direct gradient ascent. ICML-2000 (pp. 41-48). Cambridge, MA: The MIT Press.
    • (2000) ICML-2000 , pp. 41-48
    • Baxter, J.1    Barlett, P.2
  • 3
    • 85166207010 scopus 로고
    • Exploiting structure in policy construction
    • San Francisco: Morgan Kaufmann
    • Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. IJCAI-1995 (pp. 1104-1111). San Francisco: Morgan Kaufmann.
    • (1995) IJCAI-1995 , pp. 1104-1111
    • Boutilier, C.1    Dearden, R.2    Goldszmidt, M.3
  • 4
    • 84899029004 scopus 로고    scopus 로고
    • Batch value function approximation via support vectors
    • Cambridge, MA: The MIT Press
    • Dietterich, T. G., & Wang, X. (2002). Batch value function approximation via support vectors. NIPS-2001 (pp. 1491-1498). Cambridge, MA: The MIT Press.
    • (2002) NIPS-2001 , pp. 1491-1498
    • Dietterich, T.G.1    Wang, X.2
  • 5
    • 84898983933 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • Cambridge, MA: The MIT Press
    • Greensmith, E., Bartlett, P., & Baxter, J. (2002). Variance reduction techniques for gradient estimates in reinforcement learning. NIPS-2001 (pp. 1507-1514). Cambridge, MA: The MIT Press.
    • (2002) NIPS-2001 , pp. 1507-1514
    • Greensmith, E.1    Bartlett, P.2    Baxter, J.3
  • 6
    • 84880898477 scopus 로고    scopus 로고
    • Max-norm projections for factored MDPs
    • Guestrin, C., Koller, D., & Parr, R. (2001). Max-norm projections for factored MDPs. IJCAI-2001 (pp. 673-682).
    • (2001) IJCAI-2001 , pp. 673-682
    • Guestrin, C.1    Koller, D.2    Parr, R.3
  • 7
    • 0002409769 scopus 로고
    • Limited discrepancy search
    • Montréal, Québec, Canada: Morgan Kaufmann, 1995
    • Harvey, W. D., & Ginsberg, M. L. (1995). Limited discrepancy search. IJCAI-95 (pp. 607-615). Montréal, Québec, Canada: Morgan Kaufmann, 1995.
    • (1995) IJCAI-95 , pp. 607-615
    • Harvey, W.D.1    Ginsberg, M.L.2
  • 8
    • 84898967749 scopus 로고    scopus 로고
    • Approximate learning in large POMDPs via reusable trajectories
    • Cambridge, MA: The MIT Press
    • Kearns, M., Mansour, Y., & Ng, A. Y. (2000). Approximate learning in large POMDPs via reusable trajectories. NIPS-1999 (pp. 1001-1007). Cambridge, MA: The MIT Press.
    • (2000) NIPS-1999 , pp. 1001-1007
    • Kearns, M.1    Mansour, Y.2    Ng, A.Y.3
  • 9
    • 4244031272 scopus 로고    scopus 로고
    • Approximates solutions to factored markov decision processes via greedy search in the space of finite state controllers
    • Kim, K.-E., Dean, T., & Meuleau, N. (2000). Approximates solutions to factored markov decision processes via greedy search in the space of finite state controllers. Artificial Intelligence Planning Systems (pp. 323-330).
    • (2000) Artificial Intelligence Planning Systems , pp. 323-330
    • Kim, K.-E.1    Dean, T.2    Meuleau, N.3
  • 10
    • 84898938510 scopus 로고    scopus 로고
    • Actor-critic algorithms
    • Cambridge, MA: The MIT Press
    • Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-critic algorithms. NIPS-1999. Cambridge, MA: The MIT Press.
    • (2000) NIPS-1999
    • Konda, V.R.1    Tsitsiklis, J.N.2
  • 11
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
    • (2002) Machine Learning , vol.49 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 12
    • 84898967780 scopus 로고    scopus 로고
    • Policy search via density estimation
    • Cambridge, MA: The MIT Press
    • Ng, A. Y., Parr, R., & Koller, D. (2000). Policy search via density estimation. NIPS-1999. Cambridge, MA: The MIT Press.
    • (2000) NIPS-1999
    • Ng, A.Y.1    Parr, R.2    Koller, D.3
  • 13
    • 0001998385 scopus 로고    scopus 로고
    • Learning policies with external memory
    • Cambridge, MA: The MIT Press
    • Peshkin, L., Meuleau, N., & Kaelbling, L. P. (1999). Learning policies with external memory. ICML-1999 (pp. 307-314). Cambridge, MA: The MIT Press.
    • (1999) ICML-1999 , pp. 307-314
    • Peshkin, L.1    Meuleau, N.2    Kaelbling, L.P.3
  • 15
    • 18544374225 scopus 로고    scopus 로고
    • Policy improvement for POMDPs using normalized importance sampling
    • Shelton, C. R. (2001). Policy improvement for POMDPs using normalized importance sampling. UAI-2001 (pp. 496-503).
    • (2001) UAI-2001 , pp. 496-503
    • Shelton, C.R.1
  • 17
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Cambridge, MA: The MIT Press
    • Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. NIPS-1999 (pp. 1057-1063). Cambridge, MA: The MIT Press.
    • (2000) NIPS-1999 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 18
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229-256.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 19
    • 84918834208 scopus 로고
    • A reinforcement learning approach to job-shop scheduling
    • Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. IJCAI-1995 (pp. 1114-1120).
    • (1995) IJCAI-1995 , pp. 1114-1120
    • Zhang, W.1    Dietterich, T.G.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.