SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Proceedings, Twentieth International Conference on Machine Learning

Volumn 2, Issue , 2003, Pages 776-783

Model-based Policy Gradient Reinforcement Learning

(2) Wang, Xin a Dietterich, Thomas G a

a Oregon State University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

POLICY GRADIENT METHODS; REINFORCEMENT LEARNING; RESOURCE-CONSTRAINED SCHEDULING; STOCHASTIC POLICIES;

ALGORITHMS; BENCHMARKING; HEURISTIC METHODS; MATHEMATICAL MODELS; MATRIX ALGEBRA; MONTE CARLO METHODS; RANDOM PROCESSES; SCHEDULING;

LEARNING SYSTEMS;

EID: 1942451973 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (19)

References (19)

1
- 1942514241
- Scalable internal-state policy-gradient methods for POMDPs
- Morgan Kaufmann
- Aberdeen, D., & Baxter, J. (2002). Scalable internal-state policy-gradient methods for POMDPs. ICML-2002 (pp. 3-10). Morgan Kaufmann.
- (2002) ICML-2002 , pp. 3-10
- Aberdeen, D.¹ Baxter, J.²

2
- 0003272616
- Reinforcement learning in POMDPs via direct gradient ascent
- Cambridge, MA: The MIT Press
- Baxter, J., & Barlett, P. (2000). Reinforcement learning in POMDPs via direct gradient ascent. ICML-2000 (pp. 41-48). Cambridge, MA: The MIT Press.
- (2000) ICML-2000 , pp. 41-48
- Baxter, J.¹ Barlett, P.²

3
- 85166207010
- Exploiting structure in policy construction
- San Francisco: Morgan Kaufmann
- Boutilier, C., Dearden, R., & Goldszmidt, M. (1995). Exploiting structure in policy construction. IJCAI-1995 (pp. 1104-1111). San Francisco: Morgan Kaufmann.
- (1995) IJCAI-1995 , pp. 1104-1111
- Boutilier, C.¹ Dearden, R.² Goldszmidt, M.³

4
- 84899029004
- Batch value function approximation via support vectors
- Cambridge, MA: The MIT Press
- Dietterich, T. G., & Wang, X. (2002). Batch value function approximation via support vectors. NIPS-2001 (pp. 1491-1498). Cambridge, MA: The MIT Press.
- (2002) NIPS-2001 , pp. 1491-1498
- Dietterich, T.G.¹ Wang, X.²

5
- 84898983933
- Variance reduction techniques for gradient estimates in reinforcement learning
- Cambridge, MA: The MIT Press
- Greensmith, E., Bartlett, P., & Baxter, J. (2002). Variance reduction techniques for gradient estimates in reinforcement learning. NIPS-2001 (pp. 1507-1514). Cambridge, MA: The MIT Press.
- (2002) NIPS-2001 , pp. 1507-1514
- Greensmith, E.¹ Bartlett, P.² Baxter, J.³

6
- 84880898477
- Max-norm projections for factored MDPs
- Guestrin, C., Koller, D., & Parr, R. (2001). Max-norm projections for factored MDPs. IJCAI-2001 (pp. 673-682).
- (2001) IJCAI-2001 , pp. 673-682
- Guestrin, C.¹ Koller, D.² Parr, R.³

7
- 0002409769
- Limited discrepancy search
- Montréal, Québec, Canada: Morgan Kaufmann, 1995
- Harvey, W. D., & Ginsberg, M. L. (1995). Limited discrepancy search. IJCAI-95 (pp. 607-615). Montréal, Québec, Canada: Morgan Kaufmann, 1995.
- (1995) IJCAI-95 , pp. 607-615
- Harvey, W.D.¹ Ginsberg, M.L.²

8
- 84898967749
- Approximate learning in large POMDPs via reusable trajectories
- Cambridge, MA: The MIT Press
- Kearns, M., Mansour, Y., & Ng, A. Y. (2000). Approximate learning in large POMDPs via reusable trajectories. NIPS-1999 (pp. 1001-1007). Cambridge, MA: The MIT Press.
- (2000) NIPS-1999 , pp. 1001-1007
- Kearns, M.¹ Mansour, Y.² Ng, A.Y.³

9
- 4244031272
- Approximates solutions to factored markov decision processes via greedy search in the space of finite state controllers
- Kim, K.-E., Dean, T., & Meuleau, N. (2000). Approximates solutions to factored markov decision processes via greedy search in the space of finite state controllers. Artificial Intelligence Planning Systems (pp. 323-330).
- (2000) Artificial Intelligence Planning Systems , pp. 323-330
- Kim, K.-E.¹ Dean, T.² Meuleau, N.³

10
- 84898938510
- Actor-critic algorithms
- Cambridge, MA: The MIT Press
- Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-critic algorithms. NIPS-1999. Cambridge, MA: The MIT Press.
- (2000) NIPS-1999
- Konda, V.R.¹ Tsitsiklis, J.N.²

11
- 0036832953
- Variable resolution discretization in optimal control
- Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
- (2002) Machine Learning , vol.49 , pp. 291-323
- Munos, R.¹ Moore, A.²

12
- 84898967780
- Policy search via density estimation
- Cambridge, MA: The MIT Press
- Ng, A. Y., Parr, R., & Koller, D. (2000). Policy search via density estimation. NIPS-1999. Cambridge, MA: The MIT Press.
- (2000) NIPS-1999
- Ng, A.Y.¹ Parr, R.² Koller, D.³

13
- 0001998385
- Learning policies with external memory
- Cambridge, MA: The MIT Press
- Peshkin, L., Meuleau, N., & Kaelbling, L. P. (1999). Learning policies with external memory. ICML-1999 (pp. 307-314). Cambridge, MA: The MIT Press.
- (1999) ICML-1999 , pp. 307-314
- Peshkin, L.¹ Meuleau, N.² Kaelbling, L.P.³

14
- 0003584577
- Prentice Hall
- Russell, S., & Norvig, P. (1995). Artificial intelligence, a modern approach. Prentice Hall.
- (1995) Artificial Intelligence, a Modern Approach
- Russell, S.¹ Norvig, P.²

15
- 18544374225
- Policy improvement for POMDPs using normalized importance sampling
- Shelton, C. R. (2001). Policy improvement for POMDPs using normalized importance sampling. UAI-2001 (pp. 496-503).
- (2001) UAI-2001 , pp. 496-503
- Shelton, C.R.¹

16
- 0004102479
- Cambridge, MA: MIT Press
- Sutton, R., & Barto, A. (1998). Reinforcement learning: an introduction. Cambridge, MA: MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

17
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Cambridge, MA: The MIT Press
- Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. NIPS-1999 (pp. 1057-1063). Cambridge, MA: The MIT Press.
- (2000) NIPS-1999 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

18
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8, 229-256.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

19
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. IJCAI-1995 (pp. 1114-1120).
- (1995) IJCAI-1995 , pp. 1114-1120
- Zhang, W.¹ Dietterich, T.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.