SCOPUS 정보 검색 플랫폼

Volumn 15, Issue , 2011, Pages 119-127

Dynamic policy programming with function approximation

(3) Azar, Mohammad Gheshlaghi a Gómez, Vicenç a Kappen, Hilbert J a

a RADBOUD UNIVERSITY NIJMEGEN (Netherlands)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATE DYNAMIC PROGRAMMING; ASYMPTOTIC BOUNDS; DYNAMIC POLICY; FUNCTION APPROXIMATION; FUNCTION APPROXIMATION TECHNIQUES; INFINITE HORIZONS; LOSS BOUNDS; MARKOV DECISION PROBLEM; OPTIMAL POLICIES;

ARTIFICIAL INTELLIGENCE; REINFORCEMENT LEARNING;

DYNAMIC PROGRAMMING;

EID: 84862300689 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Conference Paper

Times cited : (33)

References (22)

1
- 85161978146
- Fitted q-iteration in continuous action-space mdps
- Antos, A., Munos, R., and Szepesvári, C. (2008). Fitted q-iteration in continuous action-space mdps. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems.
- (2008) Proceedings of the 21st Annual Conference on Neural Information Processing Systems
- Antos, A.¹ Munos, R.² Szepesvári, C.³

2
- 35248866146
- An introduction to reinforcement learning theory: Value function methods
- Bartlett, P. L. (2003). An introduction to reinforcement learning theory: Value function methods. Lecture Notes in Artificial Intelligence, 2600/2003:184-202.
- (2003) Lecture Notes in Artificial Intelligence , vol.2600 , Issue.2003 , pp. 184-202
- Bartlett, P.L.¹

3
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter, J. and Bartlett, P. L. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319-350.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

4
- 0003565783
- Athena Scientific, Belmount, Massachusetts, third edition
- Bertsekas, D. P. (2007). Dynamic Programming and Optimal Control, volume II. Athena Scientific, Belmount, Massachusetts, third edition.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

5
- 0003487482
- Athena Scientific, Belmont, Massachusetts
- Bertsekas, D. P. and Tsitsiklis, J. N. (1996). Neuro- Dynamic Programming. Athena Scientific, Belmont, Massachusetts.
- (1996) Neuro- Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 70349984547
- Natural actor-critic algorithms
- Bhatnagar, S., Sutton, R. S., Ghavamzadeh, M., and Lee, M. (2009). Natural actor-critic algorithms. Automatica, 45(11):2471-2482.
- (2009) Automatica , vol.45 , Issue.11 , pp. 2471-2482
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

7
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- Boyan, J. A. and Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems, pages 369-376.
- (1995) Advances in Neural Information Processing Systems , pp. 369-376
- Boyan, J.A.¹ Moore, A.W.²

8
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- Farias, D. P. and Roy, B. V. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3):589-608.
- (2000) Journal of Optimization Theory and Applications , vol.105 , Issue.3 , pp. 589-608
- Farias, D.P.¹ Roy, B.V.²

9
- 33646243319
- Natural policy gradient
- Vancouver, British Columbia, Canada
- Kakade, S. (2001). Natural policy gradient. In Advances in Neural Information Processing Systems 14, pages 1531-1538, Vancouver, British Columbia, Canada.
- (2001) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
- Kakade, S.¹

10
- 29044440299
- Path integrals and symmetry breaking for optimal control theory
- Kappen, H. J. (2005). Path integrals and symmetry breaking for optimal control theory. Statistical Mechanics, 2005(11):P11011.
- (2005) Statistical Mechanics , vol.2005 , Issue.11 , pp. 11011
- Kappen, H.J.¹

11
- 0004272772
- Cambridge University Press, Cambridge, United Kingdom, first edition
- MacKay, D. J. C. (2003). Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge, United Kingdom, first edition.
- (2003) Information Theory, Inference, and Learning Algorithms
- MacKay, D.J.C.¹

12
- 29344453913
- Error bounds for approximate value iteration
- Pittsburgh, Pennsylvania
- Munos, R. (2005). Error bounds for approximate value iteration. In Proceedings of the 20th National Conference on Artificial Intelligence, volume II, pages 1006-1011, Pittsburgh, Pennsylvania.
- (2005) Proceedings of the 20th National Conference on Artificial Intelligence , vol.2 , pp. 1006-1011
- Munos, R.¹

13
- 44649189852
- Finite-time bounds for fitted value iteration
- Munos, R. and Szepesvári, C. (2008). Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9:815-857.
- (2008) Journal of Machine Learning Research , vol.9 , pp. 815-857
- Munos, R.¹ Szepesvári, C.²

14
- 77958569725
- Relative entropy policy search
- Peters, J., Mülling, K., and Altun, Y. (2010). Relative entropy policy search. In Proceedings of the Twenty- Fourth AAAI Conference on Artificial Intelligence, Atlanta, Georgia, USA.
- (2010) Proceedings of the Twenty- Fourth AAAI Conference on Artificial Intelligence, Atlanta, Georgia, USA
- Peters, J.¹ Mülling, K.² Altun, Y.³

15
- 40649106649
- Natural actor-critic
- Peters, J. and Schaal, S. (2008). Natural actor-critic. Neurocomputing, 71(7-9):1180-1190.
- (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

16
- 85156221438
- Generalization in reinforcement learning: Succesful examples using sparse coarse coding
- Sutton, R. S. (1996). Generalization in reinforcement learning: succesful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 9, pages 1038-1044.
- (1996) Advances in Neural Information Processing Systems , vol.9 , pp. 1038-1044
- Sutton, R.S.¹

17
- 0004102479
- MIT Press, Cambridge, Massachusetts
- Sutton, R. S. and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, Massachusetts.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

18
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Denver, Colorado, USA
- Sutton, R. S., McAllester, D., Singh, S., and Mansour, Y. (1999). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057-1063, Denver, Colorado, USA.
- (1999) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

19
- 77956531256
- Reinforcement learning algorithms for mdps - A survey
- University of Alberta, Edmonton, Alberta, Canada
- Szepesvari, C. (2009). Reinforcement learning algorithms for mdps - a survey. Technical Report TR09-13, Department of Computing Science, University of Alberta, Edmonton, Alberta, Canada.
- (2009) Technical Report TR09-13, Department of Computing Science
- Szepesvari, C.¹

20
- 77956525931
- Least-squares policy iteration: Bias- variance trade-off in control problems
- Thiery, C. and Scherrer, B. (2010). Least-squares policy iteration: Bias- variance trade-off in control problems. In Proceedings of the 27th Annual International Conference on Machine Learning.
- (2010) Proceedings of the 27th Annual International Conference on Machine Learning
- Thiery, C.¹ Scherrer, B.²

21
- 84864055301
- Linearly-solvable markov decision problems
- Vancouver, British Columbia, Canada
- Todorov, E. (2006). Linearly-solvable markov decision problems. In Proceedings of the 20th Annual Conference on Neural Information Processing Systems, pages 1369-1376, Vancouver, British Columbia, Canada.
- (2006) Proceedings of the 20th Annual Conference on Neural Information Processing Systems , pp. 1369-1376
- Todorov, E.¹

22
- 85161971158
- Stable dual dynamic programming
- Vancouver, British Columbia, Canada
- Wang, T., Lizotte, D., Bowling, M., and Schuurmans, D. (2007). Stable dual dynamic programming. In Proceedings of the 21st Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada.
- (2007) Proceedings of the 21st Annual Conference on Neural Information Processing Systems
- Wang, T.¹ Lizotte, D.² Bowling, M.³ Schuurmans, D.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.