SCOPUS 정보 검색 플랫폼

Volumn 13, Issue , 2012, Pages 3207-3245

Dynamic policy programming

(3) Azar, Mohammad Gheshlaghi a Ǵomez, Vicenç a Kappen, Hilbert J a

a RADBOUD UNIVERSITY NIJMEGEN (Netherlands)

Author keywords

Approximate dynamic programming; Function approximation; Markov decision processes; Monte Carlo methods; Reinforcement learning

Indexed keywords

ACCUMULATED ERRORS; APPROXIMATE DYNAMIC PROGRAMMING; AVERAGE ERRORS; BENCH-MARK PROBLEMS; DYNAMIC POLICY; ESTIMATION ERRORS; FUNCTION APPROXIMATION; GRADUAL CHANGES; INCREMENTAL ALGORITHM; INFINITE HORIZONS; MARKOV DECISION PROCESSES; NUMBER OF SAMPLES; OPTIMAL POLICIES; POLICY ITERATION; REINFORCEMENT LEARNING METHOD; SAMPLING-BASED; SUPREMUM; THEORETICAL RESULT;

BENCHMARKING; ERRORS; ITERATIVE METHODS; MARKOV PROCESSES; OPTIMIZATION; REINFORCEMENT LEARNING;

DYNAMIC PROGRAMMING;

EID: 84870922246 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Article

Times cited : (125)

References (50)

1
- 85161978146
- Fitted q-iteration in continuous action-space mdps
- MIT Press
- A. Antos, R. Munos, and C. Szepesv'ari. Fitted Q-iteration in continuous action-space MDPs. In Advances in Neural Information Processing Systems 20, pages 9-16. MIT Press, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 9-16
- Antos, A.¹ Munos, R.² Szepesvari, C.³

2
- 84963721095
- Speedy q-learning
- MIT Press
- M. Gheshlaghi Azar, R. Munos, M. Ghavamzadeh, and H. J. Kappen. Speedy Q-learning. In Advances in Neural Information Processing Systems 24, pages 2411-2419. MIT Press, 2012.
- (2012) Advances in Neural Information Processing Systems , vol.2 , pp. 2411-2419
- Gheshlaghi Azar, M.¹ Munos, R.² Ghavamzadeh, M.³ Kappen, H.J.⁴

3
- 84858765598
- Covariant policy search
- J. A. Bagnell and J. G. Schneider. Covariant policy search. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 1019-1024. Morgan Kaufmann, 2003.
- (2003) Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 1019-1024. Morgan Kaufmann
- Bagnell, J.A.¹ Schneider, J.G.²

4
- 80053161827
- Regal: A regularization based algorithm for reinforcement learning in weakly communicating mdps
- AUAI Press
- P. L. Bartlett and A. Tewari. REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 35-42. AUAI Press, 2009.
- (2009) Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, pages 35-42
- Bartlett, P.L.¹ Tewari, A.²

5
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- A. G. Barto, R. S. Sutton, and C.W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, SMC-13(5): 834-846, 1983. (Pubitemid 14138646)
- (1983) IEEE Transactions on Systems, Man and Cybernetics , vol.13 , Issue.5 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

6
- 0013535965
- Infinite-horizon policy-gradient estimation
- J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15(1):319-350, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , Issue.1 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

7
- 0003565783
- Athena Scientific, third edition
- D. P. Bertsekas. Dynamic Programming and Optimal Control, volume II. Athena Scientific, third edition, 2007.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

8
- 0003351108
- Neuro-dynamic programming
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
- (1996) Athena Scientific
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

9
- 70349984547
- Natural actor-critic algorithms
- S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee. Natural actor-critic algorithms. Automatica, 45(11):2471-2482, 2009.
- (2009) Automatica , vol.45 , Issue.11 , pp. 2471-2482
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

10
- 84926078662
- Cambridge University Press
- N. Cesa-Bianchi and G. Lugosi. Prediction, Learning, and Games. Cambridge University Press, 2006.
- (2006) Prediction, Learning, and Games
- Cesa-Bianchi, N.¹ Lugosi, G.²

11
- 84870924061
- Hierarchical relative entropy policy search
- C. Daniel, G. Neumann, and J. Peters. Hierarchical relative entropy policy search. Journal of Machine Learning Research - Proceedings Track, 22:273-281, 2012.
- (2012) Journal of Machine Learning Research - Proceedings Track , vol.22 , pp. 273-281
- Daniel, C.¹ Neumann, G.² Peters, J.³

12
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- D. P. de Farias and B. Van Roy. On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105(3):589-608, 2000.
- (2000) Journal of Optimization Theory and Applications , vol.105 , Issue.3 , pp. 589-608
- De Farias, D.P.¹ Van Roy, B.²

13
- 21844465127
- Tree-based batch mode reinforcement learning
- (Apr
- D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(Apr):503-556, 2005.
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

14
- 14344266002
- Learning rates for q-learning
- (Dec
- E. Even-Dar and Y.Mansour. Learning rates for Q-learning. Journal ofMachine Learning Research, 5(Dec):1-25, 2003.
- (2003) Journal ofMachine Learning Research , vol.5 , pp. 1-25
- Even-Dar, E.¹ Mansour, Y.²

15
- 58449110583
- Regularized fitted q-iteration: Application to planning
- Springer
- A. Farahmand, M. Ghavamzadeh, Cs. Szepesv́ari, and S. Mannor. Regularized fitted Q-iteration: Application to planning. In European Workshop on Reinforcement Learning, Lecture Notes in Computer Science, pages 55-68. Springer, 2008.
- (2008) European Workshop on Reinforcement Learning, Lecture Notes in Computer Science , pp. 55-68
- Farahmand, A.¹ Ghavamzadeh, M.² Szepesv́ari, Cs.³ Mannor, S.⁴

16
- 70049096468
- Regularized policy iteration
- Curran Associates, Inc
- A. Farahmand, M. Ghavamzadeh, Cs. Szepesv'ari, and S. Mannor. Regularized policy iteration. In Advances in Neural Information Processing Systems 21, pages 441-448. Curran Associates, Inc., 2009.
- (2009) Advances in Neural Information Processing Systems , vol.2 , pp. 441-448
- Farahmand, A.¹ Ghavamzadeh, M.² Szepesv'ari, Cs.³ Mannor, S.⁴

17
- 85162063395
- Error propagation for approximate policy and value iteration
- MIT Press
- A. Farahmand, R. Munos, and Cs. Szepesv́ari. Error propagation for approximate policy and value iteration. In Advances in Neural Information Processing Systems 23, pages 568-576. MIT Press, 2010.
- (2010) Advances in Neural Information Processing Systems , vol.23 , pp. 568-576
- Farahmand, A.¹ Munos, R.² Szepesv́ari, Cs.³

18
- 0000121609
- The law of large numbers and the central limit theorem in banach spaces
- J. Hoffmann-Jørgensen and G. Pisier. The law of large numbers and the central limit theorem in banach spaces. The Annals of Probability, 4(4):587-599, 1976.
- (1976) The Annals of Probability , vol.4 , Issue.4 , pp. 587-599
- Hoffmann-Jørgensen, J.¹ Pisier, G.²

19
- 0000439891
- On the convergence of stochastic iterative dynamic programming
- T. Jaakkola, M. I. Jordan, and S. Singh. On the convergence of stochastic iterative dynamic programming. Neural Computation, 6(6):1185-1201, 1994.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.³

20
- 77951952841
- Near-optimal regret bounds for reinforcement learning
- (Apr
- T. Jaksch, R. Ortner, and P. Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11(Apr):1563-1600, 2010.
- (2010) Journal of Machine Learning Research , vol.11 , pp. 1563-1600
- Jaksch, T.¹ Ortner, R.² Auer, P.³

21
- 84898930479
- Natural policy gradient
- MIT Press
- S. Kakade. Natural policy gradient. In Advances in Neural Information Processing Systems 14, pages 1531-1538. MIT Press, 2002.
- (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
- Kakade, S.¹

22
- 29044440299
- Path integrals and symmetry breaking for optimal control theory
- H. J. Kappen. Path integrals and symmetry breaking for optimal control theory. Statistical Mechanics, 2005(11):P11011, 2005.
- (2005) Statistical Mechanics , vol.11
- Kappen, H.J.¹

23
- 84899026236
- Finite-sample convergence rates for q-learning and indirect algorithms
- MIT Press
- M. Kearns and S. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 12, pages 996-1002. MIT Press, 1999.
- (1999) Advances in Neural Information Processing Systems , vol.12 , pp. 996-1002
- Kearns, M.¹ Singh, S.²

24
- 84858754385
- Policy search for motor primitives in robotics
- MIT Press
- J. Kober and J. Peters. Policy search for motor primitives in robotics. In Advances in Neural Information Processing Systems 21, pages 849-856. MIT Press, 2009.
- (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 849-856
- Kober, J.¹ Peters, J.²

25
- 0027704767
- Complexity analysis of real-time reinforcement learning
- AAAI Press
- S. Koenig and R. G. Simmons. Complexity analysis of real-time reinforcement learning. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 99-105. AAAI Press, 1993.
- (1993) Proceedings of the Eleventh National Conference on Artificial Intelligence , pp. 99-105
- Koenig, S.¹ Simmons, R.G.²

26
- 4043069840
- On actor-critic algorithms
- V. Konda and J. N. Tsitsiklis. On actor-critic algorithms. SIAM Journal on Control and Optimization, 42(4):1143-1166, 2003.
- (2003) SIAM Journal on Control and Optimization , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.¹ Tsitsiklis, J.N.²

27
- 4644323293
- Least-squares policy iteration
- (Dec
- M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4(Dec):1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

28
- 0004272772
- Cambridge University Press
- D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
- (2003) Information Theory Inference, Learning Algorithms
- MacKay, D.J.C.¹

29
- 77956541799
- Toward off-policy learning control with function approximation
- Omnipress
- H. Maei, Cs. Szepesv'ari, S. Bhatnagar, and R. S. Sutton. Toward off-policy learning control with function approximation. In Proceedings of the 27th International Conference on Machine Learning, pages 719-726. Omnipress, 2010.
- (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 719-726
- Maei, H.¹ Szepesvari, Cs.² Bhatnagar, S.³ Sutton, R.S.⁴

30
- 56449091120
- An analysis of reinforcement learning with function approximation
- ACM Press
- F. Melo, S. Meyn, and I. Ribeiro. An analysis of reinforcement learning with function approximation. In Proceedings of the 25th International Conference on Machine Learning, pages 664-671. ACM Press, 2008.
- (2008) Proceedings of the 25th International Conference on Machine Learning , pp. 664-671
- Melo, F.¹ Meyn, S.² Ribeiro, I.³

31
- 29344453913
- Error bounds for approximate value iteration
- Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
- R. Munos. Error bounds for approximate value iteration. In Proceedings of the 20th national conference on Artificial intelligence - Volume 2, pages 1006-1011. AAAI Press, 2005. (Pubitemid 43006738)
- (2005) Proceedings of the National Conference on Artificial Intelligence , vol.2 , pp. 1006-1011
- Munos, R.¹

32
- 44649189852
- Finite-time bounds for fitted value iteration
- (May
- R. Munos and Cs. Szepesv́ari. Finite-time bounds for fitted value iteration. Journal of Machine Learning Research, 9(May):815-857, 2008.
- (2008) Journal of Machine Learning Research , vol.9 , pp. 815-857
- Munos, R.¹ Szepesv́ari, Cs.²

33
- 84898960655
- A convergent form of approximate policy iteration
- MIT Press
- T. J. Perkins and D. Precup. A convergent form of approximate policy iteration. In Advances in Neural Information Processing Systems 15, pages 1595-1602. MIT Press, 2003.
- (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1595-1602
- Perkins, T.J.¹ Precup, D.²

34
- 40649106649
- Natural actor-critic
- J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71(7-9):1180-1190, 2008.
- (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

35
- 77958569725
- Relative entropy policy search
- AAAI Press
- J. Peters, K. Mǔlling, and Y. Altun. Relative entropy policy search. In Proceedings of the Twenty- Fourth AAAI Conference on Artificial Intelligence, pages 1607-1612. AAAI Press, 2010.
- (2010) Proceedings of the Twenty- Fourth AAAI Conference on Artificial Intelligence , pp. 1607-1612
- Peters, J.¹ Mǔlling, K.² Altun, Y.³

36
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- DOI 10.1023/A:1007678930559
- S. Singh, T. Jaakkola, M.L. Littman, and Cs. Szepesv́ari. Convergence results for single-step onpolicy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000. (Pubitemid 30572449)
- (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
- Singh, S.¹ Jaakkola, T.² Littman, M.L.³ Szepesvari, C.⁴

37
- 84865114997
- An information-theoretic approach to curiosity-driven reinforcement learning
- S. Still and D. Precup. An information-theoretic approach to curiosity-driven reinforcement learning. Theory in Biosciences, 131(3):139-148, 2012.
- (2012) Theory in Biosciences , vol.131 , Issue.3 , pp. 139-148
- Still, S.¹ Precup, D.²

38
- 73549084301
- Reinforcement learning in finite mdps: Pac analysis
- (Nov
- A. L. Strehl, L. Li, and M. L. Littman. Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10(Nov):2413-2444, 2009.
- (2009) Journal of Machine Learning Research , vol.10 , pp. 2413-2444
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

39
- 0004102479
- MIT Press
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction.
- Sutton, R.S.¹ Barto, A.G.²

40
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- MIT Press
- R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pages 1057-1063. MIT Press, 2000.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

41
- 84898998140
- The asymptotic convergence-rate of q-learning
- MIT Press
- Cs. Szepesv́ari. The asymptotic convergence-rate of Q-learning. In Advances in Neural Information Processing Systems 10, pages 1064-1070. MIT Press, 1998.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1064-1070
- Szepesv́ari, Cs.¹

42
- 77955824148
- Morgan & Claypool Publishers
- Cs. Szepesv́ari. Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2010.
- Algorithms for Reinforcement Learning. Synthesis Lectures on Artificial Intelligence and Machine Learning , vol.2010
- Szepesv́ari, Cs.¹

43
- 14344263882
- Interpolation-based Q-learning
- Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004
- Cs. Szepesv́ari and W. Smart. Interpolation-based Q-learning. In Proceedings of 21st International Conference on Machine Learning, pages 791-798. ACM Press, 2004. (Pubitemid 40290882)
- (2004) Proceedings, Twenty-First International Conference on Machine Learning, ICML 2004 , pp. 791-798
- Szepesvari, C.¹ Smart, W.D.²

44
- 77956520676
- Model-based reinforcement learning with nearly tight exploration complexity bounds
- Omnipress
- I. Szita and Cs. Szepesv́ari. Model-based reinforcement learning with nearly tight exploration complexity bounds. In Proceedings of the 27th International Conference on Machine Learning, pages 1031-1038. Omnipress, 2010.
- (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 1031-1038
- Szita, I.¹ Szepesv́ari, Cs.²

45
- 77956525931
- Least-squares lambda policy iteration: Bias-variance trade-off in control problems
- Omnipress
- C. Thiery and B. Scherrer. Least-squares lambda policy iteration: Bias-variance trade-off in control problems. In Proceedings of the 27th International Conference onMachine Learning. Omnipress, 2010.
- (2010) Proceedings of the 27th International Conference onMachine Learning
- Thiery, C.¹ Scherrer, B.²

46
- 84864055301
- Linearly-solvable markov decision problems
- MIT Press
- E. Todorov. Linearly-solvable Markov decision problems. In Advances in Neural Information Processing Systems 19, pages 1369-1376. MIT Press, 2007.
- (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 1369-1376
- Todorov, E.¹

47
- 71249143630
- Model-free reinforcement learning as mixture learning
- ACM Press
- N. Vlassis andM. Toussaint. Model-free reinforcement learning as mixture learning. In Proceedings of the 26th International Conference on Machine Learning, pages 1081-1088. ACM Press, 2009.
- (2009) Proceedings of the 26th International Conference on Machine Learning , pp. 1081-1088
- Vlassis, N.¹ Toussaint, M.²

48
- 34548784027
- Dual representations for dynamic programming and reinforcement learning
- DOI 10.1109/ADPRL.2007.368168, 4220813, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
- T. Wang, M. Bowling, and D. Schuurmans. Dual representations for dynamic programming and reinforcement learning. In IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, pages 44-51. IEEE Press, 2007. (Pubitemid 47431365)
- (2007) Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 , pp. 44-51
- Wang, T.¹ Bowlingm, M.² Schuurmans, D.³

49
- 85161971158
- Stable dual dynamic programming
- MIT Press
- T. Wang, D. Lizotte, M. Bowling, and D. Schuurmans. Stable dual dynamic programming. In Advances in Neural Information Processing Systems 20, pages 1569-1576. MIT Press, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1569-1576
- Wang, T.¹ Lizotte, D.² Bowling, M.³ Schuurmans, D.⁴

50
- 34249833101
- Q-learning
- C. Watkins and P. Dayan. Q-learning. Machine Learning, 3(8):279-292, 1992.
- (1992) Machine Learning , vol.3 , Issue.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.