SCOPUS 정보 검색 플랫폼

NIPS 2002: Proceedings of the 15th International Conference on Neural Information Processing Systems

Volumn , Issue , 2002, Pages 1595-1602

A Convergent Form of Approximate Policy Iteration

(2) Perkins, Theodore J a Precup, Doina b

a University of Massachusetts Amherst (United States)

b MCGILL UNIVERSITY (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL RESOURCES; CONVERGENCE RESULTS; FREEFORMS; LIPSCHITZ CONTINUOUS; MODEL FREE; POLICY EVALUATION; POLICY ITERATION; POLICY ITERATION ALGORITHMS; POLICY-BASED; VALUE FUNCTION APPROXIMATION;

EID: 22944468429 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (38)

References (17)

1
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Morgan Kaufmann
- L. C. Baird. Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pages 30-37. Morgan Kaufmann, 1995.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
- Baird, L. C.¹

2
- 0029210635
- Learning to act using real-time dynamic programming
- A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81-138, 1995.
- (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
- Barto, A. G.¹ Bradtke, S. J.² Singh, S. P.³

3
- 0003565783
- Athena Scientific
- D. P. Bertsekas. Dynamic Programming and Optimal Control, Volumes 1 and 2. Athena Scientific, 2001.
- (2001) Dynamic Programming and Optimal Control, Volumes 1 and 2
- Bertsekas, D. P.¹

4
- 0003487482
- Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D. P.¹ Tsitsiklis, J. N.²

5
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- D. P. De Farias and B. Van Roy. On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Opt. Theory and Applications, 105(3), 2000.
- (2000) Journal of Opt. Theory and Applications , vol.105 , Issue.3
- De Farias, D. P.¹ Van Roy, B.²

6
- 57649089060
- Chattering in Sarsa(?)
- G. Gordon. Chattering in Sarsa(?). CMU Learning Lab Internal Report. Available at www.cs.cmu.edu/~ggordon, 1996.
- (1996) CMU Learning Lab Internal Report
- Gordon, G.¹

7
- 0003989207
- PhD thesis, Carnegie Mellon University
- G. Gordon. Approximate Solutions to Markov Decision Processes. PhD thesis, Carnegie Mellon University, 1999.
- (1999) Approximate Solutions to Markov Decision Processes
- Gordon, G.¹

8
- 84898995808
- Reinforcement learning with function approximation converges to a region
- MIT Press
- G. J. Gordon. Reinforcement learning with function approximation converges to a region. Advances in Neural Information Processing Systems 13, pages 1040-1046. MIT Press, 2001.
- (2001) Advances in Neural Information Processing Systems , vol.13 , pp. 1040-1046
- Gordon, G. J.¹

9
- 0003736354
- SIAM
- C. D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2000.
- (2000) Matrix Analysis and Applied Linear Algebra
- Meyer, C. D.¹

10
- 56449099734
- On the existence of fixed points for Q-learning and Sarsa in partially observable domains
- T. J. Perkins and M. D. Pendrith. On the existence of fixed points for Q-learning and Sarsa in partially observable domains. In Proceedings of the Nineteenth International Conference on Machine Learning, 2002.
- (2002) Proceedings of the Nineteenth International Conference on Machine Learning
- Perkins, T. J.¹ Pendrith, M. D.²

11
- 0003998452
- John Wiley & Sons, Inc, New York
- M. L. Puterman. Markov Decision Processes: Disrete Stochastic Dynamic Programming. John Wiley & Sons, Inc, New York, 1994.
- (1994) Markov Decision Processes: Disrete Stochastic Dynamic Programming
- Puterman, M. L.¹

12
- 0037886159
- Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains
- W. J. Stewart, editor, Dekker, NY
- E. Seneta. Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains. In W. J. Stewart, editor, Numerical Solutions of Markov Chains. Dekker, NY, 1991.
- (1991) Numerical Solutions of Markov Chains
- Seneta, E.¹

13
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
- (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
- Singh, S.¹ Jaakkola, T.² Littman, M. L.³ Szepesvari, C.⁴

14
- 0004102479
- MIT Press/Bradford Books, Cambridge, Massachusetts
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press/Bradford Books, Cambridge, Massachusetts, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R. S.¹ Barto, A. G.²

15
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves master-level play
- G. J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
- (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
- Tesauro, G. J.¹

16
- 0033351917
- Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- J. N. Tsitsiklis and B. Van Roy. Optimal stopping of markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Transactions on Automatic Control, 44(10):1840-1851, 1999.
- (1999) IEEE Transactions on Automatic Control , vol.44 , Issue.10 , pp. 1840-1851
- Tsitsiklis, J. N.¹ Van Roy, B.²

17
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J. N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.