SCOPUS 정보 검색 플랫폼

Proceedings of the 25th International Conference on Machine Learning

Volumn , Issue , 2008, Pages 664-671

An analysis of reinforcement learning with function approximation

(3) Melo, Francisco S a Meyn, Sean P b Ribeiro, M Isabel c

a CARNEGIE MELLON UNIVERSITY (United States)

b UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

c INSTITUTE FOR SYSTEMS AND ROBOTICS (Portugal)

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; MACHINE LEARNING; STOCHASTIC SYSTEMS; FUNCTIONS; LEARNING SYSTEMS; PROBABILITY DENSITY FUNCTION; REINFORCEMENT; REINFORCEMENT LEARNING; ROBOT LEARNING;

APPROXIMATE METHODS; CONVERGENCE PROPERTIES; FUNCTION APPROXIMATION; INFINITE STATE SPACE; MARKOV DECISION PROBLEM; REINFORCEMENT LEARNING WITH FUNCTION APPROXIMATIONS; RELATED WORKS; STOCHASTIC CONTROL;

REINFORCEMENT LEARNING; EDUCATION;

APPROXIMATE METHODS; CONVERGENCE PROPERTIES; FUNCTION APPROXIMATIONS; INFINITE STATE; MARKOV DECISION PROBLEMS; Q FUNCTIONS; Q-LEARNING; REINFORCEMENT LEARNING WITH FUNCTION APPROXIMATIONS; STOCHASTIC CONTROLS; TD-LEARNING;

EID: 56449091120 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1390156.1390240 Document Type: Conference Paper

Times cited : (229)

References (24)

1
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Proc. 12th Int. Conf. Machine Learning (pp. 30-37).
- (1995) Proc. 12th Int. Conf. Machine Learning , pp. 30-37
- Baird, L.¹

2
- 0003778897
- Springer-Verlag
- Benveniste, A., Métivier, M., & Priouret, P. (1990). Adaptive algorithms and stochastic approximations, vol. 22. Springer-Verlag.
- (1990) Adaptive algorithms and stochastic approximations , vol.22
- Benveniste, A.¹ Métivier, M.² Priouret, P.³

3
- 0003487482
- Athena Scientific
- Bertsekas, D., & Tsitsiklis, J. (1996). Neuro-dynamic programming. Athena Scientific.
- (1996) Neuro-dynamic programming
- Bertsekas, D.¹ Tsitsiklis, J.²

4
- 0031076413
- Stochastic approximation with two time scales
- Borkar, V. (1997). Stochastic approximation with two time scales. Systems & Control Letters, 29, 291-294.
- (1997) Systems & Control Letters , vol.29 , pp. 291-294
- Borkar, V.¹

5
- 0034550848
- A learning algorithm for discrete-time stochastic control
- Borkar, V. (2000). A learning algorithm for discrete-time stochastic control. Probability in the Engineering and Informational Sciences, 14, 243-258.
- (2000) Probability in the Engineering and Informational Sciences , vol.14 , pp. 243-258
- Borkar, V.¹

6
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- de Farias, D., & Van Roy, B. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105, 589-608.
- (2000) Journal of Optimization Theory and Applications , vol.105 , pp. 589-608
- de Farias, D.¹ Van Roy, B.²

7
- 0030487036
- Logarithmic Sobolev inequalities for finite Markov chains
- Diaconis, P., & Saloff-Coste, L. (1996). Logarithmic Sobolev inequalities for finite Markov chains. Annals of Applied Probability, 6, 695-750.
- (1996) Annals of Applied Probability , vol.6 , pp. 695-750
- Diaconis, P.¹ Saloff-Coste, L.²

8
- 0038595393
- Stable function approximation in dynamic programming
- CMU-CS-95-103, School of Computer Science, Carnegie Mellon University
- Gordon, G. (1995). Stable function approximation in dynamic programming (Technical Report CMU-CS-95-103). School of Computer Science, Carnegie Mellon University.
- (1995) Technical Report
- Gordon, G.¹

9
- 57649089060
- λ, Technical Report, CMU Learning Lab Internal Report
- Gordon, G. (1996). Chattering in SARSA(λ). (Technical Report). CMU Learning Lab Internal Report.
- (1996) Chattering in SARSA
- Gordon, G.¹

10
- 0003637131
- Springer-Verlag
- Meyn, S., & Tweedie, R. (1993). Markov chains and stochastic stability. Springer-Verlag.
- (1993) Markov chains and stochastic stability
- Meyn, S.¹ Tweedie, R.²

11
- 0000566364
- Computable bounds for geometric convergence rates of Markov chains
- Meyn, S., & Tweedie, R. (1994). Computable bounds for geometric convergence rates of Markov chains. Annals of Applied Probability, 4, 981-1011.
- (1994) Annals of Applied Probability , vol.4 , pp. 981-1011
- Meyn, S.¹ Tweedie, R.²

12
- 0036832956
- Kernel-based reinforcement learning
- Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161-178.
- (2002) Machine Learning , vol.49 , pp. 161-178
- Ormoneit, D.¹ Sen, S.²

13
- 56449099734
- On the existence of fixed-points for Q-learning and SARSA in partially observable domains
- Perkins, T., & Pendrith, M. (2002). On the existence of fixed-points for Q-learning and SARSA in partially observable domains. Proc. 19th Int. Conf. Machine Learning (pp. 490-497).
- (2002) Proc. 19th Int. Conf. Machine Learning , pp. 490-497
- Perkins, T.¹ Pendrith, M.²

14
- 84898960655
- A convergent form of approximate policy iteration
- Perkins, T., & Precup, D. (2003). A convergent form of approximate policy iteration. Adv. Neural Information Proc. Systems (pp. 1595-1602).
- (2003) Adv. Neural Information Proc. Systems , pp. 1595-1602
- Perkins, T.¹ Precup, D.²

15
- 4644328593
- Off-policy temporal-difference learning with function approximation
- Precup, D., Sutton, R., & Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. Proc. 18th Int. Conf. Machine Learning (pp. 417-424).
- (2001) Proc. 18th Int. Conf. Machine Learning , pp. 417-424
- Precup, D.¹ Sutton, R.² Dasgupta, S.³

16
- 56449114755
- Ribeiro, C., & Szepesvári, C. (1996). Q-learning combined with spreading: Convergence and results. Proc. ISRF-IEE Int. Conf. Intelligent and Cognitive Systems (pp. 32-36).
- Ribeiro, C., & Szepesvári, C. (1996). Q-learning combined with spreading: Convergence and results. Proc. ISRF-IEE Int. Conf. Intelligent and Cognitive Systems (pp. 32-36).

17
- 3042638629
- Quantitative convergence rates of Markov chains: A simple account
- Rosenthal, J. (2002). Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability, 7, 123-128.
- (2002) Electronic Communications in Probability , vol.7 , pp. 123-128
- Rosenthal, J.¹

18
- 85153965130
- Reinforcement learning with soft state aggregation
- Singh, S., Jaakkola, T., & Jordan, M. (1994). Reinforcement learning with soft state aggregation. Adv. Neural Information Proc. Systems (pp. 361-368).
- (1994) Adv. Neural Information Proc. Systems , pp. 361-368
- Singh, S.¹ Jaakkola, T.² Jordan, M.³

19
- 84947807317
- Open theoretical questions in reinforcement learning
- Sutton, R. (1999). Open theoretical questions in reinforcement learning. Lecture Notes in Computer Science, 1572, 11-17.
- (1999) Lecture Notes in Computer Science , vol.1572 , pp. 11-17
- Sutton, R.¹

20
- 14344263882
- Interpolation-based Q-learning
- Szepesvári, C., & Smart, W. (2004). Interpolation-based Q-learning. Proc. 21st Int. Conf. Machine learning (pp. 100-107).
- (2004) Proc. 21st Int. Conf. Machine learning , pp. 100-107
- Szepesvári, C.¹ Smart, W.²

21
- 0035283402
- On the convergence of temporal-difference learning with linear function approximation
- Tadić, V. (2001). On the convergence of temporal-difference learning with linear function approximation. Machine Learning, 42, 241-267.
- (2001) Machine Learning , vol.42 , pp. 241-267
- Tadić, V.¹

22
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J., & Van Roy, B. (1996a). An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control, 42, 674-690.
- (1996) IEEE Trans. Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.¹ Van Roy, B.²

23
- 0029752470
- Feature-based methods for large scale dynamic programming
- Tsitsiklis, J., & Van Roy, B. (1996b). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.¹ Van Roy, B.²

24
- 0004049893
- Doctoral dissertation, King's College, University of Cambridge
- Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College, University of Cambridge.
- (1989) Learning from delayed rewards
- Watkins, C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.