SCOPUS 정보 검색 플랫폼

Volumn , Issue , 2007, Pages 272-279

Reinforcement learning in continuous action spaces

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; COMPUTATIONAL METHODS; ENGINEERING RESEARCH;

CONTINUOUS ACTOR CRITIC LEARNING AUTOMATON (CACLA); GAUSSIAN EXPLORATION;

REINFORCEMENT LEARNING;

EID: 34548807200 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ADPRL.2007.368199 Document Type: Conference Paper

Times cited : (200)

References (12)

1
- 0004102479
- The MIT press, Cambridge MA, A Bradford Book
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. The MIT press, Cambridge MA, A Bradford Book, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

2
- 33646413135
- J. Peters, S. Vijayakumar, and S. Schaal, Natural actor-critic, in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., 3720. Springer, 2005, pp. 280-291.
- J. Peters, S. Vijayakumar, and S. Schaal, "Natural actor-critic," in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., vol. 3720. Springer, 2005, pp. 280-291.

3
- 33646398129
- M. Riedmiller, Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method, in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., 3720. Springer, 2005, pp. 317-328.
- M. Riedmiller, "Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method," in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., vol. 3720. Springer, 2005, pp. 317-328.

4
- 0003787146
- Princeton University Press
- R. E. Bellman, Dynamic Programming. Princeton University Press., 1957.
- (1957) Dynamic Programming
- Bellman, R.E.¹

5
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

6
- 0000430514
- The convergence of TD(λ) for general lambda
- P. Dayan, "The convergence of TD(λ) for general lambda," Machine Learning, vol. 8, pp. 341-362, 1992.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.¹

8
- 0003636089
- Cambridge University, UK, Tech. Rep. CUED/F-INFENG-TR 166
- G. Rummery and M. Niranjan, "On-line Q-learning using connectionist systems," Cambridge University, UK, Tech. Rep. CUED/F-INFENG-TR 166, 1994.
- (1994) On-line Q-learning using connectionist systems
- Rummery, G.¹ Niranjan, M.²

10
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-dynamic Programming. Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

11
- 0003477315
- Wright-Patterson Air Force Base Ohio: Wright Laboratory, Tech. Rep. WL-TR-93-1147, Online, Available
- L. C. Baird and A. H. Klopf, "Reinforcement learning with high-dimensional, continuous actions," Wright-Patterson Air Force Base Ohio: Wright Laboratory, Tech. Rep. WL-TR-93-1147, 1993. [Online]. Available: http://leemon.eom/papers/index.html#b93b
- (1993) Reinforcement learning with high-dimensional, continuous actions
- Baird, L.C.¹ Klopf, A.H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.