메뉴 건너뛰기




Volumn , Issue , 2007, Pages 272-279

Reinforcement learning in continuous action spaces

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; COMPUTATIONAL METHODS; ENGINEERING RESEARCH;

EID: 34548807200     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ADPRL.2007.368199     Document Type: Conference Paper
Times cited : (200)

References (12)
  • 2
    • 33646413135 scopus 로고    scopus 로고
    • J. Peters, S. Vijayakumar, and S. Schaal, Natural actor-critic, in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., 3720. Springer, 2005, pp. 280-291.
    • J. Peters, S. Vijayakumar, and S. Schaal, "Natural actor-critic," in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., vol. 3720. Springer, 2005, pp. 280-291.
  • 3
    • 33646398129 scopus 로고    scopus 로고
    • M. Riedmiller, Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method, in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., 3720. Springer, 2005, pp. 317-328.
    • M. Riedmiller, "Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method," in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., vol. 3720. Springer, 2005, pp. 317-328.
  • 5
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 6
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general lambda
    • P. Dayan, "The convergence of TD(λ) for general lambda," Machine Learning, vol. 8, pp. 341-362, 1992.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 7
    • 0004049893 scopus 로고
    • Learning from delayed rewards,
    • Ph.D. dissertation, King's College, Cambridge, England
    • C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D. dissertation, King's College, Cambridge, England, 1989.
    • (1989)
    • Watkins, C.J.C.H.1
  • 9
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press, Cambridge MA
    • R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press, Cambridge MA, 1996, pp. 1038-1045.
    • (1996) Advances in Neural Information Processing Systems 8 , pp. 1038-1045
    • Sutton, R.S.1
  • 11
    • 0003477315 scopus 로고
    • Wright-Patterson Air Force Base Ohio: Wright Laboratory, Tech. Rep. WL-TR-93-1147, Online, Available
    • L. C. Baird and A. H. Klopf, "Reinforcement learning with high-dimensional, continuous actions," Wright-Patterson Air Force Base Ohio: Wright Laboratory, Tech. Rep. WL-TR-93-1147, 1993. [Online]. Available: http://leemon.eom/papers/index.html#b93b
    • (1993) Reinforcement learning with high-dimensional, continuous actions
    • Baird, L.C.1    Klopf, A.H.2
  • 12
    • 0031236002 scopus 로고    scopus 로고
    • Adaptive critic designs
    • September, Online, Available
    • D. V. Prokhorov and D. C. Wunsch II, "Adaptive critic designs," IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 997-1007, September 1997. [Online]. Available: citeseer.csail.mit.edu/prokhorov97adaptive. html
    • (1997) IEEE Transactions on Neural Networks , vol.8 , Issue.5 , pp. 997-1007
    • Prokhorov, D.V.1    Wunsch II, D.C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.