메뉴 건너뛰기




Volumn 2017-December, Issue , 2017, Pages 5049-5059

Hindsight experience replay

Author keywords

[No Author keywords available]

Indexed keywords

EFFICIENT LEARNING; EXPERIENCE REPLAY; NOVEL TECHNIQUES; OFF POLICIES; PHYSICAL ROBOTS; PHYSICS SIMULATION; PICK AND PLACE;

EID: 85047009130     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (2083)

References (45)
  • 2
    • 33845876447 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization
    • Bakker, B. and Schmidhuber, J. (2004). Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In Proc. of the 8-th Conf. on Intelligent Autonomous Systems, pages 438-445.
    • (2004) Proc. of the 8-th Conf. on Intelligent Autonomous Systems , pp. 438-445
    • Bakker, B.1    Schmidhuber, J.2
  • 5
    • 1942470793 scopus 로고    scopus 로고
    • Multitask learning
    • Springer
    • Caruana, R. (1998). Multitask learning. In Learning to learn, pages 95-133. Springer.
    • (1998) Learning to Learn , pp. 95-133
    • Caruana, R.1
  • 9
    • 0027636611 scopus 로고
    • Learning and development in neural networks: The importance of starting small
    • Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition, 48(1): 71-99.
    • (1993) Cognition , vol.48 , Issue.1 , pp. 71-99
    • Elman, J.L.1
  • 10
    • 0036832959 scopus 로고    scopus 로고
    • Structure in the space of value functions
    • Foster, D. and Dayan, P. (2002). Structure in the space of value functions. Machine Learning, 49(2): 325-346.
    • (2002) Machine Learning , vol.49 , Issue.2 , pp. 325-346
    • Foster, D.1    Dayan, P.2
  • 17
    • 84868358933 scopus 로고    scopus 로고
    • Reinforcement learning to adjust parametrized motor primitives to new situations
    • Kober, J., Wilhelm, A., Oztop, E., and Peters, J. (2012). Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 33(4): 361-379.
    • (2012) Autonomous Robots , vol.33 , Issue.4 , pp. 361-379
    • Kober, J.1    Wilhelm, A.2    Oztop, E.3    Peters, J.4
  • 21
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3-4): 293-321.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 293-321
    • Lin, L.-J.1
  • 25
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and application to reward shaping
    • Ng, A. Y., Harada, D., and Russell, S. (1999). Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, Volume 99, pages 278-287.
    • (1999) ICML , vol.99 , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.3
  • 28
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • Peters, J. and Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients. Neural networks, 21(4): 682-697.
    • (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 33
    • 1642328943 scopus 로고    scopus 로고
    • Optimal ordered problem solver
    • Schmidhuber, J. (2004). Optimal ordered problem solver. Machine Learning, 54(3): 211-254.
    • (2004) Machine Learning , vol.54 , Issue.3 , pp. 211-254
    • Schmidhuber, J.1
  • 34
    • 84906338085 scopus 로고    scopus 로고
    • Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem
    • Schmidhuber, J. (2013). Powerplay: Training an increasingly general problem solver by continually searching for the simplest still unsolvable problem. Frontiers in psychology, 4.
    • (2013) Frontiers in Psychology , pp. 4
    • Schmidhuber, J.1
  • 40
    • 84899464022 scopus 로고    scopus 로고
    • Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
    • International Foundation for Autonomous Agents and Multiagent Systems
    • Sutton, R. S., Modayil, J., Delp, M., Degris, T., and Pilarski, P. M., White, A., and Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pages 761-768. International Foundation for Autonomous Agents and Multiagent Systems.
    • (2011) The 10th International Conference on Autonomous Agents and Multiagent Systems , vol.2 , pp. 761-768
    • Sutton, R.S.1    Modayil, J.2    Delp, M.3    Degris, T.4    Pilarski, P.M.5    White, A.6    Precup, D.7


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.