메뉴 건너뛰기




Volumn , Issue , 2017, Pages

Reinforcement learning with unsupervised auxiliary tasks

Author keywords

[No Author keywords available]

Indexed keywords

DEEP LEARNING; INTELLIGENT AGENTS; MACHINE LEARNING;

EID: 85088229768     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (530)

References (34)
  • 3
    • 85070993309 scopus 로고    scopus 로고
    • contributors
    • OpenArena contributors. The openarena manual. 2005. URL http://openarena.wikia.com/wiki/Manual.
    • (2005) The Openarena Manual
  • 4
    • 0001158047 scopus 로고
    • Improving generalization for temporal difference learning: The successor representation
    • Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613-624, 1993.
    • (1993) Neural Computation , vol.5 , Issue.4 , pp. 613-624
    • Dayan, P.1
  • 5
    • 0034293152 scopus 로고    scopus 로고
    • Learning to forget: Continual prediction with lstm
    • Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm. Neural computation, 12(10):2451-2471, 2000.
    • (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
    • Gers, F.A.1    Schmidhuber, J.2    Cummins, F.3
  • 6
    • 85071026926 scopus 로고    scopus 로고
    • Id software
    • id software. Quake3. 1999. URL https://github.com/id-Software/Quake-III-Arena.
    • (1999) Quake3
  • 8
    • 80055032021 scopus 로고    scopus 로고
    • Skill discovery in continuous reinforcement learning domains using skill chaining
    • George Konidaris and Andre S Barreto. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems, pp. 1015-1023, 2009.
    • (2009) Advances in Neural Information Processing Systems , pp. 1015-1023
    • Konidaris, G.1    Barreto, A.S.2
  • 19
    • 84937060789 scopus 로고    scopus 로고
    • Hip-pocampal place cells construct reward related sequences through unexplored space
    • H Freyja Olafsdottir, Caswell Barry, Aman B Saleem, Demis Hassabis, and Hugo J Spiers. Hip-pocampal place cells construct reward related sequences through unexplored space. Elife, 4: e06063, 2015.
    • (2015) Elife , vol.4
    • Freyja Olafsdottir, H.1    Barry, C.2    Saleem, A.B.3    Hassabis, D.4    Spiers, H.J.5
  • 20
    • 0000955979 scopus 로고    scopus 로고
    • Incremental multi-step q-learning
    • Jing Peng and Ronald J Williams. Incremental multi-step q-learning. Machine Learning, 22(1-3): 283-290, 1996.
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 283-290
    • Peng, J.1    Williams, R.J.2
  • 24
    • 77956578648 scopus 로고    scopus 로고
    • Formal theory of creativity, fun, and intrinsic motivation (1990-2010)
    • Jürgen Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230-247, 2010.
    • (2010) IEEE Transactions on Autonomous Mental Development , vol.2 , Issue.3 , pp. 230-247
    • Schmidhuber, J.1
  • 27
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, pp. 1057-1063, 1999a.
    • (1999) NIPS , vol.99 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.A.2    Singh, S.P.3    Mansour, Y.4
  • 28
    • 0033170372 scopus 로고    scopus 로고
    • Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
    • Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 1999b.
    • (1999) Artificial Intelligence
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 29
    • 84899464022 scopus 로고    scopus 로고
    • Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
    • International Foundation for Autonomous Agents and Multiagent Systems
    • Richard S Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M Pilarski, Adam White, and Doina Precup. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 761-768. International Foundation for Autonomous Agents and Multiagent Systems, 2011.
    • (2011) The 10th International Conference on Autonomous Agents and Multiagent Systems- , vol.2 , pp. 761-768
    • Sutton, R.S.1    Modayil, J.2    Delp, M.3    Degris, T.4    Pilarski, P.M.5    White, A.6    Precup, D.7


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.