메뉴 건너뛰기




Volumn 2017-December, Issue , 2017, Pages 3847-3856

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

MERGING; REINFORCEMENT LEARNING;

EID: 85047014445     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (134)

References (31)
  • 5
    • 85041942380 scopus 로고    scopus 로고
    • Q-prop: Sample-efficient policy gradient with an off-policy critic
    • Gu, Shixiang, Lillicrap, Timothy, Ghahramani, Zoubin, Turner, Richard E, and Levine, Sergey. Q-prop: Sample-efficient policy gradient with an off-policy critic. ICLR, 2017.
    • (2017) ICLR
    • Gu, S.1    Lillicrap, T.2    Ghahramani, Z.3    Turner, R.E.4    Levine, S.5
  • 7
    • 85046992284 scopus 로고    scopus 로고
    • Doubly robust off-policy value evaluation for reinforcement learning
    • Jiang, Nan and Li, Lihong. Doubly robust off-policy value evaluation for reinforcement learning. In International Conference on Machine Learning, pp. 652-661, 2016.
    • (2016) International Conference on Machine Learning , pp. 652-661
    • Jiang, N.1    Li, L.2
  • 8
    • 85161982655 scopus 로고    scopus 로고
    • On a connection between importance sampling and the likelihood ratio policy gradient
    • Jie, Tang and Abbeel, Pieter. On a connection between importance sampling and the likelihood ratio policy gradient. In Advances in Neural Information Processing Systems, pp. 1000-1008, 2010.
    • (2010) Advances in Neural Information Processing Systems , pp. 1000-1008
    • Jie, T.1    Abbeel, P.2
  • 16
    • 85088228567 scopus 로고    scopus 로고
    • Pgq: Combining policy gradient and q-learning
    • O'Donoghue, Brendan, Munos, Remi, Kavukcuoglu, Koray, and Mnih, Volodymyr. Pgq: Combining policy gradient and q-learning. ICLR, 2017.
    • (2017) ICLR
    • O'Donoghue, B.1    Munos, R.2    Kavukcuoglu, K.3    Mnih, V.4
  • 20
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method
    • Springer
    • Riedmiller, Martin. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In European Conference on Machine Learning, pp. 317-328. Springer, 2005.
    • (2005) European Conference on Machine Learning , pp. 317-328
    • Riedmiller, M.1
  • 21
    • 0004020933 scopus 로고    scopus 로고
    • Burlington, MA: Elsevier
    • Ross, Sheldon M. Simulation. Burlington, MA: Elsevier, 2006.
    • (2006) Simulation
    • Ross, S.M.1
  • 27
    • 85035116867 scopus 로고    scopus 로고
    • Bias in natural actor-critic algorithms
    • Thomas, Philip. Bias in natural actor-critic algorithms. In ICML, pp. 441-448, 2014.
    • (2014) ICML , pp. 441-448
    • Thomas, P.1
  • 28
    • 85018438849 scopus 로고    scopus 로고
    • Data-efficient off-policy policy evaluation for reinforcement learning
    • Thomas, Philip and Brunskill, Emma. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pp. 2139-2148, 2016.
    • (2016) International Conference on Machine Learning , pp. 2139-2148
    • Thomas, P.1    Brunskill, E.2
  • 31
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.