메뉴 건너뛰기




Volumn , Issue , 2018, Pages

The mirage of action-dependent baselines in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

DECISION MAKING; GRADIENT METHODS; MACHINE LEARNING; OPEN SOURCE SOFTWARE; OPEN SYSTEMS;

EID: 85083951605     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (128)

References (25)
  • 6
    • 85161982655 scopus 로고    scopus 로고
    • On a connection between importance sampling and the likelihood ratio policy gradient
    • Tang Jie and Pieter Abbeel. On a connection between importance sampling and the likelihood ratio policy gradient. In Advances in Neural Information Processing Systems, pp. 1000–1008, 2010.
    • (2010) Advances in Neural Information Processing Systems , pp. 1000-1008
    • Jie, T.1    Abbeel, P.2
  • 24
    • 84941874233 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Springer
    • Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Reinforcement Learning, pp. 5–32. Springer, 1992.
    • (1992) Reinforcement Learning , pp. 5-32
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.