메뉴 건너뛰기




Volumn , Issue , 2017, Pages

Q-PrOP: Sample-efficient policy gradient with an off-policy critic

Author keywords

[No Author keywords available]

Indexed keywords

EFFICIENCY; GRADIENT METHODS; LEARNING ALGORITHMS; MONTE CARLO METHODS; REINFORCEMENT LEARNING;

EID: 85041942380     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (175)

References (38)
  • 7
    • 84897694817 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • Nov
    • Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471-1530, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
    • Greensmith, E.1    Bartlett, P.L.2    Baxter, J.3
  • 11
    • 33646243319 scopus 로고    scopus 로고
    • A natural policy gradient
    • Sham Kakade. A natural policy gradient. In NIPS, volume 14, pp. 1531-1538, 2001.
    • (2001) NIPS , vol.14 , pp. 1531-1538
    • Kakade, S.1
  • 23
    • 85167411371 scopus 로고    scopus 로고
    • Relative entropy policy search
    • Atlanta
    • Jan Peters, Katharina Mülling, and Yasemin Altun. Relative entropy policy search. In AAAI. Atlanta, 2010.
    • (2010) AAAI
    • Peters, J.1    Mülling, K.2    Altun, Y.3
  • 25
    • 0004020933 scopus 로고    scopus 로고
    • Burlington, MA: Elsevier
    • Sheldon M Ross. Simulation. Burlington, MA: Elsevier, 2006.
    • (2006) Simulation
    • Ross, S.M.1
  • 30
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In International Conference on Machine Learning (ICML), pp. 216-224, 1990.
    • (1990) International Conference on Machine Learning (ICML) , pp. 216-224
    • Sutton, R.S.1
  • 34
    • 85035116867 scopus 로고    scopus 로고
    • Bias in natural actor-critic algorithms
    • Philip Thomas. Bias in natural actor-critic algorithms. In ICML, pp. 441-448, 2014.
    • (2014) ICML , pp. 441-448
    • Thomas, P.1
  • 38
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.