메뉴 건너뛰기




Volumn , Issue , 2017, Pages

Combining policy gradient and q-learning

Author keywords

[No Author keywords available]

Indexed keywords

REINFORCEMENT LEARNING;

EID: 85088228567     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (90)

References (46)
  • 1
    • 0000396062 scopus 로고    scopus 로고
    • Natural gradient works efficiently in learning
    • Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251-276, 1998.
    • (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
    • Amari, S.-I.1
  • 4
    • 0004370245 scopus 로고
    • Technical Wright-Patterson Air Force Base Ohio: Wright Laboratory
    • Leemon C Baird III. Advantage updating. Technical Report WL-TR-93-1146, Wright-Patterson Air Force Base Ohio: Wright Laboratory, 1993.
    • (1993) Advantage Updating
    • Baird, L.C.1
  • 12
    • 84910681911 scopus 로고    scopus 로고
    • Actor-critic reinforcement learning with energy-based policies
    • Nicolas Heess, David Silver, and Yee Whye Teh. Actor-critic reinforcement learning with energy-based policies. In JMLR: Workshop and Conference Proceedings 24, pp. 43-57, 2012.
    • (2012) JMLR: Workshop and Conference Proceedings , vol.24 , pp. 43-57
    • Heess, N.1    Silver, D.2    Teh, Y.W.3
  • 19
    • 0003673017 scopus 로고
    • Reinforcement learning for robots using neural networks
    • Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993.
    • (1993) Technical Report, DTIC Document
    • Lin, L.-J.1
  • 26
    • 0000955979 scopus 로고    scopus 로고
    • Incremental multi-step Q-learning
    • Jing Peng and Ronald J Williams. Incremental multi-step Q-learning. Machine Learning, 22(1-3): 283-290, 1996.
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 283-290
    • Peng, J.1    Williams, R.J.2
  • 27
    • 85167411371 scopus 로고    scopus 로고
    • Relative entropy policy search
    • Atlanta
    • Jan Peters, Katharina Mülling, and Yasemin Altun. Relative entropy policy search. In AAAI. Atlanta, 2010.
    • (2010) AAAI
    • Peters, J.1    Mülling, K.2    Altun, Y.3
  • 28
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
    • Springer Berlin Heidelberg
    • Martin Riedmiller. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005, pp. 317-328. Springer Berlin Heidelberg, 2005.
    • (2005) Machine Learning: ECML 2005 , pp. 317-328
    • Riedmiller, M.1
  • 30
    • 32844474095 scopus 로고    scopus 로고
    • Reinforcement learning with factored states and actions
    • Aug
    • Brian Sallans and Geoffrey E Hinton. Reinforcement learning with factored states and actions. Journal of Machine Learning Research, 5(Aug):1063-1088, 2004.
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1063-1088
    • Sallans, B.1    Hinton, G.E.2
  • 36
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3 (1):9-44, 1988.
    • (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 38
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38 (3):58-68, 1995.
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 45
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
    • Williams, R.J.1
  • 46
    • 0041154467 scopus 로고
    • Function optimization using connectionist reinforcement learning algorithms
    • Ronald J Williams and Jing Peng. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 3(3):241-268, 1991.
    • (1991) Connection Science , vol.3 , Issue.3 , pp. 241-268
    • Williams, R.J.1    Peng, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.