메뉴 건너뛰기




Volumn , Issue , 2000, Pages 1057-1063

Policy gradient methods for reinforcement learning with function approximation

Author keywords

[No Author keywords available]

Indexed keywords

GRADIENT METHODS;

EID: 84898939480     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (5466)

References (20)
  • 2
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • Morgan Kaufmann
    • Baird' L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. Proc. of the Twelfth Int. Conf. on Machine Learning' pp. 30-37. Morgan Kaufmann
    • (1995) Proc. of the Twelfth Int. Conf. on Machine Learning , pp. 30-37
    • Baird, L.C.1
  • 3
    • 84898958374 scopus 로고    scopus 로고
    • Gradient descent for general reinforcement learning
    • MIT Press
    • Morgan Kaufmann. Baird' L. C' Moore' A. W. (1999). Gradient descent for general reinforcement learning. NIPS 11. MIT Press.
    • (1999) NIPS , vol.11
    • Baird, L.C.1    Moore, A.W.2
  • 7
    • 0031258478 scopus 로고    scopus 로고
    • Perturbation realization' potentials' and sensitivity analysis of Markov Processes
    • Cao' X.-R.' Chen' H.-F. (1997). Perturbation realization' potentials' and sensitivity analysis of Markov Processes' IEEE Trans' on Automatic Control 42(10)=1382-1393.
    • (1997) IEEE Trans' on Automatic Control , vol.42 , Issue.10 , pp. 1382-1393
    • Cao, X.-R.1    Chen, H.-F.2
  • 8
    • 0346554800 scopus 로고
    • Reinforcement comparison
    • In D. S. Touretzky' J. L. Elman' T. J. Sejnowski' and G. E. Hinton (eds.)' Morgan Kaufmann
    • Dayan' P. (1991). Reinforcement comparison. In D. S. Touretzky' J. L. Elman' T. J. Sejnowski' and G. E. Hinton (eds.)' Connectionist Models: Proceedings of the 1990 Summer School' pp. 45-51. Morgan Kaufmann.
    • (1991) Connectionist Models: Proceedings of the 1990 Summer School , pp. 45-51
    • Dayan, P.1
  • 11
    • 85153938292 scopus 로고
    • Reinforcement learning algorithms for partially observable Markov decision problems
    • Morgan Kaufman
    • Jaakkola' T.' Singh' S. P.' Jordan' M. I. (1995) Reinforcement learning algorithms for partially observable Markov decision problems' NIPS 7' pp. 345-352. Morgan Kaufman.
    • (1995) NIPS , vol.7 , pp. 345-352
    • Jaakkola, T.1    Singh, S.P.2    Jordan, M.I.3
  • 12
    • 0008336447 scopus 로고    scopus 로고
    • An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
    • Kimura' H.' Kobayashi' S. (1998). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. Proc. ICML-98' pp. 278-286.
    • (1998) Proc. ICML-98 , pp. 278-286
    • Kimura, H.1    Kobayashi, S.2
  • 14
    • 0009011171 scopus 로고    scopus 로고
    • Simulation-based optimization of Markov reward processes
    • Massachusetts Institute of Technology
    • Mar bach' P.' Tsitsiklis' J. N. (1998) Simulation-based optimization of Markov reward processes' technical report LIDS-P-2411' Massachusetts Institute of Technology.
    • (1998) Technical Report LIDS-P-2411
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 15
    • 2142812536 scopus 로고
    • Learning without state-estimation in partially observable Markovian decision problems
    • Singh' S. P.' Jaakkola' T.' Jordan' M. I. (1994). Learning without state-estimation in partially observable Markovian decision problems. Proc. I CML-94' pp. 284-292.
    • (1994) Proc. i CML-94 , pp. 284-292
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 18
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • Tsitsiklis' J. N. Van Roy' B. (1996). Feature-based methods for large scale dynamic programming. Machine Learning 22:59-94.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 19
    • 0003890455 scopus 로고
    • Toward a theory of reinforcement-learning connectionist systems
    • Northeastern University' College of Computer Science
    • Williams' R. J. (1988). Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3' Northeastern University' College of Computer Science.
    • (1988) Technical Report NU-CCS-88-3
    • Williams, R.J.1
  • 20
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams' R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 5:229-256.
    • (1992) Machine Learning , vol.5 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.