메뉴 건너뛰기




Volumn , Issue , 2002, Pages

Variance reduction techniques for gradient estimates in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ESTIMATION; OPTIMIZATION;

EID: 84898983933     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (7)

References (15)
  • 1
  • 2
    • 0036477347 scopus 로고    scopus 로고
    • Estimation and approximation bounds for gradient-based reinforcement learning
    • To appear
    • P. L. Bartlett and J. Baxter. Estimation and approximation bounds for gradient-based reinforcement learning. Journal of Computer and Systems Sciences, 2002. To appear.
    • (2002) Journal of Computer and Systems Sciences
    • Bartlett, P.L.1    Baxter, J.2
  • 5
    • 0013495368 scopus 로고    scopus 로고
    • Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments
    • J. Baxter, P. L. Bartlett, and L. Weaver. Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments. Journal of Artificial Intelligence Research, 15:351-381, 2001.
    • (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
    • Baxter, J.1    Bartlett, P.L.2    Weaver, L.3
  • 7
    • 84976859194 scopus 로고
    • Likelihood ratio gradient estimation for stochastic systems
    • P. W. Glynn. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33:75-84, 1990.
    • (1990) Communications of the ACM , vol.33 , pp. 75-84
    • Glynn, P.W.1
  • 8
    • 84898983933 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • E. Greensmith, P. L. Bartlett, and J. Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Technical report, ANU, 2002.
    • (2002) Technical Report, ANU
    • Greensmith, E.1    Bartlett, P.L.2    Baxter, J.3
  • 11
    • 0009011171 scopus 로고    scopus 로고
    • Simulation-based optimization of Markov reward processes
    • P. Marbach and J. N. Tsitsiklis. Simulation-Based Optimization of Markov Reward Processes. Technical report, MIT, 1998.
    • (1998) Technical Report, MIT
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 12
    • 0012260708 scopus 로고
    • How to optimize complex stochastic systems from a single sample path by the score function method
    • R. Y. Rubinstein. How to optimize complex stochastic systems from a single sample path by the score function method. Ann. Oper. Res., 27:175-211, 1991.
    • (1991) Ann. Oper. Res. , vol.27 , pp. 175-211
    • Rubinstein, R.Y.1
  • 15
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8:229-256, 1992.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.