메뉴 건너뛰기




Volumn 7, Issue , 2006, Pages 413-427

Geometric variance reduction in Markov chains: Application to value function and gradient estimation

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; ERROR ANALYSIS; ITERATIVE METHODS; MONTE CARLO METHODS; OPTIMIZATION; PARAMETER ESTIMATION; SEQUENTIAL CIRCUITS; TRAJECTORIES; VALUE ENGINEERING;

EID: 33646384929     PISSN: 15337928     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (14)

References (18)
  • 3
    • 0023543886 scopus 로고
    • Likelihood ratio gradient estimation: An overview
    • A. Thesen, H. Grant, and W. D. Kelton, editors
    • P. W. Glynn. Likelihood ratio gradient estimation: an overview. In A. Thesen, H. Grant, and W. D. Kelton, editors, Proceedings of the 1987 Winter Simulation Conference, pages 366-375, 1987.
    • (1987) Proceedings of the 1987 Winter Simulation Conference , pp. 366-375
    • Glynn, P.W.1
  • 4
    • 29344452689 scopus 로고    scopus 로고
    • Sequential control variates for functionals of Markov processes
    • E. Gobet and S. Maire. Sequential control variates for functionals of Markov processes. SIAM Journal on Numerical Analysis, 43(3): 1256-1275, 2005.
    • (2005) SIAM Journal on Numerical Analysis , vol.43 , Issue.3 , pp. 1256-1275
    • Gobet, E.1    Maire, S.2
  • 5
    • 84897694817 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • E. Greensmith, P. L. Bartlett, and J. Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5:1471-1530, 2005.
    • (2005) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
    • Greensmith, E.1    Bartlett, P.L.2    Baxter, J.3
  • 6
    • 0014705837 scopus 로고
    • A retrospective and prospective survey of the Monte-Carlo method
    • J. H. Halton. A retrospective and prospective survey of the Monte-Carlo method. SIAM Review, 12 (1):1-63, 1970.
    • (1970) SIAM Review , vol.12 , Issue.1 , pp. 1-63
    • Halton, J.H.1
  • 7
    • 0028458133 scopus 로고
    • Sequential Monte-Carlo techniques for the solution of linear systems
    • J. H. Halton. Sequential Monte-Carlo techniques for the solution of linear systems. Journal of Scientific Computing, 9:213-257, 1994.
    • (1994) Journal of Scientific Computing , vol.9 , pp. 213-257
    • Halton, J.H.1
  • 11
    • 0343893613 scopus 로고    scopus 로고
    • Actor-critic-type learning algorithms for Markov decision processes
    • V. R. Konda and V. S. Borkar. Actor-critic-type learning algorithms for Markov decision processes. SIAM Journal of Control and Optimization, 38:1:94-123, 1999.
    • (1999) SIAM Journal of Control and Optimization , vol.38 , Issue.1 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 12
    • 0042020169 scopus 로고    scopus 로고
    • An iterative computation of approximations on Korobov-like spaces
    • S. Maire. An iterative computation of approximations on Korobov-like spaces. J. Comput. Appl. Math., 54(6):261-281, 2003.
    • (2003) J. Comput. Appl. Math. , vol.54 , Issue.6 , pp. 261-281
    • Maire, S.1
  • 13
    • 0037288469 scopus 로고    scopus 로고
    • Approximate gradient methods in policy-space optimization of Markov reward processes
    • P. Marbach and J. N. Tsitsiklis. Approximate gradient methods in policy-space optimization of Markov reward processes. Journal of Discrete Event Dynamical Systems, 13:111-148, 2003.
    • (2003) Journal of Discrete Event Dynamical Systems , vol.13 , pp. 111-148
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 14
    • 0022906632 scopus 로고
    • Sensitivity analysis via likelihood ratios
    • J. Wilson, J. Henriksen, and S. Roberts, editors
    • M. I. Reiman and A. Weiss. Sensitivity analysis via likelihood ratios. In J. Wilson, J. Henriksen, and S. Roberts, editors, Proceedings of the 1986 Winter Simulation Conference, pages 285-289, 1986.
    • (1986) Proceedings of the 1986 Winter Simulation Conference , pp. 285-289
    • Reiman, M.I.1    Weiss, A.2
  • 15
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • MIT Press
    • R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Neural Information Processing Systems. MIT Press, pages 1057-1063, 2000.
    • (2000) Neural Information Processing Systems , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 17
    • 84887252594 scopus 로고    scopus 로고
    • Support vector method for function approximation, regression estimation and signal processing
    • V. Vapnik, S. E. Golowich, and A. Smola. Support vector method for function approximation, regression estimation and signal processing. In Advances in Neural Information Processing Systems, pages 281-281, 1997.
    • (1997) Advances in Neural Information Processing Systems , pp. 281-281
    • Vapnik, V.1    Golowich, S.E.2    Smola, A.3
  • 18
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.