메뉴 건너뛰기




Volumn 7, Issue , 2006, Pages 771-791

Policy gradient in continuous time published

Author keywords

Gradient estimate; Likelihood ratio method; Optimization; Pathwise derivation

Indexed keywords

APPROXIMATION THEORY; DECISION MAKING; OPTIMIZATION; PARAMETER ESTIMATION; PROBLEM SOLVING; SEARCH ENGINES;

EID: 33646399442     PISSN: 15337928     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (75)

References (19)
  • 2
    • 0003981935 scopus 로고
    • Wiley/Gauthier-Villars Series in Modern Applied Mathematics. John Wiley & Sons Ltd., Chichester, Translated from the French by C. Tomson.
    • A. Bensoussan. Perturbation methods in optimal control. Wiley/Gauthier-Villars Series in Modern Applied Mathematics. John Wiley & Sons Ltd., Chichester, 1988. Translated from the French by C. Tomson.
    • (1988) Perturbation Methods in Optimal Control
    • Bensoussan, A.1
  • 4
    • 0023543886 scopus 로고
    • Likelihood ratio gradient estimation: An overview
    • A. Thesen, H. Grant, and W. D. Kelton, editors
    • P. W. Glynn. Likelihood ratio gradient estimation: an overview. In A. Thesen, H. Grant, and W. D. Kelton, editors, Proceedings of the 1987 Winter Simulation Conference, pages 366-375, 1987.
    • (1987) Proceedings of the 1987 Winter Simulation Conference , pp. 366-375
    • Glynn, P.W.1
  • 5
    • 27644539475 scopus 로고    scopus 로고
    • Sensitivity analysis using Itô-Malliavin calculus and martingales, application to stochastic optimal control
    • E. Gobet and R. Munos. Sensitivity analysis using Itô-Malliavin calculus and martingales, application to stochastic optimal control. SIAM journal on Control and Optimization, 43(5): 1676-1713, 2005.
    • (2005) SIAM Journal on Control and Optimization , vol.43 , Issue.5 , pp. 1676-1713
    • Gobet, E.1    Munos, R.2
  • 12
    • 0037288469 scopus 로고    scopus 로고
    • Approximate gradient methods in policy-space optimization of Markov reward processes
    • P. Marbach and J. N. Tsitsiklis. Approximate gradient methods in policy-space optimization of Markov reward processes. Journal of Discrete Event Dynamical Systems, 13:111-148, 2003.
    • (2003) Journal of Discrete Event Dynamical Systems , vol.13 , pp. 111-148
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 14
    • 0022906632 scopus 로고
    • Sensitivity analysis via likelihood ratios
    • J. Wilson, J. Henriksen, and S. Roberts, editors
    • M. I. Reiman and A. Weiss. Sensitivity analysis via likelihood ratios. In J. Wilson, J. Henriksen, and S. Roberts, editors, Proceedings of the 1986 Winter Simulation Conference, pages 285-289, 1986.
    • (1986) Proceedings of the 1986 Winter Simulation Conference , pp. 285-289
    • Reiman, M.I.1    Weiss, A.2
  • 15
    • 0008321896 scopus 로고    scopus 로고
    • Reinforcement learning: An introduction
    • R. S. Sutton and A. G. Barto. Reinforcement learning: An introduction. Bradford Book, 1998.
    • (1998) Bradford Book
    • Sutton, R.S.1    Barto, A.G.2
  • 17
    • 0030522124 scopus 로고    scopus 로고
    • A new look at independence
    • M. Talagrand. A new look at independence. Annals of Probability, 24:1-34, 1996,
    • (1996) Annals of Probability , vol.24 , pp. 1-34
    • Talagrand, M.1
  • 18
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 19
    • 0026221552 scopus 로고
    • A Monte Carlo method for sensitivity analysis and parametric optimization of nonlinear stochastic systems
    • J. Yang and H. J. Kushner. A Monte Carlo method for sensitivity analysis and parametric optimization of nonlinear stochastic systems. SIAM J. Control Optim., 29(5): 1216-1249, 1991.
    • (1991) SIAM J. Control Optim. , vol.29 , Issue.5 , pp. 1216-1249
    • Yang, J.1    Kushner, H.J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.