메뉴 건너뛰기




Volumn , Issue , 2008, Pages 525-531

On step sizes, stochastic shortest paths, and survival probabilities in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

AVERAGE REWARDS; DOWNSIDE RISKS; EMPIRICAL STUDIES; FUNCTION APPROXIMATIONS; LEARNING RULES; MARKOV DECISION PROCESS; NUMBER OF STATE; SIMULATION-BASED; STEP SIZES; STOCHASTIC SHORTEST PATHS; SURVIVAL PROBABILITIES; TRANSITION PROBABILITIES;

EID: 60749124483     PISSN: 08917736     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/WSC.2008.4736109     Document Type: Conference Paper
Times cited : (12)

References (20)
  • 2
    • 0036577013 scopus 로고    scopus 로고
    • Q-learning for risk-sensitive control
    • Borkar, V. 2002. Q-learning for risk-sensitive control. Mathematics of Operations Research 27(2):294-311.
    • (2002) Mathematics of Operations Research , vol.27 , Issue.2 , pp. 294-311
    • Borkar, V.1
  • 4
    • 0033876515 scopus 로고    scopus 로고
    • The ODE method for convergence of stochastic approximation and reinforcement learning
    • Borkar, V. S., and S. Meyn. 2000. The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal of Control and Optimization 38 (2):447-469.
    • (2000) SIAM Journal of Control and Optimization , vol.38 , Issue.2 , pp. 447-469
    • Borkar, V.S.1    Meyn, S.2
  • 7
    • 31144477417 scopus 로고    scopus 로고
    • Risk-sensitive reinforcement learning applied to control under constraints
    • Geibel, P., and F. Wysotzki. 2005. Risk-sensitive reinforcement learning applied to control under constraints. Journal of Artificial Intelligence Research 24:81-108.
    • (2005) Journal of Artificial Intelligence Research , vol.24 , pp. 81-108
    • Geibel, P.1    Wysotzki, F.2
  • 8
    • 2342446663 scopus 로고    scopus 로고
    • A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis
    • Gosavi, A. 2004a. A reinforcement learning algorithm based on policy iteration for average reward: Empirical results with yield management and convergence analysis. Machine Learning 55(l):5-29.
    • (2004) Machine Learning , vol.55 , Issue.L , pp. 5-29
    • Gosavi, A.1
  • 9
    • 0742319170 scopus 로고    scopus 로고
    • Reinforcement learning for long-run average cost
    • Gosavi, A. 2004b. Reinforcement learning for long-run average cost. European Journal of Operational Research 155:654-674.
    • (2004) European Journal of Operational Research , vol.155 , pp. 654-674
    • Gosavi, A.1
  • 10
    • 33745482844 scopus 로고    scopus 로고
    • A risk-sensitive approach to total productive maintenance
    • Gosavi, A. 2006. A risk-sensitive approach to total productive maintenance. Automatica 42:1321-1330.
    • (2006) Automatica , vol.42 , pp. 1321-1330
    • Gosavi, A.1
  • 11
    • 60749096250 scopus 로고    scopus 로고
    • Gosavi, A. 2008. Markov decision processes subject to semivariance risk. Working paper at University at Buffalo, SUNY, ISE Department.
    • Gosavi, A. 2008. Markov decision processes subject to semivariance risk. Working paper at University at Buffalo, SUNY, ISE Department.
  • 12
    • 60749112025 scopus 로고    scopus 로고
    • Gosavi, A., and S. Meyn. 2008. The variance-penalized Bellman equation. Working paper at SUNY Buffalo and University of Illinois at Urbana-Champaign.
    • Gosavi, A., and S. Meyn. 2008. The variance-penalized Bellman equation. Working paper at SUNY Buffalo and University of Illinois at Urbana-Champaign.
  • 15
    • 84995186518 scopus 로고
    • Portfolio selection
    • Markowitz, H. 1952. Portfolio selection. Journal of Finance 7(1):77-91.
    • (1952) Journal of Finance , vol.7 , Issue.1 , pp. 77-91
    • Markowitz, H.1
  • 20
    • 0004049893 scopus 로고
    • May, Ph. D. thesis, Kings College, Cambridge, England
    • Watkins, C. 1989, May. Learning from delayed rewards. Ph. D. thesis, Kings College, Cambridge, England.
    • (1989) Learning from delayed rewards
    • Watkins, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.