메뉴 건너뛰기




Volumn 14, Issue 2, 2000, Pages 243-258

A learning algorithm for discrete-time stochastic control

Author keywords

[No Author keywords available]

Indexed keywords

REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

EID: 0034550848     PISSN: 02699648     EISSN: None     Source Type: Journal    
DOI: 10.1017/s0269964800142081     Document Type: Article
Times cited : (13)

References (23)
  • 1
    • 0004030716 scopus 로고    scopus 로고
    • Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms
    • Laboratory for Information and Decision Systems, MIT, Cambridge, MA
    • Abounady, J., Bertsekas, D., & Borkar, V.S. (1998). Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. Technical Report LIDS-P-2433, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
    • (1998) Technical Report LIDS-P-2433
    • Abounady, J.1    Bertsekas, D.2    Borkar, V.S.3
  • 2
    • 0003874616 scopus 로고    scopus 로고
    • Learning algorithms for Markov decision processes with average cost
    • Laboratory for Information and Decision Systems, MIT, Cambridge, MA
    • Abounady, J., Bertsekas, D., & Borkar, V.S. (1998). Learning algorithms for Markov decision processes with average cost. Technical Report LIDS-P-2434, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
    • (1998) Technical Report LIDS-P-2434
    • Abounady, J.1    Bertsekas, D.2    Borkar, V.S.3
  • 3
  • 9
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • Borkar, V.S. (1997). Stochastic approximation with two time scales. Systems & Control Letters 29: 291-294.
    • (1997) Systems & Control Letters , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 10
    • 0347967095 scopus 로고    scopus 로고
    • The O.D.E. method for convergence of stochastic approximation and reinforcement learning
    • to appear
    • Borkar, V.S. & Meyn, S.P. (1998). The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM Journal of Control and Optimization (to appear).
    • (1998) SIAM Journal of Control and Optimization
    • Borkar, V.S.1    Meyn, S.P.2
  • 13
    • 0024909476 scopus 로고
    • Convergent activation dynamics in continuous time networks
    • Hirsch, M.W. (1987). Convergent activation dynamics in continuous time networks. Neural Networks 2: 331-349.
    • (1987) Neural Networks , vol.2 , pp. 331-349
    • Hirsch, M.W.1
  • 14
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakola, T., Jordan, M., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6: 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakola, T.1    Jordan, M.2    Singh, S.3
  • 15
  • 16
    • 0017526570 scopus 로고
    • Analysis of recursive stochastic algorithms
    • Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control AC-22: 551-575.
    • (1977) IEEE Transactions on Automatic Control , vol.AC-22 , pp. 551-575
    • Ljung, L.1
  • 20
    • 0009656873 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: Learning, planning and representing knowledge as multiple temporal scales
    • Sutton, R.S., Precup, D., & Singh, S.P. (1998). Between MDPs and semi-MDPs: Learning, planning and representing knowledge as multiple temporal scales. Journal of A.I. Research 1: 1-39.
    • (1998) Journal of A.I. Research , vol.1 , pp. 1-39
    • Sutton, R.S.1    Precup, D.2    Singh, S.P.3
  • 21
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis, J.N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning 16: 185-202.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 22
    • 0004049893 scopus 로고
    • [Unpublished] Ph.D. thesis, Cambridge University, Cambridge, UK
    • Watkins, C. (1989). Learning from delayed rewards. [Unpublished] Ph.D. thesis, Cambridge University, Cambridge, UK.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.