메뉴 건너뛰기




Volumn 44, Issue 4, 2008, Pages 1111-1119

New algorithms of the Q-learning type

Author keywords

Markov decision processes; Q learning; Reinforcement learning; SPSA; Two timescale stochastic approximation

Indexed keywords

APPROXIMATION ALGORITHMS; LEARNING ALGORITHMS; MARKOV PROCESSES; ROUTING ALGORITHMS; TELECOMMUNICATION NETWORKS;

EID: 41049095293     PISSN: 00051098     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.automatica.2007.09.009     Document Type: Article
Times cited : (24)

References (10)
  • 2
    • 0346902105 scopus 로고    scopus 로고
    • Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences
    • Bhatnagar S., Fu M.C., Marcus S.I., and Wang I.-J. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences. ACM Transactions on Modelling and Computer Simulation 13 2 (2003) 180-209
    • (2003) ACM Transactions on Modelling and Computer Simulation , vol.13 , Issue.2 , pp. 180-209
    • Bhatnagar, S.1    Fu, M.C.2    Marcus, S.I.3    Wang, I.-J.4
  • 3
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • Borkar V.S. Stochastic approximation with two time scales. Systems Control Letters 29 (1997) 291-294
    • (1997) Systems Control Letters , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 4
    • 0000719863 scopus 로고
    • Packet routing in dynamically changing networks: A reinforcement learning approach
    • Boyan J.A., and Littman M.L. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems 6 (1994) 671-678
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 671-678
    • Boyan, J.A.1    Littman, M.L.2
  • 5
    • 0343893613 scopus 로고    scopus 로고
    • Actor-critic like learning algorithms for Markov decision processes
    • Konda V.R., and Borkar V.S. Actor-critic like learning algorithms for Markov decision processes. SIAM Journal on Control and Optimization 38 1 (1999) 94-123
    • (1999) SIAM Journal on Control and Optimization , vol.38 , Issue.1 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 6
    • 0033876565 scopus 로고    scopus 로고
    • Call admission control and routing in integrated services networks using neuro-dynamic programming
    • Marbach P., Mihatsch O., and Tsitsiklis J.N. Call admission control and routing in integrated services networks using neuro-dynamic programming. IEEE Journal on Selected Areas in Communication 18 (2000) 197-208
    • (2000) IEEE Journal on Selected Areas in Communication , vol.18 , pp. 197-208
    • Marbach, P.1    Mihatsch, O.2    Tsitsiklis, J.N.3
  • 7
    • 0026839090 scopus 로고
    • Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
    • Spall J.C. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control 37 (1992) 332-341
    • (1992) IEEE Transactions on Automatic Control , vol.37 , pp. 332-341
    • Spall, J.C.1
  • 8
    • 0030737152 scopus 로고    scopus 로고
    • A one-measurement form of simultaneous perturbation stochastic approximation
    • Spall J.C. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica 33 (1997) 109-112
    • (1997) Automatica , vol.33 , pp. 109-112
    • Spall, J.C.1
  • 9
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis J.N. Asynchronous stochastic approximation and Q-learning. Machine Learning 16 (1994) 185-202
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.