SCOPUS 정보 검색 플랫폼

Volumn 44, Issue 4, 2008, Pages 1111-1119

New algorithms of the Q-learning type

Author keywords

Markov decision processes; Q learning; Reinforcement learning; SPSA; Two timescale stochastic approximation

Indexed keywords

APPROXIMATION ALGORITHMS; LEARNING ALGORITHMS; MARKOV PROCESSES; ROUTING ALGORITHMS; TELECOMMUNICATION NETWORKS;

MARKOV DECISION PROCESSES; Q-LEARNING; TWO TIMESCALE STOCHASTIC APPROXIMATION;

REINFORCEMENT LEARNING;

EID: 41049095293 PISSN: 00051098 EISSN: None Source Type: Journal
DOI: 10.1016/j.automatica.2007.09.009 Document Type: Article

Times cited : (24)

References (10)

1
- 0003487482
- Athena Scientific, Belmont, MA
- Bertsekas D.P., and Tsitsiklis J.N. Neuro-dynamic programming (1996), Athena Scientific, Belmont, MA
- (1996) Neuro-dynamic programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

2
- 0346902105
- Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences
- Bhatnagar S., Fu M.C., Marcus S.I., and Wang I.-J. Two-timescale simultaneous perturbation stochastic approximation using deterministic perturbation sequences. ACM Transactions on Modelling and Computer Simulation 13 2 (2003) 180-209
- (2003) ACM Transactions on Modelling and Computer Simulation , vol.13 , Issue.2 , pp. 180-209
- Bhatnagar, S.¹ Fu, M.C.² Marcus, S.I.³ Wang, I.-J.⁴

3
- 0031076413
- Stochastic approximation with two time scales
- Borkar V.S. Stochastic approximation with two time scales. Systems Control Letters 29 (1997) 291-294
- (1997) Systems Control Letters , vol.29 , pp. 291-294
- Borkar, V.S.¹

4
- 0000719863
- Packet routing in dynamically changing networks: A reinforcement learning approach
- Boyan J.A., and Littman M.L. Packet routing in dynamically changing networks: A reinforcement learning approach. Advances in Neural Information Processing Systems 6 (1994) 671-678
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 671-678
- Boyan, J.A.¹ Littman, M.L.²

5
- 0343893613
- Actor-critic like learning algorithms for Markov decision processes
- Konda V.R., and Borkar V.S. Actor-critic like learning algorithms for Markov decision processes. SIAM Journal on Control and Optimization 38 1 (1999) 94-123
- (1999) SIAM Journal on Control and Optimization , vol.38 , Issue.1 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

6
- 0033876565
- Call admission control and routing in integrated services networks using neuro-dynamic programming
- Marbach P., Mihatsch O., and Tsitsiklis J.N. Call admission control and routing in integrated services networks using neuro-dynamic programming. IEEE Journal on Selected Areas in Communication 18 (2000) 197-208
- (2000) IEEE Journal on Selected Areas in Communication , vol.18 , pp. 197-208
- Marbach, P.¹ Mihatsch, O.² Tsitsiklis, J.N.³

7
- 0026839090
- Multivariate stochastic approximation using a simultaneous perturbation gradient approximation
- Spall J.C. Multivariate stochastic approximation using a simultaneous perturbation gradient approximation. IEEE Transactions on Automatic Control 37 (1992) 332-341
- (1992) IEEE Transactions on Automatic Control , vol.37 , pp. 332-341
- Spall, J.C.¹

8
- 0030737152
- A one-measurement form of simultaneous perturbation stochastic approximation
- Spall J.C. A one-measurement form of simultaneous perturbation stochastic approximation. Automatica 33 (1997) 109-112
- (1997) Automatica , vol.33 , pp. 109-112
- Spall, J.C.¹

9
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- Tsitsiklis J.N. Asynchronous stochastic approximation and Q-learning. Machine Learning 16 (1994) 185-202
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

10
- 34249833101
- Q-learning
- Watkins J.C.H., and Dayan P. Q-learning. Machine Learning 8 (1992) 279-292
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, J.C.H.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.