SCOPUS 정보 검색 플랫폼

Volumn 27, Issue 2, 2002, Pages 294-311

Q-learning for risk-sensitive control

a TATA INSTITUTE OF FUNDAMENTAL RESEARCH (India)

Author keywords

Dynamic programming; Markov decision processes; Q learning; Reinforcement learning; Risk sensitive control; Stochastic approximation

Indexed keywords

BOUNDARY CONDITIONS; COMPUTER SIMULATION; CONVERGENCE OF NUMERICAL METHODS; DECISION THEORY; DYNAMIC PROGRAMMING; LEARNING ALGORITHMS; LEARNING SYSTEMS; MATRIX ALGEBRA; ORDINARY DIFFERENTIAL EQUATIONS; RISK ASSESSMENT; THEOREM PROVING;

MARKOV DECISION PROCESSES; Q-LEARNING ALGORITHM; REINFORCEMENT LEARNING; RISK SENSITIVE CONTROL; STOCHASTIC APPROXIMATION ALGORITHM;

MARKOV PROCESSES;

EID: 0036577013 PISSN: 0364765X EISSN: None Source Type: Journal
DOI: 10.1287/moor.27.2.294.324 Document Type: Article

Times cited : (164)

References (35)

1
- 0037225359
- Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms
- (2002) SIAM J. Control Optim. , vol.41 , pp. 1-22
- Abounadi, J.¹ Bertsekas, D.² Borkar, V.S.³

2
- 0036287773
- Learning algorithms for Markov decision processes with average cost
- (2001) SIAM J. Control Optim. , vol.40 , pp. 681-698
- Abounadi, J.¹ Bertsekas, D.² Borkar, V.S.³

3
- 0002801896
- Multiplicative ergodicity and large deviations for an irreducible Markov chain
- (2000) Stochastic Processes Their Appl. , vol.90 , Issue.1 , pp. 123-144
- Balaji, S.¹ Meyn, S.P.²

4
- 0034445353
- A learning algorithm for Markov decision processes with adaptive state aggregation
- (2000) 39th IEEE Conf. on Decision and Control, Sydney, Australia
- Baras, J.S.¹ Borkar, V.S.²

5
- 0020970738
- Neuron-like elements that can solve difficult learning control problems
- (1983) IEEE Trans. on Systems, Man and Cybernetics , vol.13 , pp. 835-846
- Barto, A.¹ Sutton, R.² Anderson, C.³

6
- 0003778897
- Springer-Verlag, New York
- (1990) Adaptive Algorithms and Stochastic Approximation
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

7
- 0003487482
- Athena Scientific, Belmont, MA
- (1996) Neurodynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

8
- 0033130828
- Risk sensitive dynamic asset management
- (1999) Appl. Math. Optim. , vol.39 , pp. 337-360
- Bielecki, T.R.¹ Pliska, S.R.²

9
- 0001109804
- Risk sensitive asset management with transaction costs
- (2000) Finance Stochastics , vol.4 , pp. 1-33
- Bielecki, T.R.¹ Pliska, S.R.²

10
- 0001177402
- Risk sensitive control of finite state Markov chains in discrete time, with applications to portfolio management
- (1999) Math. Methods Oper. Res. , vol.50 , pp. 167-188
- Bielecki, T.R.¹ Hernández-Hernández, D.² Pliska, S.³

11
- 0031076413
- Stochastic approximation with two time scales
- (1997) Systems Control Lett. , vol.29 , pp. 291-294
- Borkar, V.S.¹

12
- 0032075427
- Asynchronous stochastic approximation
- (Erratum in ibid. (2000) 38 662-663.)
- (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
- Borkar, V.S.¹

13
- 0034550848
- A learning algorithm for discrete time stochastic control
- (2000) Probab. Engrg. Inform. Sci. , vol.14 , pp. 243-248
- Borkar, V.S.¹

14
- 0009721818
- Risk sensitive optimal control for Markov processes with monotone cost
- Submitted
- (1998) Math. Oper. Res.
- Borkar, V.S.¹ Meyn, S.P.²

15
- 0033876515
- The O.D.E. method for convergence of stochastic approximation and reinforcement learning
- (2000) SIAM J. Control Optim. , vol.38 , pp. 447-469
- Borkar, V.S.¹ Meyn, S.P.²

16
- 0003536628
- McGraw-Hill Book Co., New York
- (1955) Theory of Ordinary Differential Equations
- Coddington, E.A.¹ Levinson, N.²

17
- 0030374474
- Connections between stochastic control and dynamic games
- (1996) Math. Control, Signals Systems , vol.9 , pp. 303-326
- Dai Pra, P.¹ Meneghini, L.² Runggaldier, W.J.³

18
- 0342757512
- Risk-sensitive control of discrete time Markov processes with infinite horizon
- (1999) SIAM J. Control Optim. , vol.38 , pp. 61-78
- Di Masi, G.B.¹ Stettner, L.²

19
- 0031233725
- Risk-sensitive control of finite state machines on an infinite horizon I
- (1997) SIAM J. Control Optim. , vol.35 , pp. 1790-1810
- Fleming, W.H.¹ Hernández-Hernández, D.²

20
- 0030291522
- Risk sensitive control of Markov processes in countable state space
- (Corrigendum (1998). Systems and Control Lett. 34 105-106)
- (1996) Systems Control Lett. , vol.29 , pp. 147-155
- Hernández-Hernández, D.¹ Marcus, S.I.²

21
- 0343893613
- Actor-critic type learning algorithms for Markov decision processes
- (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

22
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- (1994) Neural Comput. , vol.6 , pp. 1185-1201
- Jaakola, I.¹ Jordan, M.² Singh, S.P.³

23
- 0004066022
- Springer-Verlag, New York
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.J.²

24
- 0017526570
- Analysis of recursive stochastic algorithms
- (1977) IEEE Trans. Automatic Control , vol.22 , pp. 551-575
- Ljung, L.¹

25
- 0003998452
- John Wiley, New York
- (1994) Markov Decision Processes
- Puterman, M.I.¹

26
- 0004007508
- MIT Press, Cambridge, MA
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.²

27
- 0009656873
- Between MDPs and semi-MDPs: Learning, planning and representing knowledge as multiple temporal scales
- (1998) J. Artificial Intelligence Res. , vol.1 , pp. 1-39
- Sutton, R.S.¹ Precup, D.² Singh, S.S.³

28
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.¹

29
- 0029752470
- Feature-based methods for large scale dynamic programming
- (1996) Machine Learning , vol.22 , pp. 185-202
- Tsitsiklis, J.¹ Van Roy, B.²

30
- 0031143730
- An analysis of temporal difference learning with function approximation
- (1997) IEEE Trans. Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.¹ Van Roy, B.²

31
- 0003787427
- Learning and value function approximation in complex decision processes
- LIDS-TH 2420, Ph.D. thesis, Lab. for Information and Decision Systems, M.I.T., Cambridge, MA
- (1998)
- Van Roy, B.¹

32
- 0009709497
- Neuro-dynamic programming: Overview and recent trends
- A. Shwartz, E. A. Feinberg, eds.; Kluwer Academic Publishers, Boston. Forthcoming
- (2000) Markov Decision Processes
- Van Roy, B.¹

33
- 0004049893
- Learning from delayed rewards
- Ph.D. thesis, Cambridge University, Cambridge, U.K.
- (1989)
- Watkins, C.¹

34
- 34249833101
- Q-learning
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

35
- 0003502787
- John Wiley, New York
- (1990) Risk-Sensitive Optimal Control
- Whittle, P.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.