SCOPUS 정보 검색 플랫폼

Probability in the Engineering and Informational Sciences

Volumn 14, Issue 2, 2000, Pages 243-258

A learning algorithm for discrete-time stochastic control

(1) Borkar, V S a

a TATA INSTITUTE OF FUNDAMENTAL RESEARCH (India)

Author keywords

[No Author keywords available]

Indexed keywords

REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

ACTION SPACES; ALMOST SURE CONVERGENCE; COMPACT SUBSETS; DISCRETE STATE; EUCLIDEAN SPACES; SIMULATION-BASED ALGORITHMS; STOCHASTIC CONTROL; SUITABLE CONDITIONS;

LEARNING ALGORITHMS;

EID: 0034550848 PISSN: 02699648 EISSN: None Source Type: Journal
DOI: 10.1017/s0269964800142081 Document Type: Article

Times cited : (13)

References (23)

1
- 0004030716
- Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms
- Laboratory for Information and Decision Systems, MIT, Cambridge, MA
- Abounady, J., Bertsekas, D., & Borkar, V.S. (1998). Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms. Technical Report LIDS-P-2433, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
- (1998) Technical Report LIDS-P-2433
- Abounady, J.¹ Bertsekas, D.² Borkar, V.S.³

2
- 0003874616
- Learning algorithms for Markov decision processes with average cost
- Laboratory for Information and Decision Systems, MIT, Cambridge, MA
- Abounady, J., Bertsekas, D., & Borkar, V.S. (1998). Learning algorithms for Markov decision processes with average cost. Technical Report LIDS-P-2434, Laboratory for Information and Decision Systems, MIT, Cambridge, MA.
- (1998) Technical Report LIDS-P-2434
- Abounady, J.¹ Bertsekas, D.² Borkar, V.S.³

3
- 0003796630
- New York: Academic Press
- Adams, R.A. (1975). Sobolev spaces. New York: Academic Press.
- (1975) Sobolev Spaces
- Adams, R.A.¹

4
- 0003981735
- [Unpublished] Ph.D. thesis, Harvard University, Cambridge, MA
- Baker, W.L. (1997). Learning via stochastic approximation in function space. [Unpublished] Ph.D. thesis, Harvard University, Cambridge, MA.
- (1997) Learning via Stochastic Approximation in Function Space
- Baker, W.L.¹

5
- 0003778897
- New York: Springer-Verlag
- Benveniste, A., Metivier, M., & Priouret, P. (1990). Adaptive algorithms and stochastic approximation. New York: Springer-Verlag.
- (1990) Adaptive Algorithms and Stochastic Approximation
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

6
- 0004211236
- Belmont, MA: Athena Scientific
- Bertsekas, D. & Tsitsiklis, J. (1996). Neurodynamic programming. Belmont, MA: Athena Scientific.
- (1996) Neurodynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

7
- 0031496013
- Multiscale stochastic approximation for parametric optimization of hidden Markov models
- Bhatnagar, S. & Borkar, V.S. (1997). Multiscale stochastic approximation for parametric optimization of hidden Markov models. Probability in the Engineering and Informational Sciences 11: 509-522.
- (1997) Probability in the Engineering and Informational Sciences , vol.11 , pp. 509-522
- Bhatnagar, S.¹ Borkar, V.S.²

8
- 0004919205
- Distributed computation of fixed points of ∞-nonexpansive maps
- Borkar, V.S. (1996). Distributed computation of fixed points of ∞-nonexpansive maps. Proceedings of the Indian Academy of Sciences (Mathematical Sciences) 106(3): 289-300.
- (1996) Proceedings of the Indian Academy of Sciences (Mathematical Sciences) , vol.106 , Issue.3 , pp. 289-300
- Borkar, V.S.¹

9
- 0031076413
- Stochastic approximation with two time scales
- Borkar, V.S. (1997). Stochastic approximation with two time scales. Systems & Control Letters 29: 291-294.
- (1997) Systems & Control Letters , vol.29 , pp. 291-294
- Borkar, V.S.¹

10
- 0347967095
- The O.D.E. method for convergence of stochastic approximation and reinforcement learning
- to appear
- Borkar, V.S. & Meyn, S.P. (1998). The O.D.E. method for convergence of stochastic approximation and reinforcement learning. SIAM Journal of Control and Optimization (to appear).
- (1998) SIAM Journal of Control and Optimization
- Borkar, V.S.¹ Meyn, S.P.²

11
- 0031123471
- An analog scheme for fixed point computation, Part 1: Theory
- Borkar, V.S. & Soumyanath, K. (1997). An analog scheme for fixed point computation, Part 1: Theory. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications CS-44(4): 351-355.
- (1997) IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications , vol.CS-44 , Issue.4 , pp. 351-355
- Borkar, V.S.¹ Soumyanath, K.²

12
- 24544433669
- [Unpublished] Ph.D. thesis, MIT, Cambridge, MA
- Boussios, C.I. (1998). An approach for nonlinear control via approximate dynamic programming. [Unpublished] Ph.D. thesis, MIT, Cambridge, MA.
- (1998) An Approach for Nonlinear Control via Approximate Dynamic Programming
- Boussios, C.I.¹

13
- 0024909476
- Convergent activation dynamics in continuous time networks
- Hirsch, M.W. (1987). Convergent activation dynamics in continuous time networks. Neural Networks 2: 331-349.
- (1987) Neural Networks , vol.2 , pp. 331-349
- Hirsch, M.W.¹

14
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakola, T., Jordan, M., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6: 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakola, T.¹ Jordan, M.² Singh, S.³

15
- 0343893613
- Actor-critic type learning algorithms for Markov decision processes
- to appear
- Konda, V.R. & Borkar, V.S. (1998). Actor-critic type learning algorithms for Markov decision processes. SIAM Journal of Control and Optimization (to appear).
- (1998) SIAM Journal of Control and Optimization
- Konda, V.R.¹ Borkar, V.S.²

16
- 0017526570
- Analysis of recursive stochastic algorithms
- Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control AC-22: 551-575.
- (1977) IEEE Transactions on Automatic Control , vol.AC-22 , pp. 551-575
- Ljung, L.¹

17
- 0004239351
- Amsterdam: North-Holland
- Neveu, J. (1975). Discrete parameter martingales. Amsterdam: North-Holland.
- (1975) Discrete Parameter Martingales
- Neveu, J.¹

18
- 0003998452
- New York: Wiley
- Puterman, M.I. (1994). Markov decision processes. New York: Wiley.
- (1994) Markov Decision Processes
- Puterman, M.I.¹

19
- 0004007508
- Cambridge, MA: MIT Press
- Sutton, R.S. & Barto, A. (1998). Reinforcement learning. Cambridge, MA: MIT Press.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.²

20
- 0009656873
- Between MDPs and semi-MDPs: Learning, planning and representing knowledge as multiple temporal scales
- Sutton, R.S., Precup, D., & Singh, S.P. (1998). Between MDPs and semi-MDPs: Learning, planning and representing knowledge as multiple temporal scales. Journal of A.I. Research 1: 1-39.
- (1998) Journal of A.I. Research , vol.1 , pp. 1-39
- Sutton, R.S.¹ Precup, D.² Singh, S.P.³

21
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- Tsitsiklis, J.N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning 16: 185-202.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

22
- 0004049893
- [Unpublished] Ph.D. thesis, Cambridge University, Cambridge, UK
- Watkins, C. (1989). Learning from delayed rewards. [Unpublished] Ph.D. thesis, Cambridge University, Cambridge, UK.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

23
- 34249833101
- Q-learning
- Watkins, C. & Dayan, P. (1992). Q-learning. Machine Learning 8: 279-292.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.