SCOPUS 정보 검색 플랫폼

European Journal of Operational Research

Volumn 155, Issue 3, 2004, Pages 654-674

Reinforcement learning for long-run average cost

(1) Gosavi, Abhijit a

a UNIVERSITY AT BUFFALO (United States)

Author keywords

Reinforcement learning; Stochastic processes; Two time scales

Indexed keywords

COMPUTER SIMULATION; COSTS; DYNAMIC PROGRAMMING; LEARNING SYSTEMS; MARKOV PROCESSES; PROBABILITY; QUALITY CONTROL; RANDOM PROCESSES; REINFORCEMENT;

REINFORCEMENT LEARNING; TWO TIME SCALES;

DECISION MAKING;

EID: 0742319170 PISSN: 03772217 EISSN: None Source Type: Journal
DOI: 10.1016/S0377-2217(02)00874-3 Document Type: Conference Paper

Times cited : (115)

References (35)

1
- 85034481533
- Ode analysis for Q-learning algorithms
- MIT, Cambridge, MA
- J. Abounadi, D. Bertsekas, V. Borkar, Ode analysis for Q-learning algorithms, LIDS Report, MIT, Cambridge, MA, 1996.
- (1996) LIDS Report
- Abounadi, J.¹ Bertsekas, D.² Borkar, V.³

2
- 84966211467
- The theory of dynamic programming
- Bellman R. The theory of dynamic programming. Bulletin of American Mathematical Society. 60:1954;503-516.
- (1954) Bulletin of American Mathematical Society , vol.60 , pp. 503-516
- Bellman, R.¹

3
- 0003565783
- Belmont, MA: Athena Scientific
- Bertsekas D.P. Dynamic Programming and Optimal Control. 1995;Athena Scientific, Belmont, MA.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

4
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas D.P., Tsitsiklis J.N. Neuro-Dynamic Programming. 1996;Athena Scientific, Belmont, MA.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

5
- 0031076413
- Stochastic approximation with two-time scales
- Borkar V.S. Stochastic approximation with two-time scales. System and Control Letters. 29:1997;291-294.
- (1997) System and Control Letters , vol.29 , pp. 291-294
- Borkar, V.S.¹

6
- 0032075427
- Asynchronous stochastic approximation
- Borkar V.S. Asynchronous stochastic approximation. SIAM Journal on Control and Optimization. 36(3):1998;840-851.
- (1998) SIAM Journal on Control and Optimization , vol.36 , Issue.3 , pp. 840-851
- Borkar, V.S.¹

7
- 0742326372
- The ode method for convergence of stochastic approximation and reinforcement learning
- V.S. Borkar, S.P. Meyn, The ode method for convergence of stochastic approximation and reinforcement learning, Working paper.
- Working Paper
- Borkar, V.S.¹ Meyn, S.P.²

8
- 0031123471
- An analog scheme for fixed point computation, Part i: Theory
- Borkar V.S., Soumyanath K. An analog scheme for fixed point computation, Part i: Theory. IEEE Transactions Circuits and Systems I. Fundamental Theory and Application. 44:1997;351-354.
- (1997) IEEE Transactions Circuits and Systems I. Fundamental Theory and Application , vol.44 , pp. 351-354
- Borkar, V.S.¹ Soumyanath, K.²

9
- 0026945583
- Optimal inspection policies for a manufacturing station
- Cassandras C.G., Han Y. Optimal inspection policies for a manufacturing station. European Journal of Operational Research. 63:1992;35-53.
- (1992) European Journal of Operational Research , vol.63 , pp. 35-53
- Cassandras, C.G.¹ Han, Y.²

10
- 0032643313
- Solving semi-Markov decision problems using average reward reinforcement learning
- Das T.K., Gosavi A., Mahadevan S., Marchalleck N. Solving semi-Markov decision problems using average reward reinforcement learning. Management Science. 45(4):1999;560-574.
- (1999) Management Science , vol.45 , Issue.4 , pp. 560-574
- Das, T.K.¹ Gosavi, A.² Mahadevan, S.³ Marchalleck, N.⁴

11
- 0032683886
- Optimal preventive maintenance in a production inventory system
- Das T.K., Sarkar S. Optimal preventive maintenance in a production inventory system. IIE Transactions on Quality and Reliability. 31:1999;537-551.
- (1999) IIE Transactions on Quality and Reliability , vol.31 , pp. 537-551
- Das, T.K.¹ Sarkar, S.²

12
- 0742326370
- Optimal and near-optimal control of a two-part stochastic manufacturing system with dynamic setups
- at the Department of Industrial Engineering, University of Florida, Gainesville
- M. Elhafsi, S. Bai, Optimal and near-optimal control of a two-part stochastic manufacturing system with dynamic setups, Research Report 95-10, at the Department of Industrial Engineering, University of Florida, Gainesville, 1997.
- (1997) Research Report , vol.95 , Issue.10
- Elhafsi, M.¹ Bai, S.²

13
- 0003653971
- Unpublished Ph.D. Dissertation, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, May
- A. Gosavi, An algorithm for solving semi-Markov decision problems using reinforcement learning: Convergence analysis and numerical results, Unpublished Ph.D. Dissertation, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, May 1999.
- (1999) An Algorithm for Solving Semi-markov Decision Problems Using Reinforcement Learning: Convergence Analysis and Numerical Results
- Gosavi, A.¹

14
- 0742308756
- On the convergence of some reinforcement learning algorithms
- Department of Engineering, University of Southern Colorado, Pueblo
- A. Gosavi, On the convergence of some reinforcement learning algorithms. Working paper, Department of Engineering, University of Southern Colorado, Pueblo, 2000.
- (2000) Working Paper
- Gosavi, A.¹

15
- 0036722536
- Airline seat allocation among multiple fare classes with overbooking
- Gosavi A., Bandla N., Das T.K. Airline seat allocation among multiple fare classes with overbooking. IIE Transactions. 34(9):2002;729-742.
- (2002) IIE Transactions , vol.34 , Issue.9 , pp. 729-742
- Gosavi, A.¹ Bandla, N.² Das, T.K.³

16
- 84995317030
- A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward
- in press
- A. Gosavi, T.K. Das, S. Sarkar, A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward. IIE Transactions (in press).
- IIE Transactions
- Gosavi, A.¹ Das, T.K.² Sarkar, S.³

17
- 0742291458
- Actor-critic type learning algorithms for Markov decision processes
- Indian Institute of Sciences, Bangalore, India
- V.R. Konda, V.S. Borkar, Actor-critic type learning algorithms for Markov decision processes, Working paper, Indian Institute of Sciences, Bangalore, India.
- Working Paper
- Konda, V.R.¹ Borkar, V.S.²

18
- 0003452601
- Berlin: Springer-Verlag
- Kushner H.J., Clark D.S. Stochastic Approximation Methods for Constrained and Unconstrained Systems. 1978;Springer-Verlag, Berlin.
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems
- Kushner, H.J.¹ Clark, D.S.²

19
- 0003735267
- New York: John Wiley and Sons
- Lewis E.E. Introduction to Reliability Engineering. 1994;John Wiley and Sons, New York.
- (1994) Introduction to Reliability Engineering
- Lewis, E.E.¹

20
- 0017526570
- Analysis of recursive stochastic algorithms
- Ljung L. Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control. 22:1977;551-575.
- (1977) IEEE Transactions on Automatic Control , vol.22 , pp. 551-575
- Ljung, L.¹

21
- 0029752592
- Average reward reinforcement learning: Foundations, algorithms, and empirical results
- Mahadevan S. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning. 22(1):1996;159-195.
- (1996) Machine Learning , vol.22 , Issue.1 , pp. 159-195
- Mahadevan, S.¹

22
- 0003861655
- Unpublished Ph.D. Thesis, Brown University, Providence, RI
- M.L. Littman, Algorithms for sequential decision-making, Unpublished Ph.D. Thesis, Brown University, Providence, RI, 1996.
- (1996) Algorithms for Sequential Decision-making
- Littman, M.L.¹

23
- 0003891507
- Englewood Cliffs, NJ: Prentice Hall
- Narendra K., Thatachar M.A.L. Learning Automata: An Introduction. 1989;Prentice Hall, Englewood Cliffs, NJ.
- (1989) Learning Automata: An Introduction
- Narendra, K.¹ Thatachar, M.A.L.²

24
- 0003998452
- New York: Wiley Interscience
- Puterman M.L. Markov Decision Processes. 1994;Wiley Interscience, New York.
- (1994) Markov Decision Processes
- Puterman, M.L.¹

25
- 0000016172
- A stochastic approximation method
- Robbins H., Monro S. A stochastic approximation method. Annals Mathematical Statistics. 22:1951;400-407.
- (1951) Annals Mathematical Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

26
- 85152626183
- A reinforcement learning method for maximizing undiscounted rewards
- A. Schwartz. A reinforcement learning method for maximizing undiscounted rewards, in: Proceedings of the Tenth Annual Conference on Machine Learning, 1993, pp. 298-305.
- (1993) Proceedings of the Tenth Annual Conference on Machine Learning , pp. 298-305
- Schwartz, A.¹

27
- 0021595914
- Part selection policy for a flexible manufacturing cell feeding several production lines
- Seidmann A., Schweitzer P.J. Part selection policy for a flexible manufacturing cell feeding several production lines. IIE Transactions. 16(4):1984;355-362.
- (1984) IIE Transactions , vol.16 , Issue.4 , pp. 355-362
- Seidmann, A.¹ Schweitzer, P.J.²

28
- 85060257643
- New York: John Wiley and Sons
- Sennott L. Stochastic Dynamic Programming and the Control of Queueing Systems. 1999;John Wiley and Sons, New York.
- (1999) Stochastic Dynamic Programming and the Control of Queueing Systems
- Sennott, L.¹

29
- 0026168810
- Optimal control of a queuing network system with two types of customers
- Shioyama T. Optimal control of a queuing network system with two types of customers. European Journal of Operational Research. 52:1991;367-372.
- (1991) European Journal of Operational Research , vol.52 , pp. 367-372
- Shioyama, T.¹

30
- 0028574683
- Reinforcement learning algorithms for average-payoff Markovian decision processes
- Cambridge, MA: MIT Press
- Singh S. Reinforcement learning algorithms for average-payoff Markovian decision processes. Proceedings of the 12th AAAI. 1994;MIT Press, Cambridge, MA.
- (1994) Proceedings of the 12th AAAI
- Singh, S.¹

31
- 0004102479
- Reinforcement Learning
- special issue
- Sutton R. Reinforcement Learning. Machine Learning Journal. 8(3):1992;5. (special issue).
- (1992) Machine Learning Journal , vol.8 , Issue.3 , pp. 5
- Sutton, R.¹

32
- 0004007508
- Cambridge, MA: The MIT Press
- Button R., Barto A.G. Reinforcement Learning. 1998;The MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning
- Button, R.¹ Barto, A.G.²

33
- 0002313852
- Scaling up average reward reinforcement learning by approximating the domain models and the value function
- New York: Morgan Kaufmann
- Tadepalli P., Ok D. Scaling up average reward reinforcement learning by approximating the domain models and the value function. Proceedings of the Thirteenth International Machine Learning Conference. 1996;471-479 Morgan Kaufmann, New York.
- (1996) Proceedings of the Thirteenth International Machine Learning Conference , pp. 471-479
- Tadepalli, P.¹ Ok, D.²

34
- 0028497630
- Asynchronous stochastic approximation and q-learning
- Tsitsiklis J. Asynchronous stochastic approximation and q-learning. Machine Learning. 16:1994;185-202.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.¹

35
- 0004049893
- Ph.D. Thesis, Kings College, Cambridge, England, May
- C.J. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Kings College, Cambridge, England, May 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.