-
1
-
-
85034481533
-
Ode analysis for Q-learning algorithms
-
MIT, Cambridge, MA
-
J. Abounadi, D. Bertsekas, V. Borkar, Ode analysis for Q-learning algorithms, LIDS Report, MIT, Cambridge, MA, 1996.
-
(1996)
LIDS Report
-
-
Abounadi, J.1
Bertsekas, D.2
Borkar, V.3
-
5
-
-
0031076413
-
Stochastic approximation with two-time scales
-
Borkar V.S. Stochastic approximation with two-time scales. System and Control Letters. 29:1997;291-294.
-
(1997)
System and Control Letters
, vol.29
, pp. 291-294
-
-
Borkar, V.S.1
-
7
-
-
0742326372
-
The ode method for convergence of stochastic approximation and reinforcement learning
-
V.S. Borkar, S.P. Meyn, The ode method for convergence of stochastic approximation and reinforcement learning, Working paper.
-
Working Paper
-
-
Borkar, V.S.1
Meyn, S.P.2
-
10
-
-
0032643313
-
Solving semi-Markov decision problems using average reward reinforcement learning
-
Das T.K., Gosavi A., Mahadevan S., Marchalleck N. Solving semi-Markov decision problems using average reward reinforcement learning. Management Science. 45(4):1999;560-574.
-
(1999)
Management Science
, vol.45
, Issue.4
, pp. 560-574
-
-
Das, T.K.1
Gosavi, A.2
Mahadevan, S.3
Marchalleck, N.4
-
12
-
-
0742326370
-
Optimal and near-optimal control of a two-part stochastic manufacturing system with dynamic setups
-
at the Department of Industrial Engineering, University of Florida, Gainesville
-
M. Elhafsi, S. Bai, Optimal and near-optimal control of a two-part stochastic manufacturing system with dynamic setups, Research Report 95-10, at the Department of Industrial Engineering, University of Florida, Gainesville, 1997.
-
(1997)
Research Report
, vol.95
, Issue.10
-
-
Elhafsi, M.1
Bai, S.2
-
13
-
-
0003653971
-
-
Unpublished Ph.D. Dissertation, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, May
-
A. Gosavi, An algorithm for solving semi-Markov decision problems using reinforcement learning: Convergence analysis and numerical results, Unpublished Ph.D. Dissertation, Department of Industrial and Management Systems Engineering, University of South Florida, Tampa, FL, May 1999.
-
(1999)
An Algorithm for Solving Semi-markov Decision Problems Using Reinforcement Learning: Convergence Analysis and Numerical Results
-
-
Gosavi, A.1
-
14
-
-
0742308756
-
On the convergence of some reinforcement learning algorithms
-
Department of Engineering, University of Southern Colorado, Pueblo
-
A. Gosavi, On the convergence of some reinforcement learning algorithms. Working paper, Department of Engineering, University of Southern Colorado, Pueblo, 2000.
-
(2000)
Working Paper
-
-
Gosavi, A.1
-
15
-
-
0036722536
-
Airline seat allocation among multiple fare classes with overbooking
-
Gosavi A., Bandla N., Das T.K. Airline seat allocation among multiple fare classes with overbooking. IIE Transactions. 34(9):2002;729-742.
-
(2002)
IIE Transactions
, vol.34
, Issue.9
, pp. 729-742
-
-
Gosavi, A.1
Bandla, N.2
Das, T.K.3
-
16
-
-
84995317030
-
A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward
-
in press
-
A. Gosavi, T.K. Das, S. Sarkar, A simulation-based learning automata framework for solving semi-Markov decision problems under long-run average reward. IIE Transactions (in press).
-
IIE Transactions
-
-
Gosavi, A.1
Das, T.K.2
Sarkar, S.3
-
17
-
-
0742291458
-
Actor-critic type learning algorithms for Markov decision processes
-
Indian Institute of Sciences, Bangalore, India
-
V.R. Konda, V.S. Borkar, Actor-critic type learning algorithms for Markov decision processes, Working paper, Indian Institute of Sciences, Bangalore, India.
-
Working Paper
-
-
Konda, V.R.1
Borkar, V.S.2
-
20
-
-
0017526570
-
Analysis of recursive stochastic algorithms
-
Ljung L. Analysis of recursive stochastic algorithms. IEEE Transactions on Automatic Control. 22:1977;551-575.
-
(1977)
IEEE Transactions on Automatic Control
, vol.22
, pp. 551-575
-
-
Ljung, L.1
-
21
-
-
0029752592
-
Average reward reinforcement learning: Foundations, algorithms, and empirical results
-
Mahadevan S. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning. 22(1):1996;159-195.
-
(1996)
Machine Learning
, vol.22
, Issue.1
, pp. 159-195
-
-
Mahadevan, S.1
-
22
-
-
0003861655
-
-
Unpublished Ph.D. Thesis, Brown University, Providence, RI
-
M.L. Littman, Algorithms for sequential decision-making, Unpublished Ph.D. Thesis, Brown University, Providence, RI, 1996.
-
(1996)
Algorithms for Sequential Decision-making
-
-
Littman, M.L.1
-
27
-
-
0021595914
-
Part selection policy for a flexible manufacturing cell feeding several production lines
-
Seidmann A., Schweitzer P.J. Part selection policy for a flexible manufacturing cell feeding several production lines. IIE Transactions. 16(4):1984;355-362.
-
(1984)
IIE Transactions
, vol.16
, Issue.4
, pp. 355-362
-
-
Seidmann, A.1
Schweitzer, P.J.2
-
29
-
-
0026168810
-
Optimal control of a queuing network system with two types of customers
-
Shioyama T. Optimal control of a queuing network system with two types of customers. European Journal of Operational Research. 52:1991;367-372.
-
(1991)
European Journal of Operational Research
, vol.52
, pp. 367-372
-
-
Shioyama, T.1
-
30
-
-
0028574683
-
Reinforcement learning algorithms for average-payoff Markovian decision processes
-
Cambridge, MA: MIT Press
-
Singh S. Reinforcement learning algorithms for average-payoff Markovian decision processes. Proceedings of the 12th AAAI. 1994;MIT Press, Cambridge, MA.
-
(1994)
Proceedings of the 12th AAAI
-
-
Singh, S.1
-
31
-
-
0004102479
-
Reinforcement Learning
-
special issue
-
Sutton R. Reinforcement Learning. Machine Learning Journal. 8(3):1992;5. (special issue).
-
(1992)
Machine Learning Journal
, vol.8
, Issue.3
, pp. 5
-
-
Sutton, R.1
-
33
-
-
0002313852
-
Scaling up average reward reinforcement learning by approximating the domain models and the value function
-
New York: Morgan Kaufmann
-
Tadepalli P., Ok D. Scaling up average reward reinforcement learning by approximating the domain models and the value function. Proceedings of the Thirteenth International Machine Learning Conference. 1996;471-479 Morgan Kaufmann, New York.
-
(1996)
Proceedings of the Thirteenth International Machine Learning Conference
, pp. 471-479
-
-
Tadepalli, P.1
Ok, D.2
-
34
-
-
0028497630
-
Asynchronous stochastic approximation and q-learning
-
Tsitsiklis J. Asynchronous stochastic approximation and q-learning. Machine Learning. 16:1994;185-202.
-
(1994)
Machine Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.1
-
35
-
-
0004049893
-
-
Ph.D. Thesis, Kings College, Cambridge, England, May
-
C.J. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Kings College, Cambridge, England, May 1989.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.J.1
|