-
3
-
-
0000985504
-
A self-teaching backgammon program, achieves master-level play
-
G. Tesaru. T. gammon. A self-teaching backgammon program, achieves master-level play. Neural Computation, 1994, 6(2): 215-219.
-
(1994)
Neural Computation
, vol.6
, Issue.2
, pp. 215-219
-
-
Tesaru, G.1
Gammon, T.2
-
4
-
-
35349027192
-
Application of reinforcement learning to the game of othello
-
N. J. V. Eck, M. W. Wezel. Application of reinforcement learning to the game of othello. Computers and Operations Research, 2008, 35(6): 1999-2017.
-
(2008)
Computers and Operations Research
, vol.35
, Issue.6
, pp. 1999-2017
-
-
Eck, N.J.V.1
Wezel, M.W.2
-
5
-
-
0003787146
-
-
Princeton: Princeton University Press
-
R. E. Bellman. Dynamic Programming. Princeton: Princeton University Press, 1957.
-
(1957)
Dynamic Programming
-
-
Bellman, R.E.1
-
13
-
-
0023169119
-
Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research
-
P. J. Werbos. Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man, and Cybernetics, 1987, 17(1): 7-20.
-
(1987)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.17
, Issue.1
, pp. 7-20
-
-
Werbos, P.J.1
-
14
-
-
0036565019
-
Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neuro-control of a turbogenerator
-
G. Venayagamoorthy, R. Harley, D. Wunsch. Comparison of heuristic dynamic programming and dual heuristic programming adaptive critics for neuro-control of a turbogenerator. IEEE Transactions on Neural Networks, 2002, 13(3): 764-773.
-
(2002)
IEEE Transactions on Neural Networks
, vol.13
, Issue.3
, pp. 764-773
-
-
Venayagamoorthy, G.1
Harley, R.2
Wunsch, D.3
-
15
-
-
67650170605
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
Piscataway: New York
-
A. G. Barto, R. S. Sutton, C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. Artificial Neural Networks, Piscataway: New York, 1990: 81-93.
-
(1990)
Artificial Neural Networks
, pp. 81-93
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
16
-
-
0343893613
-
Actor-critic type learning algorithms for Markov decision processes
-
V. R. Konda, V. S. Borkar. Actor-critic type learning algorithms for Markov decision processes. SIAM Journal on Control and Optimization, 1999, 38(1): 94-123.
-
(1999)
SIAM Journal on Control and Optimization
, vol.38
, Issue.1
, pp. 94-123
-
-
Konda, V.R.1
Borkar, V.S.2
-
19
-
-
70349116541
-
Reinforcement learning and adaptive dynamic programming for feedback control
-
F. L. Lewis, D. Vrabie. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine, 2009, 9(3): 32-50.
-
(2009)
IEEE Circuits and Systems Magazine
, vol.9
, Issue.3
, pp. 32-50
-
-
Lewis, F.L.1
Vrabie, D.2
-
20
-
-
49049111594
-
Issues on stability of adp feedback controllers for dynamical systems
-
S. N. Balakrishnan, J. Ding, F. L. Lewis. Issues on stability of adp feedback controllers for dynamical systems. IEEE Transactions on Systems, Man and Cybernetics (Special Issue on ADP/RL), 2008, 38(4): 913-917.
-
(2008)
IEEE Transactions on Systems, Man and Cybernetics (Special Issue on ADP/RL)
, vol.38
, Issue.4
, pp. 913-917
-
-
Balakrishnan, S.N.1
Ding, J.2
Lewis, F.L.3
-
22
-
-
67649964731
-
Reinforcement learning: a tutorial survey and recent advances
-
A. Gosavi. Reinforcement learning: a tutorial survey and recent advances. INFORMS Journal on Computing, 2009, 21(2): 178-192.
-
(2009)
INFORMS Journal on Computing
, vol.21
, Issue.2
, pp. 178-192
-
-
Gosavi, A.1
-
26
-
-
33751077547
-
A policy-gradient method for semi-Markov decision processes with application to call admission control
-
S. Singh, V. Tadic, A. Doucet. A policy-gradient method for semi-Markov decision processes with application to call admission control. European Journal of Operational Research, 2007, 178(3): 808-818.
-
(2007)
European Journal of Operational Research
, vol.178
, Issue.3
, pp. 808-818
-
-
Singh, S.1
Tadic, V.2
Doucet, A.3
-
28
-
-
0031076413
-
Stochastic approximation with two-time scales
-
V. S. Borkar. Stochastic approximation with two-time scales. Systems & Control Letters, 1997, 29(5): 291-294.
-
(1997)
Systems & Control Letters
, vol.29
, Issue.5
, pp. 291-294
-
-
Borkar, V.S.1
-
29
-
-
0036287773
-
Learning algorithms for Markov decision processes with average cost
-
J. Abounadi, D. P. Bertsekas, V. Borkar. Learning algorithms for Markov decision processes with average cost. SIAM Journal of Control and Optimization, 2001, 40(3): 681-698.
-
(2001)
SIAM Journal of Control and Optimization
, vol.40
, Issue.3
, pp. 681-698
-
-
Abounadi, J.1
Bertsekas, D.P.2
Borkar, V.3
-
30
-
-
64049092199
-
Forecasting and control of passenger bookings
-
K. Littlewood. Forecasting and control of passenger bookings. Journal of Revenue and Pricing Management, 2005, 4(2): 111-123.
-
(2005)
Journal of Revenue and Pricing Management
, vol.4
, Issue.2
, pp. 111-123
-
-
Littlewood, K.1
-
31
-
-
0024629453
-
Application of a probabilistic decision model to airline seat inventory control
-
P. P. Belobaba. Application of a probabilistic decision model to airline seat inventory control. Operations Research, 1989, 37(2): 183-197.
-
(1989)
Operations Research
, vol.37
, Issue.2
, pp. 183-197
-
-
Belobaba, P.P.1
-
32
-
-
0032642848
-
Revenue management: Research overview and prospects
-
J. I. McGill, G. J. van Ryzin. Revenue management: Research overview and prospects. Transportation Science, 1999, 33(2): 233-256.
-
(1999)
Transportation Science
, vol.33
, Issue.2
, pp. 233-256
-
-
McGill, J.I.1
van Ryzin, G.J.2
-
33
-
-
41549145624
-
An overview of research on revenue management: current issues and future research
-
W. C. Chiang, J. C. H. Chen, X. Xu. An overview of research on revenue management: current issues and future research. International Journal of Revenue Management, 2007, 1(1): 97-128.
-
(2007)
International Journal of Revenue Management
, vol.1
, Issue.1
, pp. 97-128
-
-
Chiang, W.C.1
Chen, J.C.H.2
Xu, X.3
-
36
-
-
0036722536
-
A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking
-
A. Gosavi, N. Bandla, T. K. Das. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. IIE Transactions, 2002, 34(9): 729-742.
-
(2002)
IIE Transactions
, vol.34
, Issue.9
, pp. 729-742
-
-
Gosavi, A.1
Bandla, N.2
Das, T.K.3
-
37
-
-
2342446663
-
A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis
-
A. Gosavi. A reinforcement learning algorithm based on policy iteration for average reward: empirical results with yield management and convergence analysis. Machine Learning, 2004, 55(1): 5-29.
-
(2004)
Machine Learning
, vol.55
, Issue.1
, pp. 5-29
-
-
Gosavi, A.1
|