-
1
-
-
0036682894
-
A reinforcement learning approach to automatic generation control
-
T. P. I. Ahamed, P. S. N. Rao, and P. S. Sastry, "A reinforcement learning approach to automatic generation control," Electric Power Systems Research, vol. 63, no. 1, pp. 9-26, 2002.
-
(2002)
Electric Power Systems Research
, vol.63
, Issue.1
, pp. 9-26
-
-
Ahamed, T.P.I.1
Rao, P.S.N.2
Sastry, P.S.3
-
3
-
-
0016556021
-
A new approach to manipulator control: The cerebellar model articulation controller (CMAC)
-
J. S. Albus, "A new approach to manipulator control: The cerebellar model articulation controller (CMAC)," Journal of Dynamic Systems, Measurement and Control, pp. 220-227, 1975.
-
(1975)
Journal of Dynamic Systems, Measurement and Control
, pp. 220-227
-
-
Albus, J.S.1
-
4
-
-
0024646143
-
Learning to control an inverted pendulum using neural networks
-
C. W. Anderson, "Learning to control an inverted pendulum using neural networks," IEEE Control Systems Magazine, vol. 9, no. 3, pp. 31-37, 1989.
-
(1989)
IEEE Control Systems Magazine
, vol.9
, Issue.3
, pp. 31-37
-
-
Anderson, C.W.1
-
5
-
-
0031259122
-
Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil
-
C. W. Anderson, D. C. Hittle, A. D. Katz, and R. M. Kretchmar, "Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil," Artificial Intelligence in Engineering, vol. 11, no. 4, pp. 421-429, 1997.
-
(1997)
Artificial Intelligence in Engineering
, vol.11
, Issue.4
, pp. 421-429
-
-
Anderson, C.W.1
Hittle, D.C.2
Katz, A.D.3
Kretchmar, R.M.4
-
6
-
-
0030149709
-
Purposive behavior acquisition for a real robot by vision-based reinforcement learning
-
M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, "Purposive behavior acquisition for a real robot by vision-based reinforcement learning," Machine Learning, vol. 23, pp. 279-303, 1996.
-
(1996)
Machine Learning
, vol.23
, pp. 279-303
-
-
Asada, M.1
Noda, S.2
Tawaratsumida, S.3
Hosoda, K.4
-
7
-
-
0022740002
-
Dual control of an integrator with unknown gain
-
K. J. Åström and A. Helmersson, "Dual control of an integrator with unknown gain," Comp. & Maths, with Appls., vol. 12A, pp. 653-662, 1986.
-
(1986)
Comp. & Maths, with Appls.
, vol.12 A
, pp. 653-662
-
-
Åström, K.J.1
Helmersson, A.2
-
8
-
-
0002130986
-
Robot learning from demonstration
-
San Francisco, CA
-
C. G. Atkeson and S. Schaal, "Robot learning from demonstration," Proc. of the Fourteenth International Conference on Machine Learning, pp. 12-20, San Francisco, CA, 1997.
-
(1997)
Proc. of the Fourteenth International Conference on Machine Learning
, pp. 12-20
-
-
Atkeson, C.G.1
Schaal, S.2
-
10
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
A. G. Barto, S. J. Bradtke, and S. P. Singh, "Learning to act using real-time dynamic programming," Artificial Intelligence, vol. 72, no. 1, pp. 81-138, 1995.
-
(1995)
Artificial Intelligence
, vol.72
, Issue.1
, pp. 81-138
-
-
Barto, A.G.1
Bradtke, S.J.2
Singh, S.P.3
-
11
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. on Systems, Man, and Cybernetics, vol. 13, no. 5, pp. 834-846, 1983.
-
(1983)
IEEE Trans. on Systems, Man, and Cybernetics
, vol.13
, Issue.5
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
12
-
-
85012688561
-
-
Princeton University Press, New Jersey
-
R. E. Bellman, Dynamic Programming, Princeton University Press, New Jersey, 1957.
-
(1957)
Dynamic Programming
-
-
Bellman, R.E.1
-
14
-
-
18144371937
-
Neuro-dynamic programming: An overview
-
J. B. Rawlings, B. A. Ogunnaike, and J. W. Eaton, editors
-
D. P. Bertsekas, "Neuro-dynamic programming: An overview," In J. B. Rawlings, B. A. Ogunnaike, and J. W. Eaton, editors, Proc. of Sixth International Conference on Chemical Process Control, 2001.
-
(2001)
Proc. of Sixth International Conference on Chemical Process Control
-
-
Bertsekas, D.P.1
-
15
-
-
0024680419
-
Adaptive aggregation for infinite horizon dynamic programming
-
D. P. Bertsekas and D. A. Castanon, "Adaptive aggregation for infinite horizon dynamic programming," IEEE Trans. on Automatic Control, vol. 34, no. 6, pp. 589-598, 1989.
-
(1989)
IEEE Trans. on Automatic Control
, vol.34
, Issue.6
, pp. 589-598
-
-
Bertsekas, D.P.1
Castanon, D.A.2
-
16
-
-
0003794137
-
-
Prentice-Hall, Englewood Cliffs, NJ, 2nd edition
-
D. P. Bertsekas and R. G. Gallager, Data Networks, Prentice-Hall, Englewood Cliffs, NJ, 2nd edition, 1992.
-
(1992)
Data Networks
-
-
Bertsekas, D.P.1
Gallager, R.G.2
-
19
-
-
0343709784
-
A convex analytic approach to Markov decision processes
-
V. Borkar, "A convex analytic approach to Markov decision processes," Probability Theory and Related Fields, vol. 78, pp. 583-602, 1988.
-
(1988)
Probability Theory and Related Fields
, vol.78
, pp. 583-602
-
-
Borkar, V.1
-
20
-
-
0001133021
-
Generalization in reinforcement learning: Safely approximating the value function
-
G. Tesauro and D. Touretzky, editors, Morgan Kaufmann
-
J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: safely approximating the value function," In G. Tesauro and D. Touretzky, editors, Advances in Neural Information Processing Systems, vol. 7, Morgan Kaufmann, 1995.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
-
-
Boyan, J.A.1
Moore, A.W.2
-
21
-
-
0000859970
-
Reinforcement learning applied to linear quadratic regulation
-
S. J. Hanson, J. Cowan, and C. L. Giles, editors, Morgan Kaufmann
-
S. J. Bradtke, "Reinforcement learning applied to linear quadratic regulation," In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, vol, 5, Morgan Kaufmann, 1993.
-
(1993)
Advances in Neural Information Processing Systems
, vol.5
-
-
Bradtke, S.J.1
-
22
-
-
0003259931
-
Improving elevator performance using reinforcement learning
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, MIT Press, San Francisco, CA
-
R. Crites and A. G. Barto, "Improving elevator performance using reinforcement learning," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, MIT Press, San Francisco, CA, 1996.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
-
-
Crites, R.1
Barto, A.G.2
-
23
-
-
0032208335
-
Elevator group control using multiple reinforcement learning agents
-
R. Crites and A. G. Barto, "Elevator group control using multiple reinforcement learning agents," Machine Learning, vol. 33, pp. 235-262, 1998.
-
(1998)
Machine Learning
, vol.33
, pp. 235-262
-
-
Crites, R.1
Barto, A.G.2
-
24
-
-
0000430514
-
The convergence of TD(X) for general λ
-
P. Dayan. "The convergence of TD(X) for general λ," Machine Learning, vol. 8, pp. 341-362, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.1
-
25
-
-
0348090400
-
The linear programming approach to approximate dynamic programming
-
D. P. de Farias and B. Van Roy, "The linear programming approach to approximate dynamic programming," Operations Research, vol. 51, no. 6, pp. 850-865, 2003.
-
(2003)
Operations Research
, vol.51
, Issue.6
, pp. 850-865
-
-
De Farias, D.P.1
Van Roy, B.2
-
26
-
-
0001554538
-
On linear programming in a Markov decision problem
-
E. V. Denardo, "On linear programming in a Markov decision problem," Management Science, vol. 16, pp. 282-288, 1970.
-
(1970)
Management Science
, vol.16
, pp. 282-288
-
-
Denardo, E.V.1
-
29
-
-
0003684449
-
-
Springer-Verlag, New York, NY
-
T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, NY, 2001.
-
(2001)
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
-
-
Hastie, T.1
Tibshirani, R.2
Friedman, J.3
-
30
-
-
0018455841
-
Linear programming and Markov decision chains
-
A. Hordijk and L. C. M. Kallenberg, "Linear programming and Markov decision chains," Management Science, vol. 25, pp. 352-362, 1979.
-
(1979)
Management Science
, vol.25
, pp. 352-362
-
-
Hordijk, A.1
Kallenberg, L.C.M.2
-
31
-
-
0026849113
-
Process control via artificial neural networks and reinforcement learning
-
J. C. Hoskins and D. M. Himmelblau, "Process control via artificial neural networks and reinforcement learning," Computers & Chemical Engineering, vol. 16, no. 4, pp. 241-251, 1992.
-
(1992)
Computers & Chemical Engineering
, vol.16
, Issue.4
, pp. 241-251
-
-
Hoskins, J.C.1
Himmelblau, D.M.2
-
33
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Computation, vol. 6, no. 6, pp. 1185-1201, 1994.
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
34
-
-
0029679044
-
Reinforcement learning: A survey
-
L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
35
-
-
0037349318
-
Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor
-
N. S. Kaisare, J. M. Lee, and J. H. Lee, "Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor," International Journal of Robust and Nonlinear Control, vol. 13, no. 3-4, pp. 347-363, 2002.
-
(2002)
International Journal of Robust and Nonlinear Control
, vol.13
, Issue.3-4
, pp. 347-363
-
-
Kaisare, N.S.1
Lee, J.M.2
Lee, J.H.3
-
36
-
-
0027704767
-
Complexity analysis of real-time reinforcement learning
-
Menlo Park, CA
-
S. Koenig and R. G. Simmons, "Complexity analysis of real-time reinforcement learning," Proc. of the Eleventh National Conference on Artificial Intelligence, Menlo Park, CA, pp. 99-105, 1993.
-
(1993)
Proc. of the Eleventh National Conference on Artificial Intelligence
, pp. 99-105
-
-
Koenig, S.1
Simmons, R.G.2
-
37
-
-
84898938510
-
Actor-critic algorithms
-
S. A. Solla, T. K. Leen, and K.-R. Müller, editors
-
V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms," In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in neural information processing systems, vol. 12, 2000.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
-
-
Konda, V.R.1
Tsitsiklis, J.N.2
-
39
-
-
0003691637
-
-
Prentice Hall, Englewood Cliffs, NJ
-
P. R. Kumar and P. P. Varaiya, Stochastic Systems: Estimation, Identification, and Adaptive Control, Prentice Hall, Englewood Cliffs, NJ, 1986.
-
(1986)
Stochastic Systems: Estimation, Identification, and Adaptive Control
-
-
Kumar, P.R.1
Varaiya, P.P.2
-
40
-
-
4544228767
-
Simulation-based dynamic programming strategy for improvement of control policies
-
San Francisco, CA, paper 43 8c
-
J. M. Lee, N. S. Kaisare, and J. H. Lee, "Simulation-based dynamic programming strategy for improvement of control policies," AIChE Annual Meeting, San Francisco, CA, paper 43 8c, 2003.
-
(2003)
AIChE Annual Meeting
-
-
Lee, J.M.1
Kaisare, N.S.2
Lee, J.H.3
-
41
-
-
2942665140
-
Neuro-dynamic programming approach to dual control problem
-
Reno, NV, paper 276e
-
J. M. Lee and J. H. Lee, "Neuro-dynamic programming approach to dual control problem," AIChE Annual Meeting, Reno, NV, paper 276e, 2001.
-
(2001)
AIChE Annual Meeting
-
-
Lee, J.M.1
Lee, J.H.2
-
42
-
-
4544285548
-
Approximate dynamic programming based approaches for input-output data-driven control of nonlinear processes
-
Submitted
-
J. M. Lee and J. H. Lee, "Approximate dynamic programming based approaches for input-output data-driven control of nonlinear processes," Automatica, 2004. Submitted.
-
(2004)
Automatica
-
-
Lee, J.M.1
Lee, J.H.2
-
43
-
-
2942655578
-
Simulation-based learning of cost-to-go for control of nonlinear processes
-
J. M. Lee and J. H. Lee, "Simulation-based learning of cost-to-go for control of nonlinear processes," Korean J. Chem. Eng., vol. 21, no. 2, pp. 338-344, 2004.
-
(2004)
Korean J. Chem. Eng.
, vol.21
, Issue.2
, pp. 338-344
-
-
Lee, J.M.1
Lee, J.H.2
-
44
-
-
0026923530
-
A neural network architecture that computes its own reliability
-
J. A. Leonard, M. A. Kramer, and L. H. Ungar, "A neural network architecture that computes its own reliability," Computers & Chemical Engineering, vol. 16, pp. 819-835, 1992.
-
(1992)
Computers & Chemical Engineering
, vol.16
, pp. 819-835
-
-
Leonard, J.A.1
Kramer, M.A.2
Ungar, L.H.3
-
45
-
-
0000123778
-
Self-improving reactive agents based on reinforcement learning, plannin and teaching
-
L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, plannin and teaching." Machine Learning, vol. 8, pp. 293-321, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 293-321
-
-
Lin, L.-J.1
-
46
-
-
0026880130
-
Automatic programming of behavior-based robots using rein-forcement learning
-
S. Mahadevan and J. Connell, "Automatic programming of behavior-based robots using rein-forcement learning," Machine Learning, vol. 55, no. 2-3, pp. 311-365, 1992.
-
(1992)
Machine Learning
, vol.55
, Issue.2-3
, pp. 311-365
-
-
Mahadevan, S.1
Connell, J.2
-
48
-
-
0001257766
-
Linear programming and sequential decisions
-
A. S. Manne, "Linear programming and sequential decisions," Management Science, vol. 6, no. 3, pp. 259-267, 1960.
-
(1960)
Management Science
, vol.6
, Issue.3
, pp. 259-267
-
-
Manne, A.S.1
-
49
-
-
0035249254
-
Simulation-based optimization of Markov reward processes
-
P. Marbach and J. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. on Automatic Control, vol. 46, no. 2, pp. 191-209, 2001.
-
(2001)
IEEE Trans. on Automatic Control
, vol.46
, Issue.2
, pp. 191-209
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
50
-
-
0034662259
-
Batch process modeling for optimization using reinforcement learning
-
E. C. Martinez, "Batch process modeling for optimization using reinforcement learning," Computers & Chemical Engineering, vol. 24, pp. 1187-1193, 2000.
-
(2000)
Computers & Chemical Engineering
, vol.24
, pp. 1187-1193
-
-
Martinez, E.C.1
-
51
-
-
0345494056
-
Temporal difference learning: A chemical process control application
-
A. F. Murray, editor, Kluwer, Norwell, MA
-
S. Miller and R. J. Williams, "Temporal difference learning: A chemical process control application," In A. F. Murray, editor, Applications of Artificial Neural Networks, Kluwer, Norwell, MA, 1995.
-
(1995)
Applications of Artificial Neural Networks
-
-
Miller, S.1
Williams, R.J.2
-
52
-
-
0029514510
-
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces
-
A. Moore and C. Atkeson, "The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces. Machine Learning, vol. 21, no. 3, pp. 199-233, 1995.
-
(1995)
Machine Learning
, vol.21
, Issue.3
, pp. 199-233
-
-
Moore, A.1
Atkeson, C.2
-
54
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less time
-
A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Machine Learning, vol. 13, pp. 103-130, 1993.
-
(1993)
Machine Learning
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
55
-
-
0033135677
-
Model predictive control: Past, present and future
-
M. Morari and J. H. Lee, "Model predictive control: Past, present and future," Computers & Chemical Engineering, vol. 23, pp. 667-682, 1999.
-
(1999)
Computers & Chemical Engineering
, vol.23
, pp. 667-682
-
-
Morari, M.1
Lee, J.H.2
-
56
-
-
0039225090
-
A convergent reinforcement learning algorithm in the continuous case based on a finite difference method
-
R. Munos, "A convergent reinforcement learning algorithm in the continuous case based on a finite difference method," Proc. of the International Joint Conference on Artificial Intelligence, 1997.
-
(1997)
Proc. of the International Joint Conference on Artificial Intelligence
-
-
Munos, R.1
-
57
-
-
0034274415
-
A study of reinforcement learning in the continuous case by means of viscosity solutions
-
R. Munos, "A study of reinforcement learning in the continuous case by means of viscosity solutions," Machine Learning Journal, vol. 40, pp. 265-299, 2000.
-
(2000)
Machine Learning Journal
, vol.40
, pp. 265-299
-
-
Munos, R.1
-
58
-
-
85046267451
-
Enhancing Q-leaming for optimal asset allocation
-
M. Jordan, M. Kearns, and S. Solla, editors
-
R. Neuneier, "Enhancing Q-leaming for optimal asset allocation," In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems, vol. 10, 1997.
-
(1997)
Advances in Neural Information Processing Systems
, vol.10
-
-
Neuneier, R.1
-
59
-
-
0036804005
-
Kernel-based reinforcement learning in average-cost problems
-
D. Ormoneit and P. W. Glynn, "Kernel-based reinforcement learning in average-cost problems," IEEE Trans. on Automatic Control, vol. 47, no. 10, pp. 1624-1636, 2002.
-
(2002)
IEEE Trans. on Automatic Control
, vol.47
, Issue.10
, pp. 1624-1636
-
-
Ormoneit, D.1
Glynn, P.W.2
-
60
-
-
0036832956
-
Kernel-based reinforcement learning
-
D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Machine Learning, vol. 49, pp. 161-178, 2002.
-
(2002)
Machine Learning
, vol.49
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
-
61
-
-
0001473437
-
On estimation of a probability density function and mode
-
E. Parzen, "On estimation of a probability density function and mode," Ann. Math. Statist., vol. 33, pp. 1065-1076, 1962.
-
(1962)
Ann. Math. Statist.
, vol.33
, pp. 1065-1076
-
-
Parzen, E.1
-
62
-
-
0010932382
-
-
PhD thesis, North-eastern University, Boston, MA
-
J. Peng, Efficient Dynamic Programming-Based Learning for Control, PhD thesis, North-eastern University, Boston, MA, 1993.
-
(1993)
Efficient Dynamic Programming-based Learning for Control
-
-
Peng, J.1
-
63
-
-
84977063352
-
Efficient learning and planning within the Dyna framework
-
J. Peng and R. J. Williams, "Efficient learning and planning within the Dyna framework," Adaptive Behavior, vol. 1, no. 4. pp. 437-454, 1993.
-
(1993)
Adaptive Behavior
, vol.1
, Issue.4
, pp. 437-454
-
-
Peng, J.1
Williams, R.J.2
-
64
-
-
0031236002
-
Adaptive critic designs
-
September
-
D. V. Prokhorov and D. C. Wunsch II, "Adaptive critic designs," IEEE Trans. on Neural Networks, vol. 8, no. 5, pp. 997-1007, September 1997.
-
(1997)
IEEE Trans. on Neural Networks
, vol.8
, Issue.5
, pp. 997-1007
-
-
Prokhorov, D.V.1
Wunsch II, D.C.2
-
66
-
-
0041802770
-
A survey of industrial model predictive control technology
-
S. J. Qin and T. A. Badgwell, "A survey of industrial model predictive control technology," Control Engineering Practice, vol. 11, no. 7, pp. 733-764, 2003.
-
(2003)
Control Engineering Practice
, vol.11
, Issue.7
, pp. 733-764
-
-
Qin, S.J.1
Badgwell, T.A.2
-
70
-
-
0001201756
-
Some studies in machine learning using the game of checkers
-
A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM J. Res. Develop., pp. 210-229, 1959.
-
(1959)
IBM J. Res. Develop.
, pp. 210-229
-
-
Samuel, A.L.1
-
71
-
-
0001201757
-
Some studies in machine learning using the game of checkers II - Recent progress
-
A. L. Samuel, "Some studies in machine learning using the game of checkers II - recent progress," IBM J. Res. Develop., pp. 601-617, 1967.
-
(1967)
IBM J. Res. Develop.
, pp. 601-617
-
-
Samuel, A.L.1
-
72
-
-
0031231885
-
Experiments with reinforcement learning in problems with continuous state and action spaces
-
J. C. Santamaria, R. S. Sutton, and A. Ram, "Experiments with reinforcement learning in problems with continuous state and action spaces," Adaptive Behavior, vol. 6, no. 2, pp. 163-217, 1997.
-
(1997)
Adaptive Behavior
, vol.6
, Issue.2
, pp. 163-217
-
-
Santamaria, J.C.1
Sutton, R.S.2
Ram, A.3
-
73
-
-
84898995067
-
Learning from demonstration
-
M. C. Mozer, M. Jordan, and T. Petsche, editors
-
S. Schaal, "Learning from demonstration," In M. C. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, vol. 9, pp. 1040-1046, 1997.
-
(1997)
Advances in Neural Information Processing Systems
, vol.9
, pp. 1040-1046
-
-
Schaal, S.1
-
74
-
-
0028374275
-
Robot juggling: An implementation of memory-based learning
-
S. Schaal and C. Atkeson, "Robot juggling: An implementation of memory-based learning," IEEE Control Systems, vol. 14, no. 1, pp. 57-71, 1994.
-
(1994)
IEEE Control Systems
, vol.14
, Issue.1
, pp. 57-71
-
-
Schaal, S.1
Atkeson, C.2
-
75
-
-
0000433333
-
Temporal difference learning of position evaluation in the game of Go
-
J. D. Cowan, G. Tesauro, and J. Alspector, editors
-
N. N. Schraudolph, P. Dayan, and T. J. Sejnowski, "Temporal difference learning of position evaluation in the game of Go," In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, vol. 6, pp. 817-824, 1994.
-
(1994)
Advances in Neural Information Processing Systems
, vol.6
, pp. 817-824
-
-
Schraudolph, N.N.1
Dayan, P.2
Sejnowski, T.J.3
-
76
-
-
84898972974
-
Reinforcement learning for dynamic channel allocation in cellular telephone systems
-
M. C. Mozer, M. I. Jordan, and T. Petsche, editors
-
S. Singh and D. Bertsekas, "Reinforcement learning for dynamic channel allocation in cellular telephone systems," In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, vol. 9, pp. 974-980, 1997.
-
(1997)
Advances in Neural Information Processing Systems
, vol.9
, pp. 974-980
-
-
Singh, S.1
Bertsekas, D.2
-
77
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
S. P. Singh and R. S. Sutton, "Reinforcement learning with replacing eligibility traces," Machine Learning, vol. 22, pp. 123-158, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 123-158
-
-
Singh, S.P.1
Sutton, R.S.2
-
79
-
-
84898939480
-
Policy gradient methods for reinforce-ment learning with function approximation
-
S. A. Solla, T. K. Leen, and K.-R. Muller, editors
-
R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforce-ment learning with function approximation," In S. A. Solla, T. K. Leen, and K.-R. Muller, editors, Advances in Neural Information Processing Systems, vol. 12, pp. 1057-1063, 2000.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
80
-
-
0003617454
-
-
PhD thesis, University of Massachusetts, Amherst, MA
-
R. S. Sutton, Temporal Credit Assignment in Reinforcement Learning, PhD thesis, University of Massachusetts, Amherst, MA, 1984.
-
(1984)
Temporal Credit Assignment in Reinforcement Learning
-
-
Sutton, R.S.1
-
81
-
-
33847202724
-
Learning to predict by the method of temporal differences
-
R. S. Sutton, "Learning to predict by the method of temporal differences," Machine Learning, vol. 3.no. 1, pp. 9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
82
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Austin, TX
-
R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," Proc. of the Seventh International Conference on Machine Learning, Austin, TX, 1990.
-
(1990)
Proc. of the Seventh International Conference on Machine Learning
-
-
Sutton, R.S.1
-
84
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors
-
R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, pp. 1038-1044, 1996.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1044
-
-
Sutton, R.S.1
-
85
-
-
0019537951
-
Toward a modern theory of adaptive networks: Expectation and prediction
-
R. S. Sutton and A. G. Barto, "Toward a modern theory of adaptive networks: Expectation and prediction," Psycol. Rev., vol. 88, no. 2, pp. 135-170, 1981.
-
(1981)
Psycol. Rev.
, vol.88
, Issue.2
, pp. 135-170
-
-
Sutton, R.S.1
Barto, A.G.2
-
87
-
-
0034499835
-
Enhanced continuous valued Q-learning for real autonomous robots
-
M. Takeda, T. Nakamura, M. Imai, T. Ogasawara, and M. Asada, "Enhanced continuous valued Q-learning for real autonomous robots," Advanced Robotics, vol. 14, no. 5, pp. 439-442, 2000.
-
(2000)
Advanced Robotics
, vol.14
, Issue.5
, pp. 439-442
-
-
Takeda, M.1
Nakamura, T.2
Imai, M.3
Ogasawara, T.4
Asada, M.5
-
88
-
-
0001046225
-
Practical issues in temporal difference learning
-
G. Tesauro, "Practical issues in temporal difference learning," Machine Learning, vol. 8, pp. 257-277, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 257-277
-
-
Tesauro, G.1
-
89
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural Computation, vol. 6, no. 2, pp. 215-219, 1994.
-
(1994)
Neural Computation
, vol.6
, Issue.2
, pp. 215-219
-
-
Tesauro, G.1
-
90
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
G. Tesauro, "Temporal difference learning and TD-Gammon," Communications of the ACM, vol. 38, no. 3, pp. 58-67, 1995.
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
, pp. 58-67
-
-
Tesauro, G.1
-
91
-
-
0003215153
-
Learning to play the game of chess
-
G. Tesauro, D. S. Touretzky, and T. K. Leen, editors
-
S. Thrun, "Learning to play the game of chess," In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems, vol. 7, 1995.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
-
-
Thrun, S.1
-
93
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Machine Learning, vol. 16, pp. 185-202, 1994.
-
(1994)
Machine Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
94
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997.
-
(1997)
IEEE Trans. on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
95
-
-
4544283129
-
Neuro-dynamic programming: Overview and recent trends
-
E. Feinberg and A. Shwartz, editors, Kluwer, Boston, MA
-
B. Van Roy, "Neuro-dynamic programming: Overview and recent trends," In E. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, Kluwer, Boston, MA, 2001.
-
(2001)
Handbook of Markov Decision Processes: Methods and Applications
-
-
Van Roy, B.1
-
98
-
-
0002557583
-
Advanced forecasting methods for global crisis warning and models of intelligence'
-
P. J. Werbos, "Advanced forecasting methods for global crisis warning and models of intelligence'," General Systems Yearbook, vol. 22, pp. 25-38, 1977.
-
(1977)
General Systems Yearbook
, vol.22
, pp. 25-38
-
-
Werbos, P.J.1
-
99
-
-
0002031779
-
Approximate dynamic programming for real-time control and neural modeling
-
D. A. White and D. A. Sofge, editors, Van Nostrand Reinhold, New York
-
P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, pp. 493-525, 1992.
-
(1992)
Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches
, pp. 493-525
-
-
Werbos, P.J.1
-
101
-
-
0039967456
-
Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems
-
Northeastern University, College of Computer Science, Boston, MA
-
R. J. Williams and L. C. Baird III, "Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems," Technical Report NU-CCS-93-14, Northeastern University, College of Computer Science, Boston, MA, 1993.
-
(1993)
Technical Report
, vol.NU-CCS-93-14
-
-
Williams, R.J.1
Baird III, L.C.2
-
102
-
-
0008612661
-
Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation
-
J. A. Wilson and E. C. Martinez, "Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation," Computers & Chemical Engineering, vol. 21S, pp. S1233-S1238, 1997.
-
(1997)
Computers & Chemical Engineering
, vol.21 S
-
-
Wilson, J.A.1
Martinez, E.C.2
-
103
-
-
4544314315
-
Stochastic control problems
-
B. Friedland, editor, ASME, New York
-
M. Wonham, "Stochastic control problems," In B. Friedland, editor, Stochastic Problems in Control, ASME, New York, 1968.
-
(1968)
Stochastic Problems in Control
-
-
Wonham, M.1
-
105
-
-
0005914572
-
-
W. Zhang, Reinforcement Learning for Job-Shop Scheduling, PhD thesis, Oregon State University, 1996. Also available as Technical Report CS-96-30-1.
-
Technical Report
, vol.CS-96-30-1
-
-
-
106
-
-
84918834208
-
A reinforcement learning approach to job-shop scheduling
-
San Francisco, CA
-
W. Zhang and T. G. Dietterich, "A reinforcement learning approach to job-shop scheduling," Proc. of the Twelfth International Conference on Machine Learning, San Francisco, CA, pp. 1114-1120, 1995.
-
(1995)
Proc. of the Twelfth International Conference on Machine Learning
, pp. 1114-1120
-
-
Zhang, W.1
Dietterich, T.G.2
-
107
-
-
0001648572
-
Highperformance job-shop scheduling with a time-delay TD(λ) network
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors
-
W. Zhang and T. G. Dietterich, "Highperformance job-shop scheduling with a time-delay TD(λ) network," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, 1996.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
-
-
Zhang, W.1
Dietterich, T.G.2
|