-
1
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Morgan Kaufmann, San Francisco
-
Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 30-37. Morgan Kaufmann, San Francisco.
-
(1995)
Proceedings of the Twelfth International Conference on Machine Learning
, pp. 30-37
-
-
Baird, L.C.1
-
3
-
-
85156187730
-
Improving elevator performance using reinforcement learning
-
MIT Press, Cambridge, MA
-
Crites, R. H., and Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1017-1023. MIT Press, Cambridge, MA.
-
(1996)
Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference
, pp. 1017-1023
-
-
Crites, R.H.1
Barto, A.G.2
-
5
-
-
0012327484
-
Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes
-
Morgan Kaufmann, San Francisco
-
Loch J., and Singh S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco.
-
(1998)
Proceedings of the Fifteenth International Conference on Machine Learning
-
-
Loch, J.1
Singh, S.2
-
6
-
-
0029752592
-
Average reward reinforcement learning: Foundations, algorithms, and empirical results
-
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22: 159-196.
-
(1996)
Machine Learning
, vol.22
, pp. 159-196
-
-
Mahadevan, S.1
-
7
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less real time
-
Moore, A. W., and Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13: 103-130.
-
(1993)
Machine Learning
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
8
-
-
0003824303
-
-
Ph. D. thesis, University of Massachusetts, Amherst. Appeared as CMPSCI Technical Report 93-77
-
Singh, S. P. (1993). Learning to Solve Markovian Decision Processes. Ph. D. thesis, University of Massachusetts, Amherst. Appeared as CMPSCI Technical Report 93-77.
-
(1993)
Learning to Solve Markovian Decision Processes
-
-
Singh, S.P.1
-
9
-
-
84898972974
-
Reinforcement learning for dynamic channel allocation in cellular telephone systems
-
MIT Press, Cambridge, MA
-
Singh, S. P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pp. 974-980. MIT Press, Cambridge, MA.
-
(1997)
Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference
, pp. 974-980
-
-
Singh, S.P.1
Bertsekas, D.2
-
10
-
-
0032114627
-
Analytical mean squared error curves for temporal difference learning
-
Singh S., and Dayan P. (1998). Analytical mean squared error curves for temporal difference learning. Machine Learning.
-
(1998)
Machine Learning
-
-
Singh, S.1
Dayan, P.2
-
11
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
Singh, S. P., and Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22: 123-158.
-
(1996)
Machine Learning
, vol.22
, pp. 123-158
-
-
Singh, S.P.1
Sutton, R.S.2
-
13
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
MIT Press, Cambridge, MA
-
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1038-1044. MIT Press, Cambridge, MA.
-
(1996)
Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference
, pp. 1038-1044
-
-
Sutton, R.S.1
-
15
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38: 58-68.
-
(1995)
Communications of the ACM
, vol.38
, pp. 58-68
-
-
Tesauro, G.J.1
-
16
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42: 674-690.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
|