-
4
-
-
0041646629
-
-
Technical Report LIDS-P-2307, MIT, Boston, MA
-
D. Bertsekas, A new value-iteration method for the average cost dynamic programming problem, Technical Report LIDS-P-2307, MIT, Boston, MA, 1995.
-
(1995)
A New Value-iteration Method for the Average Cost Dynamic Programming Problem
-
-
Bertsekas, D.1
-
6
-
-
0003315455
-
Dynamic Programming and Optimal Control
-
Belmont, MA
-
D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, Belmont, MA, 1995.
-
(1995)
Athena Scientific
-
-
Bertsekas, D.P.1
-
7
-
-
85166207010
-
Exploiting structure in policy construction
-
Montreal, Que.
-
C. Boutilier, R. Dearden, M. Goldszmidt, Exploiting structure in policy construction, in: Proceedings 14th International Joint Conference on Artificial Intelligence (IJCAI-95), Montreal, Que., 1995.
-
(1995)
Proceedings 14th International Joint Conference on Artificial Intelligence (IJCAI-95)
-
-
Boutilier, C.1
Dearden, R.2
Goldszmidt, M.3
-
9
-
-
0003802343
-
-
Wadsworth International Group, Belmont, MA
-
L. Brieman, J.H. Friedman, R.A. Olshen, C.J. Stone, Classification and Regression Trees, Wadsworth International Group, Belmont, MA, 1984.
-
(1984)
Classification and Regression Trees
-
-
Brieman, L.1
Friedman, J.H.2
Olshen, R.A.3
Stone, C.J.4
-
10
-
-
0003586636
-
-
Little, Brown and Company, Boston, MA
-
G.C. Canavos, Applied Probability and Statistical Methods, Little, Brown and Company, Boston, MA, 1984.
-
(1984)
Applied Probability and Statistical Methods
-
-
Canavos, G.C.1
-
11
-
-
0003259931
-
Improving elevator performance using reinforcement learning
-
MIT Press, Cambridge, MA
-
R.H. Crites, A.G. Barto, Improving elevator performance using reinforcement learning, in: Advances in Neural Information Processing Systems, Vol. 8, MIT Press, Cambridge, MA, 1996.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
-
-
Crites, R.H.1
Barto, A.G.2
-
12
-
-
84990553353
-
A model for reasoning about persistence and causation
-
T. Dean, K. Kanazawa, A model for reasoning about persistence and causation, Computational Intelligence 5 (3) (1989) 142-150.
-
(1989)
Computational Intelligence
, vol.5
, Issue.3
, pp. 142-150
-
-
Dean, T.1
Kanazawa, K.2
-
13
-
-
0015346497
-
The reduced nearest neighbor rule
-
G.W. Gates, The reduced nearest neighbor rule, IEEE Trans. Inform. Theory (1972) 431-433.
-
(1972)
IEEE Trans. Inform. Theory
, pp. 431-433
-
-
Gates, G.W.1
-
14
-
-
84931162639
-
The condensed nearest neighbor rule
-
P.E. Hart, The condensed nearest neighbor rule, IEEE Trans. Inform. Theory 14 (1968) 515-516.
-
(1968)
IEEE Trans. Inform. Theory
, vol.14
, pp. 515-516
-
-
Hart, P.E.1
-
20
-
-
0029751419
-
The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms
-
S. Koenig, R.G Simmons, The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms, Machine Learning 22 (1996) 227-250.
-
(1996)
Machine Learning
, vol.22
, pp. 227-250
-
-
Koenig, S.1
Simmons, R.G.2
-
21
-
-
0000123778
-
Self-improving reactive agents based on reinforcement learning, planning, and teaching
-
L-J. Lin, Self-improving reactive agents based on reinforcement learning, planning, and teaching, Machine Learning 8 (1992) 293-321.
-
(1992)
Machine Learning
, vol.8
, pp. 293-321
-
-
Lin, L.-J.1
-
22
-
-
85138579181
-
Learning policies for partially observable environments: Scaling up
-
San Fransisco, CA
-
M.L. Littman, A. Cassandra, L.P. Kaelbling, Learning policies for partially observable environments: scaling up, in: Proceedings of International Machine Learning Conference, San Fransisco, CA, 1995, pp. 362-370.
-
(1995)
Proceedings of International Machine Learning Conference
, pp. 362-370
-
-
Littman, M.L.1
Cassandra, A.2
Kaelbling, L.P.3
-
24
-
-
0029752592
-
Average reward reinforcement learning: Foundations, algorithms, and empirical results
-
S. Mahadevan, Average reward reinforcement learning: foundations, algorithms, and empirical results, Machine Learning 22 (1996) 159-195.
-
(1996)
Machine Learning
, vol.22
, pp. 159-195
-
-
Mahadevan, S.1
-
25
-
-
0042147543
-
Sensitive discount optimality: Unifying discounted and average reward reinforcement learning
-
Bari, Italy
-
S. Mahadevan, Sensitive discount optimality: Unifying discounted and average reward reinforcement learning, in: Proceedings International Machine Learning Conference, Bari, Italy, 1996.
-
(1996)
Proceedings International Machine Learning Conference
-
-
Mahadevan, S.1
-
26
-
-
0026880130
-
Automatic programming of behavior-based robots using reinforcement learning
-
S. Mahadevan, J. Connell, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55 (1992) 311-365.
-
(1992)
Artificial Intelligence
, vol.55
, pp. 311-365
-
-
Mahadevan, S.1
Connell, J.2
-
28
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less time
-
A.W. Moore, A.G. Atkeson, Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning J. 13 (1993) 103-130.
-
(1993)
Machine Learning J.
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, A.G.2
-
30
-
-
0002154410
-
The data association problem when monitoring robot vehicles using dynamic belief networks
-
Vienna, Austria, Wiley, New York
-
A.E. Nicholson, J.M. Brady, The data association problem when monitoring robot vehicles using dynamic belief networks, in: ECAI 92: 10th European Conference on Artificial Intelligence Proceedings, Vienna, Austria, Wiley, New York, 1992, pp. 689-693.
-
(1992)
ECAI 92: 10th European Conference on Artificial Intelligence Proceedings
, pp. 689-693
-
-
Nicholson, A.E.1
Brady, J.M.2
-
31
-
-
0042147542
-
-
Ph.D. Thesis, Technical Report, 96-30-2, Department of Computer Science, Oregon State University, Corvallis, OR
-
D. Ok, A study of model-based average reward reinforcement learning, Ph.D. Thesis, Technical Report, 96-30-2, Department of Computer Science, Oregon State University, Corvallis, OR, 1996.
-
(1996)
A Study of Model-based Average Reward Reinforcement Learning
-
-
Ok, D.1
-
33
-
-
85168129602
-
Approximating optimal policies for partially observable stochastic domains
-
Seattle, WA
-
R. Parr, S. Russell, Approximating optimal policies for partially observable stochastic domains, in: Proceedings National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, 1994, pp. 1088-1093.
-
(1994)
Proceedings National Conference on Artificial Intelligence (AAAI-94)
, pp. 1088-1093
-
-
Parr, R.1
Russell, S.2
-
35
-
-
0003584577
-
-
Prentice-Hall, Englewood Cliffs, NJ
-
S. Russell, P. Norvig, Artificial Intelligence: A Modern Approach, Prentice-Hall, Englewood Cliffs, NJ, 1995.
-
(1995)
Artificial Intelligence: A Modern Approach
-
-
Russell, S.1
Norvig, P.2
-
36
-
-
0028374275
-
Robot juggling: An implementation of memory-based learning
-
S. Schaal, C. Atkeson, Robot juggling: an implementation of memory-based learning, in: IEEE Control Systems, Vol. 14, 1994, pp. 57-71.
-
(1994)
IEEE Control Systems
, vol.14
, pp. 57-71
-
-
Schaal, S.1
Atkeson, C.2
-
37
-
-
85152626183
-
A reinforcement learning method for maximizing undiscounted rewards
-
Morgan Kaufmann, San Mateo, CA
-
A. Schwartz, A reinforcement learning method for maximizing undiscounted rewards, in: Proceedings 10th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
-
(1993)
Proceedings 10th International Conference on Machine Learning
-
-
Schwartz, A.1
-
38
-
-
0028574683
-
Reinforcement learning algorithms for average-payoff markovian decision processes
-
Seattle, WA, MIT Press, Cambridge, MA
-
S.P. Singh, Reinforcement learning algorithms for average-payoff markovian decision processes, in: Proceedings National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, MIT Press, Cambridge, MA, 1994.
-
(1994)
Proceedings National Conference on Artificial Intelligence (AAAI-94)
-
-
Singh, S.P.1
-
39
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R.S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning 3 (1988) 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
40
-
-
85132026293
-
Integrating architectures for learning, planning and reacting based on approximating dynamic programming
-
Austin, TX
-
R.S. Sutton, Integrating architectures for learning, planning and reacting based on approximating dynamic programming, in: Proceedings Seventh International Conference on Machine Learning, Austin, TX, 1990.
-
(1990)
Proceedings Seventh International Conference on Machine Learning
-
-
Sutton, R.S.1
-
41
-
-
0043149508
-
-
Technical Report 94-30-1, Department of Computer Science, Oregon State University
-
P. Tadepalli, D. Ok, H-learning: A reinforcement learning method for optimizing undiscounted average reward, Technical Report 94-30-1, Department of Computer Science, Oregon State University, 1994.
-
(1994)
H-learning: A Reinforcement Learning Method for Optimizing Undiscounted Average Reward
-
-
Tadepalli, P.1
Ok, D.2
-
42
-
-
0002313852
-
Scaling up average reward reinforcement learning by approximating the domain models and the value function
-
P. Tadepalli, D. Ok, Scaling up average reward reinforcement learning by approximating the domain models and the value function, in: Proceedings 13th International Conference on Machine Learning, 1996.
-
(1996)
Proceedings 13th International Conference on Machine Learning
-
-
Tadepalli, P.1
Ok, D.2
-
43
-
-
85152198941
-
Multi-agent reinforcement learning: Independent vs. cooperative agents
-
Morgan Kaufmann, San Mateo, CA
-
M. Tan, Multi-agent reinforcement learning: independent vs. cooperative agents, in: Proceedings 10th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
-
(1993)
Proceedings 10th International Conference on Machine Learning
-
-
Tan, M.1
-
44
-
-
0001046225
-
Practical issues in temporal difference learning
-
G. Tesauro, Practical issues in temporal difference learning, Machine Learning 8 (3-4) (1992) 257-277.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 257-277
-
-
Tesauro, G.1
-
45
-
-
0002210775
-
The role of exploration in learning control
-
Van Nostrand Reinhold, New York
-
S. Thrun, The role of exploration in learning control, in: Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, 1994.
-
(1994)
Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches
-
-
Thrun, S.1
|