-
2
-
-
84880690842
-
Bounding the suboptimality of reusing subproblems
-
Stockholm, Sweden. Morgan Kaufman
-
Bowling, M., & Veloso, M. (1999). Bounding the suboptimality of reusing subproblems. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence, pp. 1340-1345, Stockholm, Sweden. Morgan Kaufman.
-
(1999)
Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
, pp. 1340-1345
-
-
Bowling, M.1
Veloso, M.2
-
4
-
-
0036531878
-
Multiagent learning using a variable learning rate
-
Bowling, M., & Veloso, M. (2002). Multiagent learning using a variable learning rate. Artificial Intelligence, 136, 215-250.
-
(2002)
Artificial Intelligence
, vol.136
, pp. 215-250
-
-
Bowling, M.1
Veloso, M.2
-
7
-
-
0031630561
-
The dynamics of reinforcement learning in cooperative multiagent systems
-
Menlo Park, CA. AAAI Press
-
Claus, C., & Boutilier, C. (1998). The dynamics of reinforcement learning in cooperative multiagent systems. In Proceedings of the Fifteenth National Conference on Artificial Intelligence, pp. 746-752, Menlo Park, CA. AAAI Press.
-
(1998)
Proceedings of the Fifteenth National Conference on Artificial Intelligence
, pp. 746-752
-
-
Claus, C.1
Boutilier, C.2
-
11
-
-
38249006045
-
Bounded versus unbounded rationality: The tyranny of the weak
-
Gilboa, I., & Samet, D. (1989). Bounded versus unbounded rationality: The tyranny of the weak. Games and Economic Behavior, 213-221.
-
(1989)
Games and Economic Behavior
, pp. 213-221
-
-
Gilboa, I.1
Samet, D.2
-
14
-
-
0006419533
-
Hierarchical solution of Markov decision processes using macro-actions
-
Hauskrecht, M., Meuleau, N., Kaelbling, L. P., Dean, T., & Boutilier, C. (1998). Hierarchical solution of Markov decision processes using macro-actions. In Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98).
-
(1998)
Proceedings of the Fourteenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-98)
-
-
Hauskrecht, M.1
Meuleau, N.2
Kaelbling, L.P.3
Dean, T.4
Boutilier, C.5
-
18
-
-
0000619048
-
Extensive games and the problem of information
-
Kuhn, H. W., & Tucker, A. W. (Eds.), Princeton University Press. Reprinted in (Kuhn, 1997)
-
Kuhn, H. W. (1953). Extensive games and the problem of information. In Kuhn, H. W., & Tucker, A. W. (Eds.), Contributions to the Theory of Games II, pp. 193-216. Princeton University Press. Reprinted in (Kuhn, 1997).
-
(1953)
Contributions to the Theory of Games II
, pp. 193-216
-
-
Kuhn, H.W.1
-
20
-
-
0035501436
-
Bargaining with limited computation: Deliberation equilibrium
-
Larson, K., & Sandholm, T. (2001). Bargaining with limited computation: Deliberation equilibrium. Artificial Intelligence, 132(2), 183-217.
-
(2001)
Artificial Intelligence
, vol.132
, Issue.2
, pp. 183-217
-
-
Larson, K.1
Sandholm, T.2
-
24
-
-
0001961616
-
A generalized reinforcement-learning model: Convergence and applications
-
Bari, Italy. Morgan Kaufmann
-
Littman, M. L., & Szepesvári, G. (1996). A generalized reinforcement-learning model: Convergence and applications. In Proceedings of the 13th International Conference on Machine Learning, pp. 310-318, Bari, Italy. Morgan Kaufmann.
-
(1996)
Proceedings of the 13th International Conference on Machine Learning
, pp. 310-318
-
-
Littman, M.L.1
Szepesvári, G.2
-
25
-
-
0029752592
-
Average reward reinforcement learning: Foundations, algorithms, and empirical results
-
Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22, 159-196.
-
(1996)
Machine Learning
, vol.22
, pp. 159-196
-
-
Mahadevan, S.1
-
29
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less time
-
Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 13, 103-130.
-
(1993)
Machine Learning
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
30
-
-
0002021736
-
Equilibrium points in n-person games
-
Reprinted in (Kuhn, 1997)
-
Nash, Jr., J. F. (1950). Equilibrium points in n-person games. PNAS, 36, 48-49. Reprinted in (Kuhn, 1997).
-
(1950)
PNAS
, vol.36
, pp. 48-49
-
-
Nash Jr., J.F.1
-
31
-
-
84898967780
-
Policy search via density estimation
-
MIT Press
-
Ng, A. Y., Parr, R., & Koller, D. (1999). Policy search via density estimation. In Advances in Neural Information Processing Systems 12, pp. 1022-1028. MIT Press.
-
(1999)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1022-1028
-
-
Ng, A.Y.1
Parr, R.2
Koller, D.3
-
32
-
-
0032021222
-
Soccer server: A tool for research on multi-agent systems
-
Noda, I., Matsubara, H., Hiraki, K., & Frank, I. (1998). Soccer server: a tool for research on multi-agent systems. Applied Artificial Intelligence, 12, 233-250.
-
(1998)
Applied Artificial Intelligence
, vol.12
, pp. 233-250
-
-
Noda, I.1
Matsubara, H.2
Hiraki, K.3
Frank, I.4
-
34
-
-
0345108843
-
Games with procedurally rational players
-
McMaster University, Department of Economics
-
Osborne, M. J., & Rubinstein, A. (1997). Games with procedurally rational players. Working papers 9702, McMaster University, Department of Economics.
-
(1997)
Working Papers
, vol.9702
-
-
Osborne, M.J.1
Rubinstein, A.2
-
37
-
-
0001402950
-
An iterative method of solving a game
-
Reprinted in (Kuhn, 1997)
-
Robinson, J. (1951). An iterative method of solving a game. Annals of Mathematics, 54, 296-301. Reprinted in (Kuhn, 1997).
-
(1951)
Annals of Mathematics
, vol.54
, pp. 296-301
-
-
Robinson, J.1
-
38
-
-
0018922522
-
Existence and uniqueness of equilibrium points for concave n-person games
-
Rosen, J. B. (1965). Existence and uniqueness of equilibrium points for concave n-person games. Econometrica, 33, 520-534.
-
(1965)
Econometrica
, vol.33
, pp. 520-534
-
-
Rosen, J.B.1
-
41
-
-
0000392613
-
Stochastic games
-
Reprinted in (Kuhn, 1997)
-
Shapley, L. S. (1953). Stochastic games. PNAS, 39, 1095-1100. Reprinted in (Kuhn, 1997).
-
(1953)
PNAS
, vol.39
, pp. 1095-1100
-
-
Shapley, L.S.1
-
42
-
-
0002298346
-
From substantive to procedural rationality
-
Latis, S. J. (Ed.), Cambridge University Press, New York
-
Simon, H. A. (1976). From substantive to procedural rationality. In Latis, S. J. (Ed.), Methods and Appraisals in Economics, pp. 129-148. Cambridge University Press, New York.
-
(1976)
Methods and Appraisals in Economics
, pp. 129-148
-
-
Simon, H.A.1
-
44
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
Singh, S., Jaakkola, T., Littman, M. L., & Szepesvári, C. (2000). Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning.
-
(2000)
Machine Learning
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
Szepesvári, C.4
-
45
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
MIT Press
-
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, pp. 1057-1063. MIT Press.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
46
-
-
0002260073
-
Intra-option learning about temporally abstract actions
-
San Francisco. Morgan Kaufman
-
Sutton, R. S., Precup, D., & Singh, S. (1998). Intra-option learning about temporally abstract actions. In Proceedings of the Fifteenth International Conference on Machine Learning, pp. 556-564, San Francisco. Morgan Kaufman.
-
(1998)
Proceedings of the Fifteenth International Conference on Machine Learning
, pp. 556-564
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
48
-
-
0004196515
-
Adversarial reinforcement learning
-
Carnegie Mellon University. Unpublished
-
Uther, W., & Veloso, M. (1997). Adversarial reinforcement learning. Tech. rep., Carnegie Mellon University. Unpublished.
-
(1997)
Tech. Rep.
-
-
Uther, W.1
Veloso, M.2
-
49
-
-
23144434851
-
-
Ph.D. thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. Available as technical report CMU-CS-02-169
-
Uther, W. T. B. (2002). Tree Based Hierarchical Reinforcement Learning. Ph.D. thesis, Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. Available as technical report CMU-CS-02-169.
-
(2002)
Tree Based Hierarchical Reinforcement Learning
-
-
Uther, W.T.B.1
-
52
-
-
0012252296
-
Tight performance bounds on greedy policies based on imperfect value functions
-
College of Computer Science, Northeastern University
-
Williams, R. J., & Baird, L. C. (1993). Tight performance bounds on greedy policies based on imperfect value functions. Technical report, College of Computer Science, Northeastern University.
-
(1993)
Technical Report
-
-
Williams, R.J.1
Baird, L.C.2
|