-
1
-
-
0009056093
-
Policy-gradient learning of controllers with internal state
-
Australian National University
-
Aberdeen, D., & Baxter, J. (2001). Policy-gradient learning of controllers with internal state. Tech. rep., Australian National University.
-
(2001)
Tech. Rep.
-
-
Aberdeen, D.1
Baxter, J.2
-
3
-
-
2542506169
-
Hebbian synaptic modifications in spiking neurons that learn
-
Research School of Information Sciences and Engineering, Australian National University
-
Bartlett, P. L., & Baxter, J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Tech. rep., Research School of Information Sciences and Engineering, Australian National University. http://csl.anu.edu.au/∼bartlett/papers/BartlettBaxter-Nov99.ps.gz.
-
(1999)
Tech. Rep.
-
-
Bartlett, P.L.1
Baxter, J.2
-
6
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.SMC-13
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
9
-
-
0034275416
-
Learning to play chess using temporal-differences
-
Baxter, J., Tridgell, A., & Weaver, L. (2000). Learning to play chess using temporal-differences. Machine Learning, 40(3), 243-263.
-
(2000)
Machine Learning
, vol.40
, Issue.3
, pp. 243-263
-
-
Baxter, J.1
Tridgell, A.2
Weaver, L.3
-
11
-
-
0031258478
-
Perturbation Realization, Potentials, and Sensitivity Analysis of Markov Processes
-
Cao, X.-R., & Chen, H.-F. (1997). Perturbation Realization, Potentials, and Sensitivity Analysis of Markov Processes. IEEE Transactions on Automatic Control, 42, 1382-1393.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, pp. 1382-1393
-
-
Cao, X.-R.1
Chen, H.-F.2
-
12
-
-
0032122986
-
Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization
-
Cao, X.-R., & Wan, Y.-W. (1998). Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization. IEEE Transactions on Control Systems Technology, 6, 482-492.
-
(1998)
IEEE Transactions on Control Systems Technology
, vol.6
, pp. 482-492
-
-
Cao, X.-R.1
Wan, Y.-W.2
-
14
-
-
0028444151
-
Smooth Perturbation Derivative Estimation for Markov Chains
-
Fu, M. C., & Hu, J. (1994). Smooth Perturbation Derivative Estimation for Markov Chains. Operations Research Letters, 15, 241-251.
-
(1994)
Operations Research Letters
, vol.15
, pp. 241-251
-
-
Fu, M.C.1
Hu, J.2
-
16
-
-
0008336447
-
An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
-
Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. In Fifteenth International Conference on Machine Learning, pp. 278-286.
-
(1998)
Fifteenth International Conference on Machine Learning
, pp. 278-286
-
-
Kimura, H.1
Kobayashi, S.2
-
17
-
-
0001251942
-
Reinforcement learning in POMDPs with function approximation
-
Fisher, D. H. (Ed.)
-
Kimura, H., Miyazaki, K., & Kobayashi, S. (1997). Reinforcement learning in POMDPs with function approximation. In Fisher, D. H. (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pp. 152-160.
-
(1997)
Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97)
, pp. 152-160
-
-
Kimura, H.1
Miyazaki, K.2
Kobayashi, S.3
-
18
-
-
85152624316
-
Reinforcement learning by stochastic hill climbing on discounted reward
-
Kimura, H., Yamamura, M., & Kobayashi, S. (1995). Reinforcement learning by stochastic hill climbing on discounted reward. In Proceedings of the Twelfth International Conference on Machine Learning (ICML'95), pp. 295-303.
-
(1995)
Proceedings of the Twelfth International Conference on Machine Learning (ICML'95)
, pp. 295-303
-
-
Kimura, H.1
Yamamura, M.2
Kobayashi, S.3
-
21
-
-
0009011171
-
Simulation-Based Optimization of Markov Reward Processes
-
MIT
-
Marbach, P., & Tsitsiklis, J. N. (1998). Simulation-Based Optimization of Markov Reward Processes. Tech. rep., MIT.
-
(1998)
Tech. Rep.
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
23
-
-
0001201756
-
Some Studies in Machine Learning Using the Game of Checkers
-
Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3, 210-229.
-
(1959)
IBM Journal of Research and Development
, vol.3
, pp. 210-229
-
-
Samuel, A.L.1
-
26
-
-
0000224681
-
Reinforcement learning with soft state aggregation
-
Tesauro, G., Touretzky, D., & Leen, T. (Eds.), MIT Press, Cambridge, MA
-
Singh, S., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 7. MIT Press, Cambridge, MA.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
-
-
Singh, S.1
Jaakkola, T.2
Jordan, M.3
-
27
-
-
33847202724
-
Learning to Predict by the Method of Temporal Differences
-
Sutton, R. (1988). Learning to Predict by the Method of Temporal Differences. Machine Learning, 3, 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.1
-
29
-
-
84898939480
-
Policy Gradient Methods for Reinforcement Learning with Function Approximation
-
MIT Press
-
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Neural Information Processing Systems 1999. MIT Press.
-
(2000)
Neural Information Processing Systems 1999
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
30
-
-
13444294406
-
A multi-agent, policy-gradient approach to network routing
-
Australian National University
-
Tao, N., Baxter, J., & Weaver, L. (2001). A multi-agent, policy-gradient approach to network routing. Tech. rep., Australian National University.
-
(2001)
Tech. Rep.
-
-
Tao, N.1
Baxter, J.2
Weaver, L.3
-
31
-
-
0001046225
-
Practical Issues in Temporal Difference Learning
-
Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning, 8, 257-278.
-
(1992)
Machine Learning
, vol.8
, pp. 257-278
-
-
Tesauro, G.1
-
32
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
-
(1994)
Neural Computation
, vol.6
, pp. 215-219
-
-
Tesauro, G.1
-
33
-
-
24044492128
-
Reinforcement learning from state and temporal differences
-
Australian National University
-
Weaver, L., & Baxter, J. (1999). Reinforcement learning from state and temporal differences. Tech. rep., Australian National University.
-
(1999)
Tech. Rep.
-
-
Weaver, L.1
Baxter, J.2
-
34
-
-
0000337576
-
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
-
Williams, R. J. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 229-256.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
|