-
1
-
-
0009056093
-
Policy-gradient learning of controllers with internal state
-
Australian National University
-
Aberdeen, D., & Baxter, J. (2001). Policy-gradient learning of controllers with internal state. Tech. rep., Australian National University.
-
(2001)
Tech. Rep.
-
-
Aberdeen, D.1
Baxter, J.2
-
2
-
-
0002686204
-
Stochastic optimaization
-
Aleksandrov, V. M., Sysoyev, V. I., & Shemeneva, V. V. (1968). Stochastic optimaization. Engineering Cybernetics, 5, 11-16.
-
(1968)
Engineering Cybernetics
, vol.5
, pp. 11-16
-
-
Aleksandrov, V.M.1
Sysoyev, V.I.2
Shemeneva, V.V.3
-
4
-
-
2542506169
-
Hebbian synaptic modifications in spiking neurons that learn
-
Research School of Information Sciences and Engineering, Australian National University
-
Bartlett, P. L., & Baxter, J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Tech. rep., Research School of Information Sciences and Engineering, Australian National University. http://csl.anu.edu.au/∼bartlett/papers/BartlettBaxter-Nov99.ps.gz.
-
(1999)
Tech. Rep.
-
-
Bartlett, P.L.1
Baxter, J.2
-
5
-
-
24044553495
-
Estimation and approximation bounds for gradient-based reinforcement learning
-
Invited Paper: Special Issue on COLT 2000
-
Bartlett, P. L., & Baxter, J. (2001). Estimation and approximation bounds for gradient-based reinforcement learning. Journal of Computer and Systems Sciences, 62. Invited Paper: Special Issue on COLT 2000.
-
(2001)
Journal of Computer and Systems Sciences
, vol.62
-
-
Bartlett, P.L.1
Baxter, J.2
-
6
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.SMC-13
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
7
-
-
0013495368
-
Experiments with infinite-horizon, policy-gradient estimation
-
To appear
-
Baxter, J., Bartlett, P. L., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research. To appear.
-
(2001)
Journal of Artificial Intelligence Research
-
-
Baxter, J.1
Bartlett, P.L.2
Weaver, L.3
-
8
-
-
0034275416
-
Learning to play chess using temporal-differences
-
Baxter, J., Tridgell, A., & Weaver, L. (2000). Learning to play chess using temporal-differences. Machine Learning, 40(3), 243-263.
-
(2000)
Machine Learning
, vol.40
, Issue.3
, pp. 243-263
-
-
Baxter, J.1
Tridgell, A.2
Weaver, L.3
-
12
-
-
0032122986
-
Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization
-
Cao, X.-R., & Wan, Y.-W. (1998). Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization. IEEE Transactions on Control Systems Technology, 6, 482-492.
-
(1998)
IEEE Transactions on Control Systems Technology
, vol.6
, pp. 482-492
-
-
Cao, X.-R.1
Wan, Y.-W.2
-
15
-
-
84976859194
-
Likelihood ratio gradient estimation for stochastic systems
-
Glynn, P. W. (1990). Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33, 75-84.
-
(1990)
Communications of the ACM
, vol.33
, pp. 75-84
-
-
Glynn, P.W.1
-
16
-
-
0001354607
-
Likelihood ratio gradient estimation for regenerative stochastic recursions
-
(1995), 27
-
Glynn, P. W., & L'Ecuyer, P. (1995). Likelihood ratio gradient estimation for regenerative stochastic recursions. Advances in Applied Probability, 27, 4 (1995), 27, 1019-1053.
-
(1995)
Advances in Applied Probability
, vol.27
, Issue.4
, pp. 1019-1053
-
-
Glynn, P.W.1
L'Ecuyer, P.2
-
18
-
-
0000624333
-
Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
-
Tesauro, G., Touretzky, D., & Leen, T. (Eds.), MIT Press, Cambridge, MA
-
Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 7. MIT Press, Cambridge, MA.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
-
-
Jaakkola, T.1
Singh, S.P.2
Jordan, M.I.3
-
19
-
-
0008336447
-
An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
-
Kimura, H., & Kobayashi, S. (1998a). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. In Fifteenth International Conference on Machine Learning, pp. 278-286.
-
(1998)
Fifteenth International Conference on Machine Learning
, pp. 278-286
-
-
Kimura, H.1
Kobayashi, S.2
-
20
-
-
18544383841
-
Reinforcement learning for continuous action using stochastic gradient ascent
-
Kimura, H., & Kobayashi, S. (1998b). Reinforcement learning for continuous action using stochastic gradient ascent. In Intelligent Autonomous Systems (IAS-5), pp. 288-295.
-
(1998)
Intelligent Autonomous Systems (IAS-5)
, pp. 288-295
-
-
Kimura, H.1
Kobayashi, S.2
-
21
-
-
0001251942
-
Reinforcement learning in POMDPs with function approximation
-
Fisher, D. H. (Ed.)
-
Kimura, H., Miyazaki, K., & Kobayashi, S. (1997). Reinforcement learning in POMDPs with function approximation. In Fisher, D. H. (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pp. 152-160.
-
(1997)
Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97)
, pp. 152-160
-
-
Kimura, H.1
Miyazaki, K.2
Kobayashi, S.3
-
22
-
-
85152624316
-
Reinforcement learning by stochastic hill climbing on discounted reward
-
Kimura, H., Yamamura, M., & Kobayashi, S. (1995). Reinforcement learning by stochastic hill climbing on discounted reward. In Proceedings of the Twelfth International Conference on Machine Learning (ICML'95), pp. 295-303.
-
(1995)
Proceedings of the Twelfth International Conference on Machine Learning (ICML'95)
, pp. 295-303
-
-
Kimura, H.1
Yamamura, M.2
Kobayashi, S.3
-
25
-
-
0009011171
-
Simulation-Based Optimization of Markov Reward Processes
-
MIT
-
Marbach, P., & Tsitsiklis, J. N. (1998). Simulation-Based Optimization of Markov Reward Processes. Tech. rep., MIT.
-
(1998)
Tech. Rep.
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
26
-
-
24044518063
-
Off-policy policy search
-
MIT Artificical Intelligence Laboratory
-
Meuleau, N., Peshkin, L., Kaelbling, L. P., & Kim, K.-E. (2000). Off-policy policy search. Tech. rep., MIT Artificical Intelligence Laboratory.
-
(2000)
Tech. Rep.
-
-
Meuleau, N.1
Peshkin, L.2
Kaelbling, L.P.3
Kim, K.-E.4
-
27
-
-
0002103968
-
Learning finite-state controllers for partially observable environments
-
Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence.
-
(1999)
Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence
-
-
Meuleau, N.1
Peshkin, L.2
Kim, K.-E.3
Kaelbling, L.P.4
-
28
-
-
0012646255
-
Learning to cooperate via policy search
-
Peshkin, L., Kim, K.-E., Meuleau, N., & Kaelbling, L. P. (2000). Learning to cooperate via policy search. In Proceedings of the Sixteenth International Conference on Uncertainty in Artificial Intelligence.
-
(2000)
Proceedings of the Sixteenth International Conference on Uncertainty in Artificial Intelligence
-
-
Peshkin, L.1
Kim, K.-E.2
Meuleau, N.3
Kaelbling, L.P.4
-
30
-
-
0024735795
-
Sensitivity analysis for simulations via likelihood ratios
-
Reiman, M. I., & Weiss, A. (1989). Sensitivity analysis for simulations via likelihood ratios. Operations Research, 37.
-
(1989)
Operations Research
, vol.37
-
-
Reiman, M.I.1
Weiss, A.2
-
32
-
-
0012260708
-
How to optimize complex stochastic systems from a single sample path by the score function method
-
Rubinstein, R. Y. (1991). How to optimize complex stochastic systems from a single sample path by the score function method. Annals of Operations Research, 27, 175-211.
-
(1991)
Annals of Operations Research
, vol.27
, pp. 175-211
-
-
Rubinstein, R.Y.1
-
33
-
-
24044480220
-
Decomposable score function estimators for sensitivity analysis and optimization of queueing networks
-
Rubinstein, R. Y. (1992). Decomposable score function estimators for sensitivity analysis and optimization of queueing networks. Annals of Operations Research, 39, 195-229.
-
(1992)
Annals of Operations Research
, vol.39
, pp. 195-229
-
-
Rubinstein, R.Y.1
-
36
-
-
0001201756
-
Some Studies in Machine Learning Using the Game of Checkers
-
Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3, 210-229.
-
(1959)
IBM Journal of Research and Development
, vol.3
, pp. 210-229
-
-
Samuel, A.L.1
-
37
-
-
0003667618
-
-
Prentice-Hall, Englewood Cliffs, N.J.
-
Shilov, G. E., & Gurevich, B. L. (1966). Integral, Measure and Derivative: A Unified Approach. Prentice-Hall, Englewood Cliffs, N.J.
-
(1966)
Integral, Measure and Derivative: A Unified Approach
-
-
Shilov, G.E.1
Gurevich, B.L.2
-
40
-
-
0015658957
-
The optimal control of partially observable Markov decision processes over a finite horizon
-
Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21, 1071-1098.
-
(1973)
Operations Research
, vol.21
, pp. 1071-1098
-
-
Smallwood, R.D.1
Sondik, E.J.2
-
41
-
-
0017943242
-
The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs
-
Sondik, E. J. (1978). The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs. Operations Research, 26.
-
(1978)
Operations Research
, vol.26
-
-
Sondik, E.J.1
-
43
-
-
84898939480
-
Policy Gradient Methods for Reinforcement Learning with Function Approximation
-
MIT Press
-
Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Neural Information Processing Systems 1999. MIT Press.
-
(2000)
Neural Information Processing Systems 1999
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
44
-
-
13444294406
-
A multi-agent, policy-gradient approach to network routing
-
Australian National University
-
Tao, N., Baxter, J., & Weaver, L. (2001). A multi-agent, policy-gradient approach to network routing. Tech. rep., Australian National University.
-
(2001)
Tech. Rep.
-
-
Tao, N.1
Baxter, J.2
Weaver, L.3
-
45
-
-
0001046225
-
Practical Issues in Temporal Difference Learning
-
Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning, 8, 257-278.
-
(1992)
Machine Learning
, vol.8
, pp. 257-278
-
-
Tesauro, G.1
-
46
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
-
(1994)
Neural Computation
, vol.6
, pp. 215-219
-
-
Tesauro, G.1
-
47
-
-
0031143730
-
An Analysis of Temporal Difference Learning with Function Approximation
-
Tsitsikilis, J. N., & Van-Roy, B. (1997). An Analysis of Temporal Difference Learning with Function Approximation. IEEE Transactions on Automatic Control, 42(5), 674-690.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsikilis, J.N.1
Van-Roy, B.2
-
48
-
-
0000337576
-
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
-
Williams, R. J. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 229-256.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
|