SCOPUS 정보 검색 플랫폼

Journal of Artificial Intelligence Research

Volumn 15, Issue , 2001, Pages 351-381

Experiments with infinite-horizon, policy-gradient estimation

(3) Baxter, Jonathan a Bartlett, Peter L b Weaver, Lex c

a WhizBang Labs (United States)

b BIOwulf Technologies (United States)

c AUSTRALIAN NATIONAL UNIVERSITY (Australia)

Author keywords

[No Author keywords available]

Indexed keywords

CONJUGATE-GRADIENTS; NATURAL INTERPRETATION; PARTIALLY OBSERVABLE DECISION PROCESS (POMDP);

COMPUTATIONAL METHODS; DECISION THEORY; MARKOV PROCESSES; MULTI AGENT SYSTEMS; PROBLEM SOLVING;

ALGORITHMS;

EID: 0013495368 PISSN: 10769757 EISSN: None Source Type: Journal
DOI: 10.1613/jair.807 Document Type: Article

Times cited : (121)

References (35)

1
- 0009056093
- Policy-gradient learning of controllers with internal state
- Australian National University
- Aberdeen, D., & Baxter, J. (2001). Policy-gradient learning of controllers with internal state. Tech. rep., Australian National University.
- (2001) Tech. Rep.
- Aberdeen, D.¹ Baxter, J.²

2
- 84898958374
- Gradient descent for general reinforcement learning
- MIT Press
- Baird, L., & Moore, A. (1999). Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems 11. MIT Press.
- (1999) Advances in Neural Information Processing Systems 11
- Baird, L.¹ Moore, A.²

3
- 2542506169
- Hebbian synaptic modifications in spiking neurons that learn
- Research School of Information Sciences and Engineering, Australian National University
- Bartlett, P. L., & Baxter, J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Tech. rep., Research School of Information Sciences and Engineering, Australian National University. http://csl.anu.edu.au/∼bartlett/papers/BartlettBaxter-Nov99.ps.gz.
- (1999) Tech. Rep.
- Bartlett, P.L.¹ Baxter, J.²

4
- 0001525897
- Estimation and approximation bounds for gradient-based reinforcement learning
- Bartlett, P. L., & Baxter, J. (2000a). Estimation and approximation bounds for gradient-based reinforcement learning. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pp. 133-141.
- (2000) Proceedings of the Thirteenth Annual Conference on Computational Learning Theory , pp. 133-141
- Bartlett, P.L.¹ Baxter, J.²

5
- 0034439308
- Stochastic optimization of controlled partially observable markov decision processes
- Bartlett, P. L., & Baxter, J. (2000b). Stochastic optimization of controlled partially observable markov decision processes. In Proceedings of the 39th IEEE Conference on Decision and Control (CDC00).
- (2000) Proceedings of the 39th IEEE Conference on Decision and Control (CDC00)
- Bartlett, P.L.¹ Baxter, J.²

6
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846.
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.SMC-13 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

7
- 0003272616
- Reinforcement learning in POMDPs via direct gradient ascent
- Baxter, J., & Bartlett, P. L. (2000). Reinforcement learning in POMDPs via direct gradient ascent. In Proceedings of the Seventeenth International Conference on Machine Learning.
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning
- Baxter, J.¹ Bartlett, P.L.²

8
- 0013535965
- Infinite-horizon policy-gradient estimation
- To appear
- Baxter, J., & Bartlett, P. L. (2001). Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research. To appear.
- (2001) Journal of Artificial Intelligence Research
- Baxter, J.¹ Bartlett, P.L.²

9
- 0034275416
- Learning to play chess using temporal-differences
- Baxter, J., Tridgell, A., & Weaver, L. (2000). Learning to play chess using temporal-differences. Machine Learning, 40(3), 243-263.
- (2000) Machine Learning , vol.40 , Issue.3 , pp. 243-263
- Baxter, J.¹ Tridgell, A.² Weaver, L.³

10
- 0003487482
- Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

11
- 0031258478
- Perturbation Realization, Potentials, and Sensitivity Analysis of Markov Processes
- Cao, X.-R., & Chen, H.-F. (1997). Perturbation Realization, Potentials, and Sensitivity Analysis of Markov Processes. IEEE Transactions on Automatic Control, 42, 1382-1393.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 1382-1393
- Cao, X.-R.¹ Chen, H.-F.²

12
- 0032122986
- Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization
- Cao, X.-R., & Wan, Y.-W. (1998). Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization. IEEE Transactions on Control Systems Technology, 6, 482-492.
- (1998) IEEE Transactions on Control Systems Technology , vol.6 , pp. 482-492
- Cao, X.-R.¹ Wan, Y.-W.²

13
- 0004045052
- Springer, New York
- Fine, T. L. (1999). Feedforward Neural Network Methodology. Springer, New York.
- (1999) Feedforward Neural Network Methodology
- Fine, T.L.¹

14
- 0028444151
- Smooth Perturbation Derivative Estimation for Markov Chains
- Fu, M. C., & Hu, J. (1994). Smooth Perturbation Derivative Estimation for Markov Chains. Operations Research Letters, 15, 241-251.
- (1994) Operations Research Letters , vol.15 , pp. 241-251
- Fu, M.C.¹ Hu, J.²

15
- 0022882413
- Stochastic approximation for monte-carlo optimization
- Glynn, P. W. (1986). Stochastic approximation for monte-carlo optimization. In Proceedings of the 1986 Winter Simulation Conference, pp. 356-365.
- (1986) Proceedings of the 1986 Winter Simulation Conference , pp. 356-365
- Glynn, P.W.¹

16
- 0008336447
- An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
- Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. In Fifteenth International Conference on Machine Learning, pp. 278-286.
- (1998) Fifteenth International Conference on Machine Learning , pp. 278-286
- Kimura, H.¹ Kobayashi, S.²

17
- 0001251942
- Reinforcement learning in POMDPs with function approximation
- Fisher, D. H. (Ed.)
- Kimura, H., Miyazaki, K., & Kobayashi, S. (1997). Reinforcement learning in POMDPs with function approximation. In Fisher, D. H. (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pp. 152-160.
- (1997) Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97) , pp. 152-160
- Kimura, H.¹ Miyazaki, K.² Kobayashi, S.³

18
- 85152624316
- Reinforcement learning by stochastic hill climbing on discounted reward
- Kimura, H., Yamamura, M., & Kobayashi, S. (1995). Reinforcement learning by stochastic hill climbing on discounted reward. In Proceedings of the Twelfth International Conference on Machine Learning (ICML'95), pp. 295-303.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning (ICML'95) , pp. 295-303
- Kimura, H.¹ Yamamura, M.² Kobayashi, S.³

19
- 84898938510
- Actor-Critic Algorithms
- MIT Press
- Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-Critic Algorithms. In Neural Information Processing Systems 1999. MIT Press.
- (2000) Neural Information Processing Systems 1999
- Konda, V.R.¹ Tsitsiklis, J.N.²

20
- 0009011171
- Ph.D. thesis, Laboratory for Information and Decision Systems, MIT
- Marbach, P. (1998). Simulation-Based Methods for Markov Decision Processes. Ph.D. thesis, Laboratory for Information and Decision Systems, MIT.
- (1998) Simulation-Based Methods for Markov Decision Processes
- Marbach, P.¹

21
- 0009011171
- Simulation-Based Optimization of Markov Reward Processes
- MIT
- Marbach, P., & Tsitsiklis, J. N. (1998). Simulation-Based Optimization of Markov Reward Processes. Tech. rep., MIT.
- (1998) Tech. Rep.
- Marbach, P.¹ Tsitsiklis, J.N.²

22
- 0004246265
- Wiley, New York
- Rubinstein, R. Y., & Melamed, B. (1998). Modern Simulation and Modeling. Wiley, New York.
- (1998) Modern Simulation and Modeling
- Rubinstein, R.Y.¹ Melamed, B.²

23
- 0001201756
- Some Studies in Machine Learning Using the Game of Checkers
- Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3, 210-229.
- (1959) IBM Journal of Research and Development , vol.3 , pp. 210-229
- Samuel, A.L.¹

24
- 2142812536
- Learning Without State-Estimation in Partially Observable Markovian Decision Processes
- Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning Without State-Estimation in Partially Observable Markovian Decision Processes. In Proceedings of the Eleventh International Conference on Machine Learning.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

25
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- MIT Press
- Singh, S., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pp. 974-980. MIT Press.
- (1997) Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference , pp. 974-980
- Singh, S.¹ Bertsekas, D.²

26
- 0000224681
- Reinforcement learning with soft state aggregation
- Tesauro, G., Touretzky, D., & Leen, T. (Eds.), MIT Press, Cambridge, MA
- Singh, S., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 7. MIT Press, Cambridge, MA.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Singh, S.¹ Jaakkola, T.² Jordan, M.³

27
- 33847202724
- Learning to Predict by the Method of Temporal Differences
- Sutton, R. (1988). Learning to Predict by the Method of Temporal Differences. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.¹

28
- 0004102479
- MIT Press, Cambridge MA. ISBN 0-262-19398-1
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge MA. ISBN 0-262-19398-1.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

29
- 84898939480
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
- MIT Press
- Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Neural Information Processing Systems 1999. MIT Press.
- (2000) Neural Information Processing Systems 1999
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

30
- 13444294406
- A multi-agent, policy-gradient approach to network routing
- Australian National University
- Tao, N., Baxter, J., & Weaver, L. (2001). A multi-agent, policy-gradient approach to network routing. Tech. rep., Australian National University.
- (2001) Tech. Rep.
- Tao, N.¹ Baxter, J.² Weaver, L.³

31
- 0001046225
- Practical Issues in Temporal Difference Learning
- Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning, 8, 257-278.
- (1992) Machine Learning , vol.8 , pp. 257-278
- Tesauro, G.¹

32
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves master-level play
- Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
- (1994) Neural Computation , vol.6 , pp. 215-219
- Tesauro, G.¹

33
- 24044492128
- Reinforcement learning from state and temporal differences
- Australian National University
- Weaver, L., & Baxter, J. (1999). Reinforcement learning from state and temporal differences. Tech. rep., Australian National University.
- (1999) Tech. Rep.
- Weaver, L.¹ Baxter, J.²

34
- 0000337576
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
- Williams, R. J. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 229-256.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

35
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- Morgan Kaufmann
- Zhang, W., & Dietterich, T. (1995). A reinforcement learning approach to job-shop scheduling. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1114-1120. Morgan Kaufmann.
- (1995) Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence , pp. 1114-1120
- Zhang, W.¹ Dietterich, T.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.