SCOPUS 정보 검색 플랫폼

Journal of Artificial Intelligence Research

Volumn 15, Issue , 2001, Pages 319-350

Infinite-horizon policy-gradient estimation

(2) Baxter, Jonathan a Bartlett, Peter L b

a WhizBang Labs (United States)

b BIOwulf Technologies (United States)

Author keywords

[No Author keywords available]

Indexed keywords

GRADIENT-BASED APPROACHES; POLICY PARAMETERS; VALUE-FUNCTION METHODS;

ALGORITHMS; COMPUTATIONAL METHODS; MARKOV PROCESSES; MULTI AGENT SYSTEMS; PROBLEM SOLVING; RANDOM PROCESSES;

LEARNING SYSTEMS;

EID: 0013535965 PISSN: 10769757 EISSN: None Source Type: Journal
DOI: 10.1613/jair.806 Document Type: Article

Times cited : (676)

References (49)

1
- 0009056093
- Policy-gradient learning of controllers with internal state
- Australian National University
- Aberdeen, D., & Baxter, J. (2001). Policy-gradient learning of controllers with internal state. Tech. rep., Australian National University.
- (2001) Tech. Rep.
- Aberdeen, D.¹ Baxter, J.²

2
- 0002686204
- Stochastic optimaization
- Aleksandrov, V. M., Sysoyev, V. I., & Shemeneva, V. V. (1968). Stochastic optimaization. Engineering Cybernetics, 5, 11-16.
- (1968) Engineering Cybernetics , vol.5 , pp. 11-16
- Aleksandrov, V.M.¹ Sysoyev, V.I.² Shemeneva, V.V.³

3
- 84898958374
- Gradient descent for general reinforcement learning
- MIT Press
- Baird, L., & Moore, A. (1999). Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems 11. MIT Press.
- (1999) Advances in Neural Information Processing Systems 11
- Baird, L.¹ Moore, A.²

4
- 2542506169
- Hebbian synaptic modifications in spiking neurons that learn
- Research School of Information Sciences and Engineering, Australian National University
- Bartlett, P. L., & Baxter, J. (1999). Hebbian synaptic modifications in spiking neurons that learn. Tech. rep., Research School of Information Sciences and Engineering, Australian National University. http://csl.anu.edu.au/∼bartlett/papers/BartlettBaxter-Nov99.ps.gz.
- (1999) Tech. Rep.
- Bartlett, P.L.¹ Baxter, J.²

5
- 24044553495
- Estimation and approximation bounds for gradient-based reinforcement learning
- Invited Paper: Special Issue on COLT 2000
- Bartlett, P. L., & Baxter, J. (2001). Estimation and approximation bounds for gradient-based reinforcement learning. Journal of Computer and Systems Sciences, 62. Invited Paper: Special Issue on COLT 2000.
- (2001) Journal of Computer and Systems Sciences , vol.62
- Bartlett, P.L.¹ Baxter, J.²

6
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13, 834-846.
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.SMC-13 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

7
- 0013495368
- Experiments with infinite-horizon, policy-gradient estimation
- To appear
- Baxter, J., Bartlett, P. L., & Weaver, L. (2001). Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research. To appear.
- (2001) Journal of Artificial Intelligence Research
- Baxter, J.¹ Bartlett, P.L.² Weaver, L.³

8
- 0034275416
- Learning to play chess using temporal-differences
- Baxter, J., Tridgell, A., & Weaver, L. (2000). Learning to play chess using temporal-differences. Machine Learning, 40(3), 243-263.
- (2000) Machine Learning , vol.40 , Issue.3 , pp. 243-263
- Baxter, J.¹ Tridgell, A.² Weaver, L.³

9
- 0003487482
- Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

10
- 0003565783
- Athena Scientific
- Bertsekas, D. P. (1995). Dynamic Programming and Optimal Control, Vol II. Athena Scientific.
- (1995) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

11
- 0004256573
- Addison-Wesley
- Breiman, L. (1966). Probability. Addison-Wesley.
- (1966) Probability
- Breiman, L.¹

12
- 0032122986
- Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization
- Cao, X.-R., & Wan, Y.-W. (1998). Algorithms for Sensitivity Analysis of Markov Chains Through Potentials and Perturbation Realization. IEEE Transactions on Control Systems Technology, 6, 482-492.
- (1998) IEEE Transactions on Control Systems Technology , vol.6 , pp. 482-492
- Cao, X.-R.¹ Wan, Y.-W.²

13
- 0003875839
- Wadsworth & Brooks/Cole, Belmont, California
- Dudley, R. M. (1989). Real Analysis and Probability. Wadsworth & Brooks/Cole, Belmont, California.
- (1989) Real Analysis and Probability
- Dudley, R.M.¹

14
- 0022882413
- Stochastic approximation for monte-carlo optimization
- Glynn, P. W. (1986). Stochastic approximation for monte-carlo optimization. In Proceedings of the 1986 Winter Simulation Conference, pp. 356-365.
- (1986) Proceedings of the 1986 Winter Simulation Conference , pp. 356-365
- Glynn, P.W.¹

15
- 84976859194
- Likelihood ratio gradient estimation for stochastic systems
- Glynn, P. W. (1990). Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33, 75-84.
- (1990) Communications of the ACM , vol.33 , pp. 75-84
- Glynn, P.W.¹

16
- 0001354607
- Likelihood ratio gradient estimation for regenerative stochastic recursions
- (1995), 27
- Glynn, P. W., & L'Ecuyer, P. (1995). Likelihood ratio gradient estimation for regenerative stochastic recursions. Advances in Applied Probability, 27, 4 (1995), 27, 1019-1053.
- (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1019-1053
- Glynn, P.W.¹ L'Ecuyer, P.²

17
- 0003585978
- Kluwer Academic, Boston
- Ho, Y.-C., & Cao, X.-R. (1991). Perturbation Analysis of Discrete Event Dynamic Systems. Kluwer Academic, Boston.
- (1991) Perturbation Analysis of Discrete Event Dynamic Systems
- Ho, Y.-C.¹ Cao, X.-R.²

18
- 0000624333
- Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems
- Tesauro, G., Touretzky, D., & Leen, T. (Eds.), MIT Press, Cambridge, MA
- Jaakkola, T., Singh, S. P., & Jordan, M. I. (1995). Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems. In Tesauro, G., Touretzky, D., & Leen, T. (Eds.), Advances in Neural Information Processing Systems, Vol. 7. MIT Press, Cambridge, MA.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Jaakkola, T.¹ Singh, S.P.² Jordan, M.I.³

19
- 0008336447
- An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
- Kimura, H., & Kobayashi, S. (1998a). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions. In Fifteenth International Conference on Machine Learning, pp. 278-286.
- (1998) Fifteenth International Conference on Machine Learning , pp. 278-286
- Kimura, H.¹ Kobayashi, S.²

20
- 18544383841
- Reinforcement learning for continuous action using stochastic gradient ascent
- Kimura, H., & Kobayashi, S. (1998b). Reinforcement learning for continuous action using stochastic gradient ascent. In Intelligent Autonomous Systems (IAS-5), pp. 288-295.
- (1998) Intelligent Autonomous Systems (IAS-5) , pp. 288-295
- Kimura, H.¹ Kobayashi, S.²

21
- 0001251942
- Reinforcement learning in POMDPs with function approximation
- Fisher, D. H. (Ed.)
- Kimura, H., Miyazaki, K., & Kobayashi, S. (1997). Reinforcement learning in POMDPs with function approximation. In Fisher, D. H. (Ed.), Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pp. 152-160.
- (1997) Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97) , pp. 152-160
- Kimura, H.¹ Miyazaki, K.² Kobayashi, S.³

22
- 85152624316
- Reinforcement learning by stochastic hill climbing on discounted reward
- Kimura, H., Yamamura, M., & Kobayashi, S. (1995). Reinforcement learning by stochastic hill climbing on discounted reward. In Proceedings of the Twelfth International Conference on Machine Learning (ICML'95), pp. 295-303.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning (ICML'95) , pp. 295-303
- Kimura, H.¹ Yamamura, M.² Kobayashi, S.³

23
- 84898938510
- Actor-Critic Algorithms
- MIT Press
- Konda, V. R., & Tsitsiklis, J. N. (2000). Actor-Critic Algorithms. In Neural Information Processing Systems 1999. MIT Press.
- (2000) Neural Information Processing Systems 1999
- Konda, V.R.¹ Tsitsiklis, J.N.²

24
- 0004293156
- Academic Press, San Diego, CA
- Lancaster, P., & Tismenetsky, M. (1985). The Theory of Matrices. Academic Press, San Diego, CA.
- (1985) The Theory of Matrices
- Lancaster, P.¹ Tismenetsky, M.²

25
- 0009011171
- Simulation-Based Optimization of Markov Reward Processes
- MIT
- Marbach, P., & Tsitsiklis, J. N. (1998). Simulation-Based Optimization of Markov Reward Processes. Tech. rep., MIT.
- (1998) Tech. Rep.
- Marbach, P.¹ Tsitsiklis, J.N.²

26
- 24044518063
- Off-policy policy search
- MIT Artificical Intelligence Laboratory
- Meuleau, N., Peshkin, L., Kaelbling, L. P., & Kim, K.-E. (2000). Off-policy policy search. Tech. rep., MIT Artificical Intelligence Laboratory.
- (2000) Tech. Rep.
- Meuleau, N.¹ Peshkin, L.² Kaelbling, L.P.³ Kim, K.-E.⁴

27
- 0002103968
- Learning finite-state controllers for partially observable environments
- Meuleau, N., Peshkin, L., Kim, K.-E., & Kaelbling, L. P. (1999). Learning finite-state controllers for partially observable environments. In Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence.
- (1999) Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence
- Meuleau, N.¹ Peshkin, L.² Kim, K.-E.³ Kaelbling, L.P.⁴

28
- 0012646255
- Learning to cooperate via policy search
- Peshkin, L., Kim, K.-E., Meuleau, N., & Kaelbling, L. P. (2000). Learning to cooperate via policy search. In Proceedings of the Sixteenth International Conference on Uncertainty in Artificial Intelligence.
- (2000) Proceedings of the Sixteenth International Conference on Uncertainty in Artificial Intelligence
- Peshkin, L.¹ Kim, K.-E.² Meuleau, N.³ Kaelbling, L.P.⁴

29
- 0022906632
- Sensitivity analysis via likelihood ratios
- Reiman, M. I., & Weiss, A. (1986). Sensitivity analysis via likelihood ratios. In Proceedings of the 1986 Winter Simulation Conference.
- (1986) Proceedings of the 1986 Winter Simulation Conference
- Reiman, M.I.¹ Weiss, A.²

30
- 0024735795
- Sensitivity analysis for simulations via likelihood ratios
- Reiman, M. I., & Weiss, A. (1989). Sensitivity analysis for simulations via likelihood ratios. Operations Research, 37.
- (1989) Operations Research , vol.37
- Reiman, M.I.¹ Weiss, A.²

31
- 0004202917
- Ph.D. thesis
- Rubinstein, R. Y. (1969). Some Problems in Monte Carlo Optimization. Ph.D. thesis.
- (1969) Some Problems in Monte Carlo Optimization
- Rubinstein, R.Y.¹

32
- 0012260708
- How to optimize complex stochastic systems from a single sample path by the score function method
- Rubinstein, R. Y. (1991). How to optimize complex stochastic systems from a single sample path by the score function method. Annals of Operations Research, 27, 175-211.
- (1991) Annals of Operations Research , vol.27 , pp. 175-211
- Rubinstein, R.Y.¹

33
- 24044480220
- Decomposable score function estimators for sensitivity analysis and optimization of queueing networks
- Rubinstein, R. Y. (1992). Decomposable score function estimators for sensitivity analysis and optimization of queueing networks. Annals of Operations Research, 39, 195-229.
- (1992) Annals of Operations Research , vol.39 , pp. 195-229
- Rubinstein, R.Y.¹

34
- 0004246265
- Wiley, New York
- Rubinstein, R. Y., & Melamed, B. (1998). Modern Simulation and Modeling. Wiley, New York.
- (1998) Modern Simulation and Modeling
- Rubinstein, R.Y.¹ Melamed, B.²

35
- 0003864136
- Wiley, New York
- Rubinstein, R. Y., & Shapiro, A. (1993). Discrete Event Systems. Wiley, New York.
- (1993) Discrete Event Systems
- Rubinstein, R.Y.¹ Shapiro, A.²

36
- 0001201756
- Some Studies in Machine Learning Using the Game of Checkers
- Samuel, A. L. (1959). Some Studies in Machine Learning Using the Game of Checkers. IBM Journal of Research and Development, 3, 210-229.
- (1959) IBM Journal of Research and Development , vol.3 , pp. 210-229
- Samuel, A.L.¹

37
- 0003667618
- Prentice-Hall, Englewood Cliffs, N.J.
- Shilov, G. E., & Gurevich, B. L. (1966). Integral, Measure and Derivative: A Unified Approach. Prentice-Hall, Englewood Cliffs, N.J.
- (1966) Integral, Measure and Derivative: A Unified Approach
- Shilov, G.E.¹ Gurevich, B.L.²

38
- 2142812536
- Learning Without State-Estimation in Partially Observable Markovian Decision Processes
- Singh, S. P., Jaakkola, T., & Jordan, M. I. (1994). Learning Without State-Estimation in Partially Observable Markovian Decision Processes. In Proceedings of the Eleventh International Conference on Machine Learning.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

39
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- MIT Press
- Singh, S., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pp. 974-980. MIT Press.
- (1997) Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference , pp. 974-980
- Singh, S.¹ Bertsekas, D.²

40
- 0015658957
- The optimal control of partially observable Markov decision processes over a finite horizon
- Smallwood, R. D., & Sondik, E. J. (1973). The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21, 1071-1098.
- (1973) Operations Research , vol.21 , pp. 1071-1098
- Smallwood, R.D.¹ Sondik, E.J.²

41
- 0017943242
- The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs
- Sondik, E. J. (1978). The optimal control of partially observable Markov decision processes over the infinite horizon: Discounted costs. Operations Research, 26.
- (1978) Operations Research , vol.26
- Sondik, E.J.¹

42
- 0004102479
- MIT Press, Cambridge MA. ISBN 0-262-19398-1
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge MA. ISBN 0-262-19398-1.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

43
- 84898939480
- Policy Gradient Methods for Reinforcement Learning with Function Approximation
- MIT Press
- Sutton, R. S., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Neural Information Processing Systems 1999. MIT Press.
- (2000) Neural Information Processing Systems 1999
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

44
- 13444294406
- A multi-agent, policy-gradient approach to network routing
- Australian National University
- Tao, N., Baxter, J., & Weaver, L. (2001). A multi-agent, policy-gradient approach to network routing. Tech. rep., Australian National University.
- (2001) Tech. Rep.
- Tao, N.¹ Baxter, J.² Weaver, L.³

45
- 0001046225
- Practical Issues in Temporal Difference Learning
- Tesauro, G. (1992). Practical Issues in Temporal Difference Learning. Machine Learning, 8, 257-278.
- (1992) Machine Learning , vol.8 , pp. 257-278
- Tesauro, G.¹

46
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves master-level play
- Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
- (1994) Neural Computation , vol.6 , pp. 215-219
- Tesauro, G.¹

47
- 0031143730
- An Analysis of Temporal Difference Learning with Function Approximation
- Tsitsikilis, J. N., & Van-Roy, B. (1997). An Analysis of Temporal Difference Learning with Function Approximation. IEEE Transactions on Automatic Control, 42(5), 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsikilis, J.N.¹ Van-Roy, B.²

48
- 0000337576
- Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning
- Williams, R. J. (1992). Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8, 229-256.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

49
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- Morgan Kaufmann
- Zhang, W., & Dietterich, T. (1995). A reinforcement learning approach to job-shop scheduling. In Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1114-1120. Morgan Kaufmann.
- (1995) Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence , pp. 1114-1120
- Zhang, W.¹ Dietterich, T.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.