SCOPUS 정보 검색 플랫폼

Journal of Computer and System Sciences

Volumn 64, Issue 1, 2002, Pages 133-150

Estimation and approximation bounds for gradient-based reinforcement learning

(2) Bartlett, Peter L a Baxter, Jonathan b

a BIOwulf Technologies (United States)

b WhizBang Labs (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; ARTIFICIAL INTELLIGENCE; CONVERGENCE OF NUMERICAL METHODS; MARKOV PROCESSES; MATHEMATICAL MODELS; REINFORCEMENT; SET THEORY;

APPROXIMATION BOUNDS;

DECISION THEORY;

EID: 0036477347 PISSN: 00220000 EISSN: None Source Type: Journal
DOI: 10.1006/jcss.2001.1793 Document Type: Article

Times cited : (13)

References (20)

1
- 0009056093
- Technical Report, Australian National University, January
- (2001) Policy-Gradient Learning of Controllers with Internal State
- Aberdeen, D.¹ Baxter, J.²

2
- 0002686204
- Stochastic optimization
- (1968) Engrg. Cybernetics , vol.5 , pp. 11-16
- Aleksandrov, V.M.¹ Sysoyev, V.I.² Shemeneva, V.V.³

3
- 0003924391
- Cambridge Univ. Press, Cambridge, UK
- (1999) Neural Network Learning: Theoretical Foundations
- Anthony, M.¹ Bartlett, P.L.²

4
- 84898958374
- Gradient descent for general reinforcement learning
- MIT Press, Cambridge, MA
- (1999) Advances in Neural Information Processing Systems 11 , vol.11
- Baird, L.¹ Moore, A.²

5
- 0009056095
- On some algorithms for infinite-horizon policy-gradient estimation
- (2001) J. Artificial Intelligence Res. , vol.14
- Baxter, J.¹ Bartlett, P.L.²

6
- 0001578564
- Learning dynamical systems in a stationary environment
- (1998) Systems Control Lett. , vol.34 , pp. 125-132
- Campi, M.C.¹ Kumar, P.R.²

7
- 0032122986
- Algorithms for sensitivity analysis of Markov chains through potentials and perturbation realization
- (1998) IEEE Trans. Control Systems Tech. , vol.6 , pp. 482-492
- Cao, X.-R.¹ Wan, Y.-W.²

8
- 0022882413
- Stochastic approximation for Monte-Carlo optimization
- (1986) Proceedings of the 1986 Winter Simulation Conference, 1986 , pp. 356-365
- Glynn, P.W.¹

9
- 0004207439
- Springer-Verlag, New York
- (1974) Measure Theory
- Halmos, P.R.¹

10
- 84947403595
- Probability inequalities for sums of bounded random variables
- (1963) J. Amer. Statist. Assoc. , vol.58 , Issue.301 , pp. 13-30
- Hoeffding, W.¹

11
- 0001251942
- Reinforcement learning in POMDPs with function approximation
- D. H. Fisher, Ed.
- (1997) Proceedings of the Fourteenth International Conference on Machine Lemming (ICML'97), 1997 , pp. 152-160
- Kimura, H.¹ Miyazaki, K.² Kobayashi, S.³

12
- 84898938510
- Actor-critic algorithms
- MIT Press, Cambridge, MA
- (2000) Neural Information Processing Systems 1999
- Konda, V.R.¹ Tsitsiklis, J.N.²

13
- 0009011171
- Ph.D. thesis, Laboratory for Information and Decision Systems, MIT
- (1998) Simulation-Based Methods for Markov Decision Processes
- Marbach, P.¹

14
- 0009011171
- Technical Report, MIT
- (1998) Simulation-Based Optimization of Markov Reward Processes
- Marbach, P.¹ Tsitsiklis, J.N.²

15
- 0033904367
- Nonparametric time series prediction through adaptive model selection
- (2000) Mach. Learning , vol.39 , pp. 5-34
- Meir, R.¹

16
- 0000973081
- Minimum complexity regression estimation with weakly dependent observations
- (1996) IEEE Trans. Inform. Theory , vol.42
- Modha, D.S.¹ Masry, E.²

17
- 0022906632
- Sensitivity analysis via likelihood ratios
- (1986) Proceedings of the 1986 Winter Simulation Conference, 1986
- Reiman, M.I.¹ Weiss, A.²

18
- 0004202917
- Ph.D. thesis, Rigas Polytechnical Institute, Latvija, Riga
- (1969) Some Problems in Monte Carlo Optimization
- Rubinstein, R.Y.¹

19
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- MIT Press, Cambridge, MA
- (2000) Neural Information Processing Systems 1999
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

20
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- (1992) Mach. Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.