SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 4005 LNAI, Issue , 2006, Pages 574-588

Learning near-optimal policies with bellman-residual minimization based fitted policy iteration and a single sample path

(3) Antos, András a Szepesvári, Csaba a Munos, Rémi b

a INSTITUTE OF EXPERIMENTAL MEDICINE (Hungary)

b CENTRE DE MATHÉMATIQUES APPLIQUÉES (France)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL COMPLEXITY; DECISION MAKING; ITERATIVE METHODS; LEARNING ALGORITHMS; MARKOV PROCESSES;

BEHAVIOUR POLICY; BELLMAN-RESIDUAL MINIMIZATION; MARKOVIAN DECISION PROBLEMS;

PROBLEM SOLVING;

EID: 33746032553 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/11776420_42 Document Type: Conference Paper

Times cited : (21)

References (23)

1
- 4644323293
- Least-squares policy iteration
- M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.¹ Parr, R.²

2
- 84941157238
- Learning near-optimal policies with fitted policy iteration and a single sample path: Approximate iterative policy evaluation
- A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with fitted policy iteration and a single sample path: approximate iterative policy evaluation, (submitted to ICML'2006, 2006.
- (2006) ICML'2006
- Antos, A.¹ Szepesvári, Cs.² Munos, R.³

3
- 0003923091
- Academic Press, New York
- D. P. Bertsekas and S.E. Shreve. Stochastic Optimal Control (The Discrete Time Case). Academic Press, New York, 1978.
- (1978) Stochastic Optimal Control (The Discrete Time Case)
- Bertsekas, D.P.¹ Shreve, S.E.²

4
- 33746052738
- Toward a modern theory of adaptive networks: Expectation and prediction
- Erlbaum, Hillsdale, NJ, USA
- R.S. Button and A.G. Barto. Toward a modern theory of adaptive networks: Expectation and prediction. In Proc. of the Ninth Annual Conference of Cognitive Science Society. Erlbaum, Hillsdale, NJ, USA, 1987.
- (1987) Proc. of the Ninth Annual Conference of Cognitive Science Society
- Button, R.S.¹ Barto, A.G.²

5
- 1942516880
- Error bounds for approximate policy iteration
- R. Munos. Error bounds for approximate policy iteration. 19th International Conference on Machine Learning, pages 560-567, 2003.
- (2003) 19th International Conference on Machine Learning , pp. 560-567
- Munos, R.¹

6
- 0003637131
- Springer-Verlag, New York
- S.P. Meyn and R. Tweedie. Markov Chains and Stochastic Stability. Springer-Verlag, New York, 1993.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.²

7
- 0003924391
- Cambridge University Press
- M. Anthony and P. L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
- (1999) Neural Network Learning: Theoretical Foundations
- Anthony, M.¹ Bartlett, P.L.²

8
- 0003161174
- Rates of convergence for empirical processes of stationary mixing sequences
- January
- B. Yu. Rates of convergence for empirical processes of stationary mixing sequences. The Annals of Probability, 22(1):94-116, January 1994.
- (1994) The Annals of Probability , vol.22 , Issue.1 , pp. 94-116
- Yu, B.¹

9
- 0030489341
- Histogram regression estimation using data-dependent partitions
- A. Nobel. Histogram regression estimation using data-dependent partitions. Annals of Statistics, 24(3):1084-1105, 1996.
- (1996) Annals of Statistics , vol.24 , Issue.3 , pp. 1084-1105
- Nobel, A.¹

10
- 0000996139
- Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension
- D. Haussler. Sphere packing numbers for subsets of the boolean n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory Series A, 69:217-232, 1995.
- (1995) Journal of Combinatorial Theory Series A , vol.69 , pp. 217-232
- Haussler, D.¹

11
- 0001201756
- Some studies in machine learning using the game of checkers
- A.L. Samuel. Some studies in machine learning using the game of checkers. IBM Journal on Research and Development, pages 210-229, 1959.
- (1959) IBM Journal on Research and Development , pp. 210-229
- Samuel, A.L.¹

12
- 0004242550
- Reprinted, E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York
- Reprinted in Computers and Thought, E.A. Feigenbaum and J. Feldman, editors, McGraw-Hill, New York, 1963.
- (1963) Computers and Thought

13
- 84968519017
- Functional approximation and dynamic programming
- R.E. Bellman and S.E. Dreyfus. Functional approximation and dynamic programming. Math. Tables and other Aids Comp., 13:247-251, 1959.
- (1959) Math. Tables and Other Aids Comp. , vol.13 , pp. 247-251
- Bellman, R.E.¹ Dreyfus, S.E.²

14
- 0003487482
- Athena Scientific
- Dimitri P. Bertsekas and J. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.²

15
- 0008321896
- Reinforcement learning: An introduction
- Richard S. Sutton and Andrew G. Barto. Reinforcement learning: An introduction. Bradford Book, 1998.
- (1998) Bradford Book
- Sutton, R.S.¹ Barto, A.G.²

16
- 84880694195
- Stable function approximation in dynamic programming
- Armand Prieditis and Stuart Russell, editors, San Francisco, CA. Morgan Kaufmann
- Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Armand Prieditis and Stuart Russell, editors, Proceedings of the Twelfth International Conference on Machine Learning, pages 261-268, San Francisco, CA, 1995. Morgan Kaufmann.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 261-268
- Gordon, G.J.¹

17
- 0029752470
- Feature-based methods for large scale dynamic programming
- J. N. Tsitsiklis and B. Van Roy. Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94, 1996.
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

18
- 84880898477
- Max-norm projections for factored mdps
- Carlos Guestrin, Daphne Koller, and Ronald Parr. Max-norm projections for factored mdps. Proceedings of the International Joint Conference on Artificial Intelligence, 2001.
- (2001) Proceedings of the International Joint Conference on Artificial Intelligence
- Guestrin, C.¹ Koller, D.² Parr, R.³

19
- 21844465127
- Tree-based batch mode reinforcement learning
- D. Ernst, P. Geurts, and L. Wehenkel. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503-556, 2005.
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

20
- 0006292876
- Efficient value function approximation using regression trees
- Stockholm, Sweden
- X. Wang and T.G. Dietterich. Efficient value function approximation using regression trees. In Proceedings of the IJCAI Workshop on Statistical Machine Learning for Large-Scale Optimization, Stockholm, Sweden, 1999.
- (1999) Proceedings of the IJCAI Workshop on Statistical Machine Learning for Large-scale Optimization
- Wang, X.¹ Dietterich, T.G.²

21
- 84899029004
- Batch value function approximation via support vectors
- T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Cambridge, MA. MIT Press
- T. G. Dietterich and X. Wang. Batch value function approximation via support vectors. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press.
- (2002) Advances in Neural Information Processing Systems , vol.14
- Dietterich, T.G.¹ Wang, X.²

22
- 31844456754
- Finite time bounds for sampling based fitted value iteration
- Cs. Szepesvári and R. Munos. Finite time bounds for sampling based fitted value iteration. In ICML'2005, 2005.
- (2005) ICML'2005
- Szepesvári, Cs.¹ Munos, R.²

23
- 0033904367
- Nonparametric time series prediction through adaptive model selection
- April
- R. Meir. Nonparametric time series prediction through adaptive model selection. Machine Learning, 39(1):5-34, April 2000.
- (2000) Machine Learning , vol.39 , Issue.1 , pp. 5-34
- Meir, R.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.