SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 5323 LNAI, Issue , 2008, Pages 55-68

Regularized fitted Q-Iteration: Application to planning

(4) Farahmand, Amir Massoud a Ghavamzadeh, Mohammad a Szepesvári, Csaba a Mannor, Shie b

a UNIVERSITY OF ALBERTA (Canada)

b MCGILL UNIVERSITY (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

PROBABILITY DENSITY FUNCTION; REINFORCEMENT LEARNING;

DECISION PROBLEMS; FUNCTION SPACES; KERNEL FUNCTIONS; KERNEL HILBERT SPACE; LEAST SQUARES (LS); MARKOVIAN;

REINFORCEMENT;

EID: 58449110583 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-540-89722-4_5 Document Type: Conference Paper

Times cited : (20)

References (23)

1
- 34548752490
- Antes, A., Szepesvári, C., Munos, R.: Value-iteration based fitted policy iteration: learning with a single trajectory. In: IEEE ADPRL, pp. 330-337 (2007)
- Antes, A., Szepesvári, C., Munos, R.: Value-iteration based fitted policy iteration: learning with a single trajectory. In: IEEE ADPRL, pp. 330-337 (2007)

2
- 58449095085
- Antos, A., Munos, R., Szepesvári, C.: Fitted Q-iteration in continuous action-space MDPs. In: Advances in Neural Information Processing Systems 20, NIPS 2007 (in print, 2008)
- Antos, A., Munos, R., Szepesvári, C.: Fitted Q-iteration in continuous action-space MDPs. In: Advances in Neural Information Processing Systems 20, NIPS 2007 (in print, 2008)

3
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Antos, A., Szepesvári, C., Munos, R.: Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning 71, 89-129 (2008)
- (2008) Machine Learning , vol.71 , pp. 89-129
- Antos, A.¹ Szepesvári, C.² Munos, R.³

4
- 0003923091
- Academic Press, New York
- Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control (The Discrete Time Case). Academic Press, New York (1978)
- (1978) Stochastic Optimal Control (The Discrete Time Case)
- Bertsekas, D.P.¹ Shreve, S.E.²

5
- 50849114939
- Sparsity oracle inequalities for the lasso
- Bunea, F., Tsybakov, A., Wegkamp, M.: Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics 1, 169-194 (2007)
- (2007) Electronic Journal of Statistics , vol.1 , pp. 169-194
- Bunea, F.¹ Tsybakov, A.² Wegkamp, M.³

6
- 3543096272
- The kernel recursive least squares algorithm
- Engel, Y., Mannor, S., Meir, R.: The kernel recursive least squares algorithm. IEEE Transaction on Signal Processing 52(8), 2275-2285 (2004)
- (2004) IEEE Transaction on Signal Processing , vol.52 , Issue.8 , pp. 2275-2285
- Engel, Y.¹ Mannor, S.² Meir, R.³

7
- 31844451013
- Reinforcement learning with Gaussian processes
- ACM, New York
- Engel, Y., Mannor, S., Meir, R.: Reinforcement learning with Gaussian processes. In: ICML 2005: Proceedings of the 22nd international conference on Machine learning, pp. 201-208. ACM, New York (2005)
- (2005) ICML 2005: Proceedings of the 22nd international conference on Machine learning , pp. 201-208
- Engel, Y.¹ Mannor, S.² Meir, R.³

8
- 21844465127
- Tree-based batch mode reinforcement learning
- Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503-556 (2005)
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

9
- 70049096468
- Regularized policy iteration
- to appear
- Farahmand, A.M., Ghavamzadeh, M., Szepesvari, C., Mannor, S.: Regularized policy iteration. In: Advances in Neural Information Processing Systems 21, NIPS 2008 (to appear, 2008)
- (2008) Advances in Neural Information Processing Systems 21, NIPS 2008
- Farahmand, A.M.¹ Ghavamzadeh, M.² Szepesvari, C.³ Mannor, S.⁴

10
- 0003624357
- Springer, New York
- Györfi, L., Kohler, M., Krzyzak, A., Walk, H.: A distribution-free theory of non-parametric regression. Springer, New York (2002)
- (2002) A distribution-free theory of non-parametric regression
- Györfi, L.¹ Kohler, M.² Krzyzak, A.³ Walk, H.⁴

11
- 84885993384
- Jung, T., Polani, D.: Least squares SVM for least squares TD learning. In: ECAI, pp. 499-503 (2006)
- Jung, T., Polani, D.: Least squares SVM for least squares TD learning. In: ECAI, pp. 499-503 (2006)

12
- 84880649215
- A sparse sampling algorithm for near-optimal planning in large Markovian decision processes
- Kearns, M., Mansour, Y., Ng, A.Y.: A sparse sampling algorithm for near-optimal planning in large Markovian decision processes. In: Proceedings of IJCAI 1999, pp. 1324-1331 (1999)
- (1999) Proceedings of IJCAI , pp. 1324-1331
- Kearns, M.¹ Mansour, Y.² Ng, A.Y.³

13
- 1942420814
- Reinforcement learning as classification: Leveraging modern classifiers
- Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: ICML 2003, pp. 424-431 (2003)
- (2003) ICML 2003 , pp. 424-431
- Lagoudakis, M.G.¹ Parr, R.²

14
- 34548803187
- Sparse temporal difference learning using LASSO
- Loth, M., Davy, M., Preux, P.: Sparse temporal difference learning using LASSO. In: IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (2007)
- (2007) IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
- Loth, M.¹ Davy, M.² Preux, P.³

15
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- Mannor, S., Menache, I., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215-238 (2005)
- (2005) Annals of Operations Research , vol.134 , pp. 215-238
- Mannor, S.¹ Menache, I.² Shimkin, N.³

16
- 44649189852
- Finite-time bounds for fitted value iteration
- Munos, R., Szepesvári, C.: Finite-time bounds for fitted value iteration. Journal of Machine Learning Research 9, 815-857 (2008)
- (2008) Journal of Machine Learning Research , vol.9 , pp. 815-857
- Munos, R.¹ Szepesvári, C.²

17
- 0141819580
- PEGASUS: A policy search method for large MDPs and POMDPs
- Ng, A.Y., Jordan, M.: PEGASUS: A policy search method for large MDPs and POMDPs. In: Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence, pp. 406-415 (2000)
- (2000) Proceedings of the 16th Conference in Uncertainty in Artificial Intelligence , pp. 406-415
- Ng, A.Y.¹ Jordan, M.²

18
- 0036832956
- Kernel-based reinforcement learning
- Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49, 161-178 (2002)
- (2002) Machine Learning , vol.49 , pp. 161-178
- Ormoneit, D.¹ Sen, S.²

19
- 34547982545
- Analyzing feature generation for value-function approximation
- Parr, R., Painter-Wakefield, C., Li, L., Littman, M.L.: Analyzing feature generation for value-function approximation. In: ICML, pp. 737-744 (2007)
- (2007) ICML , pp. 737-744
- Parr, R.¹ Painter-Wakefield, C.² Li, L.³ Littman, M.L.⁴

20
- 0004094721
- MIT Press, Cambridge
- Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
- (2002) Learning with Kernels
- Schölkopf, B.¹ Smola, A.J.²

21
- 33746031418
- Srebro, N., Ben-David, S.: Learning bounds for support vector machines with learned kernels. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, 4005, pp. 169-183. Springer, Heidelberg (2006)
- Srebro, N., Ben-David, S.: Learning bounds for support vector machines with learned kernels. In: Lugosi, G., Simon, H.U. (eds.) COLT 2006. LNCS, vol. 4005, pp. 169-183. Springer, Heidelberg (2006)

22
- 34547098844
- Kernel-based least squares policy iteration for reinforcement learning
- Xu, X., Hu, D., Lu, X.: Kernel-based least squares policy iteration for reinforcement learning. IEEE Trans, on Neural Networks 18, 973-992 (2007)
- (2007) IEEE Trans, on Neural Networks , vol.18 , pp. 973-992
- Xu, X.¹ Hu, D.² Lu, X.³

23
- 0038105204
- Capacity of reproducing kernel spaces in learning theory
- Zhou, D.-X.: Capacity of reproducing kernel spaces in learning theory. IEEE Transactions on Information Theory 49, 1743-1752 (2003)
- (2003) IEEE Transactions on Information Theory , vol.49 , pp. 1743-1752
- Zhou, D.-X.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.