SCOPUS 정보 검색 플랫폼

IEEE SSCI 2011: Symposium Series on Computational Intelligence - ADPRL 2011: 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning

Volumn , Issue , 2011, Pages 9-16

Parametric value function approximation: A unified view

(2) Geist, Matthieu a Pietquin, Olivier b

a UMI Georgia Tech CNRS 2958 (France)

b CNRS (France)

Author keywords

Reinforcement learning; survey; value function approximation

Indexed keywords

ARTIFICIAL INTELLIGENCE; COST FUNCTIONS; DYNAMIC PROGRAMMING; OPTIMAL CONTROL SYSTEMS; QUALITY CONTROL; STOCHASTIC SYSTEMS; SURVEYING; SURVEYS;

EXACT REPRESENTATIONS; OPTIMAL CONTROL POLICY; OPTIMAL CONTROL PROBLEM; RECURSIVE LEAST SQUARE (RLS); RELATED ALGORITHMS; STATE-OF-THE-ART METHODS; STOCHASTIC GRADIENT DESCENT; VALUE FUNCTION APPROXIMATION;

REINFORCEMENT LEARNING;

EID: 80052232599 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ADPRL.2011.5967355 Document Type: Conference Paper

Times cited : (34)

References (54)

1
- 0003487482
- Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming (Optimization and Neural Computation Series, 3). Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming (Optimization and Neural Computation Series , vol.3
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

2
- 0004102479
- 3rd ed. The MIT Press, March
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 3rd ed. The MIT Press, March 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

3
- 80052257863
- Wiley - ISTE
- O. Sigaud and O. Buffet, Eds., Markov Decision Processes and Artificial Intelligence. Wiley - ISTE, 2010.
- (2010) Markov Decision Processes and Artificial Intelligence
- Sigaud, O.¹ Buffet, O.²

4
- 85102627959
- Wiley-Interscience
- M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley-Interscience, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

5
- 0003787146
- 6th ed. Dover Publications
- R. Bellman, Dynamic Programming, 6th ed. Dover Publications, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

6
- 80052238397
- Supélec, Tech. Rep., September
- M. Geist and O. Pietquin, "A Brief Survey of Parametric Value Function Approximation," Supélec, Tech. Rep., September 2010.
- (2010) A Brief Survey of Parametric Value Function Approximation
- Geist, M.¹ Pietquin, O.²

7
- 33646435300
- A generalized kalman filter for fixed point approximation and efficient temporal-difference learning
- D. Choi and B. Van Roy, "A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning," Discrete Event Dynamic Systems, vol. 16, pp. 207-239, 2006.
- (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
- Choi, D.¹ Van Roy, B.²

8
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Transactions on Automatic Control, vol. 42, 1997
- (1997) IEEE Transactions on Automatic Control , vol.42
- Tsitsiklis, J.N.¹ Van Roy, B.²

9
- 0029276036
- Temporal difference learning and TD-gammon
- March
- G. Tesauro, "Temporal Difference Learning and TD-Gammon," Communications of the ACM, vol. 38, no. 3, March 1995.
- (1995) Communications of the ACM , vol.38 , Issue.3
- Tesauro, G.¹

10
- 56449091120
- An analysis of reinforcement learning with function approximation
- F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proceedings of the 25th International Conference on Machine Learning, 2009, pp. 664-671.
- (2009) Proceedings of the 25th International Conference on Machine Learning , pp. 664-671
- Melo, F.S.¹ Meyn, S.P.² Ribeiro, M.I.³

11
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- A. Antos, C. Szepesvári, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Machine Learning, vol. 71, no. 1, pp. 89-129, 2008.
- (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
- Antos, A.¹ Szepesvári, C.² Munos, R.³

12
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- L. C. Baird, "Residual Algorithms: Reinforcement Learning with Function Approximation," in Proc. of the International Conference on Machine Learning (ICML 95), 1995, pp. 30-37.
- (1995) Proc. of the International Conference on Machine Learning (ICML 95) , pp. 30-37
- Baird, L.C.¹

13
- 1942421151
- Bayes meets bellman: The gaussian process approach to temporal difference learning
- Y. Engel, S. Mannor, and R. Meir, "Bayes Meets Bellman: The Gaussian Process Approach to Temporal Difference Learning," in Proceedings of the International Conference on Machine Learning (ICML 03), 2003, pp. 154-161.
- (2003) Proceedings of the International Conference on Machine Learning (ICML 03) , pp. 154-161
- Engel, Y.¹ Mannor, S.² Meir, R.³

14
- 25444448065
- The MIT Press
- C. E. Rassmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. The MIT Press, 2006.
- (2006) Gaussian Processes for Machine Learning
- Rassmussen, C.E.¹ Williams, C.K.I.²

15
- 3543096272
- The kernel recursive least squares algorithm
- [Online]
- Y. Engel, S. Mannor, and R. Meir, "The Kernel Recursive Least Squares Algorithm," IEEE Transactions on Signal Processing, vol. 52, pp. 2275-2285, 2004. [Online]. Available: http://www.cs.ualberta.ca/yaki/
- (2004) IEEE Transactions on Signal Processing , vol.52 , pp. 2275-2285
- Engel, Y.¹ Mannor, S.² Meir, R.³

16
- 31844456714
- Ph.D. dissertation, Hebrew University, April
- Y. Engel, "Algorithms and Representations for Reinforcement Learning," Ph.D. dissertation, Hebrew University, April 2005.
- (2005) Algorithms and Representations for Reinforcement Learning
- Engel, Y.¹

17
- 3543103331
- Adaptive Systems Lab, McMaster University, Tech. Rep.
- Z. Chen, "Bayesian Filtering : From Kalman Filters to Particle Filters, and Beyond," Adaptive Systems Lab, McMaster University, Tech. Rep., 2003.
- (2003) Bayesian Filtering : From Kalman Filters to Particle Filters, and beyond
- Chen, Z.¹

18
- 31844451013
- Reinforcement learning with Gaussian processes
- Y. Engel, S. Mannor, and R. Meir, "Reinforcement Learning with Gaussian Processes," in Proceedings of the International Conference on Machine Learning (ICML 05), 2005.
- (2005) Proceedings of the International Conference on Machine Learning (ICML 05)
- Engel, Y.¹ Mannor, S.² Meir, R.³

19
- 0242393653
- Eligibility traces for off- policy policy evaluation
- San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
- D. Precup, R. S. Sutton, and S. P. Singh, "Eligibility Traces for Off- Policy Policy Evaluation," in Proceedings of the Seventeenth International Conference on Machine Learning (ICML 00). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 759-766.
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning (ICML 00) , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.P.³

20
- 67650458797
- Kalman temporal differences: The deterministic case
- Nashville, TN, USA, April
- M. Geist, O. Pietquin, and G. Fricout, "Kalman Temporal Differences: the deterministic case," in Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, TN, USA, April 2009.
- (2009) Proceedings of the IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009)
- Geist, M.¹ Pietquin, O.² Fricout, G.³

21
- 78651465938
- Kalman temporal differences
- M. Geist and O. Pietquin, "Kalman Temporal Differences," Journal of Artificial Intelligence Research (JAIR), 2010.
- (2010) Journal of Artificial Intelligence Research (JAIR)
- Geist, M.¹ Pietquin, O.²

22
- 79951485912
- Eligibility traces through colored noises
- Moscow (Russia): IEEE, October
- -, "Eligibility Traces through Colored Noises," in Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT 2010). Moscow (Russia): IEEE, October 2010.
- (2010) Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT 2010)
- Geist, M.¹ Pietquin, O.²

23
- 34547974097
- Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation
- C. W. Phua and R. Fitch, "Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation," in Proceedings of the International Conference on Machine Learning (ICML 07), 2007.
- (2007) Proceedings of the International Conference on Machine Learning (ICML 07)
- Phua, C.W.¹ Fitch, R.²

24
- 76649127744
- Tracking in reinforcement learning
- Bangkok (Thailande): Springer
- M. Geist, O. Pietquin, and G. Fricout, "Tracking in Reinforcement Learning," in Proceedings of the 16th International Conference on Neural Information Processing (ICONIP 2009). Bangkok (Thailande): Springer, 2009.
- (2009) Proceedings of the 16th International Conference on Neural Information Processing (ICONIP 2009)
- Geist, M.¹ Pietquin, O.² Fricout, G.³

25
- 85024429815
- A new approach to linear filtering and prediction problems
- R. E. Kalman, "A new approach to linear filtering and prediction problems," Transactions of the ASME-Journal of Basic Engineering, vol. 82, no. Series D, pp. 35-45, 1960.
- (1960) Transactions of the ASME-Journal of Basic Engineering , vol.82 , Issue.SERIES D , pp. 35-45
- Kalman, R.E.¹

26
- 78449267579
- Statistically linearized recursive least squares
- Kittilá (Finland): IEEE
- M. Geist and O. Pietquin, "Statistically Linearized Recursive Least Squares," in Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2010). Kittilá (Finland): IEEE, 2010.
- (2010) Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2010)
- Geist, M.¹ Pietquin, O.²

27
- 80052208068
- Managing uncertainty within value function approximation in reinforcement learning
- Active Learning and Experimental Design Workshop (collocated with AISTATS 2010) , Sardinia, Italy
- -, "Managing Uncertainty within Value Function Approximation in Reinforcement Learning," in Active Learning and Experimental Design Workshop (collocated with AISTATS 2010), ser. Journal of Machine Learning Research - Workshop and Conference Proceedings, Sardinia, Italy, 2010.
- (2010) Journal of Machine Learning Research - Workshop and Conference Proceedings
- Geist, M.¹ Pietquin, O.²

28
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto, "Linear Least-Squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 1-3, pp. 33-57, 1996. (Pubitemid 126724362)
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
- Bradtke, S.J.¹

29
- 0036236260
- Instrumental variable methods for system identification
- DOI 10.1007/BF01211647
- T. Söderström and P. Stoica, "Instrumental variable methods for system identification," Circuits, Systems, and Signal Processing, vol. 21, pp. 1-9, 2002. (Pubitemid 34414642)
- (2002) Circuits, Systems, and Signal Processing , vol.21 , Issue.1 , pp. 1-9
- Soderstrom, T.¹ Stoica, P.²

30
- 4644323293
- Least-squares policy iteration
- M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," Journal of Machine Learning Research, vol. 4, pp. 1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

31
- 0036832950
- Technical update: Least-squares temporal difference learning
- DOI 10.1023/A:1017936530646
- J. A. Boyan, "Technical Update: Least-Squares Temporal Difference Learning," Machine Learning, vol. 49, no. 2-3, pp. 233-246, 1999. (Pubitemid 34325688)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
- Boyan, J.A.¹

32
- 77956517288
- Convergence of least squares temporal difference methods under general conditions
- H. Yu, "Convergence of Least Squares Temporal Difference Methods Under General Conditions," in International Conference on Machine Learning (ICML 2010), 2010, pp. 1207-1214.
- (2010) International Conference on Machine Learning (ICML 2010) , pp. 1207-1214
- Yu, H.¹

33
- 79951499926
- Statistically linearized least-squares temporal differences
- Moscow (Russia): IEEE, October
- M. Geist and O. Pietquin, "Statistically Linearized Least-Squares Temporal Differences," in Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT 2010). Moscow (Russia): IEEE, October 2010.
- (2010) Proceedings of the IEEE International Conference on Ultra Modern Control Systems (ICUMT 2010)
- Geist, M.¹ Pietquin, O.²

34
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- New York, NY, USA: ACM
- R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, "Fast gradient-descent methods for temporal-difference learning with linear function approximation," in ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning. New York, NY, USA: ACM, 2009, pp. 993- 1000.
- (2009) ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

35
- 70349982705
- Incremental natural actor-critic algorithms
- Vancouver, Canada
- S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Incremental natural actor-critic algorithms," in Conference on Neural Information Processing Systems (NIPS), Vancouver, Canada, 2007.
- (2007) Conference on Neural Information Processing Systems (NIPS)
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

36
- 79951481923
- Convergent temporal-difference learning with arbitrary smooth function approximation
- Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds.
- H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds., 2009, pp. 1204-1212.
- (2009) Advances in Neural Information Processing Systems , vol.22 , pp. 1204-1212
- Maei, H.¹ Szepesvari, C.² Bhatnagar, S.³ Precup, D.⁴ Silver, D.⁵ Sutton, R.S.⁶

37
- 77954101982
- GQ(λ): A general gradient algorithm for temporal-differences prediction learning with eligibility traces
- H. R. Maei and R. S. Sutton, "GQ(λ): a general gradient algorithm for temporal-differences prediction learning with eligibility traces," in Third Conference on Artificial General Intelligence, 2010.
- (2010) Third Conference on Artificial General Intelligence
- Maei, H.R.¹ Sutton, R.S.²

38
- 77956541799
- Toward off-policy learning control with function approximation
- H. R. Maei, C. Szepesvari, S. Bhatnagar, and R. S. Sutton, "Toward Off-Policy Learning Control with Function Approximation," in 27th conference on Machine Learning (ICML 2010), 2010.
- (2010) 27th Conference on Machine Learning (ICML 2010)
- Maei, H.R.¹ Szepesvari, C.² Bhatnagar, S.³ Sutton, R.S.⁴

39
- 0001201756
- Some studies in machine learning using the game of checkers
- A. Samuel, "Some studies in machine learning using the game of checkers," IBM Journal on Research and Development, pp. 210-229, 1959.
- (1959) IBM Journal on Research and Development , pp. 210-229
- Samuel, A.¹

40
- 84880694195
- Stable function approximation in dynamic programming
- G. Gordon, "Stable Function Approximation in Dynamic Programming," in Proceedings of the International Conference on Machine Learning (IMCL 95), 1995.
- (1995) Proceedings of the International Conference on Machine Learning (IMCL 95)
- Gordon, G.¹

41
- 40949107944
- Performance bounds in Lp norm for approximate value iteration
- R. Munos, "Performance Bounds in Lp norm for Approximate Value Iteration," SIAM Journal on Control and Optimization, 2007.
- (2007) SIAM Journal on Control and Optimization
- Munos, R.¹

42
- 0036832956
- Kernel-based reinforcement learning
- DOI 10.1023/A:1017928328829
- D. Ormoneit and S. Sen, "Kernel-Based Reinforcement Learning," Machine Learning, vol. 49, pp. 161-178, 2002. (Pubitemid 34325684)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
- Ormoneit, D.¹ Sen, A.²

43
- 33646687423
- Neural fitted q iteration - First experiences with a data efficient neural reinforcement learning method
- M. Riedmiller, "Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method ," in Europeac Conference on Machine Learning (ECML), 2005.
- (2005) Europeac Conference on Machine Learning (ECML)
- Riedmiller, M.¹

44
- 21844465127
- Tree-based batch mode reinforcement learning
- D. Ernst, P. Geurts, and L. Wehenkel, "Tree-Based Batch Mode Reinforcement Learning," Journal of Machine Learning Research, vol. 6, pp. 503-556, 2005.
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

45
- 0037288398
- Least squares policy evaluation algorithms with linear function approximation
- A. Nedić and D. P. Bertsekas, "Least Squares Policy Evaluation Algorithms with Linear Function Approximation," Discrete Event Dynamic Systems: Theory and Applications, vol. 13, pp. 79-110, 2003.
- (2003) Discrete Event Dynamic Systems: Theory and Applications , vol.13 , pp. 79-110
- Nedić, A.¹ Bertsekas, D.P.²

46
- 18544370594
- IEEE Press, ch. Improved Temporal Difference Methods with Linear Function Approximation
- D. P. Bertsekas, V. Borkar, and A. Nedic, Learning and Approximate Dynamic Programming. IEEE Press, 2004, ch. Improved Temporal Difference Methods with Linear Function Approximation, pp. 231-235.
- (2004) Learning and Approximate Dynamic Programming , pp. 231-235
- Bertsekas, D.P.¹ Borkar, V.² Nedic, A.³

47
- 61849106433
- Projected equation methods for approximate solution of large linear systems
- D. P. Bertsekas and H. Yu, "Projected Equation Methods for Approximate Solution of Large Linear Systems," Journal of Computational and Applied Mathematics, vol. 227, pp. 27-50, 2007.
- (2007) Journal of Computational and Applied Mathematics , vol.227 , pp. 27-50
- Bertsekas, D.P.¹ Yu, H.²

48
- 84876115077
- Projected equations, variational inequalities, and temporal difference methods
- D. P. Bertsekas, "Projected Equations, Variational Inequalities, and Temporal Difference Methods," in IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning, 2009.
- (2009) IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning
- Bertsekas, D.P.¹

49
- 84927748655
- Q-learning algorithms for optimal stopping based on least squares
- Kos, Greece
- H. Yu and D. P. Bertsekas, "Q-Learning Algorithms for Optimal Stopping Based on Least Squares," in Proceedings of European Control Conference, Kos, Greece, 2007.
- (2007) Proceedings of European Control Conference
- Yu, H.¹ Bertsekas, D.P.²

50
- 34547098844
- Kernel-based least squares policy iteration for reinforcement learning
- DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
- X. Xu, D. Hu, and X. Lu, "Kernel-Based Least Squares Policy Iteration for Reinforcement Learning," IEEE Transactions on Neural Networks, vol. 18, no. 4, pp. 973-992, July 2007. (Pubitemid 47098876)
- (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
- Xu, X.¹ Hu, D.² Lu, X.³

51
- 34548765672
- Kernelizing LSPE(λ)
- T. Jung and D. Polani, "Kernelizing LSPE(λ)," in IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, 2007, pp. 338-345.
- (2007) IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning , pp. 338-345
- Jung, T.¹ Polani, D.²

52
- 33750737011
- Incremental least-squares temporal difference learning
- Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
- A. Geramifard, M. Bowling, and R. S. Sutton, "Incremental Least- Squares Temporal Difference Learning," in 21st Conference of American Association for Artificial Intelligence (AAAI 06), 2006, pp. 356-361. (Pubitemid 44705310)
- (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
- Geramifard, A.¹ Bowling, M.² Sutton, R.S.³

53
- 70049096468
- Regularized policy iteration
- Vancouver, Canada
- A. Farahmand, M. Ghavamzadeh, C. Szepesvári, and S. Mannor, "Regularized policy iteration," in 22nd Annual Conference on Neural Information Processing Systems (NIPS 21), Vancouver, Canada, 2008.
- (2008) 22nd Annual Conference on Neural Information Processing Systems (NIPS 21)
- Farahmand, A.¹ Ghavamzadeh, M.² Szepesvári, C.³ Mannor, S.⁴

54
- 71149121683
- Regularization and feature selection in least-squares temporal difference learning
- Montreal Canada
- J. Z. Kolter and A. Y. Ng, "Regularization and Feature Selection in Least-Squares Temporal Difference Learning," in proceedings of the 26th International Conference on Machine Learning (ICML 2009), Montreal Canada, 2009.
- (2009) Proceedings of the 26th International Conference on Machine Learning (ICML 2009)
- Kolter, J.Z.¹ Ng, A.Y.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.