SCOPUS 정보 검색 플랫폼

IEEE Transactions on Neural Networks and Learning Systems

Volumn 24, Issue 6, 2013, Pages 845-867

Algorithmic survey of parametric value function approximation

(2) Geist, Matthieu a Pietquin, Olivier a,b

a UMI Georgia Tech CNRS 2958 (France)

b CNRS (France)

Author keywords

Reinforcement learning (RL); survey; value function approximation

Indexed keywords

EXACT REPRESENTATIONS; MINIMIZATION METHODS; OPTIMAL CONTROL POLICY; OPTIMAL CONTROL PROBLEM; RECURSIVE LEAST SQUARES; STATE-OF-THE-ART METHODS; STOCHASTIC GRADIENT DESCENT; VALUE FUNCTION APPROXIMATION;

ALGORITHMS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING;

SURVEYS;

EID: 84876108223 PISSN: 2162237X EISSN: 21622388 Source Type: Journal
DOI: 10.1109/TNNLS.2013.2247418 Document Type: Article

Times cited : (73)

References (98)

1
- 0003487482
- Belmont, MA, USA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming (Optimization and Neural Computation). Belmont, MA, USA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming (Optimization and Neural Computation)
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

2
- 0004102479
- Cambridge, MA, USA: MIT Press, Mar
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 3rd ed. Cambridge, MA, USA: MIT Press, Mar. 1998.
- (1998) Reinforcement Learning: An Introduction, 3rd Ed
- Sutton, R.S.¹ Barto, A.G.²

3
- 80052257863
- New York, USA: Wiley
- O. Sigaud and O. Buffet, Markov Decision Processes and Artificial Intelligence. New York, USA: Wiley, 2010.
- (2010) Markov Decision Processes and Artificial Intelligence
- Sigaud, O.¹ Buffet, O.²

4
- 79955859296
- San Mateo, CA, USA: Morgan & Claypool
- C. Szepesvári, Algorithms for Reinforcement Learning. San Mateo, CA, USA: Morgan & Claypool, 2010.
- (2010) Algorithms for Reinforcement Learning
- Szepesvári, C.¹

5
- 84873574800
- New York, USA: Springer-Verlag
- M. Wiering and M. van Otterlo, Reinforcement Learning: State-ofthe-Art (Adaptation, Learning, and Optimization). New York, USA: Springer-Verlag, 2012.
- (2012) Reinforcement Learning: State-ofthe-Art (Adaptation, Learning, and Optimization)
- Wiering, M.¹ Van Otterlo, M.²

6
- 85046476577
- Boca Raton FL USA: CRC Press
- L. Busoniu, R. Babuska, B. D. Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering). Boca Raton, FL, USA: CRC Press, 2010.
- (2010) Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering)
- Busoniu, L.¹ Babuska, R.² Schutter, B.D.³ Ernst, D.⁴

7
- 47349092417
- New York USA: Wiley
- W. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality. New York, USA: Wiley, 2007.
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell, W.¹

8
- 33646435300
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- Apr
- D. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., vol. 16, no. 2, pp. 207-239, Apr. 2006.
- (2006) Discrete Event Dyn. Syst , vol.16 , Issue.2 , pp. 207-239
- Choi, D.¹ Van Roy, B.²

9
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. Int. Conf. Mach. Learn., 1995, pp. 30-37.
- (1995) Proc. Int. Conf. Mach. Learn , pp. 30-37
- Baird, L.C.¹

10
- 31844456714
- Ph.D. dissertation, Interdisciplinary Center Neural Comput., Hebrew Univ., Jerusalem, Israel Apr
- Y. Engel, "Algorithms and representations for reinforcement learning," Ph.D. dissertation, Interdisciplinary Center Neural Comput., Hebrew Univ., Jerusalem, Israel, Apr. 2005.
- (2005) Algorithms and Representations for Reinforcement Learning
- Engel, Y.¹

11
- 78651465938
- Kalman temporal differences
- M. Geist and O. Pietquin, "Kalman temporal differences," J. Artif. Intell. Res., vol. 39, no. 1, pp. 483-532, 2010.
- (2010) J. Artif. Intell. Res , vol.39 , Issue.1 , pp. 483-532
- Geist, M.¹ Pietquin, O.²

12
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Mach. Learn., vol. 22, nos. 1-3, pp. 33-57, 1996. (Pubitemid 126724362)
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
- Bradtke, S.J.¹

13
- 79951499926
- Statistically linearized least-squares temporal differences
- Oct
- M. Geist and O. Pietquin, "Statistically linearized least-squares temporal differences," in Proc. IEEE Int. Conf. Ultra Modern Control Syst., Oct. 2010, pp. 450-457.
- (2010) Proc. IEEE Int. Conf. Ultra Modern Control Syst , pp. 450-457
- Geist, M.¹ Pietquin, O.²

14
- 71149099079
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- R. S. Sutton, H. R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, "Fast gradient-descent methods for temporal-difference learning with linear function approximation," in Proc. Int. Conf. Mach. Learn., 2009, pp. 993-1000.
- (2009) Proc. Int. Conf. Mach. Learn , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

15
- 79951481923
- Convergent temporal-difference learning with arbitrary smooth function approximation
- Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds. Cambridge, MA, USA: MIT Press
- H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Advances in Neural Information Processing Systems, Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds. Cambridge, MA, USA: MIT Press, 2009, pp. 1204-1212.
- (2009) Advances in Neural Information Processing Systems , pp. 1204-1212
- Maei, H.¹ Szepesvari, C.² Bhatnagar, S.³ Precup, D.⁴ Silver, D.⁵ Sutton, R.S.⁶

16
- 77954101982
- GQ(a;): A general gradient algorithm for temporal-differences prediction learning with eligibility traces
- H. R. Maei and R. S. Sutton, "GQ(a;): A general gradient algorithm for temporal-differences prediction learning with eligibility traces," in Proc. Conf. Artif. General Intell., 2010, pp. 1-6.
- (2010) Proc. Conf. Artif. General Intell , pp. 1-6
- Maei, H.R.¹ Sutton, R.S.²

17
- 77956541799
- Toward off-policy learning control with function approximation
- H. R. Maei, C. Szepesvari, S. Bhatnagar, and R. S. Sutton, "Toward off-policy learning control with function approximation," in Proc. Int. Conf. Mach. Learn., 2010, pp. 1-8.
- (2010) Proc. Int. Conf. Mach. Learn , pp. 1-8
- Maei, H.R.¹ Szepesvari, C.² Bhatnagar, S.³ Sutton, R.S.⁴

18
- 0037288398
- Least squares policy evaluation algorithms with linear function approximation
- Jan
- A. Nedic and D. P. Bertsekas, "Least squares policy evaluation algorithms with linear function approximation," Discrete Event Dyn. Syst., Theory Appl., vol. 13, nos. 1-2, pp. 79-110, Jan. 2003.
- (2003) Discrete Event Dyn. Syst., Theory Appl , vol.13 , Issue.1-2 , pp. 79-110
- Nedic, A.¹ Bertsekas, D.P.²

19
- 84927748655
- Q-learning algorithms for optimal stopping based on least squares
- Jul
- H. Yu and D. P. Bertsekas, "Q-learning algorithms for optimal stopping based on least squares," in Proc. Eur. Control Conf., Jul. 2007, pp. 1-15.
- (2007) Proc. Eur. Control Conf , pp. 1-15
- Yu, H.¹ Bertsekas, D.P.²

20
- 84880694195
- Stable function approximation in dynamic programming
- G. Gordon, "Stable function approximation in dynamic programming," in Proc. Int. Conf. Mach. Learn., 1995, pp. 1-8.
- (1995) Proc. Int. Conf. Mach. Learn , pp. 1-8
- Gordon, G.¹

21
- 21844465127
- Tree-based batch mode reinforcement learning
- Apr
- D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Apr. 2005.
- (2005) J. Mach. Learn. Res , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

22
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
- (1988) Mach. Learn , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.S.¹

23
- 0003636089
- Dept. Eng., Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR 166
- G. A. Rummery and M. Niranjan, "On-line Q-learning using connectionist systems," Dept. Eng., Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR 166, 1994.
- (1994) On-line Q-learning Using Connectionist Systems
- Rummery, G.A.¹ Niranjan, M.²

24
- 34249833101
- Q-learning
- C. J. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3, pp. 279-292, 1992.
- (1992) Mach. Learn , vol.8 , Issue.3 , pp. 279-292
- Watkins, C.J.¹ Dayan, P.²

25
- 1942421151
- Bayes meets Bellman: The Gaussian process approach to temporal difference learning
- Y. Engel, S. Mannor, and R. Meir, "Bayes meets Bellman: The Gaussian process approach to temporal difference learning," in Proc. Int. Conf. Mach. Learn., 2003, pp. 154-161.
- (2003) Proc. Int. Conf. Mach. Learn , pp. 154-161
- Engel, Y.¹ Mannor, S.² Meir, R.³

26
- 0031143730
- An analysis of temporal-difference learning with function approximation
- PII S0018928697034375
- J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Autom. Control, vol. 42, no. 5, pp. 674-690, May 1997. (Pubitemid 127760263)
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

27
- 0029276036
- Temporal difference learning and TD-Gammon
- Mar
- G. Tesauro, "Temporal difference learning and TD-Gammon," Commun. ACM, vol. 38, no. 3, pp. 58-68, Mar. 1995.
- (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

28
- 67650505307
- A theoretical and empirical analysis of expected Sarsa
- Mar.-Apr
- H. van Seijen, H. van Hasselt, S. Whiteson, and M. Wiering, "A theoretical and empirical analysis of expected Sarsa," in Proc. IEEE Int. Symp. Adapt. Dynamic Program. Reinforce. Learn., Mar.-Apr. 2009, pp. 177-184.
- (2009) Proc. IEEE Int. Symp. Adapt. Dynamic Program. Reinforce. Learn , pp. 177-184
- Van Seijen, H.¹ Van Hasselt, H.² Whiteson, S.³ Wiering, M.⁴

29
- 79960454066
- Dept. Math. Stat., Univ. Helsinki, Helsinki, Finland, Tech. Rep. C-2010-39
- H. Yu, "Least squares temporal difference methods: An analysis under general conditions," Dept. Math. Stat., Univ. Helsinki, Helsinki, Finland, Tech. Rep. C-2010-39, 2010.
- (2010) Least Squares Temporal Difference Methods: An Analysis under General Conditions
- Yu, H.¹

30
- 56449091120
- An analysis of reinforcement learning with function approximation
- F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proc. Int. Conf. Mach. Learn., 2009, pp. 664-671.
- (2009) Proc. Int. Conf. Mach. Learn , pp. 664-671
- Melo, F.S.¹ Meyn, S.P.² Ribeiro, M.I.³

31
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Apr
- A. Antos, C. Szepesvári, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Mach. Learn., vol. 71, no. 1, pp. 89-129, Apr. 2008.
- (2008) Mach. Learn , vol.71 , Issue.1 , pp. 89-129
- Antos, A.¹ Szepesvári, C.² Munos, R.³

32
- 2542529060
- On Fréchet subdifferentials
- A. Y. Kruger, "On Fréchet subdifferentials," J. Math. Sci., vol. 116, no. 3, pp. 3325-3358, 2003.
- (2003) J. Math. Sci , vol.116 , Issue.3 , pp. 3325-3358
- Kruger, A.Y.¹

33
- 0003648234
- New York, USA: Wiley
- T. W. Anderson, An Introduction to Multivariate Statistical Analysis (Probability and Statistics). New York, USA: Wiley, 1984.
- (1984) An Introduction to Multivariate Statistical Analysis (Probability and Statistics)
- Anderson, T.W.¹

34
- 31844451013
- Reinforcement learning with Gaussian processes
- DOI 10.1145/1102351.1102377, ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
- Y. Engel, S. Mannor, and R. Meir, "Reinforcement learning with gaussian processes," in Proc. Int. Conf. Mach. Learn., 2005, pp. 201-208. (Pubitemid 43183334)
- (2005) ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning , pp. 201-208
- Engel, Y.¹ Mannor, S.² Meir, R.³

35
- 0242393653
- Eligibility traces for offpolicy policy evaluation
- D. Precup, R. S. Sutton, and S. P. Singh, "Eligibility traces for offpolicy policy evaluation," in Proc. Int. Conf. Mach. Learn., 2000, pp. 759-766.
- (2000) Proc. Int. Conf. Mach. Learn , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.P.³

36
- 0031347068
- A new extension of the Kalman filter to nonlinear systems
- S. J. Julier and J. K. Uhlmann, "A new extension of the Kalman filter to nonlinear systems," in Proc. 3rd Int. Symp. Aerosp. Defense Sens. Simul. Controls, 1997, pp. 182-193.
- (1997) Proc. 3rd Int. Symp. Aerosp. Defense Sens. Simul. Controls , pp. 182-193
- Julier, S.J.¹ Uhlmann, J.K.²

37
- 0036060759
- The scaled unscented transformation
- S. J. Julier, "The scaled unscented transformation," in Proc. Amer. Control Conf., vol. 6. 2002, pp. 4555-4559.
- (2002) Proc. Amer. Control Conf , vol.6 , pp. 4555-4559
- Julier, S.J.¹

38
- 0034326226
- New developments in state estimation for nonlinear systems
- Nov
- P. M. Nørgard, N. K. Poulsen, and O. Ravn, "New developments in state estimation for nonlinear systems," Automatica, vol. 36, no. 11, pp. 1627-1638, Nov. 2000.
- (2000) Automatica , vol.36 , Issue.11 , pp. 1627-1638
- Nørgard, P.M.¹ Poulsen, N.K.² Ravn, O.³

39
- 8344287766
- Ph.D. dissertation, OGI School Science Eng., Oregon Health & Science Univ., Portland, OR, USA Apr
- R. van der Merwe, "Sigma-point kalman filters for probabilistic inference in dynamic state-space models," Ph.D. dissertation, OGI School Science Eng., Oregon Health & Science Univ., Portland, OR, USA, Apr. 2004.
- (2004) Sigma-point Kalman Filters for Probabilistic Inference in Dynamic State-space Models
- Merwe Der R.Van¹

40
- 79951485912
- Eligibility traces through colored noises
- Oct
- M. Geist and O. Pietquin, "Eligibility traces through colored noises," in Proc. IEEE Int. Conf. Ultra Modern Control Syst., Oct. 2010, pp. 458-465.
- (2010) Proc. IEEE Int. Conf. Ultra Modern Control Syst , pp. 458-465
- Geist, M.¹ Pietquin, O.²

41
- 84865737983
- Managing uncertainty within the KTD framework
- Mar
- M. Geist and O. Pietquin, "Managing uncertainty within the KTD framework," J. Mach. Learn. Res. (W&C Proc.), vol. 15, pp. 157-168, Mar. 2011.
- (2011) J. Mach. Learn. Res. (W&C Proc.) , vol.15 , pp. 157-168
- Geist, M.¹ Pietquin, O.²

42
- 84865703906
- Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system
- Aug
- L. Daubigney, M. Gasic, S. Chandramohan, M. Geist, O. Pietquin, and S. Young, "Uncertainty management for on-line optimisation of a POMDP-based large-scale spoken dialogue system," in Proc. Annu. Conf. Int. Speech Commun. Assoc., Aug. 2011, pp. 1301-1304.
- (2011) Proc. Annu. Conf. Int. Speech Commun. Assoc , pp. 1301-1304
- Daubigney, L.¹ Gasic, M.² Chandramohan, S.³ Geist, M.⁴ Pietquin, O.⁵ Young, S.⁶

43
- 4644323293
- Least-squares policy iteration
- Dec
- M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learn. Res., vol. 4, pp. 1107-1149, Dec. 2003.
- (2003) J. Mach. Learn. Res , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

44
- 0036236260
- Instrumental variable methods for system identification
- DOI 10.1007/BF01211647
- T. Söderström and P. Stoica, "Instrumental variable methods for system identification," Circuits Syst. Signal Process., vol. 21, no. 1, pp. 1-9, Jan.-Feb. 2002. (Pubitemid 34414642)
- (2002) Circuits, Systems, and Signal Processing , vol.21 , Issue.1 , pp. 1-9
- Soderstrom, T.¹ Stoica, P.²

45
- 33750737011
- Incremental least-squares temporal difference learning
- Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
- A. Geramifard, M. Bowling, and R. S. Sutton, "Incremental least-squares temporal difference learning," in Proc. 21st Nat. Conf. Artif. Intell., 2006, pp. 356-361. (Pubitemid 44705310)
- (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
- Geramifard, A.¹ Bowling, M.² Sutton, R.S.³

46
- 68949099445
- Hybrid least-squares algorithms for approximate policy evaluation
- Sep
- J. Johns, M. Petrik, and S. Mahadevan, "Hybrid least-squares algorithms for approximate policy evaluation," Mach. Learn., vol. 76, nos. 2-3, pp. 243-256, Sep. 2009.
- (2009) Mach. Learn , vol.76 , Issue.2-3 , pp. 243-256
- Johns, J.¹ Petrik, M.² Mahadevan, S.³

47
- 78449267579
- Statistically linearized recursive least squares
- Sep
- M. Geist and O. Pietquin, "Statistically linearized recursive least squares," in Proc. IEEE Int. Workshop Mach. Learn. Signal Process., Sep. 2010, pp. 272-276.
- (2010) Proc. IEEE Int. Workshop Mach. Learn. Signal Process , pp. 272-276
- Geist, M.¹ Pietquin, O.²

48
- 85048441545
- A convergent O(n) Algorithm for off-policy temporal-difference learning with linear function approximation
- Vancouver, BC, Canada: MIT Press
- R. S. Sutton, C. Szepesvari, and H. R. Maei, "A convergent O(n) Algorithm for off-policy temporal-difference learning with linear function approximation," in Advances in Neural Information Processing Systems, Vancouver, BC, Canada: MIT Press, 2008.
- (2008) Advances in Neural Information Processing Systems
- Sutton, R.S.¹ Szepesvari, C.² Maei, H.R.³

49
- 0004007880
- New York, USA: Wiley
- B. D. Ripley, Stochastic Simulation, New York, USA: Wiley, 1987.
- (1987) Stochastic Simulation
- Ripley, B.D.¹

50
- 0001201756
- Some studies in machine learning using the game of checkers
- Jul
- A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM J. Res. Develop., vol. 3, no. 3, pp. 210-229, Jul. 1959.
- (1959) IBM J. Res. Develop , vol.3 , Issue.3 , pp. 210-229
- Samuel, A.L.¹

51
- 40949107944
- p-norm for approximate value iteration
- DOI 10.1137/040614384
- R. Munos, "Performance bounds in L p-norm for approximate value iteration," SIAM J. Control Optim., vol. 46, no. 2, pp. 541-561. Jan. 2007. (Pubitemid 351456426)
- (2007) SIAM Journal on Control and Optimization , vol.46 , Issue.2 , pp. 541-561
- Munos, R.¹

52
- 0036832956
- Kernel-based reinforcement learning
- DOI 10.1023/A:1017928328829
- D. Ormoneit and ́ S. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, nos. 2-3, pp. 161-178, Nov. 2002. (Pubitemid 34325684)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
- Ormoneit, D.¹ Sen, A.²

53
- 33646398129
- Neural fitted q iteration-First experiences with a data efficient neural reinforcement learning method
- M. Riedmiller, "Neural fitted q iteration-First experiences with a data efficient neural reinforcement learning method," in Proc. Eur. Conf. Mach. Learn., 2005, pp. 317-328.
- (2005) Proc. Eur. Conf. Mach. Learn , pp. 317-328
- Riedmiller, M.¹

54
- 4243567726
- Temporal differences-based policy iteration and applications in neuro-dynamic programming
- MIT, Cambridge, MA, USA, Tech. Rep. LIDS-P-2349
- D. P. Bertsekas and S. Ioffe, "Temporal differences-based policy iteration and applications in neuro-dynamic programming," Labs Information & Decision Systems, MIT, Cambridge, MA, USA, Tech. Rep. LIDS-P-2349, 1996.
- (1996) Labs Information & Decision Systems
- Bertsekas, D.P.¹ Ioffe, S.²

55
- 85036496976
- Improved temporal difference methods with linear function approximation
- Piscataway, NJ, USA: IEEE Press
- D. P. Bertsekas, V. Borkar, and A. Nedic, "Improved temporal difference methods with linear function approximation," in Learning and Approximate Dynamic Programming. Piscataway, NJ, USA: IEEE Press, 2004, pp. 231-235.
- (2004) Learning and Approximate Dynamic Programming , pp. 231-235
- Bertsekas, D.P.¹ Borkar, V.² Nedic, A.³

56
- 61849106433
- Projected equation methods for approximate solution of large linear systems
- May
- D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Appl. Math., vol. 227, no. 1, pp. 27-50, May 2009.
- (2009) J. Comput. Appl. Math , vol.227 , Issue.1 , pp. 27-50
- Bertsekas, D.P.¹ Yu, H.²

57
- 84876115077
- Projected equations, variational inequalities, and temporal difference methods
- D. P. Bertsekas, "Projected equations, variational inequalities, and temporal difference methods," in Proc. IEEE Int. Symp. Adapt. Dynamic Program. Reinforce. Learn., 2009, pp. 1-28.
- (2009) Proc. IEEE Int. Symp. Adapt. Dynamic Program. Reinforce. Learn , pp. 1-28
- Bertsekas, D.P.¹

58
- 0348090400
- The linear programming approach to approximate dynamic programming
- D. P. de Farias and B. V. Roy, "The linear programming approach to approximate dynamic programming," Oper. Res., vol. 51, no. 6, pp. 850-865, 2003.
- (2003) Oper. Res , vol.51 , Issue.6 , pp. 850-865
- De Farias, D.P.¹ Roy, B.V.²

59
- 77956513313
- The smoothed approximate linear program
- New York, USA: Columbia Univ. Press Aug
- V. V. Desai, V. F. Farias, and C. C. Moallemi, "The smoothed approximate linear program," in Advances in Neural Information Processing Systems, New York, USA: Columbia Univ. Press, Aug. 2009.
- (2009) Advances in Neural Information Processing Systems
- Desai, V.V.¹ Farias, V.F.² Moallemi, C.C.³

60
- 84876101517
- Approximate modified policy iteration
- B. Scherrer, V. Gabillon, M. Ghavamzadeh, and M. Geist, "Approximate modified policy iteration," in Proc. Int. Conf. Mach. Learn., 2012, pp. 1-21.
- (2012) Proc. Int. Conf. Mach. Learn , pp. 1-21
- Scherrer, B.¹ Gabillon, V.² Ghavamzadeh, M.³ Geist, M.⁴

61
- 1942514728
- Approximately optimal approximate reinforcement learning
- S. Kakade and J. Langford, "Approximately optimal approximate reinforcement learning," in Proc. 19th Int. Conf. Mach. Learn., 2002, pp. 267-274.
- (2002) Proc. 19th Int. Conf. Mach. Learn , pp. 267-274
- Kakade, S.¹ Langford, J.²

62
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- Sep-Oct
- A. Barto, R. Sutton, and C. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. 13, no. 5, pp. 834-846, Sep.-Oct. 1983.
- (1983) IEEE Trans. Syst., Man, Cybern , vol.13 , Issue.5 , pp. 834-846
- Barto, A.¹ Sutton, R.² Anderson, C.³

63
- 4043069840
- On actor-critic algorithms
- V. R. Konda and J. N. Tsitsiklis, "On actor-critic algorithms," SIAM J. Control Optim., vol. 42, no. 4, pp. 1143-1166, 2003.
- (2003) SIAM J. Control Optim , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

64
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Cambridge, MA, USA: MIT Press
- R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 1999, pp. 1057-1063.
- (1999) Neural Information Processing Systems , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

65
- 40649106649
- Natural actor-critic
- J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, nos. 7-9, pp. 1180-1190, 2008.
- (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

66
- 85162049326
- Incremental natural actor-critic algorithms
- Vancouver, Canada: MIT Press
- S. Bhatnagar, R. S. Sutton, M. Ghavamzadeh, and M. Lee, "Incremental natural actor-critic algorithms," in Advances in Neural Information Processing Systems, Vancouver, Canada: MIT Press, 2007.
- (2007) Advances in Neural Information Processing Systems
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

67
- 79956274048
- Revisiting natural actor-critics with value function approximation
- M. Geist and O. Pietquin, "Revisiting natural actor-critics with value function approximation," in Proc. Int. Conf. Model. Decisions Artif. Intell., 2010, pp. 207-218.
- (2010) Proc. Int. Conf. Model. Decisions Artif. Intell , pp. 207-218
- Geist, M.¹ Pietquin, O.²

68
- 84876147682
- Recursive least-squares learning with eligibility traces
- B. Scherrer and M. Geist, "Recursive least-squares learning with eligibility traces," in Proc. Eur. Workshop Reinforce. Learn., 2011, pp. 11-12.
- (2011) Proc. Eur. Workshop Reinforce. Learn , pp. 11-12
- Scherrer, B.¹ Geist, M.²

69
- 1942516880
- Error bounds for approximate policy iteration
- R. Munos, "Error bounds for approximate policy iteration," in Proc. Int. Conf. Mach. Learn., 2003, pp. 560-567.
- (2003) Proc. Int. Conf. Mach. Learn , pp. 560-567
- Munos, R.¹

70
- 77956551905
- Should one compute the temporal difference fix point or minimize the Bellman residual? the unified oblique projection view
- B. Scherrer, "Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view," in Proc. Int. Conf. Mach. Learn., 2010, pp. 1-8.
- (2010) Proc. Int. Conf. Mach. Learn , pp. 1-8
- Scherrer, B.¹

71
- 77956549349
- Finite-sample analysis of LSTD
- A. Lazaric, M. Ghavamzadeh, and R. Munos, "Finite-sample analysis of LSTD," in Proc. Int. Conf. Mach. Learn., 2010, pp. 1-14.
- (2010) Proc. Int. Conf. Mach. Learn , pp. 1-14
- Lazaric, A.¹ Ghavamzadeh, M.² Munos, R.³

72
- 84881039547
- Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences
- Jul
- O. Pietquin, M. Geist, and S. Chandramohan, "Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences," in Proc. Int. Joint Conf. Artif. Intell., Jul. 2011, pp. 1878-1883.
- (2011) Proc. Int. Joint Conf. Artif. Intell , pp. 1878-1883
- Pietquin, O.¹ Geist, M.² Chandramohan, S.³

73
- 0036832950
- Technical update: Least-squares temporal difference learning
- DOI 10.1023/A:1017936530646
- J. A. Boyan, "Technical update: Least-squares temporal difference learning," Mach. Learn., vol. 49, nos. 2-3, pp. 233-246, 1999. (Pubitemid 34325688)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
- Boyan, J.A.¹

74
- 84867619228
- Off-policy learning in large-scale POMDP-based dialogue systems
- Mar
- L. Daubigney, M. Geist, and O. Pietquin, "Off-policy learning in large-scale POMDP-based dialogue systems," in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., Mar. 2012, pp. 4989-4992.
- (2012) Proc. IEEE Int. Conf. Acoust., Speech Signal Process , pp. 4989-4992
- Daubigney, L.¹ Geist, M.² Pietquin, O.³

75
- 34547974097
- Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation
- C. W. Phua and R. Fitch, "Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation," in Proc. 24th Int. Conf. Mach. Learn., 2007, pp. 751-758.
- (2007) Proc. 24th Int. Conf. Mach. Learn , pp. 751-758
- Phua, C.W.¹ Fitch, R.²

76
- 79958143780
- Speed/accuracy trade-off between the habitual and the goal-directed processes
- M. Keramati, A. Dezfouli, and P. Piray, "Speed/accuracy trade-off between the habitual and the goal-directed processes," PLoS Comput. Biol., vol. 7, no. 5, p. e1002055, 2011.
- (2011) PLoS Comput. Biol , vol.7 , Issue.5
- Keramati, M.¹ Dezfouli, A.² Piray, P.³

77
- 80052060715
- Sample-efficient batch reinforcement learning for dialogue management optimization
- O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-Buet, "Sample-efficient batch reinforcement learning for dialogue management optimization," ACM Trans. Speech Lang. Process., vol. 7, no. 3, pp. 7:1-7:21, 2011.
- (2011) ACM Trans. Speech Lang. Process , vol.7 , Issue.3 , pp. 71-721
- Pietquin, O.¹ Geist, M.² Chandramohan, S.³ Frezza-Buet, H.⁴

78
- 26944457467
- Bias-variance error bounds for temporal difference updates
- M. Kearns and S. Singh, "Bias-variance error bounds for temporal difference updates," in Proc. Conf. Learn. Theory, 2000, pp. 142-147.
- (2000) Proc. Conf. Learn. Theory , pp. 142-147
- Kearns, M.¹ Singh, S.²

79
- 84876150517
- INRIA, France, Tech. Rep. hal-00644516
- B. Scherrer and M. Geist, "Recursive least-squares off-policy learning with eligibility traces," INRIA, France, Tech. Rep. hal-00644516, 2012.
- (2012) Recursive Least-squares Off-policy Learning with Eligibility Traces
- Scherrer, B.¹ Geist, M.²

80
- 34547098844
- Kernel-based least squares policy iteration for reinforcement learning
- DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
- X. Xu, D. Hu, and X. Lu, "Kernel-based least squares policy iteration for reinforcement learning," IEEE Trans. Neural Netw., vol. 18, no. 4, pp. 973-992, Jul. 2007. (Pubitemid 47098876)
- (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
- Xu, X.¹ Hu, D.² Lu, X.³

81
- 34548765672
- Kernelizing LSPE(a;)
- Apr
- T. Jung and D. Polani, "Kernelizing LSPE(a;)," in Proc. IEEE Symp. Approx. Dynamic Program. Reinforce. Learn., Apr. 2007, pp. 338-345.
- (2007) Proc. IEEE Symp. Approx. Dynamic Program. Reinforce. Learn , pp. 338-345
- Jung, T.¹ Polani, D.²

82
- 71149100225
- Kernelized value function approximation for reinforcement learning
- G. Taylor, and R. Parr, "Kernelized value function approximation for reinforcement learning," in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 1017-1024.
- (2009) Proc. 26th Annu. Int. Conf. Mach. Learn , pp. 1017-1024
- Taylor, G.¹ Parr, R.²

83
- 34548803187
- Sparse temporal difference learning using LASSO
- DOI 10.1109/ADPRL.2007.368210, 4220855, Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007
- M. Loth, M. Davy, and P. Preux, "Sparse temporal difference learning using LASSO," in Proc. IEEE Int. Symp. Approx. Dynamic Program. Reinforce. Learn., Apr. 2007, pp. 352-359. (Pubitemid 47431407)
- (2007) Proceedings of the 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning, ADPRL 2007 , pp. 352-359
- Loth, M.¹ Davy, M.² Preux, P.³

84
- 71149121683
- Regularization and feature selection in leastsquares temporal difference learning
- J. Z. Kolter and A. Y. Ng, "Regularization and feature selection in leastsquares temporal difference learning," in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 521-528.
- (2009) Proc. 26th Annu. Int. Conf. Mach. Learn , pp. 521-528
- Kolter, J.Z.¹ Ng, A.Y.²

85
- 85162069759
- Linear complementarity for regularized policy evaluation and improvement
- J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds. Durham, NC, USA: Duke Univ. Press
- J. Johns, C. Painter-Wakefield, and R. Parr, "Linear complementarity for regularized policy evaluation and improvement," in Advances in Neural Information Processing Systems, J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds. Durham, NC, USA: Duke Univ. Press, 2010, pp. 1009-1017.
- (2010) Advances in Neural Information Processing Systems , pp. 1009-1017
- Johns, J.¹ Painter-Wakefield, C.² Parr, R.³

86
- 84876124578
- 1-penalized projected Bellman residual
- M. Geist and B. Scherrer, "1-penalized projected Bellman residual," in Proc. Eur. Workshop Reinforce. Learn., 2011, pp. 1-12.
- (2011) Proc. Eur. Workshop Reinforce. Learn , pp. 1-12
- Geist, M.¹ Scherrer, B.²

87
- 84894186443
- Regularized least squares temporal difference learning with nested 2 and 1 penalization
- M. W. Hoffman, A. Lazaric, M. Ghavamzadeh, and R. Munos, "Regularized least squares temporal difference learning with nested 2 and 1 penalization," in Proc. Eur. Workshop Reinforce. Learn., 2011, pp. 1-12.
- (2011) Proc. Eur. Workshop Reinforce. Learn , pp. 1-12
- Hoffman, M.W.¹ Lazaric, A.² Ghavamzadeh, M.³ Munos, R.⁴

88
- 84867137477
- A Dantzig selector approach to temporal difference learning
- M. Geist, B. Scherrer, A. Lazaric, and M. Ghavamzadeh, "A Dantzig selector approach to temporal difference learning," in Proc. Int. Conf. Mach. Learn., 2012, pp. 1-8.
- (2012) Proc. Int. Conf. Mach. Learn , pp. 1-8
- Geist, M.¹ Scherrer, B.² Lazaric, A.³ Ghavamzadeh, M.⁴

89
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- DOI 10.1007/s10479-005-5732-z
- I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Ann. Oper. Res., vol. 134, no. 1, pp. 215-238, 2005. (Pubitemid 40550047)
- (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

90
- 33749263205
- Automatic basis function construction for approximate dynamic programming and reinforcement learning
- P. W. Keller, S. Mannor, and D. Precup, "Automatic basis function construction for approximate dynamic programming and reinforcement learning," in Proc. Int. Conf. Mach. Learn., 2006, pp. 449-456.
- (2006) Proc. Int. Conf. Mach. Learn , pp. 449-456
- Keller, P.W.¹ Mannor, S.² Precup, D.³

91
- 51849168812
- Analyzing feature generation for value-function approximation
- R. Parr, C. Painter-Wakefield, L. Li, and M. Littman, "Analyzing feature generation for value-function approximation," in Proc. 24th Int. Conf. Mach. Learn., 2007, pp. 1-8.
- (2007) Proc. 24th Int. Conf. Mach. Learn , pp. 1-8
- Parr, R.¹ Painter-Wakefield, C.² Li, L.³ Littman, M.⁴

92
- 56449092660
- An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
- R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M. L. Littman, "An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning," in Proc. 25th Int. Conf. Mach. Learn., 2008, pp. 752-759.
- (2008) Proc. 25th Int. Conf. Mach. Learn , pp. 752-759
- Parr, R.¹ Li, L.² Taylor, G.³ Painter-Wakefield, C.⁴ Littman, M.L.⁵

93
- 77957872063
- Automatic induction of bellman-error features for probabilistic planning
- J. Wu and R. Givan, "Automatic induction of bellman-error features for probabilistic planning," J. Artif. Intell. Res., vol. 38, no. 1, pp. 687-755, 2010.
- (2010) J. Artif. Intell. Res , vol.38 , Issue.1 , pp. 687-755
- Wu, J.¹ Givan, R.²

94
- 35748957806
- Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
- S. Mahadevan and M. Maggioni, "Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes," J. Mach. Learn. Res., vol. 8, no. 10, pp. 2169-2231, 2007. (Pubitemid 350046199)
- (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
- Mahadevan, S.¹ Maggioni, M.²

95
- 0024680419
- Adaptive aggregation methods for infinite horizon dynamic programming
- Jun
- D. Bertsekas and D. Castanon, "Adaptive aggregation methods for infinite horizon dynamic programming," IEEE Trans. Autom. Control, vol. 34, no. 6, pp. 589-598, Jun. 1989.
- (1989) IEEE Trans. Autom. Control , vol.34 , Issue.6 , pp. 589-598
- Bertsekas, D.¹ Castanon, D.²

96
- 85153965130
- Reinforcement learning with soft state aggregation
- Cambridge, MA, USA: MIT Press
- S. Singh, T. Jaakkola, and M. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 1995, pp. 361-368.
- (1995) Advances in Neural Information Processing Systems , pp. 361-368
- Singh, S.¹ Jaakkola, T.² Jordan, M.³

97
- 79960443181
- Dept. Oper. Res. Financial Eng., Princeton Univ., Princeton, NJ, USA, Tech. Rep. 08544
- J. Ma and W. B. Powell, "Convergence analysis of kernel-based onpolicy approximate policy iteration algorithms for Markov decision processes with continuous, multidimensional states and actions," Dept. Oper. Res. Financial Eng., Princeton Univ., Princeton, NJ, USA, Tech. Rep. 08544, 2010.
- (2010) Convergence Analysis of Kernel-based Onpolicy Approximate Policy Iteration Algorithms for Markov Decision Processes with Continuous, Multidimensional States and Actions
- Ma, J.¹ Powell, W.B.²

98
- 85162341384
- Reinforcement learning using Kernel-based stochastic factorization
- Cambridge, MA, USA: MIT Press
- A. Barreto, D. Precup, and J. Pineau, "Reinforcement learning using Kernel-based stochastic factorization," in Advances in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 2011.
- (2011) Advances in Neural Information Processing Systems
- Barreto, A.¹ Precup, D.² Pineau, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.