메뉴 건너뛰기




Volumn 24, Issue 6, 2013, Pages 845-867

Algorithmic survey of parametric value function approximation

Author keywords

Reinforcement learning (RL); survey; value function approximation

Indexed keywords

EXACT REPRESENTATIONS; MINIMIZATION METHODS; OPTIMAL CONTROL POLICY; OPTIMAL CONTROL PROBLEM; RECURSIVE LEAST SQUARES; STATE-OF-THE-ART METHODS; STOCHASTIC GRADIENT DESCENT; VALUE FUNCTION APPROXIMATION;

EID: 84876108223     PISSN: 2162237X     EISSN: 21622388     Source Type: Journal    
DOI: 10.1109/TNNLS.2013.2247418     Document Type: Article
Times cited : (73)

References (98)
  • 8
    • 33646435300 scopus 로고    scopus 로고
    • A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
    • Apr
    • D. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., vol. 16, no. 2, pp. 207-239, Apr. 2006.
    • (2006) Discrete Event Dyn. Syst , vol.16 , Issue.2 , pp. 207-239
    • Choi, D.1    Van Roy, B.2
  • 9
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. Int. Conf. Mach. Learn., 1995, pp. 30-37.
    • (1995) Proc. Int. Conf. Mach. Learn , pp. 30-37
    • Baird, L.C.1
  • 10
    • 31844456714 scopus 로고    scopus 로고
    • Ph.D. dissertation, Interdisciplinary Center Neural Comput., Hebrew Univ., Jerusalem, Israel Apr
    • Y. Engel, "Algorithms and representations for reinforcement learning," Ph.D. dissertation, Interdisciplinary Center Neural Comput., Hebrew Univ., Jerusalem, Israel, Apr. 2005.
    • (2005) Algorithms and Representations for Reinforcement Learning
    • Engel, Y.1
  • 11
    • 78651465938 scopus 로고    scopus 로고
    • Kalman temporal differences
    • M. Geist and O. Pietquin, "Kalman temporal differences," J. Artif. Intell. Res., vol. 39, no. 1, pp. 483-532, 2010.
    • (2010) J. Artif. Intell. Res , vol.39 , Issue.1 , pp. 483-532
    • Geist, M.1    Pietquin, O.2
  • 12
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Mach. Learn., vol. 22, nos. 1-3, pp. 33-57, 1996. (Pubitemid 126724362)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1
  • 15
    • 79951481923 scopus 로고    scopus 로고
    • Convergent temporal-difference learning with arbitrary smooth function approximation
    • Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds. Cambridge, MA, USA: MIT Press
    • H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Advances in Neural Information Processing Systems, Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds. Cambridge, MA, USA: MIT Press, 2009, pp. 1204-1212.
    • (2009) Advances in Neural Information Processing Systems , pp. 1204-1212
    • Maei, H.1    Szepesvari, C.2    Bhatnagar, S.3    Precup, D.4    Silver, D.5    Sutton, R.S.6
  • 16
    • 77954101982 scopus 로고    scopus 로고
    • GQ(a;): A general gradient algorithm for temporal-differences prediction learning with eligibility traces
    • H. R. Maei and R. S. Sutton, "GQ(a;): A general gradient algorithm for temporal-differences prediction learning with eligibility traces," in Proc. Conf. Artif. General Intell., 2010, pp. 1-6.
    • (2010) Proc. Conf. Artif. General Intell , pp. 1-6
    • Maei, H.R.1    Sutton, R.S.2
  • 18
    • 0037288398 scopus 로고    scopus 로고
    • Least squares policy evaluation algorithms with linear function approximation
    • Jan
    • A. Nedic and D. P. Bertsekas, "Least squares policy evaluation algorithms with linear function approximation," Discrete Event Dyn. Syst., Theory Appl., vol. 13, nos. 1-2, pp. 79-110, Jan. 2003.
    • (2003) Discrete Event Dyn. Syst., Theory Appl , vol.13 , Issue.1-2 , pp. 79-110
    • Nedic, A.1    Bertsekas, D.P.2
  • 19
    • 84927748655 scopus 로고    scopus 로고
    • Q-learning algorithms for optimal stopping based on least squares
    • Jul
    • H. Yu and D. P. Bertsekas, "Q-learning algorithms for optimal stopping based on least squares," in Proc. Eur. Control Conf., Jul. 2007, pp. 1-15.
    • (2007) Proc. Eur. Control Conf , pp. 1-15
    • Yu, H.1    Bertsekas, D.P.2
  • 20
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • G. Gordon, "Stable function approximation in dynamic programming," in Proc. Int. Conf. Mach. Learn., 1995, pp. 1-8.
    • (1995) Proc. Int. Conf. Mach. Learn , pp. 1-8
    • Gordon, G.1
  • 21
    • 21844465127 scopus 로고    scopus 로고
    • Tree-based batch mode reinforcement learning
    • Apr
    • D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Apr. 2005.
    • (2005) J. Mach. Learn. Res , vol.6 , pp. 503-556
    • Ernst, D.1    Geurts, P.2    Wehenkel, L.3
  • 22
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
    • (1988) Mach. Learn , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 24
    • 34249833101 scopus 로고
    • Q-learning
    • C. J. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3, pp. 279-292, 1992.
    • (1992) Mach. Learn , vol.8 , Issue.3 , pp. 279-292
    • Watkins, C.J.1    Dayan, P.2
  • 25
    • 1942421151 scopus 로고    scopus 로고
    • Bayes meets Bellman: The Gaussian process approach to temporal difference learning
    • Y. Engel, S. Mannor, and R. Meir, "Bayes meets Bellman: The Gaussian process approach to temporal difference learning," in Proc. Int. Conf. Mach. Learn., 2003, pp. 154-161.
    • (2003) Proc. Int. Conf. Mach. Learn , pp. 154-161
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 26
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • PII S0018928697034375
    • J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Autom. Control, vol. 42, no. 5, pp. 674-690, May 1997. (Pubitemid 127760263)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 27
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Mar
    • G. Tesauro, "Temporal difference learning and TD-Gammon," Commun. ACM, vol. 38, no. 3, pp. 58-68, Mar. 1995.
    • (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 30
    • 56449091120 scopus 로고    scopus 로고
    • An analysis of reinforcement learning with function approximation
    • F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proc. Int. Conf. Mach. Learn., 2009, pp. 664-671.
    • (2009) Proc. Int. Conf. Mach. Learn , pp. 664-671
    • Melo, F.S.1    Meyn, S.P.2    Ribeiro, M.I.3
  • 31
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • Apr
    • A. Antos, C. Szepesvári, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Mach. Learn., vol. 71, no. 1, pp. 89-129, Apr. 2008.
    • (2008) Mach. Learn , vol.71 , Issue.1 , pp. 89-129
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 32
    • 2542529060 scopus 로고    scopus 로고
    • On Fréchet subdifferentials
    • A. Y. Kruger, "On Fréchet subdifferentials," J. Math. Sci., vol. 116, no. 3, pp. 3325-3358, 2003.
    • (2003) J. Math. Sci , vol.116 , Issue.3 , pp. 3325-3358
    • Kruger, A.Y.1
  • 37
    • 0036060759 scopus 로고    scopus 로고
    • The scaled unscented transformation
    • S. J. Julier, "The scaled unscented transformation," in Proc. Amer. Control Conf., vol. 6. 2002, pp. 4555-4559.
    • (2002) Proc. Amer. Control Conf , vol.6 , pp. 4555-4559
    • Julier, S.J.1
  • 38
    • 0034326226 scopus 로고    scopus 로고
    • New developments in state estimation for nonlinear systems
    • Nov
    • P. M. Nørgard, N. K. Poulsen, and O. Ravn, "New developments in state estimation for nonlinear systems," Automatica, vol. 36, no. 11, pp. 1627-1638, Nov. 2000.
    • (2000) Automatica , vol.36 , Issue.11 , pp. 1627-1638
    • Nørgard, P.M.1    Poulsen, N.K.2    Ravn, O.3
  • 41
    • 84865737983 scopus 로고    scopus 로고
    • Managing uncertainty within the KTD framework
    • Mar
    • M. Geist and O. Pietquin, "Managing uncertainty within the KTD framework," J. Mach. Learn. Res. (W&C Proc.), vol. 15, pp. 157-168, Mar. 2011.
    • (2011) J. Mach. Learn. Res. (W&C Proc.) , vol.15 , pp. 157-168
    • Geist, M.1    Pietquin, O.2
  • 43
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • Dec
    • M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learn. Res., vol. 4, pp. 1107-1149, Dec. 2003.
    • (2003) J. Mach. Learn. Res , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 44
    • 0036236260 scopus 로고    scopus 로고
    • Instrumental variable methods for system identification
    • DOI 10.1007/BF01211647
    • T. Söderström and P. Stoica, "Instrumental variable methods for system identification," Circuits Syst. Signal Process., vol. 21, no. 1, pp. 1-9, Jan.-Feb. 2002. (Pubitemid 34414642)
    • (2002) Circuits, Systems, and Signal Processing , vol.21 , Issue.1 , pp. 1-9
    • Soderstrom, T.1    Stoica, P.2
  • 45
    • 33750737011 scopus 로고    scopus 로고
    • Incremental least-squares temporal difference learning
    • Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
    • A. Geramifard, M. Bowling, and R. S. Sutton, "Incremental least-squares temporal difference learning," in Proc. 21st Nat. Conf. Artif. Intell., 2006, pp. 356-361. (Pubitemid 44705310)
    • (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
    • Geramifard, A.1    Bowling, M.2    Sutton, R.S.3
  • 46
    • 68949099445 scopus 로고    scopus 로고
    • Hybrid least-squares algorithms for approximate policy evaluation
    • Sep
    • J. Johns, M. Petrik, and S. Mahadevan, "Hybrid least-squares algorithms for approximate policy evaluation," Mach. Learn., vol. 76, nos. 2-3, pp. 243-256, Sep. 2009.
    • (2009) Mach. Learn , vol.76 , Issue.2-3 , pp. 243-256
    • Johns, J.1    Petrik, M.2    Mahadevan, S.3
  • 48
    • 85048441545 scopus 로고    scopus 로고
    • A convergent O(n) Algorithm for off-policy temporal-difference learning with linear function approximation
    • Vancouver, BC, Canada: MIT Press
    • R. S. Sutton, C. Szepesvari, and H. R. Maei, "A convergent O(n) Algorithm for off-policy temporal-difference learning with linear function approximation," in Advances in Neural Information Processing Systems, Vancouver, BC, Canada: MIT Press, 2008.
    • (2008) Advances in Neural Information Processing Systems
    • Sutton, R.S.1    Szepesvari, C.2    Maei, H.R.3
  • 50
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • Jul
    • A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM J. Res. Develop., vol. 3, no. 3, pp. 210-229, Jul. 1959.
    • (1959) IBM J. Res. Develop , vol.3 , Issue.3 , pp. 210-229
    • Samuel, A.L.1
  • 51
    • 40949107944 scopus 로고    scopus 로고
    • p-norm for approximate value iteration
    • DOI 10.1137/040614384
    • R. Munos, "Performance bounds in L p-norm for approximate value iteration," SIAM J. Control Optim., vol. 46, no. 2, pp. 541-561. Jan. 2007. (Pubitemid 351456426)
    • (2007) SIAM Journal on Control and Optimization , vol.46 , Issue.2 , pp. 541-561
    • Munos, R.1
  • 52
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • DOI 10.1023/A:1017928328829
    • D. Ormoneit and ́ S. Sen, "Kernel-based reinforcement learning," Mach. Learn., vol. 49, nos. 2-3, pp. 161-178, Nov. 2002. (Pubitemid 34325684)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, A.2
  • 53
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted q iteration-First experiences with a data efficient neural reinforcement learning method
    • M. Riedmiller, "Neural fitted q iteration-First experiences with a data efficient neural reinforcement learning method," in Proc. Eur. Conf. Mach. Learn., 2005, pp. 317-328.
    • (2005) Proc. Eur. Conf. Mach. Learn , pp. 317-328
    • Riedmiller, M.1
  • 54
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and applications in neuro-dynamic programming
    • MIT, Cambridge, MA, USA, Tech. Rep. LIDS-P-2349
    • D. P. Bertsekas and S. Ioffe, "Temporal differences-based policy iteration and applications in neuro-dynamic programming," Labs Information & Decision Systems, MIT, Cambridge, MA, USA, Tech. Rep. LIDS-P-2349, 1996.
    • (1996) Labs Information & Decision Systems
    • Bertsekas, D.P.1    Ioffe, S.2
  • 55
    • 85036496976 scopus 로고    scopus 로고
    • Improved temporal difference methods with linear function approximation
    • Piscataway, NJ, USA: IEEE Press
    • D. P. Bertsekas, V. Borkar, and A. Nedic, "Improved temporal difference methods with linear function approximation," in Learning and Approximate Dynamic Programming. Piscataway, NJ, USA: IEEE Press, 2004, pp. 231-235.
    • (2004) Learning and Approximate Dynamic Programming , pp. 231-235
    • Bertsekas, D.P.1    Borkar, V.2    Nedic, A.3
  • 56
    • 61849106433 scopus 로고    scopus 로고
    • Projected equation methods for approximate solution of large linear systems
    • May
    • D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Appl. Math., vol. 227, no. 1, pp. 27-50, May 2009.
    • (2009) J. Comput. Appl. Math , vol.227 , Issue.1 , pp. 27-50
    • Bertsekas, D.P.1    Yu, H.2
  • 58
    • 0348090400 scopus 로고    scopus 로고
    • The linear programming approach to approximate dynamic programming
    • D. P. de Farias and B. V. Roy, "The linear programming approach to approximate dynamic programming," Oper. Res., vol. 51, no. 6, pp. 850-865, 2003.
    • (2003) Oper. Res , vol.51 , Issue.6 , pp. 850-865
    • De Farias, D.P.1    Roy, B.V.2
  • 61
    • 1942514728 scopus 로고    scopus 로고
    • Approximately optimal approximate reinforcement learning
    • S. Kakade and J. Langford, "Approximately optimal approximate reinforcement learning," in Proc. 19th Int. Conf. Mach. Learn., 2002, pp. 267-274.
    • (2002) Proc. 19th Int. Conf. Mach. Learn , pp. 267-274
    • Kakade, S.1    Langford, J.2
  • 62
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • Sep-Oct
    • A. Barto, R. Sutton, and C. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. 13, no. 5, pp. 834-846, Sep.-Oct. 1983.
    • (1983) IEEE Trans. Syst., Man, Cybern , vol.13 , Issue.5 , pp. 834-846
    • Barto, A.1    Sutton, R.2    Anderson, C.3
  • 63
    • 4043069840 scopus 로고    scopus 로고
    • On actor-critic algorithms
    • V. R. Konda and J. N. Tsitsiklis, "On actor-critic algorithms," SIAM J. Control Optim., vol. 42, no. 4, pp. 1143-1166, 2003.
    • (2003) SIAM J. Control Optim , vol.42 , Issue.4 , pp. 1143-1166
    • Konda, V.R.1    Tsitsiklis, J.N.2
  • 64
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Cambridge, MA, USA: MIT Press
    • R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Neural Information Processing Systems. Cambridge, MA, USA: MIT Press, 1999, pp. 1057-1063.
    • (1999) Neural Information Processing Systems , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.A.2    Singh, S.P.3    Mansour, Y.4
  • 65
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • J. Peters and S. Schaal, "Natural actor-critic," Neurocomputing, vol. 71, nos. 7-9, pp. 1180-1190, 2008.
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 68
    • 84876147682 scopus 로고    scopus 로고
    • Recursive least-squares learning with eligibility traces
    • B. Scherrer and M. Geist, "Recursive least-squares learning with eligibility traces," in Proc. Eur. Workshop Reinforce. Learn., 2011, pp. 11-12.
    • (2011) Proc. Eur. Workshop Reinforce. Learn , pp. 11-12
    • Scherrer, B.1    Geist, M.2
  • 69
    • 1942516880 scopus 로고    scopus 로고
    • Error bounds for approximate policy iteration
    • R. Munos, "Error bounds for approximate policy iteration," in Proc. Int. Conf. Mach. Learn., 2003, pp. 560-567.
    • (2003) Proc. Int. Conf. Mach. Learn , pp. 560-567
    • Munos, R.1
  • 70
    • 77956551905 scopus 로고    scopus 로고
    • Should one compute the temporal difference fix point or minimize the Bellman residual? the unified oblique projection view
    • B. Scherrer, "Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view," in Proc. Int. Conf. Mach. Learn., 2010, pp. 1-8.
    • (2010) Proc. Int. Conf. Mach. Learn , pp. 1-8
    • Scherrer, B.1
  • 72
    • 84881039547 scopus 로고    scopus 로고
    • Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences
    • Jul
    • O. Pietquin, M. Geist, and S. Chandramohan, "Sample efficient on-line learning of optimal dialogue policies with Kalman temporal differences," in Proc. Int. Joint Conf. Artif. Intell., Jul. 2011, pp. 1878-1883.
    • (2011) Proc. Int. Joint Conf. Artif. Intell , pp. 1878-1883
    • Pietquin, O.1    Geist, M.2    Chandramohan, S.3
  • 73
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • DOI 10.1023/A:1017936530646
    • J. A. Boyan, "Technical update: Least-squares temporal difference learning," Mach. Learn., vol. 49, nos. 2-3, pp. 233-246, 1999. (Pubitemid 34325688)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.A.1
  • 75
    • 34547974097 scopus 로고    scopus 로고
    • Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation
    • C. W. Phua and R. Fitch, "Tracking value function dynamics to improve reinforcement learning with piecewise linear function approximation," in Proc. 24th Int. Conf. Mach. Learn., 2007, pp. 751-758.
    • (2007) Proc. 24th Int. Conf. Mach. Learn , pp. 751-758
    • Phua, C.W.1    Fitch, R.2
  • 76
    • 79958143780 scopus 로고    scopus 로고
    • Speed/accuracy trade-off between the habitual and the goal-directed processes
    • M. Keramati, A. Dezfouli, and P. Piray, "Speed/accuracy trade-off between the habitual and the goal-directed processes," PLoS Comput. Biol., vol. 7, no. 5, p. e1002055, 2011.
    • (2011) PLoS Comput. Biol , vol.7 , Issue.5
    • Keramati, M.1    Dezfouli, A.2    Piray, P.3
  • 77
    • 80052060715 scopus 로고    scopus 로고
    • Sample-efficient batch reinforcement learning for dialogue management optimization
    • O. Pietquin, M. Geist, S. Chandramohan, and H. Frezza-Buet, "Sample-efficient batch reinforcement learning for dialogue management optimization," ACM Trans. Speech Lang. Process., vol. 7, no. 3, pp. 7:1-7:21, 2011.
    • (2011) ACM Trans. Speech Lang. Process , vol.7 , Issue.3 , pp. 71-721
    • Pietquin, O.1    Geist, M.2    Chandramohan, S.3    Frezza-Buet, H.4
  • 78
    • 26944457467 scopus 로고    scopus 로고
    • Bias-variance error bounds for temporal difference updates
    • M. Kearns and S. Singh, "Bias-variance error bounds for temporal difference updates," in Proc. Conf. Learn. Theory, 2000, pp. 142-147.
    • (2000) Proc. Conf. Learn. Theory , pp. 142-147
    • Kearns, M.1    Singh, S.2
  • 80
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least squares policy iteration for reinforcement learning
    • DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
    • X. Xu, D. Hu, and X. Lu, "Kernel-based least squares policy iteration for reinforcement learning," IEEE Trans. Neural Netw., vol. 18, no. 4, pp. 973-992, Jul. 2007. (Pubitemid 47098876)
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.2    Lu, X.3
  • 82
    • 71149100225 scopus 로고    scopus 로고
    • Kernelized value function approximation for reinforcement learning
    • G. Taylor, and R. Parr, "Kernelized value function approximation for reinforcement learning," in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 1017-1024.
    • (2009) Proc. 26th Annu. Int. Conf. Mach. Learn , pp. 1017-1024
    • Taylor, G.1    Parr, R.2
  • 84
    • 71149121683 scopus 로고    scopus 로고
    • Regularization and feature selection in leastsquares temporal difference learning
    • J. Z. Kolter and A. Y. Ng, "Regularization and feature selection in leastsquares temporal difference learning," in Proc. 26th Annu. Int. Conf. Mach. Learn., 2009, pp. 521-528.
    • (2009) Proc. 26th Annu. Int. Conf. Mach. Learn , pp. 521-528
    • Kolter, J.Z.1    Ng, A.Y.2
  • 85
    • 85162069759 scopus 로고    scopus 로고
    • Linear complementarity for regularized policy evaluation and improvement
    • J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds. Durham, NC, USA: Duke Univ. Press
    • J. Johns, C. Painter-Wakefield, and R. Parr, "Linear complementarity for regularized policy evaluation and improvement," in Advances in Neural Information Processing Systems, J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, Eds. Durham, NC, USA: Duke Univ. Press, 2010, pp. 1009-1017.
    • (2010) Advances in Neural Information Processing Systems , pp. 1009-1017
    • Johns, J.1    Painter-Wakefield, C.2    Parr, R.3
  • 89
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • DOI 10.1007/s10479-005-5732-z
    • I. Menache, S. Mannor, and N. Shimkin, "Basis function adaptation in temporal difference reinforcement learning," Ann. Oper. Res., vol. 134, no. 1, pp. 215-238, 2005. (Pubitemid 40550047)
    • (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 90
    • 33749263205 scopus 로고    scopus 로고
    • Automatic basis function construction for approximate dynamic programming and reinforcement learning
    • P. W. Keller, S. Mannor, and D. Precup, "Automatic basis function construction for approximate dynamic programming and reinforcement learning," in Proc. Int. Conf. Mach. Learn., 2006, pp. 449-456.
    • (2006) Proc. Int. Conf. Mach. Learn , pp. 449-456
    • Keller, P.W.1    Mannor, S.2    Precup, D.3
  • 92
    • 56449092660 scopus 로고    scopus 로고
    • An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
    • R. Parr, L. Li, G. Taylor, C. Painter-Wakefield, and M. L. Littman, "An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning," in Proc. 25th Int. Conf. Mach. Learn., 2008, pp. 752-759.
    • (2008) Proc. 25th Int. Conf. Mach. Learn , pp. 752-759
    • Parr, R.1    Li, L.2    Taylor, G.3    Painter-Wakefield, C.4    Littman, M.L.5
  • 93
    • 77957872063 scopus 로고    scopus 로고
    • Automatic induction of bellman-error features for probabilistic planning
    • J. Wu and R. Givan, "Automatic induction of bellman-error features for probabilistic planning," J. Artif. Intell. Res., vol. 38, no. 1, pp. 687-755, 2010.
    • (2010) J. Artif. Intell. Res , vol.38 , Issue.1 , pp. 687-755
    • Wu, J.1    Givan, R.2
  • 94
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
    • S. Mahadevan and M. Maggioni, "Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes," J. Mach. Learn. Res., vol. 8, no. 10, pp. 2169-2231, 2007. (Pubitemid 350046199)
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 95
    • 0024680419 scopus 로고
    • Adaptive aggregation methods for infinite horizon dynamic programming
    • Jun
    • D. Bertsekas and D. Castanon, "Adaptive aggregation methods for infinite horizon dynamic programming," IEEE Trans. Autom. Control, vol. 34, no. 6, pp. 589-598, Jun. 1989.
    • (1989) IEEE Trans. Autom. Control , vol.34 , Issue.6 , pp. 589-598
    • Bertsekas, D.1    Castanon, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.