메뉴 건너뛰기




Volumn , Issue , 2011, Pages 9-16

Parametric value function approximation: A unified view

Author keywords

Reinforcement learning; survey; value function approximation

Indexed keywords

ARTIFICIAL INTELLIGENCE; COST FUNCTIONS; DYNAMIC PROGRAMMING; OPTIMAL CONTROL SYSTEMS; QUALITY CONTROL; STOCHASTIC SYSTEMS; SURVEYING; SURVEYS;

EID: 80052232599     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ADPRL.2011.5967355     Document Type: Conference Paper
Times cited : (34)

References (54)
  • 7
    • 33646435300 scopus 로고    scopus 로고
    • A generalized kalman filter for fixed point approximation and efficient temporal-difference learning
    • D. Choi and B. Van Roy, "A Generalized Kalman Filter for Fixed Point Approximation and Efficient Temporal-Difference Learning," Discrete Event Dynamic Systems, vol. 16, pp. 207-239, 2006.
    • (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
    • Choi, D.1    Van Roy, B.2
  • 8
  • 9
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-gammon
    • March
    • G. Tesauro, "Temporal Difference Learning and TD-Gammon," Communications of the ACM, vol. 38, no. 3, March 1995.
    • (1995) Communications of the ACM , vol.38 , Issue.3
    • Tesauro, G.1
  • 11
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • A. Antos, C. Szepesvári, and R. Munos, "Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path," Machine Learning, vol. 71, no. 1, pp. 89-129, 2008.
    • (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 15
    • 3543096272 scopus 로고    scopus 로고
    • The kernel recursive least squares algorithm
    • [Online]
    • Y. Engel, S. Mannor, and R. Meir, "The Kernel Recursive Least Squares Algorithm," IEEE Transactions on Signal Processing, vol. 52, pp. 2275-2285, 2004. [Online]. Available: http://www.cs.ualberta.ca/yaki/
    • (2004) IEEE Transactions on Signal Processing , vol.52 , pp. 2275-2285
    • Engel, Y.1    Mannor, S.2    Meir, R.3
  • 25
    • 85024429815 scopus 로고
    • A new approach to linear filtering and prediction problems
    • R. E. Kalman, "A new approach to linear filtering and prediction problems," Transactions of the ASME-Journal of Basic Engineering, vol. 82, no. Series D, pp. 35-45, 1960.
    • (1960) Transactions of the ASME-Journal of Basic Engineering , vol.82 , Issue.SERIES D , pp. 35-45
    • Kalman, R.E.1
  • 27
    • 80052208068 scopus 로고    scopus 로고
    • Managing uncertainty within value function approximation in reinforcement learning
    • Active Learning and Experimental Design Workshop (collocated with AISTATS 2010) , Sardinia, Italy
    • -, "Managing Uncertainty within Value Function Approximation in Reinforcement Learning," in Active Learning and Experimental Design Workshop (collocated with AISTATS 2010), ser. Journal of Machine Learning Research - Workshop and Conference Proceedings, Sardinia, Italy, 2010.
    • (2010) Journal of Machine Learning Research - Workshop and Conference Proceedings
    • Geist, M.1    Pietquin, O.2
  • 28
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto, "Linear Least-Squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 1-3, pp. 33-57, 1996. (Pubitemid 126724362)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1
  • 29
    • 0036236260 scopus 로고    scopus 로고
    • Instrumental variable methods for system identification
    • DOI 10.1007/BF01211647
    • T. Söderström and P. Stoica, "Instrumental variable methods for system identification," Circuits, Systems, and Signal Processing, vol. 21, pp. 1-9, 2002. (Pubitemid 34414642)
    • (2002) Circuits, Systems, and Signal Processing , vol.21 , Issue.1 , pp. 1-9
    • Soderstrom, T.1    Stoica, P.2
  • 31
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • DOI 10.1023/A:1017936530646
    • J. A. Boyan, "Technical Update: Least-Squares Temporal Difference Learning," Machine Learning, vol. 49, no. 2-3, pp. 233-246, 1999. (Pubitemid 34325688)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.A.1
  • 32
    • 77956517288 scopus 로고    scopus 로고
    • Convergence of least squares temporal difference methods under general conditions
    • H. Yu, "Convergence of Least Squares Temporal Difference Methods Under General Conditions," in International Conference on Machine Learning (ICML 2010), 2010, pp. 1207-1214.
    • (2010) International Conference on Machine Learning (ICML 2010) , pp. 1207-1214
    • Yu, H.1
  • 36
    • 79951481923 scopus 로고    scopus 로고
    • Convergent temporal-difference learning with arbitrary smooth function approximation
    • Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds.
    • H. Maei, C. Szepesvari, S. Bhatnagar, D. Precup, D. Silver, and R. S. Sutton, "Convergent temporal-difference learning with arbitrary smooth function approximation," in Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, Eds., 2009, pp. 1204-1212.
    • (2009) Advances in Neural Information Processing Systems , vol.22 , pp. 1204-1212
    • Maei, H.1    Szepesvari, C.2    Bhatnagar, S.3    Precup, D.4    Silver, D.5    Sutton, R.S.6
  • 37
    • 77954101982 scopus 로고    scopus 로고
    • GQ(λ): A general gradient algorithm for temporal-differences prediction learning with eligibility traces
    • H. R. Maei and R. S. Sutton, "GQ(λ): a general gradient algorithm for temporal-differences prediction learning with eligibility traces," in Third Conference on Artificial General Intelligence, 2010.
    • (2010) Third Conference on Artificial General Intelligence
    • Maei, H.R.1    Sutton, R.S.2
  • 39
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • A. Samuel, "Some studies in machine learning using the game of checkers," IBM Journal on Research and Development, pp. 210-229, 1959.
    • (1959) IBM Journal on Research and Development , pp. 210-229
    • Samuel, A.1
  • 41
    • 40949107944 scopus 로고    scopus 로고
    • Performance bounds in Lp norm for approximate value iteration
    • R. Munos, "Performance Bounds in Lp norm for Approximate Value Iteration," SIAM Journal on Control and Optimization, 2007.
    • (2007) SIAM Journal on Control and Optimization
    • Munos, R.1
  • 42
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • DOI 10.1023/A:1017928328829
    • D. Ormoneit and S. Sen, "Kernel-Based Reinforcement Learning," Machine Learning, vol. 49, pp. 161-178, 2002. (Pubitemid 34325684)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, A.2
  • 43
    • 33646687423 scopus 로고    scopus 로고
    • Neural fitted q iteration - First experiences with a data efficient neural reinforcement learning method
    • M. Riedmiller, "Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method ," in Europeac Conference on Machine Learning (ECML), 2005.
    • (2005) Europeac Conference on Machine Learning (ECML)
    • Riedmiller, M.1
  • 46
    • 18544370594 scopus 로고    scopus 로고
    • IEEE Press, ch. Improved Temporal Difference Methods with Linear Function Approximation
    • D. P. Bertsekas, V. Borkar, and A. Nedic, Learning and Approximate Dynamic Programming. IEEE Press, 2004, ch. Improved Temporal Difference Methods with Linear Function Approximation, pp. 231-235.
    • (2004) Learning and Approximate Dynamic Programming , pp. 231-235
    • Bertsekas, D.P.1    Borkar, V.2    Nedic, A.3
  • 47
    • 61849106433 scopus 로고    scopus 로고
    • Projected equation methods for approximate solution of large linear systems
    • D. P. Bertsekas and H. Yu, "Projected Equation Methods for Approximate Solution of Large Linear Systems," Journal of Computational and Applied Mathematics, vol. 227, pp. 27-50, 2007.
    • (2007) Journal of Computational and Applied Mathematics , vol.227 , pp. 27-50
    • Bertsekas, D.P.1    Yu, H.2
  • 49
    • 84927748655 scopus 로고    scopus 로고
    • Q-learning algorithms for optimal stopping based on least squares
    • Kos, Greece
    • H. Yu and D. P. Bertsekas, "Q-Learning Algorithms for Optimal Stopping Based on Least Squares," in Proceedings of European Control Conference, Kos, Greece, 2007.
    • (2007) Proceedings of European Control Conference
    • Yu, H.1    Bertsekas, D.P.2
  • 50
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least squares policy iteration for reinforcement learning
    • DOI 10.1109/TNN.2007.899161, Neural Networks for Feedback Control Systems
    • X. Xu, D. Hu, and X. Lu, "Kernel-Based Least Squares Policy Iteration for Reinforcement Learning," IEEE Transactions on Neural Networks, vol. 18, no. 4, pp. 973-992, July 2007. (Pubitemid 47098876)
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.2    Lu, X.3
  • 52
    • 33750737011 scopus 로고    scopus 로고
    • Incremental least-squares temporal difference learning
    • Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
    • A. Geramifard, M. Bowling, and R. S. Sutton, "Incremental Least- Squares Temporal Difference Learning," in 21st Conference of American Association for Artificial Intelligence (AAAI 06), 2006, pp. 356-361. (Pubitemid 44705310)
    • (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
    • Geramifard, A.1    Bowling, M.2    Sutton, R.S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.