메뉴 건너뛰기




Volumn 227, Issue 1, 2009, Pages 27-50

Projected equation methods for approximate solution of large linear systems

Author keywords

Dynamic programming; Jacobi method; Linear equations; Projected equations; Simulation; Temporal differences; Value iteration

Indexed keywords

APPROXIMATION ALGORITHMS; DIFFERENCE EQUATIONS; DYNAMIC PROGRAMMING; LINEAR SYSTEMS; SYSTEMS ENGINEERING;

EID: 61849106433     PISSN: 03770427     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.cam.2008.07.037     Document Type: Article
Times cited : (63)

References (30)
  • 1
    • 61849084542 scopus 로고    scopus 로고
    • A learning algorithm for risk-sensitive cost
    • Tech. Report No. 2006/25, Dept. of Math, Indian Institute of Science, Bangalore, India
    • A. Basu, Bhattacharyya, V. Borkar, A learning algorithm for risk-sensitive cost, Tech. Report No. 2006/25, Dept. of Math., Indian Institute of Science, Bangalore, India, 2006
    • (2006)
    • Basu, A.1    Bhattacharyya2    Borkar, V.3
  • 2
    • 85036496976 scopus 로고    scopus 로고
    • Improved temporal difference methods with linear function approximation
    • Si J., Barto A., and Powell W. (Eds), IEEE Press, NY
    • Bertsekas D.P., Borkar V., and Nedić A. Improved temporal difference methods with linear function approximation. In: Si J., Barto A., and Powell W. (Eds). Learning and Approximate Dynamic Programming (2004), IEEE Press, NY
    • (2004) Learning and Approximate Dynamic Programming
    • Bertsekas, D.P.1    Borkar, V.2    Nedić, A.3
  • 4
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and applications in neuro-dynamic programming, Lab. for Info. and Dec. Sys
    • Report LIDS-P-2349, MIT, Cambridge, MA
    • D.P. Bertsekas, S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, Lab. for Info. and Dec. Sys. Report LIDS-P-2349, MIT, Cambridge, MA, 1996
    • (1996)
    • Bertsekas, D.P.1    Ioffe, S.2
  • 5
    • 79960439092 scopus 로고    scopus 로고
    • Solution of large systems of equations using approximate dynamic programming methods, Lab. for Info. and Dec. Sys
    • Report LIDS-2754, MIT, Cambridge, MA
    • D.P. Bertsekas, H. Yu, Solution of large systems of equations using approximate dynamic programming methods, Lab. for Info. and Dec. Sys. Report LIDS-2754, MIT, Cambridge, MA, 2007
    • (2007)
    • Bertsekas, D.P.1    Yu, H.2
  • 8
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • Boyan J.A. Technical update: Least-squares temporal difference learning. Machine Learning 49 (2002) 1-15
    • (2002) Machine Learning , vol.49 , pp. 1-15
    • Boyan, J.A.1
  • 9
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke S.J., and Barto A.G. Linear least-squares algorithms for temporal difference learning. Machine Learning 22 (1996) 33-57
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 10
    • 33646435300 scopus 로고    scopus 로고
    • A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
    • Choi D.S., and Van Roy B. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems: Theory and Applications 16 (2006) 207-239
    • (2006) Discrete Event Dynamic Systems: Theory and Applications , vol.16 , pp. 207-239
    • Choi, D.S.1    Van Roy, B.2
  • 11
    • 85162635208 scopus 로고
    • Monte Carlo methods for the iteration of linear operators
    • Curtiss J.H. Monte Carlo methods for the iteration of linear operators. Journal of Mathematics and Physics 32 (1953) 209-232
    • (1953) Journal of Mathematics and Physics , vol.32 , pp. 209-232
    • Curtiss, J.H.1
  • 12
    • 0041541978 scopus 로고
    • A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations
    • Meyer H.A. (Ed), Wiley, New York, NY
    • Curtiss J.H. A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations. In: Meyer H.A. (Ed). Symposium on Monte Carlo Methods (1954), Wiley, New York, NY 191-233
    • (1954) Symposium on Monte Carlo Methods , pp. 191-233
    • Curtiss, J.H.1
  • 14
    • 0014705837 scopus 로고
    • A retrospective and prospective survey of the Monte Carlo method
    • Halton J.H. A retrospective and prospective survey of the Monte Carlo method. SIAM Review 12 (1970) 1-63
    • (1970) SIAM Review , vol.12 , pp. 1-63
    • Halton, J.H.1
  • 17
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • Menache I., Mannor S., and Shimkin N. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134 (2005)
    • (2005) Annals of Operations Research , vol.134
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 23
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton R.S. Learning to predict by the methods of temporal differences. Machine Learning 3 (1988) 9-44
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 24
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis J.N., and Van Roy B. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42 (1997) 674-690
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 25
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • Tsitsiklis J.N., and Van Roy B. Average cost temporal-difference learning. Automatica 35 (1999) 1799-1808
    • (1999) Automatica , vol.35 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 26
    • 0033351917 scopus 로고    scopus 로고
    • Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
    • Tsitsiklis J.N., and Van Roy B. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives. IEEE Transactions on Automatic Control 44 (1999) 1840-1851
    • (1999) IEEE Transactions on Automatic Control , vol.44 , pp. 1840-1851
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 27
    • 61849175825 scopus 로고    scopus 로고
    • Approximate dynamic programming with applications in multi-agent systems, Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science, MIT
    • M.J. Valenti, Approximate dynamic programming with applications in multi-agent systems, Ph.D. Thesis, Dept. of Electrical Engineering and Computer Science, MIT, 2007
    • (2007)
    • Valenti, M.J.1
  • 29
    • 34547991475 scopus 로고    scopus 로고
    • Convergence results for some temporal difference methods based on least squares, Lab. for Info. and Dec. Sys
    • Report 2697, MIT, 2006
    • H. Yu, D.P. Bertsekas, Convergence results for some temporal difference methods based on least squares, Lab. for Info. and Dec. Sys. Report 2697, MIT, 2006
    • Yu, H.1    Bertsekas, D.P.2
  • 30
    • 61849092475 scopus 로고    scopus 로고
    • A least squares Q-learning algorithm for optimal stopping problems, Lab. for Info. and Dec. Sys
    • Report 2731, MIT, 2007
    • H. Yu, D.P. Bertsekas, A least squares Q-learning algorithm for optimal stopping problems, Lab. for Info. and Dec. Sys. Report 2731, MIT, 2007
    • Yu, H.1    Bertsekas, D.P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.