메뉴 건너뛰기




Volumn 16, Issue 2, 2006, Pages 207-239

A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning

Author keywords

Dynamic programming; Kalman filter; Optimal stopping; Queueing; Recursive least squares; Reinforcement learning; Temporal difference learning

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; DYNAMIC PROGRAMMING; LEARNING SYSTEMS; QUEUEING THEORY; RANDOM PROCESSES;

EID: 33646435300     PISSN: 09246703     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10626-006-8134-8     Document Type: Article
Times cited : (60)

References (30)
  • 1
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • Barto A, Crites R 1996. Improving elevator performance using reinforcement learning, Adv Neural Inf Process Syst, 8:1017-1023.
    • (1996) Adv Neural Inf Process Syst , vol.8 , pp. 1017-1023
    • Barto, A.1    Crites, R.2
  • 2
    • 84968519017 scopus 로고
    • Functional approximations and dynamic programming
    • Bellman R, Dreyfuss S 1959. Functional approximations and dynamic programming, Math Tables Other Aids Comput, 13:247-251.
    • (1959) Math Tables Other Aids Comput , vol.13 , pp. 247-251
    • Bellman, R.1    Dreyfuss, S.2
  • 6
    • 84898972974 scopus 로고    scopus 로고
    • Reinforcement learning for dynamic channel allocation in cellular telephone systems
    • MIT
    • Bertsekas DP, Singh S 1997. Reinforcement learning for dynamic channel allocation in cellular telephone systems. Adv Neural Inf Process Syst. MIT, vol. 9, p. 974.
    • (1997) Adv Neural Inf Process Syst. , vol.9 , pp. 974
    • Bertsekas, D.P.1    Singh, S.2
  • 10
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • Boyan J 2002. Technical update: least-squares temporal difference learning, Mach Learn, 49(2):233-246.
    • (2002) Mach Learn , vol.49 , Issue.2 , pp. 233-246
    • Boyan, J.1
  • 11
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal-difference learning
    • Bradtke SJ, Barto AG 1996. Linear least-squares algorithms for temporal-difference learning, Mach Learn. 22:33-57.
    • (1996) Mach Learn , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 13
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general (λ)
    • Dayan PD 1992. The convergence of TD(λ) for general (λ), Mach Learn, 8:341-362.
    • (1992) Mach Learn , vol.8 , pp. 341-362
    • Dayan, P.D.1
  • 14
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • de Farias DP, Van Roy B 2000. On the existence of fixed points for approximate value iteration and temporal-difference learning, J Optim Theory Appl, 105(3).
    • (2000) J Optim Theory Appl , vol.105 , Issue.3
    • De Farias, D.P.1    Van Roy, B.2
  • 15
    • 0003786198 scopus 로고
    • Incremental learning of evaluation functions for absorbing markov chains
    • preprint
    • Gurvits L, Lin LJ, and Hanson SJ 1994. incremental learning of evaluation functions for absorbing markov chains: New Methods and Theorems, preprint.
    • (1994) New Methods and Theorems
    • Gurvits, L.1    Lin, L.J.2    Hanson, S.J.3
  • 18
    • 33646436235 scopus 로고    scopus 로고
    • Policy evaluation algorithms with linear function approximation
    • MIT Laboratory for Information and Decision Systems, December 2001
    • Nedic A, Bertsekas DP 2001. Policy evaluation algorithms with linear function approximation. Tech. Rep. LIDS-P-2537, MIT Laboratory for Information and Decision Systems, December 2001.
    • (2001) Tech. Rep. , vol.LIDS-P-2537
    • Nedic, A.1    Bertsekas, D.P.2
  • 19
    • 0003276733 scopus 로고    scopus 로고
    • Mean-field analysis for batched TD(λ)
    • Pineda F 1997. Mean-field analysis for batched TD(λ). Neural Comput, 1403-1419.
    • (1997) Neural Comput , pp. 1403-1419
    • Pineda, F.1
  • 20
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton RS 1988. Learning to predict by the method of temporal differences, Mach Learn, 3:9-44.
    • (1988) Mach Learn , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 21
    • 0035283402 scopus 로고    scopus 로고
    • On the convergence of temporal-difference learning with linear function approximation
    • Tadić V 2001. On the convergence of temporal-difference learning with linear function approximation, Mach Learn, 42:241-267.
    • (2001) Mach Learn , vol.42 , pp. 241-267
    • Tadić, V.1
  • 22
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-gammon
    • Tesauro G 1995. Temporal difference learning and TD-gammon, Communications of the ACM, 38(3).
    • (1995) Communications of the ACM , vol.38 , Issue.3
    • Tesauro, G.1
  • 23
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis JN, Van Roy B 1997. An analysis of temporal-difference learning with function approximation, IEEE Trans Automat Contr, 42:674-690.
    • (1997) IEEE Trans Automat Contr , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 24
    • 0033351917 scopus 로고    scopus 로고
    • Optimal stopping of markov processes: Hilbert Space Theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
    • Tsitsiklis JN, Van Roy B 1999. Optimal stopping of markov processes: Hilbert Space Theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives, IEEE Trans Automat Contr, 44(10):1840-1851.
    • (1999) IEEE Trans Automat Contr , vol.44 , Issue.10 , pp. 1840-1851
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 27
    • 0022060331 scopus 로고
    • Extensions of the multiarmed bandit problem: The discounted case
    • Varaiya P, Walrand J, and Buyukkoc C 1985. Extensions of the multiarmed bandit problem: the discounted case, IEEE Trans Automat Contr, 30(5).
    • (1985) IEEE Trans Automat Contr , vol.30 , Issue.5
    • Varaiya, P.1    Walrand, J.2    Buyukkoc, C.3
  • 29
    • 0013419177 scopus 로고    scopus 로고
    • On the worst-case analysis of temporal-difference learning algorithms
    • 2
    • Warmuth M, Schapire R 1997. On the worst-case analysis of temporal-difference learning algorithms, Journal of Machine Learning, 22(1,2,3):95-121.
    • (1997) Journal of Machine Learning , vol.22 , Issue.1-3 , pp. 95-121
    • Warmuth, M.1    Schapire, R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.