SCOPUS 정보 검색 플랫폼

Discrete Event Dynamic Systems: Theory and Applications

Volumn 16, Issue 2, 2006, Pages 207-239

A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Dynamic programming; Kalman filter; Optimal stopping; Queueing; Recursive least squares; Reinforcement learning; Temporal difference learning

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; DYNAMIC PROGRAMMING; LEARNING SYSTEMS; QUEUEING THEORY; RANDOM PROCESSES;

OPTIMAL STOPPING; QUEUEING; RECURSIVE LEAST-SQUARES; REINFORCEMENT LEARNING; TEMPORAL-DIFFERENCE LEARNING;

KALMAN FILTERING;

EID: 33646435300 PISSN: 09246703 EISSN: None Source Type: Journal
DOI: 10.1007/s10626-006-8134-8 Document Type: Article

Times cited : (60)

References (30)

1
- 85156187730
- Improving elevator performance using reinforcement learning
- Barto A, Crites R 1996. Improving elevator performance using reinforcement learning, Adv Neural Inf Process Syst, 8:1017-1023.
- (1996) Adv Neural Inf Process Syst , vol.8 , pp. 1017-1023
- Barto, A.¹ Crites, R.²

2
- 84968519017
- Functional approximations and dynamic programming
- Bellman R, Dreyfuss S 1959. Functional approximations and dynamic programming, Math Tables Other Aids Comput, 13:247-251.
- (1959) Math Tables Other Aids Comput , vol.13 , pp. 247-251
- Bellman, R.¹ Dreyfuss, S.²

3
- 0003778897
- Berlin Heidelberg New York: Springer-Verlag
- Benveniste A, Métivier M, and Priouret P 1991. Adaptive Algorithms and Stochastic Approximations. Berlin Heidelberg New York: Springer-Verlag
- (1991) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Métivier, M.² Priouret, P.³

4
- 0003713964
- Athena Scientific
- Bertsekas DP 1995a. Nonlinear Programming. Athena Scientific.
- (1995) Nonlinear Programming
- Bertsekas, D.P.¹

5
- 0003565783
- Athena Scientific
- Bertsekas DP 1995b. Dynamic Programming and Optimal Control. Athena Scientific.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

6
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- MIT
- Bertsekas DP, Singh S 1997. Reinforcement learning for dynamic channel allocation in cellular telephone systems. Adv Neural Inf Process Syst. MIT, vol. 9, p. 974.
- (1997) Adv Neural Inf Process Syst. , vol.9 , pp. 974
- Bertsekas, D.P.¹ Singh, S.²

7
- 0003487482
- Athena Scientific
- Bertsekas DP, Tsitsiklis JN 1995. Neuro-Dynamic Programming. Athena Scientific.
- (1995) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

8
- 0003500973
- Berlin Heidelberg New York: Springer-Verlag
- Borkar V 1995. Probability theory: an advanced course. Berlin Heidelberg New York: Springer-Verlag
- (1995) Probability Theory: An Advanced Course
- Borkar, V.¹

9
- 0038595396
- Least-squares temporal difference learning
- Boyan J 1999. Least-squares temporal difference learning. Proceedings of the Sixteenth International Conference (ICML) on Machine Learning, pp. 49-56.
- (1999) Proceedings of the Sixteenth International Conference (ICML) on Machine Learning , pp. 49-56
- Boyan, J.¹

10
- 0036832950
- Technical update: Least-squares temporal difference learning
- Boyan J 2002. Technical update: least-squares temporal difference learning, Mach Learn, 49(2):233-246.
- (2002) Mach Learn , vol.49 , Issue.2 , pp. 233-246
- Boyan, J.¹

11
- 0001771345
- Linear least-squares algorithms for temporal-difference learning
- Bradtke SJ, Barto AG 1996. Linear least-squares algorithms for temporal-difference learning, Mach Learn. 22:33-57.
- (1996) Mach Learn , vol.22 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

12
- 33646386989
- A generalized kalman filter for fixed point approximation and efficient temporal-difference learning
- Choi DS, Van Roy B 2001. A generalized kalman filter for fixed point approximation and efficient temporal-difference learning, proceedings of the international joint conference on machine learning.
- (2001) Proceedings of the International Joint Conference on Machine Learning
- Choi, D.S.¹ Van Roy, B.²

13
- 0000430514
- The convergence of TD(λ) for general (λ)
- Dayan PD 1992. The convergence of TD(λ) for general (λ), Mach Learn, 8:341-362.
- (1992) Mach Learn , vol.8 , pp. 341-362
- Dayan, P.D.¹

14
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- de Farias DP, Van Roy B 2000. On the existence of fixed points for approximate value iteration and temporal-difference learning, J Optim Theory Appl, 105(3).
- (2000) J Optim Theory Appl , vol.105 , Issue.3
- De Farias, D.P.¹ Van Roy, B.²

15
- 0003786198
- Incremental learning of evaluation functions for absorbing markov chains
- preprint
- Gurvits L, Lin LJ, and Hanson SJ 1994. incremental learning of evaluation functions for absorbing markov chains: New Methods and Theorems, preprint.
- (1994) New Methods and Theorems
- Gurvits, L.¹ Lin, L.J.² Hanson, S.J.³

16
- 0004011154
- Berlin Heidelberg New York: Springer
- Karatzas I, Shreve SE 1998. Methods of Mathematical Finance. Berlin Heidelberg New York: Springer.
- (1998) Methods of Mathematical Finance
- Karatzas, I.¹ Shreve, S.E.²

17
- 84898963274
- Model-free least-squares policy iteration
- Lagoudakis M, Parr R 2001. Model-free least-squares policy iteration. Neural Inf Process Syst (NPIS-14).
- (2001) Neural Inf Process Syst (NPIS-14)
- Lagoudakis, M.¹ Parr, R.²

18
- 33646436235
- Policy evaluation algorithms with linear function approximation
- MIT Laboratory for Information and Decision Systems, December 2001
- Nedic A, Bertsekas DP 2001. Policy evaluation algorithms with linear function approximation. Tech. Rep. LIDS-P-2537, MIT Laboratory for Information and Decision Systems, December 2001.
- (2001) Tech. Rep. , vol.LIDS-P-2537
- Nedic, A.¹ Bertsekas, D.P.²

19
- 0003276733
- Mean-field analysis for batched TD(λ)
- Pineda F 1997. Mean-field analysis for batched TD(λ). Neural Comput, 1403-1419.
- (1997) Neural Comput , pp. 1403-1419
- Pineda, F.¹

20
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton RS 1988. Learning to predict by the method of temporal differences, Mach Learn, 3:9-44.
- (1988) Mach Learn , vol.3 , pp. 9-44
- Sutton, R.S.¹

21
- 0035283402
- On the convergence of temporal-difference learning with linear function approximation
- Tadić V 2001. On the convergence of temporal-difference learning with linear function approximation, Mach Learn, 42:241-267.
- (2001) Mach Learn , vol.42 , pp. 241-267
- Tadić, V.¹

22
- 0029276036
- Temporal difference learning and TD-gammon
- Tesauro G 1995. Temporal difference learning and TD-gammon, Communications of the ACM, 38(3).
- (1995) Communications of the ACM , vol.38 , Issue.3
- Tesauro, G.¹

23
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis JN, Van Roy B 1997. An analysis of temporal-difference learning with function approximation, IEEE Trans Automat Contr, 42:674-690.
- (1997) IEEE Trans Automat Contr , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

24
- 0033351917
- Optimal stopping of markov processes: Hilbert Space Theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
- Tsitsiklis JN, Van Roy B 1999. Optimal stopping of markov processes: Hilbert Space Theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives, IEEE Trans Automat Contr, 44(10):1840-1851.
- (1999) IEEE Trans Automat Contr , vol.44 , Issue.10 , pp. 1840-1851
- Tsitsiklis, J.N.¹ Van Roy, B.²

25
- 0003787427
- Ph.D. dissertation, MIT
- Van Roy B 1998. Learning and value function approximation in complex decision processes, Ph.D. dissertation, MIT.
- (1998) Learning and Value Function Approximation in Complex Decision Processes
- Van Roy, B.¹

26
- 33646430921
- A Neuro-dynamic programming approach to retailer inventory management
- Van Roy B, Bertsekas DP, Lee Y, and Tsitsiklis JN 1999. A Neuro-dynamic programming approach to retailer inventory management, Proc. of the IEEE Conf Decis Contr.
- (1999) Proc. of the IEEE Conf Decis Contr.
- Van Roy, B.¹ Bertsekas, D.P.² Lee, Y.³ Tsitsiklis, J.N.⁴

27
- 0022060331
- Extensions of the multiarmed bandit problem: The discounted case
- Varaiya P, Walrand J, and Buyukkoc C 1985. Extensions of the multiarmed bandit problem: the discounted case, IEEE Trans Automat Contr, 30(5).
- (1985) IEEE Trans Automat Contr , vol.30 , Issue.5
- Varaiya, P.¹ Walrand, J.² Buyukkoc, C.³

28
- 0000885533
- Relative loss bounds for temporal-difference learning
- Warmuth M, Forster J 2000. Relative loss bounds for temporal-difference learning. Proc. of the Seventeenth International Conference on Machine Learning, pp. 295-302.
- (2000) Proc. of the Seventeenth International Conference on Machine Learning , pp. 295-302
- Warmuth, M.¹ Forster, J.²

29
- 0013419177
- On the worst-case analysis of temporal-difference learning algorithms
- 2
- Warmuth M, Schapire R 1997. On the worst-case analysis of temporal-difference learning algorithms, Journal of Machine Learning, 22(1,2,3):95-121.
- (1997) Journal of Machine Learning , vol.22 , Issue.1-3 , pp. 95-121
- Warmuth, M.¹ Schapire, R.²

30
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- Zhang W, Dietterich TG 1995. A reinforcement learning approach to job-shop scheduling. Proc. of the International Joint Conference on Artificial Intellience.
- (1995) Proc. of the International Joint Conference on Artificial Intellience
- Zhang, W.¹ Dietterich, T.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.