SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Neural Computation

Volumn 9, Issue 7, 1997, Pages 1403-1419

Mean-Field Theory for Batched TD(λ)

(1) Pineda, Fernando J a

a JOHNS HOPKINS UNIVERSITY APPLIED PHYSICS LABORATORY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 0003276733 PISSN: 08997667 EISSN: None Source Type: Journal
DOI: 10.1162/neco.1997.9.7.1403 Document Type: Article

Times cited : (19)

References (17)

1
- 0003778897
- Berlin: Springer-Verlag
- Benveniste, A., Métivier, M., & Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximations. Berlin: Springer-Verlag.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Métivier, M.² Priouret, P.³

2
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 0000430514
- The convergence of TD(λ) for general lambda
- Dayan, P. (1992). The convergence of TD(λ) for general lambda. Machine Learning, 8, 341-362.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.¹

4
- 0028388685
- TD(λ) converges with probability 1
- Dayan, P., & Sejnowski, T. J. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
- (1994) Machine Learning , vol.14 , pp. 295-301
- Dayan, P.¹ Sejnowski, T.J.²

5
- 0004236492
- Baltimore: Johns Hopkins University Press
- Golub, G. H., & Van Loan, C. F. (1996). Matrix computations (3rd ed.). Baltimore: Johns Hopkins University Press.
- (1996) Matrix Computations (3rd Ed.)
- Golub, G.H.¹ Van Loan, C.F.²

6
- 85156203891
- Stable fitted reinforcement learning
- G. Tesauro, D. Touretzky, & T. Lean (Eds.), Cambridge, MA: MIT Press
- Gordon G. J. (1996). Stable fitted reinforcement learning. In G. Tesauro, D. Touretzky, & T. Lean (Eds.), Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference (pp. 1052-1058). Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference , pp. 1052-1058
- Gordon, G.J.¹

7
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

8
- 0003452601
- New York: Springer-Verlag
- Kushner, H. J., & Clark, D. S. (1978). Stochastic approximation methods for constrained and unconstrained systems. New York: Springer-Verlag.
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems
- Kushner, H.J.¹ Clark, D.S.²

9
- 0346575867
- August 23-26. Theoretical Physics Institute, University of Minnesota
- Pineda, F. J. (1995, August 23-26). Generalization in TD(λ). Theoretical Physics Institute, University of Minnesota.
- (1995) Generalization in TD(λ)
- Pineda, F.J.¹

10
- 0346575866
- Analytical mean squared error curves in temporal difference learning
- M. Mozer, M. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
- Singh, S. P., & Dayan, P. (1996). Analytical mean squared error curves in temporal difference learning. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems, 9. Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.9
- Singh, S.P.¹ Dayan, P.²

11
- 85153965130
- Reinforcement learning with soft state aggregation
- G. Tesauro, D. Touretzky, & T. Lean (Eds.), Cambridge, MA: MIT Press
- Singh, S. P., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. Touretzky, & T. Lean (Eds.), Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference (pp. 361-368). Cambridge, MA: MIT Press.
- (1995) Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.³

12
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

13
- 0001046225
- Practial issues in temporal difference learning
- Tesauro, G. (1992). Practial issues in temporal difference learning. Machine Learning, 8, 257-277.
- (1992) Machine Learning , vol.8 , pp. 257-277
- Tesauro, G.¹

14
- 0029276036
- Temporal difference learning and TD-Gammon
- Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58-68.
- (1995) Communications of the ACM , vol.38 , pp. 58-68
- Tesauro, G.¹

15
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16, 185-202.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

16
- 0029752470
- Feature-based methods for large scale dynamic programming
- Tsitsiklis, J. N., & Van Roy, B. (1996a). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

17
- 0008813539
- (Tech. Rep. No. LIDS-P-2322). Cambridge, MA: MIT Laboratory for Information and Decision Systems
- Tsitsiklis, J. N., & Van Roy, B. (1996b). An analysis of temporal-difference learning with function approximation. (Tech. Rep. No. LIDS-P-2322). Cambridge, MA: MIT Laboratory for Information and Decision Systems.
- (1996) An Analysis of Temporal-difference Learning with Function Approximation
- Tsitsiklis, J.N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.