메뉴 건너뛰기




Volumn 42, Issue 5, 1997, Pages 674-690

An analysis of temporal-difference learning with function approximation

Author keywords

Dynamic programming; Function approximation; Markov chains; Neuro dynamic programming; Renforcement learning; Temporal difference learning

Indexed keywords

APPROXIMATION THEORY; CONVERGENCE OF NUMERICAL METHODS; COSTS; DYNAMIC PROGRAMMING; MARKOV PROCESSES; OPTIMAL CONTROL SYSTEMS; PARAMETER ESTIMATION; PROBABILITY; STATE SPACE METHODS;

EID: 0031143730     PISSN: 00189286     EISSN: None     Source Type: Journal    
DOI: 10.1109/9.580874     Document Type: Article
Times cited : (1135)

References (25)
  • 2
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learning, vol. 3, pp. 9-44, 1988.
    • (1988) Mach. Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 4
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Mach. Learning, vol. 16, pp. 185-202, 1994.
    • (1994) Mach. Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 5
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Comp., vol. 6, no. 6, pp. 1185-1201, 1994.
    • (1994) Neural Comp. , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 6
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • P. D. Dayan and T. J. Sejnowski, "TD(λ) converges with probability 1," Mach. Learning, vol. 14, pp. 295-301, 1994.
    • (1994) Mach. Learning , vol.14 , pp. 295-301
    • Dayan, P.D.1    Sejnowski, T.J.2
  • 8
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • P. D. Dayan, "The convergence of TD(λ) for general λ" Mach. Learning, vol. 8, pp. 341-362, 1992.
    • (1992) Mach. Learning , vol.8 , pp. 341-362
    • Dayan, P.D.1
  • 9
    • 0013419177 scopus 로고    scopus 로고
    • On the worst-case analysis of temporal-difference learning algorithms
    • R. E. Schapire and M. K. Warmuth, "On the worst-case analysis of temporal-difference learning algorithms," Mach. Learning, vol. 22, pp. 95-122, 1996.
    • (1996) Mach. Learning , vol.22 , pp. 95-122
    • Schapire, R.E.1    Warmuth, M.K.2
  • 10
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Mach. Learning, vol. 22, pp. 59-94, 1996.
    • (1996) Mach. Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 12
    • 0000224681 scopus 로고
    • Reinforcement learning with soft state aggregation
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds Cambridge, MA: MIT Press
    • S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds Cambridge, MA: MIT Press, 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 13
    • 85151728371 scopus 로고
    • Residual algorithms Reinforcement learning with function approximation
    • July 9-12, Prieditis and Russell, Eds. San Francisco, CA: Morgan Kaufman
    • L. C. Baird, " Residual algorithms Reinforcement learning with function approximation," in Machine Learning: Proceedings 12th Int. Conf., July 9-12, Prieditis and Russell, Eds. San Francisco, CA: Morgan Kaufman, 1995.
    • (1995) Machine Learning: Proceedings 12th Int. Conf.
    • Baird, L.C.1
  • 14
    • 0001133021 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • MIT Press
    • J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems, vol. 7. MIT Press, 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Boyan, J.A.1    Moore, A.W.2
  • 15
    • 33746944751 scopus 로고
    • On the virtues of linear learning and trajectory distributions
    • Carnegie Mellon Univ., Tech. Rep. CMU-CS-95-206
    • R. S. Sutton, "On the virtues of linear learning and trajectory distributions," in Proc. Wkshp. Value Function Approximation, Mach. Learning Conf., Carnegie Mellon Univ., Tech. Rep. CMU-CS-95-206, 1995.
    • (1995) Proc. Wkshp. Value Function Approximation, Mach. Learning Conf.
    • Sutton, R.S.1
  • 17
    • 33746958635 scopus 로고    scopus 로고
    • private communication
    • L. Gurvits, 1996, private communication.
    • (1996)
    • Gurvits, L.1
  • 21
    • 33746944242 scopus 로고
    • On the settling time of the congested GI/G/1 queue
    • G. D. Stamoulis and J. N. Tsitsiklis, "On the settling time of the congested GI/G/1 queue," Adv. Appl. Probability, vol. 22, pp. 929-956, 1990.
    • (1990) Adv. Appl. Probability , vol.22 , pp. 929-956
    • Stamoulis, G.D.1    Tsitsiklis, J.N.2
  • 22
    • 0010940692 scopus 로고
    • On the cut-off phenomenon in some queueing systems
    • P. Konstantopoulos and F. Baccelli, "On the cut-off phenomenon in some queueing systems," J. Appl. Probability, vol. 28, pp. 683-694, 1991.
    • (1991) J. Appl. Probability , vol.28 , pp. 683-694
    • Konstantopoulos, P.1    Baccelli, F.2
  • 23
    • 0000268954 scopus 로고
    • A counterexample to temporal-difference learning
    • D. P. Bertsekas, "A counterexample to temporal-difference learning," Neural Comp., vol. 7, pp. 270-279, 1994.
    • (1994) Neural Comp. , vol.7 , pp. 270-279
    • Bertsekas, D.P.1
  • 24
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • S. P. Singh and R. S. Sutton, "Reinforcement learning with replacing eligibility traces," Mach. Learning, vol. 22, pp. 123-158, 1996.
    • (1996) Mach. Learning , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 25
    • 0000723997 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press
    • R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Sutton, R.S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.