메뉴 건너뛰기




Volumn 38, Issue 4, 2008, Pages 930-936

Ensemble algorithms in reinforcement learning

Author keywords

Dynamic mazes; Ensemble algorithms; Partially observable environments; Reinforcement learning (RL)

Indexed keywords

ALGORITHMS; CHLORINE COMPOUNDS; EDUCATION; LEARNING SYSTEMS; MATHEMATICAL PROGRAMMING; PROBABILITY; REINFORCEMENT; REINFORCEMENT LEARNING; SYSTEMS ENGINEERING;

EID: 49049105169     PISSN: 10834419     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSMCB.2008.920231     Document Type: Article
Times cited : (196)

References (23)
  • 3
    • 0004049893 scopus 로고
    • Learning from delayed rewards,
    • Ph.D. dissertation, King's College, Cambridge, U.K
    • C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D. dissertation, King's College, Cambridge, U.K., 1989.
    • (1989)
    • Watkins, C.J.C.H.1
  • 4
    • 49049097809 scopus 로고    scopus 로고
    • G. Rummery and M. Niranjan, On-line Q-learning using connectionist systems, Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG-TR 166, 1994.
    • G. Rummery and M. Niranjan, "On-line Q-learning using connectionist systems," Cambridge Univ., Cambridge, U.K., Tech. Rep. CUED/F-INFENG-TR 166, 1994.
  • 5
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press
    • R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, pp. 1038-1045.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1045
    • Sutton, R.S.1
  • 7
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Cambridge, MA: MIT Press
    • R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Advances in Neural Information Processing Systems, vol. 12. Cambridge, MA: MIT Press, 2000, pp. 1057-1063.
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
    • Sutton, R.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 8
    • 0013535965 scopus 로고    scopus 로고
    • Infinite-horizon policy-gradient estimation
    • J. Baxter and P. Bartlett, "Infinite-horizon policy-gradient estimation," J. Artif. Intell. Res., vol. 15, pp. 319-350, 2001.
    • (2001) J. Artif. Intell. Res , vol.15 , pp. 319-350
    • Baxter, J.1    Bartlett, P.2
  • 9
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • Oct
    • A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Mach. Learn., vol. 13, no. 1, pp. 103-130, Oct. 1993.
    • (1993) Mach. Learn , vol.13 , Issue.1 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 10
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
    • M. Riedmiller, "Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method," in Proc. 16th ECML, 2005, pp. 317-328.
    • (2005) Proc. 16th ECML , pp. 317-328
    • Riedmiller, M.1
  • 11
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • Aug
    • L. Breiman, "Bagging predictors," Mach. Learn., vol. 24, no. 2, pp. 123-140, Aug. 1996.
    • (1996) Mach. Learn , vol.24 , Issue.2 , pp. 123-140
    • Breiman, L.1
  • 14
    • 0001652790 scopus 로고
    • The efficient learning of multiple task sequences
    • J. Moody, S. Hanson, and R. Lippman, Eds. San Mateo, CA: Morgan Kaufmann
    • S. P. Singh, "The efficient learning of multiple task sequences," in Advances in Neural Information Processing Systems, vol. 4, J. Moody, S. Hanson, and R. Lippman, Eds. San Mateo, CA: Morgan Kaufmann, 1992, pp. 251-258.
    • (1992) Advances in Neural Information Processing Systems , vol.4 , pp. 251-258
    • Singh, S.P.1
  • 15
    • 0029390263 scopus 로고
    • Reinforcement learning of multiple tasks using a hierarchical CMAC architecture
    • C. Tham, "Reinforcement learning of multiple tasks using a hierarchical CMAC architecture," Robot. Auton. Syst., vol. 15, no. 4, pp. 247-274, 1995.
    • (1995) Robot. Auton. Syst , vol.15 , Issue.4 , pp. 247-274
    • Tham, C.1
  • 16
    • 0032772352 scopus 로고    scopus 로고
    • Multi-agent reinforcement learning: Weighting and partitioning
    • Jun
    • R. Sun and T. Peterson, "Multi-agent reinforcement learning: Weighting and partitioning," Neural Netw., vol. 12, no. 4/5, pp. 727-753, Jun. 1999.
    • (1999) Neural Netw , vol.12 , Issue.4-5 , pp. 727-753
    • Sun, R.1    Peterson, T.2
  • 17
    • 21844465127 scopus 로고    scopus 로고
    • Tree-based batch mode reinforcement learning
    • Dec
    • D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learn. Res., vol. 6, pp. 503-556, Dec. 2005.
    • (2005) J. Mach. Learn. Res , vol.6 , pp. 503-556
    • Ernst, D.1    Geurts, P.2    Wehenkel, L.3
  • 18
    • 34249833101 scopus 로고
    • Q-learning
    • C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3/4, pp. 279-292, 1992.
    • (1992) Mach. Learn , vol.8 , Issue.3-4 , pp. 279-292
    • Watkins, C.J.C.H.1    Dayan, P.2
  • 19
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Aug
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, Aug. 1988.
    • (1988) Mach. Learn , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 21
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press
    • J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 369-376.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 369-376
    • Boyan, J.A.1    Moore, A.W.2
  • 23
    • 0030421566 scopus 로고    scopus 로고
    • Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot
    • P. Werbos and X. Pang, "Generalized maze navigation: SRN critics solve what feedforward or Hebbian nets cannot," in Proc. IEEE Int. Conf. Syst., Man, Cybern., 1996, vol. 3, pp. 1764-1769.
    • (1996) Proc. IEEE Int. Conf. Syst., Man, Cybern , vol.3 , pp. 1764-1769
    • Werbos, P.1    Pang, X.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.