메뉴 건너뛰기




Volumn 2006, Issue , 2006, Pages 4159-4164

Reinforcement learning with supervision by combining multiple learnings and expert advices

Author keywords

[No Author keywords available]

Indexed keywords

BASE AGENTS; EXPERT ADVICES; REINFORCEMENT FUNCTION;

EID: 34047195105     PISSN: 07431619     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (6)

References (28)
  • 2
    • 0004007508 scopus 로고    scopus 로고
    • Reinforcement Learning
    • J. Si, A. G. Barto, W. B. Powell, and D. Wunsch eds, pp, Wiley-IEEE Press, Piscataway, NJ
    • A. G. Barto, "Reinforcement Learning," in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. G. Barto, W. B. Powell, and D. Wunsch (eds.), pp. 804-809, Wiley-IEEE Press, Piscataway, NJ, 2004.
    • (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 804-809
    • Barto, A.G.1
  • 3
    • 84979715630 scopus 로고    scopus 로고
    • Supervised Actor-Critic Reinforcement Learning
    • J. Si, A. G. Barto, W. B. Powell, and D. Wunsch eds, pp, Wiley-IEEE Press, Piscataway, NJ
    • A. G. Barto and M. T. Rosenstein, "Supervised Actor-Critic Reinforcement Learning," in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. G. Barto, W. B. Powell, and D. Wunsch (eds.), pp. 359-380, Wiley-IEEE Press, Piscataway, NJ, 2004.
    • (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 359-380
    • Barto, A.G.1    Rosenstein, M.T.2
  • 5
    • 15744397544 scopus 로고    scopus 로고
    • An Ant System Based Exploration-Exploitation for Reinforcement Learning
    • H. S. Chang, "An Ant System Based Exploration-Exploitation for Reinforcement Learning," in Proc. of the IEEE Conf. on Systems, Man, and Cybernetics, Vol. 4, 2004, pp. 3805-3810.
    • (2004) Proc. of the IEEE Conf. on Systems, Man, and Cybernetics , vol.4 , pp. 3805-3810
    • Chang, H.S.1
  • 6
    • 0004033139 scopus 로고    scopus 로고
    • D. Corne, Fl Glover, and M. Dorigo eds, McGraw-Hill
    • D. Corne, Fl Glover, and M. Dorigo (eds.), New Ideas in Optimization, McGraw-Hill, 1999.
    • (1999) New Ideas in Optimization
  • 7
    • 0043247546 scopus 로고    scopus 로고
    • Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks
    • C. Drummond, "Accelerating Reinforcement Learning by Composing Solutions of Automatically Identified Subtasks," J. of Artificial Intelligence Research, vol. 16, 2002, pp. 59-104.
    • (2002) J. of Artificial Intelligence Research , vol.16 , pp. 59-104
    • Drummond, C.1
  • 8
    • 0002012598 scopus 로고    scopus 로고
    • The ant colony optimization metaheuristic
    • D. Corne, M. Dorigo eds, pp, McGraw-Hill, NY, USA
    • M. Dorigo and G. Di Caro, "The ant colony optimization metaheuristic," New Ideas in Optimization, D. Corne, M. Dorigo (eds.), pp. 11-32, McGraw-Hill, NY, USA, 1999.
    • (1999) New Ideas in Optimization , pp. 11-32
    • Dorigo, M.1    Di Caro, G.2
  • 9
  • 12
    • 0002357911 scopus 로고
    • Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms
    • J. D. Cowan, G. Tesauro, and J. Alspector eds, Morgan Kaufmann Publishers, Inc
    • V. Gullapalli and A. G. Barto, "Convergence of Indirect Adaptive Asynchronous Value Iteration Algorithms", Advances in Neural Information Processing Systems, J. D. Cowan, G. Tesauro, and J. Alspector (eds.), Morgan Kaufmann Publishers, Inc., vol. 6, 1994, pp. 695-702.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 695-702
    • Gullapalli, V.1    Barto, A.G.2
  • 15
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • L. J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching," Machine Learning, vol. 8, 1992, pp. 294-321.
    • (1992) Machine Learning , vol.8 , pp. 294-321
    • Lin, L.J.1
  • 16
    • 0029732210 scopus 로고    scopus 로고
    • Creating advice-taking reinforcement learners
    • R. Maclin and J.W. Shavlik, "Creating advice-taking reinforcement learners," Machine Learning, vol. 22, 1996, pp. 251-282.
    • (1996) Machine Learning , vol.22 , pp. 251-282
    • Maclin, R.1    Shavlik, J.W.2
  • 19
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and application to reward shaping
    • A. Y. Ng, D. Harada, and S. Russell, "Policy invariance under reward transformations: theory and application to reward shaping," in Proc. of the 16th Int. Conf. on Machine Learning, 1999, pp. 278-287.
    • (1999) Proc. of the 16th Int. Conf. on Machine Learning , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.3
  • 21
    • 8744269435 scopus 로고    scopus 로고
    • Reinforcement learning with super- vision by a stable controller
    • M. Rosenstein and A.G. Barto, "Reinforcement learning with super- vision by a stable controller," in Proc. of the American Control Conf., 2004, pp. 4517-4522.
    • (2004) Proc. of the American Control Conf , pp. 4517-4522
    • Rosenstein, M.1    Barto, A.G.2
  • 23
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement learning algorithms
    • S. Singh, T. Jaakkola, M. Littman, and C. Szepesvari, "Convergence results for single-step on-policy reinforcement learning algorithms," Machine Learning, vol. 38, pp. 287-308, 2000.
    • (2000) Machine Learning , vol.38 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.3    Szepesvari, C.4
  • 26
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Machine Learning, vol. 16, pp. 185-202, 1994.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 27
    • 0039967456 scopus 로고
    • Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems
    • Tech. Rep. NU-CCS-93-11
    • R. J. Williams and L. C. Baird, "Analysis of some incremental variants of policy iteration: first steps toward understanding actor-critic learning systems," Tech. Rep. NU-CCS-93-11. 1993.
    • (1993)
    • Williams, R.J.1    Baird, L.C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.