메뉴 건너뛰기




Volumn 2, Issue , 2010, Pages 709-714

Optimal policy switching algorithms for reinforcement learning

Author keywords

Markov Decision Processes; Policy gradient; Reinforcement learning; Temporal abstraction

Indexed keywords

GRADIENT METHODS; LEARNING ALGORITHMS; MARKOV PROCESSES; MULTI AGENT SYSTEMS; OPTIMIZATION; REINFORCEMENT LEARNING;

EID: 80053022338     PISSN: 15488403     EISSN: 15582914     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (34)

References (14)
  • 1
    • 0141988716 scopus 로고    scopus 로고
    • Recent advances in hierarchical reinforcement learning
    • A. G. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341-379, 2003.
    • (2003) Discrete Event Dynamic Systems , vol.13 , Issue.4 , pp. 341-379
    • Barto, A.G.1    Mahadevan, S.2
  • 2
    • 31844447221 scopus 로고    scopus 로고
    • Identifying useful subgoals in reinforcement learning by local graph partitioning
    • Ö. Şimşek, A. P. Wolfe, and A. G. Barto. Identifying useful subgoals in reinforcement learning by local graph partitioning. In ICML, pages 816-823, 2005.
    • (2005) ICML , pp. 816-823
    • Şimşek, O.1    Wolfe, A.P.2    Barto, A.G.3
  • 3
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 1999.
    • (1999) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 4
    • 0013465036 scopus 로고    scopus 로고
    • Discovering hierarchy in reinforcement learning with HEXQ
    • B. Hengst. Discovering hierarchy in reinforcement learning with HEXQ. In ICML, pages 243-250, 2002.
    • (2002) ICML , pp. 243-250
    • Hengst, B.1
  • 5
  • 6
    • 0013465187 scopus 로고    scopus 로고
    • Automatic discovery of subgoals in reinforcement learning using diverse density
    • A. Mcgovern and A. G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In ICML, pages 361-368, 2001.
    • (2001) ICML , pp. 361-368
    • Mcgovern, A.1    Barto, A.G.2
  • 7
    • 56449130136 scopus 로고    scopus 로고
    • Automatic discovery and transfer of MAXQ hierarchies
    • N. Mehta, S. Ray, P. Tadepalli, and T. G. Dietterich. Automatic discovery and transfer of MAXQ hierarchies. In ICML, pages 648-655, 2008.
    • (2008) ICML , pp. 648-655
    • Mehta, N.1    Ray, S.2    Tadepalli, P.3    Dietterich, T.G.4
  • 8
    • 84945250000 scopus 로고    scopus 로고
    • Q-cut - Dynamic discovery of sub-goals in reinforcement learning
    • I. Menache, S. Mannor, and N. Shimkin. Q-cut - dynamic discovery of sub-goals in reinforcement learning. In ECML, pages 295-306, 2002.
    • (2002) ECML , pp. 295-306
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 9
    • 84898956770 scopus 로고    scopus 로고
    • Reinforcement learning with hierarchies of machines
    • R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In NIPS, 1998.
    • (1998) NIPS
    • Parr, R.1    Russell, S.2
  • 12
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • R. S. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 112:181-211, 1999.
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 14
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In NIPS, pages 1057-1063, 2000.
    • (2000) NIPS , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.