SCOPUS 정보 검색 플랫폼

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

Volumn 2, Issue , 2010, Pages 709-714

Optimal policy switching algorithms for reinforcement learning

(2) Comanici, Gheorghe a Precup, Doina a

a MCGILL UNIVERSITY (Canada)

Author keywords

Markov Decision Processes; Policy gradient; Reinforcement learning; Temporal abstraction

Indexed keywords

GRADIENT METHODS; LEARNING ALGORITHMS; MARKOV PROCESSES; MULTI AGENT SYSTEMS; OPTIMIZATION; REINFORCEMENT LEARNING;

GRADIENT BASED ALGORITHM; LONG-TERM RETURNS; MARKOV DECISION PROCESSES; POLICY GRADIENT; SEQUENTIAL DECISION MAKING; SWITCHING ALGORITHMS; TEMPORAL ABSTRACTION; TERMINATION CONDITION;

AUTONOMOUS AGENTS;

EID: 80053022338 PISSN: 15488403 EISSN: 15582914 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (34)

References (14)

1
- 0141988716
- Recent advances in hierarchical reinforcement learning
- A. G. Barto and S. Mahadevan. Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(4):341-379, 2003.
- (2003) Discrete Event Dynamic Systems , vol.13 , Issue.4 , pp. 341-379
- Barto, A.G.¹ Mahadevan, S.²

2
- 31844447221
- Identifying useful subgoals in reinforcement learning by local graph partitioning
- Ö. Şimşek, A. P. Wolfe, and A. G. Barto. Identifying useful subgoals in reinforcement learning by local graph partitioning. In ICML, pages 816-823, 2005.
- (2005) ICML , pp. 816-823
- Şimşek, O.¹ Wolfe, A.P.² Barto, A.G.³

3
- 0002278788
- Hierarchical reinforcement learning with the MAXQ value function decomposition
- T. G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 1999.
- (1999) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
- Dietterich, T.G.¹

4
- 0013465036
- Discovering hierarchy in reinforcement learning with HEXQ
- B. Hengst. Discovering hierarchy in reinforcement learning with HEXQ. In ICML, pages 243-250, 2002.
- (2002) ICML , pp. 243-250
- Hengst, B.¹

5
- 33750705246
- Causal graph based decomposition of factored mdps
- A. Jonsson and A. Barto. Causal graph based decomposition of factored mdps. Journal of Machine Learning Research, 7:2259-2301,2006.
- (2006) Journal of Machine Learning Research , vol.7 , pp. 2259-2301
- Jonsson, A.¹ Barto, A.²

6
- 0013465187
- Automatic discovery of subgoals in reinforcement learning using diverse density
- A. Mcgovern and A. G. Barto. Automatic discovery of subgoals in reinforcement learning using diverse density. In ICML, pages 361-368, 2001.
- (2001) ICML , pp. 361-368
- Mcgovern, A.¹ Barto, A.G.²

7
- 56449130136
- Automatic discovery and transfer of MAXQ hierarchies
- N. Mehta, S. Ray, P. Tadepalli, and T. G. Dietterich. Automatic discovery and transfer of MAXQ hierarchies. In ICML, pages 648-655, 2008.
- (2008) ICML , pp. 648-655
- Mehta, N.¹ Ray, S.² Tadepalli, P.³ Dietterich, T.G.⁴

8
- 84945250000
- Q-cut - Dynamic discovery of sub-goals in reinforcement learning
- I. Menache, S. Mannor, and N. Shimkin. Q-cut - dynamic discovery of sub-goals in reinforcement learning. In ECML, pages 295-306, 2002.
- (2002) ECML , pp. 295-306
- Menache, I.¹ Mannor, S.² Shimkin, N.³

9
- 84898956770
- Reinforcement learning with hierarchies of machines
- R. Parr and S. Russell. Reinforcement learning with hierarchies of machines. In NIPS, 1998.
- (1998) NIPS
- Parr, R.¹ Russell, S.²

10
- 0003392384
- PhD thesis, University of Massachusetts, Amherst
- D. Precup. Temporal abstraction in reinforcement learning. PhD thesis, University of Massachusetts, Amherst, 2000.
- (2000) Temporal Abstraction in Reinforcement Learning
- Precup, D.¹

11
- 85102627959
- Wiley
- M. L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

12
- 0033170372
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- R. S. Sutton, D. Precup, and S. Singh. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning. Artificial Intelligence, 112:181-211, 1999.
- (1999) Artificial Intelligence , vol.112 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

13
- 0004102479
- MIT Press
- R.S. Sutton and A.G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

14
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In NIPS, pages 1057-1063, 2000.
- (2000) NIPS , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.