SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

31st AAAI Conference on Artificial Intelligence, AAAI 2017

Volumn , Issue , 2017, Pages 1726-1734

The option-critic architecture

(3) Bacon, Pierre Luc a Harb, Jean a Precup, Doina a

a MCGILL UNIVERSITY (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

ABSTRACTING; REINFORCEMENT LEARNING;

NEW OPTIONS; POLICY GRADIENT; SCALING-UP; SUBGOALS; TEMPORAL ABSTRACTION; TERMINATION CONDITION;

ARTIFICIAL INTELLIGENCE;

EID: 85030457046 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1022)

References (30)

1
- 0004370245
- Advantage updating
- Wright Laboratory
- Baird, L. C. 1993. Advantage updating. Technical Report WL-TR-93-1146, Wright Laboratory.
- (1993) Technical Report WL-TR-93-1146
- Baird, L.C.¹

2
- 84879976780
- The arcade learning environment: An evaluation platform for general agents
- Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47: 253-279.
- (2013) Journal of Artificial Intelligence Research , vol.47 , pp. 253-279
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

3
- 80053022338
- Optimal policy switching algorithms for reinforcement learning
- Comanici, G., and Precup, D. 2010. Optimal policy switching algorithms for reinforcement learning. In AAMAS, 709-714.
- (2010) AAMAS , pp. 709-714
- Comanici, G.¹ Precup, D.²

4
- 78651097494
- Skill characterization based on betweenness
- ŞimŞek, O., and Barto, A. G. 2009. Skill characterization based on betweenness. In NIPS 21, 1497-1504.
- (2009) NIPS , vol.21 , pp. 1497-1504
- ŞimŞek, O.¹ Barto, A.G.²

5
- 84982855450
- Probabilistic inference for determining options in reinforcement learning
- Daniel, C.; van Hoof, H.; Peters, J.; and Neumann, G. 2016. Probabilistic inference for determining options in reinforcement learning. Machine Learning, Special Issue 104 (2): 337-357.
- (2016) Machine Learning, Special Issue , vol.104 , Issue.2 , pp. 337-357
- Daniel, C.¹ Van Hoof, H.² Peters, J.³ Neumann, G.⁴

6
- 85030452198
- Master's thesis, McGill University
- Harb, J. 2016. Learning options in deep reinforcement learning. Master's thesis, McGill University.
- (2016) Learning Options in Deep Reinforcement Learning
- Harb, J.¹

7
- 84898938510
- Actor-critic algorithms
- Konda, V. R., and Tsitsiklis, J. N. 2000. Actor-critic algorithms. In NIPS 12, 1008-1014.
- (2000) NIPS , vol.12 , pp. 1008-1014
- Konda, V.R.¹ Tsitsiklis, J.N.²

8
- 80055032021
- Skill discovery in continuous reinforcement learning domains using skill chaining
- Konidaris, G., and Barto, A. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In NIPS 22, 1015-1023.
- (2009) NIPS , vol.22 , pp. 1015-1023
- Konidaris, G.¹ Barto, A.²

9
- 84992756326
- Autonomous skill acquisition on a mobile manipulator
- Konidaris, G.; Kuindersma, S.; Grupen, R. A.; and Barto, A. G. 2011. Autonomous skill acquisition on a mobile manipulator. In AAAI.
- (2011) AAAI
- Konidaris, G.¹ Kuindersma, S.² Grupen, R.A.³ Barto, A.G.⁴

10
- 84989329430
- CoRR abs/1605. 05359
- Krishnamurthy, R.; Lakshminarayanan, A. S.; Kumar, P.; and Ravindran, B. 2016. Hierarchical reinforcement learning using spatio-temporal abstractions and deep neural networks. CoRR abs/1605. 05359.
- (2016) Hierarchical Reinforcement Learning Using Spatio-temporal Abstractions and Deep Neural Networks
- Krishnamurthy, R.¹ Lakshminarayanan, A.S.² Kumar, P.³ Ravindran, B.⁴

11
- 85019246453
- Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation
- Kulkarni, T.; Narasimhan, K.; Saeedi, A.; and Tenenbaum, J. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In NIPS 29.
- (2016) NIPS , pp. 29
- Kulkarni, T.¹ Narasimhan, K.² Saeedi, A.³ Tenenbaum, J.⁴

12
- 84861674687
- Unified inter and intra options learning using policy gradient methods
- Levy, K. Y., and Shimkin, N. 2011. Unified inter and intra options learning using policy gradient methods. In EWRL, 153-164.
- (2011) EWRL , pp. 153-164
- Levy, K.Y.¹ Shimkin, N.²

13
- 85018897525
- Adaptive skills, adaptive partitions (ASAP)
- Mankowitz, D. J.; Mann, T. A.; and Mannor, S. 2016. Adaptive skills, adaptive partitions (ASAP). In NIPS 29.
- (2016) NIPS , pp. 29
- Mankowitz, D.J.¹ Mann, T.A.² Mannor, S.³

14
- 84919807958
- Timeregularized interrupting options (TRIO)
- Mann, T. A.; Mankowitz, D. J.; and Mannor, S. 2014. Timeregularized interrupting options (TRIO). In ICML, 1350-1358.
- (2014) ICML , pp. 1350-1358
- Mann, T.A.¹ Mankowitz, D.J.² Mannor, S.³

15
- 84938498958
- Approximate value iteration with temporally extended actions
- Mann, T. A.; Mannor, S.; and Precup, D. 2015. Approximate value iteration with temporally extended actions. Journal of Artificial Intelligence Research 53: 375-438.
- (2015) Journal of Artificial Intelligence Research , vol.53 , pp. 375-438
- Mann, T.A.¹ Mannor, S.² Precup, D.³

16
- 0013465187
- Automatic discovery of subgoals in reinforcement learning using diverse density
- McGovern, A., and Barto, A. G. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In ICML, 361-368.
- (2001) ICML , pp. 361-368
- McGovern, A.¹ Barto, A.G.²

17
- 84945250000
- Q-cut-dynamic discovery of sub-goals in reinforcement learning
- Menache, I.; Mannor, S.; and Shimkin, N. 2002. Q-cut-dynamic discovery of sub-goals in reinforcement learning. In ECML, 295-306.
- (2002) ECML , pp. 295-306
- Menache, I.¹ Mannor, S.² Shimkin, N.³

18
- 84904867557
- CoRR abs/1312. 5602
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; and Riedmiller, M. A. 2013. Playing atari with deep reinforcement learning. CoRR abs/1312. 5602.
- (2013) Playing Atari with Deep Reinforcement Learning
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.A.⁷

19
- 84999036937
- Asynchronous methods for deep reinforcement learning
- Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T. P.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In ICML.
- (2016) ICML
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.P.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

20
- 84899793528
- Ph. D. Dissertation, University of Massachusetts, Amherst
- Niekum, S. 2013. Semantically Grounded Learning from Unstructured Demonstrations. Ph. D. Dissertation, University of Massachusetts, Amherst.
- (2013) Semantically Grounded Learning from Unstructured Demonstrations
- Niekum, S.¹

21
- 0003392384
- Ph. D. Dissertation, University of Massachusetts, Amherst
- Precup, D. 2000. Temporal abstraction in reinforcement learning. Ph. D. Dissertation, University of Massachusetts, Amherst.
- (2000) Temporal Abstraction in Reinforcement Learning
- Precup, D.¹

22
- 0003998452
- John Wiley & Sons, Inc
- Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

23
- 84867135062
- Compositional planning using optimal option models
- Silver, D., and Ciosek, K. 2012. Compositional planning using optimal option models. In ICML.
- (2012) ICML
- Silver, D.¹ Ciosek, K.²

24
- 84868298774
- Linear options
- Sorg, J., and Singh, S. P. 2010. Linear options. In AAMAS, 31-38.
- (2010) AAMAS , pp. 31-38
- Sorg, J.¹ Singh, S.P.²

25
- 84912073624
- Learning options in reinforcement learning
- Stolle, M., and Precup, D. 2002. Learning options in reinforcement learning. In Abstraction, Reformulation and Approximation, 5th International Symposium, SARA Proceedings, 212-223.
- (2002) Abstraction, Reformulation and Approximation, 5th International Symposium, SARA Proceedings , pp. 212-223
- Stolle, M.¹ Precup, D.²

26
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In NIPS 12. 1057-1063.
- (2000) NIPS 12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

27
- 0033170372
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
- Sutton, R. S.; Precup, D.; and Singh, S. P. 1999. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1-2): 181-211.
- (1999) Artificial Intelligence , vol.112 , Issue.1-2 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.P.³

28
- 0003617454
- Ph. D. Dissertation
- Sutton, R. S. 1984. Temporal Credit Assignment in Reinforcement Learning. Ph. D. Dissertation.
- (1984) Temporal Credit Assignment in Reinforcement Learning
- Sutton, R.S.¹

29
- 84919794661
- Bias in natural actor-critic algorithms
- Thomas, P. 2014. Bias in natural actor-critic algorithms. In ICML, 441-448.
- (2014) ICML , pp. 441-448
- Thomas, P.¹

30
- 85017563254
- Strategic attentive writer for learning macro-actions
- Vezhnevets, A. S.; Mnih, V.; Agapiou, J.; Osindero, S.; Graves, A.; Vinyals, O.; and Kavukcuoglu, K. 2016. Strategic attentive writer for learning macro-actions. In NIPS 29.
- (2016) NIPS 29
- Vezhnevets, A.S.¹ Mnih, V.² Agapiou, J.³ Osindero, S.⁴ Graves, A.⁵ Vinyals, O.⁶ Kavukcuoglu, K.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.