메뉴 건너뛰기




Volumn , Issue , 2017, Pages 1726-1734

The option-critic architecture

Author keywords

[No Author keywords available]

Indexed keywords

ABSTRACTING; REINFORCEMENT LEARNING;

EID: 85030457046     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (1022)

References (30)
  • 3
    • 80053022338 scopus 로고    scopus 로고
    • Optimal policy switching algorithms for reinforcement learning
    • Comanici, G., and Precup, D. 2010. Optimal policy switching algorithms for reinforcement learning. In AAMAS, 709-714.
    • (2010) AAMAS , pp. 709-714
    • Comanici, G.1    Precup, D.2
  • 4
    • 78651097494 scopus 로고    scopus 로고
    • Skill characterization based on betweenness
    • ŞimŞek, O., and Barto, A. G. 2009. Skill characterization based on betweenness. In NIPS 21, 1497-1504.
    • (2009) NIPS , vol.21 , pp. 1497-1504
    • ŞimŞek, O.1    Barto, A.G.2
  • 5
    • 84982855450 scopus 로고    scopus 로고
    • Probabilistic inference for determining options in reinforcement learning
    • Daniel, C.; van Hoof, H.; Peters, J.; and Neumann, G. 2016. Probabilistic inference for determining options in reinforcement learning. Machine Learning, Special Issue 104 (2): 337-357.
    • (2016) Machine Learning, Special Issue , vol.104 , Issue.2 , pp. 337-357
    • Daniel, C.1    Van Hoof, H.2    Peters, J.3    Neumann, G.4
  • 7
    • 84898938510 scopus 로고    scopus 로고
    • Actor-critic algorithms
    • Konda, V. R., and Tsitsiklis, J. N. 2000. Actor-critic algorithms. In NIPS 12, 1008-1014.
    • (2000) NIPS , vol.12 , pp. 1008-1014
    • Konda, V.R.1    Tsitsiklis, J.N.2
  • 8
    • 80055032021 scopus 로고    scopus 로고
    • Skill discovery in continuous reinforcement learning domains using skill chaining
    • Konidaris, G., and Barto, A. 2009. Skill discovery in continuous reinforcement learning domains using skill chaining. In NIPS 22, 1015-1023.
    • (2009) NIPS , vol.22 , pp. 1015-1023
    • Konidaris, G.1    Barto, A.2
  • 11
    • 85019246453 scopus 로고    scopus 로고
    • Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation
    • Kulkarni, T.; Narasimhan, K.; Saeedi, A.; and Tenenbaum, J. 2016. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. In NIPS 29.
    • (2016) NIPS , pp. 29
    • Kulkarni, T.1    Narasimhan, K.2    Saeedi, A.3    Tenenbaum, J.4
  • 12
    • 84861674687 scopus 로고    scopus 로고
    • Unified inter and intra options learning using policy gradient methods
    • Levy, K. Y., and Shimkin, N. 2011. Unified inter and intra options learning using policy gradient methods. In EWRL, 153-164.
    • (2011) EWRL , pp. 153-164
    • Levy, K.Y.1    Shimkin, N.2
  • 13
    • 85018897525 scopus 로고    scopus 로고
    • Adaptive skills, adaptive partitions (ASAP)
    • Mankowitz, D. J.; Mann, T. A.; and Mannor, S. 2016. Adaptive skills, adaptive partitions (ASAP). In NIPS 29.
    • (2016) NIPS , pp. 29
    • Mankowitz, D.J.1    Mann, T.A.2    Mannor, S.3
  • 14
    • 84919807958 scopus 로고    scopus 로고
    • Timeregularized interrupting options (TRIO)
    • Mann, T. A.; Mankowitz, D. J.; and Mannor, S. 2014. Timeregularized interrupting options (TRIO). In ICML, 1350-1358.
    • (2014) ICML , pp. 1350-1358
    • Mann, T.A.1    Mankowitz, D.J.2    Mannor, S.3
  • 16
    • 0013465187 scopus 로고    scopus 로고
    • Automatic discovery of subgoals in reinforcement learning using diverse density
    • McGovern, A., and Barto, A. G. 2001. Automatic discovery of subgoals in reinforcement learning using diverse density. In ICML, 361-368.
    • (2001) ICML , pp. 361-368
    • McGovern, A.1    Barto, A.G.2
  • 17
    • 84945250000 scopus 로고    scopus 로고
    • Q-cut-dynamic discovery of sub-goals in reinforcement learning
    • Menache, I.; Mannor, S.; and Shimkin, N. 2002. Q-cut-dynamic discovery of sub-goals in reinforcement learning. In ECML, 295-306.
    • (2002) ECML , pp. 295-306
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 23
    • 84867135062 scopus 로고    scopus 로고
    • Compositional planning using optimal option models
    • Silver, D., and Ciosek, K. 2012. Compositional planning using optimal option models. In ICML.
    • (2012) ICML
    • Silver, D.1    Ciosek, K.2
  • 24
    • 84868298774 scopus 로고    scopus 로고
    • Linear options
    • Sorg, J., and Singh, S. P. 2010. Linear options. In AAMAS, 31-38.
    • (2010) AAMAS , pp. 31-38
    • Sorg, J.1    Singh, S.P.2
  • 26
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In NIPS 12. 1057-1063.
    • (2000) NIPS 12 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.A.2    Singh, S.P.3    Mansour, Y.4
  • 27
    • 0033170372 scopus 로고    scopus 로고
    • Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
    • Sutton, R. S.; Precup, D.; and Singh, S. P. 1999. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112 (1-2): 181-211.
    • (1999) Artificial Intelligence , vol.112 , Issue.1-2 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.P.3
  • 29
    • 84919794661 scopus 로고    scopus 로고
    • Bias in natural actor-critic algorithms
    • Thomas, P. 2014. Bias in natural actor-critic algorithms. In ICML, 441-448.
    • (2014) ICML , pp. 441-448
    • Thomas, P.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.