메뉴 건너뛰기




Volumn , Issue , 2016, Pages 4033-4041

Deep exploration via bootstrapped DQN

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER AIDED INSTRUCTION; DEEP LEARNING; REINFORCEMENT LEARNING;

EID: 85019259487     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (1225)

References (27)
  • 2
  • 4
  • 8
    • 84877781573 scopus 로고    scopus 로고
    • Efficient bayes-adaptive reinforcement learning using sample-based search
    • Arthur Guez, David Silver, and Peter Dayan. Efficient bayes-adaptive reinforcement learning using sample-based search. In Advances in Neural Information Processing Systems, pages 1025-1033, 2012.
    • (2012) Advances in Neural Information Processing Systems , pp. 1025-1033
    • Guez, A.1    Silver, D.2    Dayan, P.3
  • 9
    • 77951952841 scopus 로고    scopus 로고
    • Near-optimal regret bounds for reinforcement learning
    • Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11: 1563-1600, 2010.
    • (2010) Journal of Machine Learning Research , vol.11 , pp. 1563-1600
    • Jaksch, T.1    Ortner, R.2    Auer, P.3
  • 11
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1): 4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 12
    • 84924051598 scopus 로고    scopus 로고
    • Mnih. Human-level control through deep reinforcement learning
    • Volodymyr et al. Mnih. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
    • (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
    • Volodymyr1
  • 13
    • 84899019264 scopus 로고    scopus 로고
    • (More) efficient reinforcement learning via posterior sampling
    • Curran Associates, Inc.
    • Ian Osband, Daniel Russo, and Benjamin Van Roy. (More) efficient reinforcement learning via posterior sampling. In NIPS, pages 3003-3011. Curran Associates, Inc., 2013.
    • (2013) NIPS , pp. 3003-3011
    • Osband, I.1    Russo, D.2    Van Roy, B.3
  • 17
    • 84863522108 scopus 로고    scopus 로고
    • Bootstrapping data arrays of arbitrary order
    • Art B Owen, Dean Eckles, et al. Bootstrapping data arrays of arbitrary order. The Annals of Applied Statistics, 6(3): 895-927, 2012.
    • (2012) The Annals of Applied Statistics , vol.6 , Issue.3 , pp. 895-927
    • Owen, A.B.1    Eckles, D.2
  • 21
    • 14344258433 scopus 로고    scopus 로고
    • A Bayesian framework for reinforcement learning
    • Malcolm J. A. Strens. A bayesian framework for reinforcement learning. In ICML, pages 943-950, 2000.
    • (2000) ICML , pp. 943-950
    • Strens, M.J.A.1
  • 23
    • 0029276036 scopus 로고
    • Temporal difference learning and td-gammon
    • Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3): 58-68, 1995.
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 24
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • W.R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4): 285-294, 1933.
    • (1933) Biometrika , vol.25 , Issue.3-4 , pp. 285-294
    • Thompson, W.R.1
  • 27
    • 84899020590 scopus 로고    scopus 로고
    • Efficient exploration and value function generalization in deterministic systems
    • Zheng Wen and Benjamin Van Roy. Efficient exploration and value function generalization in deterministic systems. In NIPS, pages 3021-3029, 2013.
    • (2013) NIPS , pp. 3021-3029
    • Wen, Z.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.