SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn , Issue , 2016, Pages 4033-4041

Deep exploration via bootstrapped DQN

(4) Osband, Ian a,b Blundell, Charles b Pritzel, Alexander b Van Roy, Benjamin a

a STANFORD UNIVERSITY (United States)

b DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER AIDED INSTRUCTION; DEEP LEARNING; REINFORCEMENT LEARNING;

COMPLEX ENVIRONMENTS; DATA REQUIREMENTS; LEARNING ENVIRONMENTS; LEARNING SPEED; PARAMETERIZED; VALUE FUNCTIONS;

DEEP NEURAL NETWORKS;

EID: 85019259487 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1225)

References (27)

1
- 84879678310
- arXiv preprint arXiv: 1207.4708
- Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. arXiv preprint arXiv: 1207.4708, 2012.
- (2012) The Arcade Learning Environment: An Evaluation Platform for General Agents
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

2
- 0000485156
- Some asymptotic theory for the bootstrap
- Peter J Bickel and David A Freedman. Some asymptotic theory for the bootstrap. The Annals of Statistics, pages 1196-1217, 1981.
- (1981) The Annals of Statistics , pp. 1196-1217
- Bickel, P.J.¹ Freedman, D.A.²

3
- 84969752808
- Weight uncertainty in neural networks
- Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural networks. ICML, 2015.
- (2015) ICML
- Blundell, C.¹ Cornebise, J.² Kavukcuoglu, K.³ Wierstra, D.⁴

4
- 84965163561
- Sample complexity of episodic fixed-horizon reinforcement learning
- Christoph Dann and Emma Brunskill. Sample complexity of episodic fixed-horizon reinforcement learning. In Advances in Neural Information Processing Systems, pages 2800-2808, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 2800-2808
- Dann, C.¹ Brunskill, E.²

5
- 0003421415
- SIAM
- Bradley Efron. The jackknife, the bootstrap and other resampling plans, Volume 38. SIAM, 1982.
- (1982) The Jackknife, the Bootstrap and Other Resampling Plans , vol.38
- Efron, B.¹

6
- 0003991665
- CRC press
- Bradley Efron and Robert J Tibshirani. An introduction to the bootstrap. CRC press, 1994.
- (1994) An Introduction to the Bootstrap
- Efron, B.¹ Tibshirani, R.J.²

7
- 84965138919
- arXiv preprint arXiv: 1506.02142
- Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. arXiv preprint arXiv: 1506.02142, 2015.
- (2015) Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
- Gal, Y.¹ Ghahramani, Z.²

8
- 84877781573
- Efficient bayes-adaptive reinforcement learning using sample-based search
- Arthur Guez, David Silver, and Peter Dayan. Efficient bayes-adaptive reinforcement learning using sample-based search. In Advances in Neural Information Processing Systems, pages 1025-1033, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1025-1033
- Guez, A.¹ Silver, D.² Dayan, P.³

9
- 77951952841
- Near-optimal regret bounds for reinforcement learning
- Thomas Jaksch, Ronald Ortner, and Peter Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11: 1563-1600, 2010.
- (2010) Journal of Machine Learning Research , vol.11 , pp. 1563-1600
- Jaksch, T.¹ Ortner, R.² Auer, P.³

10
- 23244466805
- PhD thesis, University College London
- Sham Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.¹

11
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in applied mathematics, 6(1): 4-22, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

12
- 84924051598
- Mnih. Human-level control through deep reinforcement learning
- Volodymyr et al. Mnih. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Volodymyr¹

13
- 84899019264
- (More) efficient reinforcement learning via posterior sampling
- Curran Associates, Inc.
- Ian Osband, Daniel Russo, and Benjamin Van Roy. (More) efficient reinforcement learning via posterior sampling. In NIPS, pages 3003-3011. Curran Associates, Inc., 2013.
- (2013) NIPS , pp. 3003-3011
- Osband, I.¹ Russo, D.² Van Roy, B.³

14
- 84937854661
- Model-based reinforcement learning and the eluder dimension
- Ian Osband and Benjamin Van Roy. Model-based reinforcement learning and the eluder dimension. In Advances in Neural Information Processing Systems, pages 1466-1474, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 1466-1474
- Osband, I.¹ Van Roy, B.²

15
- 84998674139
- arXiv preprint arXiv: 1507.00300
- Ian Osband and Benjamin Van Roy. Bootstrapped thompson sampling and deep exploration. arXiv preprint arXiv: 1507.00300, 2015.
- (2015) Bootstrapped Thompson Sampling and Deep Exploration
- Osband, I.¹ Van Roy, B.²

16
- 84937894074
- arXiv preprint arXiv: 1402.0635
- Ian Osband, Benjamin Van Roy, and Zheng Wen. Generalization and exploration via randomized value functions. arXiv preprint arXiv: 1402.0635, 2014.
- (2014) Generalization and Exploration Via Randomized Value Functions
- Osband, I.¹ Van Roy, B.² Wen, Z.³

17
- 84863522108
- Bootstrapping data arrays of arbitrary order
- Art B Owen, Dean Eckles, et al. Bootstrapping data arrays of arbitrary order. The Annals of Applied Statistics, 6(3): 895-927, 2012.
- (2012) The Annals of Applied Statistics , vol.6 , Issue.3 , pp. 895-927
- Owen, A.B.¹ Eckles, D.²

18
- 84980041049
- arXiv preprint arXiv: 1511.05952
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv: 1511.05952, 2015.
- (2015) Prioritized Experience Replay
- Schaul, T.¹ Quan, J.² Antonoglou, I.³ Silver, D.⁴

19
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1): 1929-1958, 2014.
- (2014) The Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

20
- 84959023524
- arXiv preprint arXiv: 1507.00814
- Bradly C Stadie, Sergey Levine, and Pieter Abbeel. Incentivizing exploration in reinforcement learning with deep predictive models. arXiv preprint arXiv: 1507.00814, 2015.
- (2015) Incentivizing Exploration in Reinforcement Learning with Deep Predictive Models
- Stadie, B.C.¹ Levine, S.² Abbeel, P.³

21
- 14344258433
- A Bayesian framework for reinforcement learning
- Malcolm J. A. Strens. A bayesian framework for reinforcement learning. In ICML, pages 943-950, 2000.
- (2000) ICML , pp. 943-950
- Strens, M.J.A.¹

22
- 0004102479
- MIT Press, March
- Richard Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, March 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

23
- 0029276036
- Temporal difference learning and td-gammon
- Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3): 58-68, 1995.
- (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

24
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- W.R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4): 285-294, 1933.
- (1933) Biometrika , vol.25 , Issue.3-4 , pp. 285-294
- Thompson, W.R.¹

25
- 84980007690
- arXiv preprint arXiv: 1509.06461
- Hado Van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. arXiv preprint arXiv: 1509.06461, 2015.
- (2015) Deep Reinforcement Learning with Double Q-learning
- Van Hasselt, H.¹ Guez, A.² Silver, D.³

26
- 84998595997
- arXiv preprint arXiv: 1511.06581
- Ziyu Wang, Nando de Freitas, and Marc Lanctot. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv: 1511.06581, 2015.
- (2015) Dueling Network Architectures for Deep Reinforcement Learning
- Wang, Z.¹ De Freitas, N.² Lanctot, M.³

27
- 84899020590
- Efficient exploration and value function generalization in deterministic systems
- Zheng Wen and Benjamin Van Roy. Efficient exploration and value function generalization in deterministic systems. In NIPS, pages 3021-3029, 2013.
- (2013) NIPS , pp. 3021-3029
- Wen, Z.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.