SCOPUS 정보 검색 플랫폼

33rd International Conference on Machine Learning, ICML 2016

Volumn 4, Issue , 2016, Pages 2809-2822

Graying the black box: Understanding DQNs

(3) Zahavy, Tom a Zrihem, Nir Ben a Mannor, Shie a

a TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING SYSTEMS;

BLACK BOXES; DEEP NEURAL NETWORKS;

REINFORCEMENT LEARNING;

EID: 84998679057 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (61)

References (42)

1
- 84879678310
- arXiv preprint arXiv:1207.4708
- Bellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. arXiv preprint arXiv:1207.4708, 2012.
- (2012) The Arcade Learning Environment: An Evaluation Platform for General Agents
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

2
- 84998591851
- arXiv preprint arXiv: 1512.04860
- Bellemare, Marc G, Ostrovski, Georg, Guez, Arthur, Thomas, Philip S, and Munos, Remi. Increasing the action gap: New operators for reinforcement learning. arXiv preprint arXiv: 1512.04860, 2015.
- (2015) Increasing the Action Gap: New Operators for Reinforcement Learning
- Bellemare, M.G.¹ Ostrovski, G.² Guez, A.³ Thomas, P.S.⁴ Munos, R.⁵

3
- 0001234682
- Morgan Kaufmann Publishers
- Dayan, Peter and Hinton, Geoffrey E. Feudal reinforcement learning, pp. 271-271. Morgan Kaufmann Publishers, 1993.
- (1993) Feudal Reinforcement Learning , pp. 271
- Dayan, P.¹ Hinton, G.E.²

4
- 0006424007
- Citeseer
- Dean, Thomas and Lin, Shieu-Hong. Decomposition techniques for planning in stochastic domains. Citeseer, 1995.
- (1995) Decomposition Techniques for Planning in Stochastic Domains
- Dean, T.¹ Lin, S.-H.²

5
- 0002278788
- Hierarchical reinforcement learning with the MAXQ value function decomposition
- Dietterich, Thomas G. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res.(JAIR), 13:227-303, 2000.
- (2000) J. Artif. Intell. Res.(JAIR) , vol.13 , pp. 227-303
- Dietterich, T.G.¹

6
- 84998993668
- Learning embedded maps of markov processes
- Citeseer
- Engel, Yaakov and Mannor, Shie. Learning embedded maps of markov processes. In in Proceedings of ICML 2001. Citeseer, 2001.
- (2001) Proceedings of ICML 2001
- Engel, Y.¹ Mannor, S.²

7
- 77949524387
- Dept. IRO, Universite de Montreal, Tech. Rep, 4323
- Erhan, Dumitru, Bengio, Yoshua, Courville, Aaron, and Vincent, Pascal. Visualizing higher-layer features of a deep network. Dept. IRO, Universite de Montreal, Tech. Rep, 4323, 2009.
- (2009) Visualizing Higher-layer Features of a Deep Network
- Erhan, D.¹ Bengio, Y.² Courville, A.³ Vincent, P.⁴

8
- 0038595393
- Gordon, Geoffrey J. Stable function approximation in dynamic programming. 1995.
- (1995) Stable Function Approximation in Dynamic Programming
- Gordon, G.J.¹

9
- 0006419533
- Hierarchical solution of Markov decision processes using macro-actions
- Morgan Kaufmann Publishers Inc
- Hauskrecht, Milos, Meuleau, Nicolas, Kaelbling, Leslie Pack, Dean, Thomas, and Boutilier, Craig. Hierarchical solution of Markov decision processes using macro-actions. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pp. 220-229. Morgan Kaufmann Publishers Inc., 1998.
- (1998) Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence , pp. 220-229
- Hauskrecht, M.¹ Meuleau, N.² Kaelbling, L.P.³ Dean, T.⁴ Boutilier, C.⁵

10
- 84943767635
- arXiv preprint arXiv:1504.00702
- Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702, 2015.
- (2015) End-to-end Training of Deep Visuomotor Policies
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

11
- 0003673017
- Technical report, DTIC Document
- Lin, Long-Ji. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993.
- (1993) Reinforcement Learning for Robots Using Neural Networks
- Lin, L.-J.¹

12
- 85032751123
- Manifold-learning-based feature extraction for classification of hyperspectral data: A review of advances in manifold learning
- IEEE
- Lunga, Dalton, Prasad, Santasriya, Crawford, Melba M, and Ersoy, Ozan. Manifold-learning-based feature extraction for classification of hyperspectral data: A review of advances in manifold learning. Signal Processing Magazine, IEEE, 31(1):55-66, 2014.
- (2014) Signal Processing Magazine , vol.31 , Issue.1 , pp. 55-66
- Lunga, D.¹ Prasad, S.² Crawford, M.M.³ Ersoy, O.⁴

13
- 84938498958
- Approximate value iteration with temporally extended actions
- Mann, Timothy A, Mannor, Shie, and Precup, Doina. Approximate value iteration with temporally extended actions. Journal of Artificial Intelligence Research, 53(1): 375-438, 2015.
- (2015) Journal of Artificial Intelligence Research , vol.53 , Issue.1 , pp. 375-438
- Mann, T.A.¹ Mannor, S.² Precup, D.³

14
- 14344250635
- Dynamic abstraction in reinforcement learning via clustering
- ACM
- Mannor, Shie, Menache, Ishai, Hoze, Amit, and Klein, Uri. Dynamic abstraction in reinforcement learning via clustering. In Proceedings of the twenty-first international conference on Machine learning, pp. 71. ACM, 2004.
- (2004) Proceedings of the Twenty-first International Conference on Machine Learning , pp. 71
- Mannor, S.¹ Menache, I.² Hoze, A.³ Klein, U.⁴

15
- 84945250000
- Q-cutdynamic discovery of sub-goals in reinforcement learning
- Q, Springer
- Menache, Ishai, Mannor, Shie, and Shimkin, Nahum. Q-cutdynamic discovery of sub-goals in reinforcement learning. In Machine Learning: ECML 2002, pp. 295- 306. Springer, 2002.
- (2002) Machine Learning: ECML 2002 , pp. 295-306
- Menache, I.¹ Mannor, S.² Shimkin, N.³

16
- 84904867557
- arXiv preprint arXiv:1312.5602
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- (2013) Playing Atari with Deep Reinforcement Learning
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.⁷

17
- 84924051598
- Human-level control through deep reinforcement learning
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529- 533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

18
- 84980007683
- arXiv preprint arXiv: 1507.04296
- Nair, Arun, Srinivasan, Praveen, Blackwell, Sam, Alcicek, Cagdas, Fearon, Rory, De Maria, Alessandro, Panneershelvam, Vedavyas, Suleyman, Mustafa, Beattie, Charles, Petersen, Stig, et al. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv: 1507.04296, 2015.
- (2015) Massively Parallel Methods for Deep Reinforcement Learning
- Nair, A.¹ Srinivasan, P.² Blackwell, S.³ Alcicek, C.⁴ Fearon, R.⁵ De Maria, A.⁶ Panneershelvam, V.⁷ Suleyman, M.⁸ Beattie, C.⁹ Petersen, S.¹⁰

19
- 84932160299
- arXiv preprint arXiv:1412.1897
- Nguyen, Anh, Yosinski, Jason, and Clune, Jeff. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. arXiv preprint arXiv:1412.1897, 2014.
- (2014) Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
- Nguyen, A.¹ Yosinski, J.² Clune, J.³

20
- 84980041074
- Parisotto, Emilio, Ba, Jimmy Lei, and Salakhutdinov, Ruslan. Actor-mimic: Deep multitask and transfer reinforcement learning, 2015.
- (2015) Actor-mimic: Deep Multitask and Transfer Reinforcement Learning
- Parisotto, E.¹ Ba, J.L.² Salakhutdinov, R.³

21
- 0346738900
- Flexible decomposition algorithms for weakly coupled Markov decision problems
- Morgan Kaufmann Publishers Inc
- Parr, Ronald. Flexible decomposition algorithms for weakly coupled Markov decision problems. In Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence, pp. 422-430. Morgan Kaufmann Publishers Inc., 1998.
- (1998) Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence , pp. 422-430
- Parr, R.¹

22
- 21344435992
- Invariant visual representation by single neurons in the human brain
- Quiroga, R Quian, Reddy, Leila, Kreiman, Gabriel, Koch, Christof, and Fried, Itzhak. Invariant visual representation by single neurons in the human brain. Nature, 435 (7045): 1102-1107, 2005.
- (2005) Nature , vol.435 , Issue.7045 , pp. 1102-1107
- Quiroga, R.Q.¹ Reddy, L.² Kreiman, G.³ Koch, C.⁴ Fried, I.⁵

23
- 79952580566
- 3D visualization of scientific data
- Ramachandran, P. and Varoquaux, G. Mayavi: 3D Visualization of Scientific Data. Computing in Science & Engineering, 13(2):40-51, 2011. ISSN1521-9615.
- (2011) Computing in Science & Engineering , vol.13 , Issue.2 , pp. 40-51
- Ramachandran, P.¹ Varoquaux, G.M.²

24
- 33646398129
- Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
- Springer
- Riedmiller, Martin. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005, pp. 317- 328. Springer, 2005.
- (2005) Machine Learning: ECML 2005 , pp. 317-328
- Riedmiller, M.¹

25
- 84980003817
- Rusu, Andrei A., Colmenarejo, Sergio Gomez, Gulcehre, Caglar, Desjardins, Guillaume, Kirkpatrick, James, Pascanu, Razvan, Mnih, Volodymyr, Kavukcuoglu, Koray, and Hadsell, Raia. Policy distillation, 2015.
- (2015) Policy Distillation
- Rusu, A.A.¹ Colmenarejo, S.G.² Gulcehre, C.³ Desjardins, G.⁴ Kirkpatrick, J.⁵ Pascanu, R.⁶ Mnih, V.⁷ Kavukcuoglu, K.⁸ Hadsell, R.⁹

26
- 84980041049
- arXiv preprint arXiv:1511.05952
- Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
- (2015) Prioritized Experience Replay
- Schaul, T.¹ Quan, J.² Antonoglou, I.³ Silver, D.⁴

27
- 84925410541
- arXiv preprint arXiv:1409.1556
- Simonyan, Karen and Zisserman, Andrew. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

28
- 31844447221
- Identifying useful subgoals in reinforcement learning by local graph partitioning
- ACM
- Simsek, Ozgur, Wolfe, Alicia P, and Barto, Andrew G. Identifying useful subgoals in reinforcement learning by local graph partitioning. In Proceedings of the 22nd international conference on Machine learning, pp. 816- 823. ACM, 2005.
- (2005) Proceedings of the 22nd International Conference on Machine Learning , pp. 816-823
- Simsek, O.¹ Wolfe, A.P.² Barto, A.G.³

29
- 85153965130
- Reinforcement learning with soft state aggregation
- Singh, Satinder P, Jaakkola, Tommi, and Jordan, Michael I. Reinforcement learning with soft state aggregation. Advances in neural information processing systems, pp. 361-368, 1995.
- (1995) Advances in Neural Information Processing Systems , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

30
- 34250746921
- PhD thesis, McGill University
- Stolle, Martin. Automated discovery of options in reinforcement learning. PhD thesis, McGill University, 2004.
- (2004) Automated Discovery of Options in Reinforcement Learning
- Stolle, M.¹

31
- 0033170372
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112( 1): 181-211, 1999.
- (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

32
- 84925331214
- arXiv preprint arXiv: 1312.6199
- Szegedy, Christian, Zaremba, Wojciech, Sutskever, Ilya, Bruna, Joan, Erhan, Dumitru, Goodfellow, Ian, and Fergus, Rob. Intriguing properties of neural networks. arXiv preprint arXiv: 1312.6199, 2013.
- (2013) Intriguing Properties of Neural Networks
- Szegedy, C.¹ Zaremba, W.² Sutskever, I.³ Bruna, J.⁴ Erhan, D.⁵ Goodfellow, I.⁶ Fergus, R.⁷

33
- 0034704229
- A global geometric framework for nonlinear dimensionality reduction
- Tenenbaum, Joshua B, De Silva, Vin, and Langford, John C. A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500):2319-2323, 2000.
- (2000) Science , vol.290 , Issue.5500 , pp. 2319-2323
- Tenenbaum, J.B.¹ De Silva, V.² Langford, J.C.³

34
- 0029276036
- Temporal difference learning and TD- Gammon
- Tesauro, Gerald. Temporal difference learning and TD- Gammon. Communications of the ACM, 38(3):58-68, 1995.
- (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

35
- 0031998630
- Learning metric-topological maps for indoor mobile robot navigation
- Thrun, Sebastian. Learning metric-topological maps for indoor mobile robot navigation. Artificial Intelligence, 99(1):21-71, 1998.
- (1998) Artificial Intelligence , vol.99 , Issue.1 , pp. 21-71
- Thrun, S.¹

36
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, John N and Van Roy, Benjamin. An analysis of temporal-difference learning with function approximation. Automatic Control, IEEE Transactions on, 42(5): 674-690, 1997.
- (1997) Automatic Control, IEEE Transactions on , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

37
- 84919775831
- Accelerating t-SNE using tree- based algorithms
- Van Der Maaten, Laurens. Accelerating t-SNE using tree- based algorithms. The Journal of Machine Learning Research, 15(1):3221-3245, 2014.
- (2014) The Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 3221-3245
- Van Der Maaten, L.¹

38
- 57249084011
- Visualizing data using t-SNE
- 2579-2605
- Van der Maaten, Laurens and Hinton, Geoffrey. Visualizing data using t-SNE. Journal of Machine Learning Research, 9(2579-2605):85, 2008.
- (2008) Journal of Machine Learning Research , vol.9 , pp. 85
- Van Der Maaten, L.¹ Hinton, G.²

39
- 84980007690
- arXiv preprint arXiv: 1509.06461
- Van Hasselt, Hado, Guez, Arthur, and Silver, David. Deep reinforcement learning with double q-learning. arXiv preprint arXiv: 1509.06461, 2015.
- (2015) Deep Reinforcement Learning with Double Q-learning
- Van Hasselt, H.¹ Guez, A.² Silver, D.³

40
- 84998595997
- arXiv preprint arXiv: 1511.06581
- Wang, Ziyu, de Freitas, Nando, and Lanctot, Marc. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv: 1511.06581, 2015.
- (2015) Dueling Network Architectures for Deep Reinforcement Learning
- Wang, Z.¹ De Freitas, N.² Lanctot, M.³

41
- 84937508363
- How transferable are features in deep neural networks?
- Yosinski, Jason, Clune, Jeff, Bengio, Yoshua, and Lipson, Hod. How transferable are features in deep neural networks? In Advances in Neural Information Processing Systems, pp. 3320-3328, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 3320-3328
- Yosinski, J.¹ Clune, J.² Bengio, Y.³ Lipson, H.⁴

42
- 84906489074
- Visualizing and understanding convolutional networks
- Springer
- Zeiler, Matthew D and Fergus, Rob. Visualizing and understanding convolutional networks. In Computer Vision- ECCV2014, pp. 818-833. Springer, 2014.
- (2014) Computer Vision- ECCV2014 , pp. 818-833
- Zeiler, M.D.¹ Fergus, R.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.