SCOPUS 정보 검색 플랫폼

Volumn , Issue , 2016, Pages

Actor-mimic deep multitask and transfer reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

AUTONOMOUS AGENTS; MACHINE LEARNING; REINFORCEMENT LEARNING;

MODEL COMPRESSION; MULTIPLE TASKS; POLICY NETWORKS; TESTING ENVIRONMENT; TRANSFER LEARNING;

DEEP LEARNING;

EID: 85083953433 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (262)

References (19)

1
- 84937961091
- Do deep nets really need to be deep?
- Ba, Jimmy and Caruana, Rich. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, pp. 2654–2662, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 2654-2662
- Ba, J.¹ Caruana, R.²

2
- 84880904080
- General game learning using knowledge transfer
- Banerjee, Bikramjit and Stone, Peter. General game learning using knowledge transfer. In International Joint Conferences on Artificial Intelligence, pp. 672–677, 2007.
- (2007) International Joint Conferences on Artificial Intelligence , pp. 672-677
- Banerjee, B.¹ Stone, P.²

3
- 84879976780
- The arcade learning environment: An evaluation platform for general agents
- Bellemare, Marc G., Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47:253–279, 2013.
- (2013) Journal of Artificial Intelligence Research , vol.47 , pp. 253-279
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

4
- 0003565783
- Athena Scientific Belmont, MA
- Bertsekas, Dimitri P. Dynamic programming and optimal control, volume 1. Athena Scientific Belmont, MA, 1995.
- (1995) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

5
- 84937779024
- Deep learning for real-time atari game play using offline monte-carlo tree search planning
- Guo, Xiaoxiao, Singh, Satinder, Lee, Honglak, Lewis, Richard L, and Wang, Xiaoshi. Deep learning for real-time atari game play using offline monte-carlo tree search planning. In Advances in Neural Information Processing Systems 27, pp. 3338–3346, 2014.
- (2014) Advances in Neural Information Processing Systems , vol.27 , pp. 3338-3346
- Guo, X.¹ Singh, S.² Lee, H.³ Lewis, R.L.⁴ Wang, X.⁵

6
- 84959176782
- arXiv preprint
- Hinton, Geoffrey, Vinyals, Oriol, and Dean, Jeff. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
- (2015) Distilling the Knowledge in A Neural Network
- Hinton, G.¹ Vinyals, O.² Dean, J.³

7
- 85083951076
- ADaM: A method for stochastic optimization
- Kingma, Diederik P. and Ba, Jimmy. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
- (2015) International Conference on Learning Representations
- Kingma, D.P.¹ Ba, J.²

8
- 34250719248
- Autonomous shaping: Knowledge transfer in reinforcement learning
- Konidaris, George and Barto, Andrew G. Autonomous shaping: Knowledge transfer in reinforcement learning. In Proceedings of the 23rd international conference on Machine learning, pp. 489–496, 2006.
- (2006) Proceedings of the 23rd International Conference on Machine Learning , pp. 489-496
- Konidaris, G.¹ Barto, A.G.²

9
- 84897529781
- Guided policy search
- Levine, Sergey and Koltun, Vladlen. Guided policy search. In Proceedings of the 30th international conference on Machine Learning, 2013.
- (2013) Proceedings of the 30th International Conference on Machine Learning
- Levine, S.¹ Koltun, V.²

10
- 85028018890
- End-to-end training of deep visuomotor policies
- Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. CoRR, abs/1504.00702, 2015.
- (2015) CoRR
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

11
- 85007167143
- Continuous control with deep reinforcement learning
- Lillicrap, Timothy P., Hunt, Jonathan J., Pritzel, Alexander, Heess, Nicholas, Erez, Tom, Tassa, Yuval, Silver, David, and Wierstra, Daan. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2015.
- (2015) CoRR
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

12
- 84924051598
- Human-level control through deep reinforcement learning
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A., Veness, Joel, Bellemare, Marc G., Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K., Ostrovski, Georg, Petersen, Stig, Beattie, Charles, Sadik, Amir, Antonoglou, Ioannis, King, Helen, Kumaran, Dharshan, Wierstra, Daan, Legg, Shane, and Hassabis, Demis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

13
- 22944468429
- A convergent form of approximate policy iteration
- Perkins, Theodore J and Precup, Doina. A convergent form of approximate policy iteration. In Advances in neural information processing systems, pp. 1595–1602, 2002.
- (2002) Advances in Neural Information Processing Systems , pp. 1595-1602
- Perkins, T.J.¹ Precup, D.²

14
- 0000016172
- A stochastic approximation method
- Robbins, Herbert and Monro, Sutton. A stochastic approximation method. The annals of mathematical statistics, pp. 400–407, 1951.
- (1951) The Annals of Mathematical Statistics , pp. 400-407
- Robbins, H.¹ Monro, S.²

15
- 85083953559
- Fitnets: Hints for thin deep nets
- Romero, Adriana, Ballas, Nicolas, Kahou, Samira Ebrahimi, Chassang, Antoine, Gatta, Carlo, and Bengio, Yoshua. Fitnets: Hints for thin deep nets. In International Conference on Learning Representations, 2015.
- (2015) International Conference on Learning Representations
- Romero, A.¹ Ballas, N.² Kahou, S.E.³ Chassang, A.⁴ Gatta, C.⁵ Bengio, Y.⁶

16
- 84862273266
- A reduction of imitation learning and structured prediction to no-regret online learning
- Ross, Stephane, Gordon, Geoffrey, and Bagnell, Andrew. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research, 15: 627–635, 2011.
- (2011) Journal of Machine Learning Research , vol.15 , pp. 627-635
- Ross, S.¹ Gordon, G.² Bagnell, A.³

17
- 0037886159
- Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains
- Seneta, E. Sensitivity analysis, ergodicity coefficients, and rank-one updates for finite markov chains. Numerical solution of Markov chains, 8:121–129, 1991.
- (1991) Numerical Solution of Markov Chains , vol.8 , pp. 121-129
- Seneta, E.¹

18
- 0004102479
- MIT Press Cambridge
- Sutton, Richard S. and Barto, Andrew G. Reinforcement learning: An introduction. MIT Press Cambridge, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

19
- 68949157375
- Transfer learning for reinforcement learning domains: A survey
- Taylor, Matthew E and Stone, Peter. Transfer learning for reinforcement learning domains: A survey. The Journal of Machine Learning Research, 10:1633–1685, 2009.
- (2009) The Journal of Machine Learning Research , vol.10 , pp. 1633-1685
- Taylor, M.E.¹ Stone, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.