SCOPUS 정보 검색 플랫폼

32nd International Conference on Machine Learning, ICML 2015

Volumn 2, Issue , 2015, Pages 1312-1320

Universal value function approximators

(4) Schaul, Tom a Horgan, Dan a Gregor, Karol a Silver, David a

a DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING ALGORITHMS; LEARNING SYSTEMS;

CORE COMPONENTS; FUNCTION APPROXIMATORS; OBSERVED VALUES; VALUE FUNCTIONS;

REINFORCEMENT LEARNING;

EID: 84969760283 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1188)

References (26)

1
- 84879678310
- arXiv preprint arXiv:1207.4708
- Bellemare, Marc G, Naddaf, Yavar, Veness, Joel, and Bowling, Michael. The arcade learning environment: An evaluation platform for general agents. arXiv preprint arXiv:1207.4708, 2012.
- (2012) The Arcade Learning Environment: An Evaluation Platform for General Agents
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

2
- 0031189914
- Multitask learning
- Caruana, Rich. Multitask learning. Machine learning, 28 (1):41-75, 1997.
- (1997) Machine Learning , vol.28 , Issue.1 , pp. 41-75
- Caruana, R.¹

3
- 84888340666
- Torch7: A matlab-like environment for machine learning
- Collobert, Ronan, Kavukcuoglu, Koray, and Farabet, Clément. Torch7: A matlab-like environment for machine learning. In Big Learn, NIPS Workshop, 2011a.
- (2011) Big Learn, NIPS Workshop
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

4
- 80053558787
- Natural language processing (almost) from scratch
- Collobert, Ronan, Weston, Jason, Bottou, Léon, Karlen, Michael, Kavukcuoglu, Koray, and Kuksa, Pavel. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493-2537, 2011b.
- (2011) The Journal of Machine Learning Research , vol.12 , pp. 2493-2537
- Collobert, R.¹ Weston, J.² Bottou, L.³ Karlen, M.⁴ Kavukcuoglu, K.⁵ Kuksa, P.⁶

5
- 84908178089
- arXiv preprint arXiv: 1206.6398
- Da Silva, Bruno, Konidaris, George, and Barto, Andrew. Learning parameterized skills. arXiv preprint arXiv: 1206.6398, 2012.
- (2012) Learning Parameterized Skills
- Da Silva, B.¹ Konidaris, G.² Barto, A.³

6
- 84919821063
- Multi-task policy search for robotics
- IEEE
- Deisenroth, Marc Peter, Englert, Peter, Peters, Jan, and Fox, Dieter. Multi-task policy search for robotics. In Robotics and Automation (ICRA), 2014 IEEE International Conference on, pp. 3876-3881. IEEE, 2014.
- (2014) Robotics and Automation (ICRA), 2014 IEEE International Conference on , pp. 3876-3881
- Deisenroth, M.P.¹ Englert, P.² Peters, J.³ Fox, D.⁴

7
- 0036832959
- Structure in the space of value functions
- Foster, David and Dayan, Peter. Structure in the space of value functions. Machine Learning, 49(2-3):325-346, 2002.
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 325-346
- Foster, D.¹ Dayan, P.²

8
- 85143168613
- Hierarchical learning in stochastic domains: Preliminary results
- Kaelbling, Leslie Pack. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the tenth international conference on machine learning, volume 951, pp. 167-173, 1993.
- (1993) Proceedings of the Tenth International Conference on Machine Learning , vol.951 , pp. 167-173
- Kaelbling, L.P.¹

9
- 70449487160
- CoRR, abs/0901.3150
- Keshavan, Raghunandan H., Oh, Sewoong, and Montanari, Andrea. Matrix completion from a few entries. CoRR, abs/0901.3150, 2009.
- (2009) Matrix Completion from A Few Entries
- Keshavan, R.H.¹ Oh, S.² Montanari, A.³

10
- 84941620184
- CoRR, abs/1412.6980
- Kingma, Diederik P. and Ba, Jimmy. Adam: A method for stochastic optimization. CoRR, abs/1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.P.¹ Ba, J.²

11
- 84868358933
- Reinforcement learning to adjust parametrized motor primitives to new situations
- Kober, Jens, Wilhelm, Andreas, Oztop, Erhan, and Peters, Jan. Reinforcement learning to adjust parametrized motor primitives to new situations. Autonomous Robots, 33 (4):361-379, 2012.
- (2012) Autonomous Robots , vol.33 , Issue.4 , pp. 361-379
- Kober, J.¹ Wilhelm, A.² Oztop, E.³ Peters, J.⁴

12
- 84862001711
- Transfer in reinforcement learning via shared features
- Konidaris, George, Scheidwasser, Ilya, and Barto, Andrew G. Transfer in reinforcement learning via shared features. The Journal of Machine Learning Research, 13 (1): 1333-1371, 2012.
- (2012) The Journal of Machine Learning Research , vol.13 , Issue.1 , pp. 1333-1371
- Konidaris, G.¹ Scheidwasser, I.² Barto, A.G.³

13
- 84898956512
- Distributed representations of words and phrases and their compositionality
- Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S, and Dean, Jeff. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems, pp. 3111-3119, 2013.
- (2013) Advances in Neural Information Processing Systems , pp. 3111-3119
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

14
- 84904867557
- arXiv preprint arXiv:1312.5602
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- (2013) Playing Atari with Deep Reinforcement Learning
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.⁷

15
- 84896357393
- Multi-timescale nexting in a reinforcement learning robot
- Modayil, Joseph, White, Adam, and Sutton, Richard S. Multi-timescale nexting in a reinforcement learning robot. Adaptive Behavior, 22(2): 146-160, 2014.
- (2014) Adaptive Behavior , vol.22 , Issue.2 , pp. 146-160
- Modayil, J.¹ White, A.² Sutton, R.S.³

16
- 4644328593
- Off-policy temporal-difference learning with function approximation
- Citeseer
- Precup, Doina, Sutton, Richard S, and Dasgupta, Sanjoy. Off-policy temporal-difference learning with function approximation. In ICML, pp. 417-424. Citeseer, 2001.
- (2001) ICML , pp. 417-424
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

17
- 0003588579
- PhD thesis, University of Texas at Austin
- Ring, Mark Bishop. Continual Learning in Reinforcement Environments. PhD thesis, University of Texas at Austin, 1994.
- (1994) Continual Learning in Reinforcement Environments
- Ring, M.B.¹

18
- 84896063904
- Better generalization with forecasts
- AAAI Press
- Schaul, Tom and Ring, Mark. Better generalization with forecasts. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence, pp. 1656-1662. AAAI Press, 2013.
- (2013) Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence , pp. 1656-1662
- Schaul, T.¹ Ring, M.²

19
- 1942452236
- Learning predictive state representations
- Singh, Satinder, Littman, Michael L, Jong, Nicholas K, Pardoe, David, and Stone, Peter. Learning predictive state representations. In ICML, pp. 712-719, 2003.
- (2003) ICML , pp. 712-719
- Singh, S.¹ Littman, M.L.² Jong, N.K.³ Pardoe, D.⁴ Stone, P.⁵

20
- 0003420416
- MIT Press
- Sutton, Richard S and Barto, Andrew G. Introduction to reinforcement learning. MIT Press, 1998.
- (1998) Introduction to Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

21
- 84899003536
- Temporal-difference networks
- Saul, L.K., Weiss, Y., and Bottou, L. (eds.), MIT Press
- Sutton, Richard S and Tanner, Brian. Temporal-difference networks. In Saul, L.K., Weiss, Y., and Bottou, L. (eds.), Advances in Neural Information Processing Systems 17, pp. 1377-1384. MIT Press, 2005.
- (2005) Advances in Neural Information Processing Systems , vol.17 , pp. 1377-1384
- Sutton, R.S.¹ Tanner, B.²

22
- 0033170372
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
- Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1): 181-211, 1999.
- (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

23
- 84899464022
- Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
- Sutton, Richard S, Modayil, Joseph, Delp, Michael, De-gris, Thomas, Pilarski, Patrick M, White, Adam, and Precup, Doina. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 761-768, 2011.
- (2011) The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2 , pp. 761-768
- Sutton, R.S.¹ Modayil, J.² Delp, M.³ De-Gris, T.⁴ Pilarski, P.M.⁵ White, A.⁶ Precup, D.⁷

24
- 57249084011
- Visualizing data using t-sne
- Van der Maaten, Laurens and Hinton, Geoffrey. Visualizing data using t-sne. Journal of Machine Learning Research, 9(2579-2605):85, 2008.
- (2008) Journal of Machine Learning Research , vol.9 , Issue.2579-2605 , pp. 85
- Van Der-Maaten, L.¹ Hinton, G.²

25
- 70049091599
- Ios Press
- van Otterlo, Martijn. The logic of adaptive behavior: knowledge representation and algorithms for adaptive sequential decision making under uncertainty in first-order and relational domains, volume 192. Ios Press, 2009.
- (2009) The Logic of Adaptive Behavior: Knowledge Representation and Algorithms for Adaptive Sequential Decision Making under Uncertainty in First-order and Relational Domains , vol.192
- Van Otterlo, M.¹

26
- 84937951926
- Universal option models
- Yao, Hengshuai, Szepesvári, Csaba, Sutton, Richard S, Modayil, Joseph, and Bhatnagar, Shalabh. Universal option models. In Advances in Neural Information Processing Systems, pp. 990-998, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 990-998
- Yao, H.¹ Szepesvári, C.² Sutton, R.S.³ Modayil, J.⁴ Bhatnagar, S.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.