SCOPUS 정보 검색 플랫폼

33rd International Conference on Machine Learning, ICML 2016

Volumn 3, Issue , 2016, Pages 2001-2014

Benchmarking deep reinforcement learning for continuous control

(5) Duan, Yan a Chen, Xi a Houthooft, Rein a,b Schulman, John a,c Abbeel, Pieter a

a UNIVERSITY OF CALIFORNIA (United States)

b GHENT UNIVERSITY (Belgium)

c Open AI (Belgium)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING ALGORITHMS; LEARNING SYSTEMS; PERSONNEL TRAINING;

BENCHMARK SUITES; CONTINUOUS CONTROL; FEATURE REPRESENTATION; HIERARCHICAL STRUCTURES; PARTIAL OBSERVATION; REFERENCE IMPLEMENTATION; REPRODUCIBILITIES; SYSTEMATIC EVALUATION;

REINFORCEMENT LEARNING;

EID: 84999018287 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (660)

References (84)

1
- 84998738558
- Abeyruwan, S. RLLib: Lightweight standard and on/off policy reinforcement learning library (C++), http://web.cs.miami. edu/home/saminda/rilib.html, 2013.
- (2013) RLLib: Lightweight Standard and On/off Policy Reinforcement Learning Library (C++)
- Abeyruwan, S.¹

2
- 84858765598
- IJCAI
- Bagnell, J. A. and Schneider, J. Covariant policy search, pp. 1019-1024. IJCAI, 2003.
- (2003) Covariant Policy Search , pp. 1019-1024
- Bagnell, J.A.¹ Schneider, J.²

3
- 84995343329
- Reinforcement learning with long short-term memory
- Bakker, B. Reinforcement learning with long short-term memory. In NIPS, pp. 1475-1482, 2001.
- (2001) NIPS , pp. 1475-1482
- Bakker, B.¹

4
- 84879976780
- The Arcade Learning Environment: An evaluation platform for general agents
- Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253-279, 2013.
- (2013) J. Artif. Intell. Res. , vol.47 , pp. 253-279
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

5
- 85012688561
- Princeton University Press
- Bellman, R. Dynamic Programming. Princeton University Press, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

6
- 0029509952
- Neuro-dynamic programming: An overview
- Bertsekas, Dimitri P and Tsitsiklis, John N. Neuro-dynamic programming: an overview. In CDC, pp. 560-564, 1995.
- (1995) CDC , pp. 560-564
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

7
- 84998737196
- Busoniu, L. ApproxRL: A Matlab toolbox for approximate RL and DP. http://busoniu.net/files/repository/readme-approxrl.html, 2010.
- (2010) ApproxRL: A Matlab Toolbox for Approximate RL and DP
- Busoniu, L.¹

8
- 84887369635
- Catto, E. Box2D: A 2D physics engine for games, 2011.
- (2011) Box2D: A 2D Physics Engine for Games
- Catto, E.¹

9
- 10944228202
- PhD thesis, Institut National Polytechnique de Grenoble-INPG
- Coulom, Remi. Reinforcement learning using neural networks, with applications to motor control. PhD thesis, Institut National Polytechnique de Grenoble-INPG, 2002.
- (2002) Reinforcement Learning Using Neural Networks, with Applications to Motor Control
- Coulom, R.¹

10
- 84899800132
- Policy evaluation with temporal differences: A survey and comparison
- Dann, C, Neumann, G., and Peters, J. Policy evaluation with temporal differences: A survey and comparison. J. Mach. Learn. Res., 15(1):809-883, 2014.
- (2014) J. Mach. Learn. Res. , vol.15 , Issue.1 , pp. 809-883
- Dann, C.¹ Neumann, G.² Peters, J.³

11
- 84962234361
- Degris, T., Bechu, J., White, A., Modayil, J., Pilarski, P. M., and Denk, C. RLPark. http://rlpark.github.io, 2013.
- (2013) RLPark
- Degris, T.¹ Bechu, J.² White, A.³ Modayil, J.⁴ Pilarski, P.M.⁵ Denk, C.⁶

12
- 84903590417
- A survey on policy search for robotics, foundations and trends in robotics
- Deisenroth, M. P., Neumann, G., and Peters, J. A survey on policy search for robotics, foundations and trends in robotics. Found. Trends Robotics, 2(1-2):1-142, 2013.
- (2013) Found. Trends Robotics , vol.2 , Issue.1-2 , pp. 1-142
- Deisenroth, M.P.¹ Neumann, G.² Peters, J.³

13
- 0028605089
- Swinging up the Acrobot: An example of intelligent control
- DeJong, G. and Spong, M. W. Swinging up the Acrobot: An example of intelligent control. In ACC, pp. 2158-2162, 1994.
- (1994) ACC , pp. 2158-2162
- DeJong, G.¹ Spong, M.W.²

14
- 85198028989
- ImageNet: A large-scale hierarchical image database
- Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In CVPR, pp. 248-255, 2009.
- (2009) CVPR , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

15
- 0002278788
- Hierarchical reinforcement learning with the MAXQ value function decomposition
- Dietterich, T. G. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res, 13:227-303, 2000.
- (2000) J. Artif. Intell. Res , vol.13 , pp. 227-303
- Dietterich, T.G.¹

16
- 84896061295
- Dimitrakakis, C, Tziortziotis, N., and Tossou, A. Beliefbox: A framework for statistical methods in sequential decision making. http://code.google.eom/p/beliefbox/, 2007.
- (2007) Beliefbox: A Framework for Statistical Methods in Sequential Decision Making
- Dimitrakakis, C.¹ Tziortziotis, N.² Tossou, A.³

17
- 84907654870
- The reinforcement learning competition 2014
- Dimitrakakis, Christos, Li, Guangliang, and Tziortziotis, Nikoalos. The reinforcement learning competition 2014. AI Magazine, 35(3):61-65, 2014.
- (2014) AI Magazine , vol.35 , Issue.3 , pp. 61-65
- Dimitrakakis, C.¹ Li, G.² Tziortziotis, N.³

18
- 0348221772
- Error decorrelation: A technique for matching a class of functions
- Donaldson, P. E. K. Error decorrelation: a technique for matching a class of functions. In Proc. 3th Intl. Conf. Medical Electronics, pp. 173-178, 1960.
- (1960) Proc. 3th Intl. Conf. Medical Electronics , pp. 173-178
- Donaldson, P.E.K.¹

19
- 0033629916
- Reinforcement learning in continuous time and space
- Doya, K. Reinforcement learning in continuous time and space. Neural Comput., 12(1):219-245, 2000.
- (2000) Neural Comput. , vol.12 , Issue.1 , pp. 219-245
- Doya, K.¹

20
- 70349417489
- Reinforcement learning benchmarks and bake-offs II
- Dutech, Alain, Edmunds, Timothy, Kok, Jelle, Lagoudakis, Michail, Littman, Michael, Riedmiller, Martin, Russell, Bryan, Scherrer, Bruno, Sutton, Richard, Timmer, Stephan, et al. Reinforcement learning benchmarks and bake-offs ii. Advances in Neural Information Processing Systems (NIPS), 17, 2005.
- (2005) Advances in Neural Information Processing Systems (NIPS) , vol.17
- Dutech, A.¹ Edmunds, T.² Kok, J.³ Lagoudakis, M.⁴ Littman, M.⁵ Riedmiller, M.⁶ Russell, B.⁷ Scherrer, B.⁸ Sutton, R.⁹ Timmer, S.¹⁰

21
- 84891539169
- Infinite horizon model predictive control for nonlinear periodic tasks
- Erez, Tom, Tassa, Yuval, and Todorov. Emanuel. Infinite horizon model predictive control for nonlinear periodic tasks. Manuscript under review, 4, 2011.
- (2011) Manuscript Under Review , pp. 4
- Erez, T.¹ Tassa, Y.² Emanuel, T.³

22
- 77951298115
- The pascal visual object classes (VOC) challenge
- Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision, 88(2):303-338, 2010.
- (2010) Int. J. Comput. Vision , vol.88 , Issue.2 , pp. 303-338
- Everingham, M.¹ Van Gool, L.² Williams, C.K.I.³ Winn, J.⁴ Zisserman, A.⁵

23
- 33144466753
- One-shot learning of object categories
- Fei-Fei, L., Fergus, R., and Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell, 28(4):594-611, 2006.
- (2006) IEEE Trans. Pattern Anal. Mach. Intell , vol.28 , Issue.4 , pp. 594-611
- Fei-Fei, L.¹ Fergus, R.² Perona, P.³

24
- 0017943442
- Computer control of a double inverted pendulum
- Furuta, K., Okutani, T., and Sone, H. Computer control of a double inverted pendulum. Comput. Electr. Eng., 5(1):67-84, 1978.
- (1978) Comput. Electr. Eng. , vol.5 , Issue.1 , pp. 67-84
- Furuta, K.¹ Okutani, T.² Sone, H.³

25
- 6344222337
- DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1
- Garofolo, J. S., Lamel, L. R, Fisher, W. M., Fiscus, J. G., and Pallett, D. S. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N, 93, 1993.
- (1993) NASA STI/Recon Technical Report N, 93
- Garofolo, J.S.¹ Lamel, L.R.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵

26
- 85016587886
- SWITCH-BOARD: Telephone speech corpus for research and development
- Godfrey, J. J., Holliman, E. C, and McDaniel, J. SWITCH-BOARD: Telephone speech corpus for research and development. In ICASSP, pp. 517-520, 1992.
- (1992) ICASSP , pp. 517-520
- Godfrey, J.J.¹ Holliman, E.C.² McDaniel, J.³

27
- 15744370759
- 2-d pole balancing with recurrent evolutionary networks
- Gomez, F. and Miikkulainen, R. 2-d pole balancing with recurrent evolutionary networks. In ICANN, pp. 425-430. 1998.
- (1998) ICANN , pp. 425-430
- Gomez, F.¹ Miikkulainen, R.²

28
- 84937779024
- Deep learning for real-time Atari game play using offline montecarlo tree search planning
- Guo, X., Singh, S., Lee, H., Lewis, R. L., and Wang, X. Deep learning for real-time Atari game play using offline montecarlo tree search planning. In NIPS, pp. 3338-3346. 2014.
- (2014) NIPS , pp. 3338-3346
- Guo, X.¹ Singh, S.² Lee, H.³ Lewis, R.L.⁴ Wang, X.⁵

29
- 0035377566
- Completely derandomized self-adaptation in evolution strategies
- Hansen, N. and Ostermeier, A. Completely derandomized self-adaptation in evolution strategies. Evol. Comput., 9(2):159-195, 2001.
- (2001) Evol. Comput. , vol.9 , Issue.2 , pp. 159-195
- Hansen, N.¹ Ostermeier, A.²

30
- 84998919856
- arXiv: 1512.04455
- Heess, N., Hunt, J., Lillicrap, T., and Silver, D. Memory-based control with recurrent neural networks. arXiv: 1512.04455, 2015a.
- (2015) Memory-based Control with Recurrent Neural Networks
- Heess, N.¹ Hunt, J.² Lillicrap, T.³ Silver, D.⁴

31
- 84965103751
- Learning continuous control policies by stochastic value gradients
- Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T, and Tassa, T. Learning continuous control policies by stochastic value gradients. In NIPS, pp. 2926-2934. 2015b.
- (2015) NIPS , pp. 2926-2934
- Heess, N.¹ Wayne, G.² Silver, D.³ Lillicrap, T.⁴ Erez, T.⁵ Tassa, T.⁶

32
- 84905454735
- The open-source TEXPLORE code release for reinforcement learning on robots
- Hester, T. and Stone, P. The open-source TEXPLORE code release for reinforcement learning on robots. In RoboCup 2013: Robot World Cup XVII, pp. 536-543. 2013.
- (2013) RoboCup 2013: Robot World Cup XVII , pp. 536-543
- Hester, T.¹ Stone, P.²

33
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- Hinton, G., Deng, L., Yu, D., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Dahl, T. S. G., and Kingsbury, B. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag, 29(6):82-97, 2012.
- (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.D.¹ Deng, L.² Yu, D.³ Mohamed, A.-R.⁴ Jaitly, N.⁵ Senior, A.⁶ Vanhoucke, V.⁷ Nguyen, P.⁸ Dahl, T.S.G.⁹ Kingsbury, B.¹⁰

34
- 4544279104
- The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Hirsch, H.-G. and Pearce, D. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In ASR2000-Automatic Speech Recognition: Challenges for the new Millenium ISCA Tutorial and Research Workshop (ITRW), 2000.
- (2000) ASR2000-automatic Speech Recognition: Challenges for the New Millenium ISCA Tutorial and Research Workshop (ITRW)
- Hirsch, H.-G.¹ Pearce, D.²

35
- 0031573117
- Long short-term memory
- Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural Comput., 9(8):1735-1780, 1997.
- (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

36
- 84898930479
- A natural policy gradient
- Kakade, S. M. A natural policy gradient. In NIPS, pp. 1531-1538. 2002.
- (2002) NIPS , pp. 1531-1538
- Kakade, S.M.¹

37
- 84998584769
- Stochastic real-valued reinforcement learning to solve a nonlinear control problem
- Kimura, H. and Kobayashi, S. Stochastic real-valued reinforcement learning to solve a nonlinear control problem. In IEEE SMC, pp. 510-515, 1999.
- (1999) IEEE SMC , pp. 510-515
- Kimura, H.¹ Kobayashi, S.²

38
- 84858754385
- Policy search for motor primitives in robotics
- Kober, J. and Peters, J. Policy search for motor primitives in robotics. In NIPS, pp. 849-856, 2009.
- (2009) NIPS , pp. 849-856
- Kober, J.¹ Peters, J.²

39
- 84962254922
- Kochenderfer, M. JRLF: Java reinforcement learning framework. http://mykel.kochenderfer.com/jrlf, 2006.
- (2006) JRLF: Java Reinforcement Learning Framework
- Kochenderfer, M.¹

40
- 77956002520
- Technical report
- Krizhevsky, A. and Hinton, G. Learning multiple layers of features from tiny images. Technical report, 2009.
- (2009) Learning Multiple Layers of Features from Tiny Images
- Krizhevsky, A.¹ Hinton, G.²

41
- 84876231242
- ImageNet classification with deep convolutional neural networks
- Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. In NIPS, pp. 1097-1105. 2012.
- (2012) NIPS , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.³

42
- 6344235947
- LeCun, Y, Cortes, C, and Burges, C. The MNIST database of handwritten digits, 1998.
- (1998) The MNIST Database of Handwritten Digits
- LeCun, Y.¹ Cortes, C.² Burges, C.³

43
- 84998840306
- Guided policy search
- Levine, S. and Koltun, V. Guided policy search. In ICML, pp. 1-9, 2013.
- (2013) ICML , pp. 1-9
- Levine, S.¹ Koltun, V.²

44
- 84943767635
- arXiv: 1504.00702
- Levine, S., Finn, C, Darrell, T, and Abbeel, P. End-to-end training of deep visuomotor policies. arXiv: 1504.00702, 2015.
- (2015) End-to-end Training of Deep Visuomotor Policies
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

45
- 84965135289
- arXiv: 1509.02971
- Lillicrap, T, Hunt, J., Pritzel, A., Heess, N., Erez, T, Tassa, Y, Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv: 1509.02971, 2015.
- (2015) Continuous Control with Deep Reinforcement Learning
- Lillicrap, T.¹ Hunt, J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

46
- 0034850577
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics
- Martin, D., C. Fowlkes, D. Tal, and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, pp. 416-423, 2001.
- (2001) ICCV , pp. 416-423
- Martin, D.¹ Fowlkes, C.² Tal, D.³ Malik, J.⁴

47
- 84962231908
- Metzen, J. M. and Edgington, M. Maja machine learning framework. http://mloss.org/software/view/220/, 2011.
- (2011) Maja Machine Learning Framework
- Metzen, J.M.¹ Edgington, M.²

48
- 0000827179
- BOXES: An experiment in adaptive control
- Michie, D. and Chambers, R. A. BOXES: An experiment in adaptive control. Machine Intelligence, 2:137-152, 1968.
- (1968) Machine Intelligence , vol.2 , pp. 137-152
- Michie, D.¹ Chambers, R.A.²

49
- 84924051598
- Human-level control through deep reinforcement learning
- Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C, Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

50
- 0003442587
- Technical report, University of Cambridge, Computer Laboratory
- Moore, A. Efficient memory-based learning for robot control. Technical report, University of Cambridge, Computer Laboratory, 1990.
- (1990) Efficient Memory-based Learning for Robot Control
- Moore, A.¹

51
- 33645293249
- Technical report, UC Berkeley, EECS Department
- Murray, R. M. and Hauser, J. A case study in approximate linearization: The Acrobot example. Technical report, UC Berkeley, EECS Department, 1991.
- (1991) A Case Study in Approximate Linearization: The Acrobot Example
- Murray, R.M.¹ Hauser, J.²

52
- 84998605266
- 3D balance in legged locomotion: Modeling and simulation for the one-legged case
- Murthy, S. S. and Raibert, M. H. 3D balance in legged locomotion: modeling and simulation for the one-legged case. ACM SIGGRAPH Computer Graphics, 18(1):27-27, 1984.
- (1984) ACM SIGGRAPH Computer Graphics , vol.18 , Issue.1 , pp. 27
- Murthy, S.S.¹ Raibert, M.H.²

53
- 84998622442
- A reinforcement learning toolbox and RL benchmarks for the control of dynamical systems
- Neumann, G. A reinforcement learning toolbox and RL benchmarks for the control of dynamical systems. Dynamical principles for neuroscience and intelligent biomimetic devices, pp. 113, 2006.
- (2006) Dynamical Principles for Neuroscience and Intelligent Biomimetic Devices , pp. 113
- Neumann, G.¹

54
- 84892504975
- Dotrl: A platform for rapid reinforcement learning methods development and validation
- Papis, B. and Wawrzynski, P. dotrl: A platform for rapid reinforcement learning methods development and validation. In FedCSIS, pp. pages 129-136., 2013.
- (2013) FedCSIS , pp. 129-136
- Papis, B.¹ Wawrzynski, P.²

55
- 84898956770
- Reinforcement learning with hierarchies of machines
- Parr, Ronald and Russell, Stuart. Reinforcement learning with hierarchies of machines. Advances in neural information processing systems, pp. 1043-1049, 1998.
- (1998) Advances in Neural Information Processing Systems , pp. 1043-1049
- Parr, R.¹ Russell, S.²

56
- 84998662973
- Peters, J. Policy Gradient Toolbox. http://www.ausy. tu-darmstadt.de/Research/PolicyGradientToolbox, 2002.
- (2002) Policy Gradient Toolbox
- Peters, J.¹

57
- 34547964788
- Reinforcement learning by rewardweighted regression for operational space control
- Peters, J. and Schaal, S. Reinforcement learning by rewardweighted regression for operational space control. In ICML, pp. 745-750, 2007.
- (2007) ICML , pp. 745-750
- Peters, J.¹ Schaal, S.²

58
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- Peters, J. and Schaal, S. Reinforcement learning of motor skills with policy gradients. Neural networks, 21(4):682-697, 2008.
- (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
- Peters, J.¹ Schaal, S.²

59
- 74049092902
- Technical report
- Peters, J., Vijaykumar, S., and Schaal, S. Policy gradient methods for robot control. Technical report, 2003.
- (2003) Policy Gradient Methods for Robot Control
- Peters, J.¹ Vijaykumar, S.² Schaal, S.³

60
- 77958569725
- Relative entropy policy search
- Peters, J., Mulling, K., and Altiin, Y. Relative entropy policy search. In AAAI, pp. 1607-1612, 2010.
- (2010) AAAI , pp. 1607-1612
- Peters, J.¹ Mulling, K.² Altiin, Y.³

61
- 84890506631
- Life at low Reynolds number
- Purcell, E. M. Life at low Reynolds number. Am. J. Phys, 45(1):3-11, 1977.
- (1977) Am. J. Phys , vol.45 , Issue.1 , pp. 3-11
- Purcell, E.M.¹

62
- 84995160894
- Animation of dynamic legged locomotion
- Raibert, M. H. and Hodgins, J. K. Animation of dynamic legged locomotion. In ACM SIGGRAPH Computer Graphics, volume 25, pp. 349-358, 1991.
- (1991) ACM SIGGRAPH Computer Graphics , vol.25 , pp. 349-358
- Raibert, M.H.¹ Hodgins, J.K.²

63
- 84998928274
- Riedmiller, M., Blum, M., and Lampe, T. CLS2: Closed loop simulation system, http://ml.inforniatik.uni-freiburg.de/research/clsquare, 2012.
- (2012) CLS2: Closed Loop Simulation System
- Riedmiller, M.¹ Blum, M.² Lampe, T.³

64
- 0000228665
- The cross-entropy method for combinatorial and continuous optimization
- Rubinstein, R. The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab., 1(2):127-190, 1999.
- (1999) Methodol. Comput. Appl. Probab. , vol.1 , Issue.2 , pp. 127-190
- Rubinstein, R.¹

65
- 38049144999
- Solving partially observable reinforcement learning problems with recurrent neural networks
- Schäfer, A. M. and Udluft, S. Solving partially observable reinforcement learning problems with recurrent neural networks. In ECML Workshops, pp. 71-81, 2005.
- (2005) ECML Workshops , pp. 71-81
- Schäfer, A.M.¹ Udluft, S.²

66
- 77949523247
- Schaul, T, Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., Rückstieß, T, and Schmidhuber, J. PyBrain. J Mach. Learn. Res., 11:743-746, 2010.
- (2010) J Mach. Learn. Res. , vol.11 , pp. 743-746
- Schaul, T.¹ Bayer, J.² Wierstra, D.³ Sun, Y.⁴ Felder, M.⁵ Sehnke, F.⁶ Rückstieß, T.⁷ Schmidhuber, J.P.⁸

67
- 84969963490
- Trust region policy optimization
- Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., and Moritz, P. Trust region policy optimization. In ICML, pp. 1889-1897, 2015a.
- (2015) ICML , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.I.⁴ Moritz, P.⁵

68
- 84993963574
- arXiv: 1506.02438
- Schulman, J., Moritz, P., Levine, S., Jordan, M. I., and Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv: 1506.02438, 2015b.
- (2015) High-dimensional Continuous Control Using Generalized Advantage Estimation
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.I.⁴ Abbeel, P.⁵

69
- 0000450303
- On induced stability
- Stephenson, A. On induced stability. Philos. Mag., 15(86):233-236, 1908.
- (1908) Philos. Mag. , vol.15 , Issue.86 , pp. 233-236
- Stephenson, A.¹

70
- 33746857797
- Keepaway soccer: From machine learning testbed to benchmark
- Springer
- Stone, Peter, Kuhlmann, Gregory, Taylor, Matthew E, and Liu, Yaxin. Keepaway soccer: From machine learning testbed to benchmark. In RoboCup 2005: Robot Soccer World Cup IX, pp. 93-105. Springer, 2005.
- (2005) RoboCup 2005: Robot Soccer World Cup IX , pp. 93-105
- Stone, P.¹ Kuhlmann, G.² Taylor, M.E.³ Liu, Y.⁴

71
- 0033170372
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
- Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1):181-211, 1999.
- (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

72
- 33845344721
- Learning Tetris using the noisy crossentropy method
- Szita, I. and Lorincz, A. Learning Tetris using the noisy crossentropy method. Neural Comput., 18(12):2936-2941, 2006.
- (2006) Neural Comput. , vol.18 , Issue.12 , pp. 2936-2941
- Szita, I.¹ Lorincz, A.²

73
- 0042967671
- ϵ-MDPs: Learning in varying environments
- Szita, I., Takacs, B., and Lorincz, A. ϵ-MDPs: Learning in varying environments. J. Mach. Learn. Res., 3:145-174, 2003.
- (2003) J. Mach. Learn. Res. , vol.3 , pp. 145-174
- Szita, I.¹ Takacs, B.² Lorincz, A.³

74
- 84872363924
- Synthesis and stabilization of complex behaviors through online trajectory optimization
- IEEE
- Tassa, Yuval, Erez, Tom, and Todorov, Emanuel. Synthesis and stabilization of complex behaviors through online trajectory optimization. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 4906-4913. IEEE, 2012.
- (2012) Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on , pp. 4906-4913
- Tassa, Y.¹ Erez, T.² Todorov, E.³

75
- 0029276036
- Temporal difference learning and TD-Gammon
- Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM, 38(3):58-68, 1995.
- (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

76
- 84872292044
- MuJoCo: A physics engine for model-based control
- Todorov, E., Erez, T, and Tassa, Y. MuJoCo: A physics engine for model-based control. In IROS, pp. 5026-5033, 2012.
- (2012) IROS , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

77
- 84954328659
- Learning of nonparametric control policies with high-dimensional state features
- van Hoof, H., Peters, J., and Neumann, G. Learning of nonparametric control policies with high-dimensional state features. In AISTATS, pp. 995-1003, 2015.
- (2015) AISTATS , pp. 995-1003
- Hoof, H.¹ Peters, J.² Neumann, G.³

78
- 84965129327
- Embed to control: A locally linear latent dynamics model for control from raw images
- Watter, M., Springenberg, J., Boedecker, J., and Riedmiller, M. Embed to control: A locally linear latent dynamics model for control from raw images. In NIPS, pp. 2728-2736, 2015.
- (2015) NIPS , pp. 2728-2736
- Watter, M.¹ Springenberg, J.² Boedecker, J.³ Riedmiller, M.⁴

79
- 46149089179
- Learning to control a 6-degree-of-freedom walking robot
- Wawrzynski, P. Learning to control a 6-degree-of-freedom walking robot. In IEEE EUROCON, pp. 698-705, 2007.
- (2007) IEEE EUROCON , pp. 698-705
- Wawrzynski, P.¹

80
- 84867114808
- Pattern recognition and adaptive control
- Widrow, B. Pattern recognition and adaptive control. IEEE Trans. Ind. Appl, 83(74):269-277, 1964.
- (1964) IEEE Trans. Ind. Appl , vol.83 , Issue.74 , pp. 269-277
- Widrow, B.¹

81
- 38149018611
- Solving deep memory POMDPs with recurrent policy gradients
- Wierstra, D., Foerster, A., Peters, J., and Schmidhuber, J. Solving deep memory POMDPs with recurrent policy gradients. In ICANN, pp. 697-706. 2007.
- (2007) ICANN , pp. 697-706
- Wierstra, D.¹ Foerster, A.² Peters, J.³ Schmidhuber, J.⁴

82
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229-256, 1992.
- (1992) Mach. Learn. , vol.8 , pp. 229-256
- Williams, R.J.¹

83
- 79851475664
- SkyAI: Highly modularized reinforcement learning library
- Yamaguchi, A. and Ogasawara, T. SkyAI: Highly modularized reinforcement learning library. In IEEE-RAS Humanoids, pp. 118-123, 2010.
- (2010) IEEE-RAS Humanoids , pp. 118-123
- Yamaguchi, A.¹ Ogasawara, T.²

84
- 44049108531
- Automated directory assistance system - From theory to practice
- Yu, D., Ju, Y.-C, Wang, Y.-Y, Zweig, G., and Acero, A. Automated directory assistance system - from theory to practice. In Interspeech, pp. 2709-2712, 2007.
- (2007) Interspeech , pp. 2709-2712
- Yu, D.¹ Ju, Y.-C.² Wang, Y.-Y.³ Zweig, G.⁴ Acero, A.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.