메뉴 건너뛰기




Volumn 3, Issue , 2016, Pages 2001-2014

Benchmarking deep reinforcement learning for continuous control

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING ALGORITHMS; LEARNING SYSTEMS; PERSONNEL TRAINING;

EID: 84999018287     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (660)

References (84)
  • 3
    • 84995343329 scopus 로고    scopus 로고
    • Reinforcement learning with long short-term memory
    • Bakker, B. Reinforcement learning with long short-term memory. In NIPS, pp. 1475-1482, 2001.
    • (2001) NIPS , pp. 1475-1482
    • Bakker, B.1
  • 4
    • 84879976780 scopus 로고    scopus 로고
    • The Arcade Learning Environment: An evaluation platform for general agents
    • Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253-279, 2013.
    • (2013) J. Artif. Intell. Res. , vol.47 , pp. 253-279
    • Bellemare, M.G.1    Naddaf, Y.2    Veness, J.3    Bowling, M.4
  • 6
    • 0029509952 scopus 로고
    • Neuro-dynamic programming: An overview
    • Bertsekas, Dimitri P and Tsitsiklis, John N. Neuro-dynamic programming: an overview. In CDC, pp. 560-564, 1995.
    • (1995) CDC , pp. 560-564
    • Bertsekas, D.P.1    Tsitsiklis, J.N.2
  • 10
    • 84899800132 scopus 로고    scopus 로고
    • Policy evaluation with temporal differences: A survey and comparison
    • Dann, C, Neumann, G., and Peters, J. Policy evaluation with temporal differences: A survey and comparison. J. Mach. Learn. Res., 15(1):809-883, 2014.
    • (2014) J. Mach. Learn. Res. , vol.15 , Issue.1 , pp. 809-883
    • Dann, C.1    Neumann, G.2    Peters, J.3
  • 12
    • 84903590417 scopus 로고    scopus 로고
    • A survey on policy search for robotics, foundations and trends in robotics
    • Deisenroth, M. P., Neumann, G., and Peters, J. A survey on policy search for robotics, foundations and trends in robotics. Found. Trends Robotics, 2(1-2):1-142, 2013.
    • (2013) Found. Trends Robotics , vol.2 , Issue.1-2 , pp. 1-142
    • Deisenroth, M.P.1    Neumann, G.2    Peters, J.3
  • 13
    • 0028605089 scopus 로고
    • Swinging up the Acrobot: An example of intelligent control
    • DeJong, G. and Spong, M. W. Swinging up the Acrobot: An example of intelligent control. In ACC, pp. 2158-2162, 1994.
    • (1994) ACC , pp. 2158-2162
    • DeJong, G.1    Spong, M.W.2
  • 14
    • 85198028989 scopus 로고    scopus 로고
    • ImageNet: A large-scale hierarchical image database
    • Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In CVPR, pp. 248-255, 2009.
    • (2009) CVPR , pp. 248-255
    • Deng, J.1    Dong, W.2    Socher, R.3    Li, L.-J.4    Li, K.5    Fei-Fei, L.6
  • 15
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • Dietterich, T. G. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res, 13:227-303, 2000.
    • (2000) J. Artif. Intell. Res , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 17
    • 84907654870 scopus 로고    scopus 로고
    • The reinforcement learning competition 2014
    • Dimitrakakis, Christos, Li, Guangliang, and Tziortziotis, Nikoalos. The reinforcement learning competition 2014. AI Magazine, 35(3):61-65, 2014.
    • (2014) AI Magazine , vol.35 , Issue.3 , pp. 61-65
    • Dimitrakakis, C.1    Li, G.2    Tziortziotis, N.3
  • 18
    • 0348221772 scopus 로고
    • Error decorrelation: A technique for matching a class of functions
    • Donaldson, P. E. K. Error decorrelation: a technique for matching a class of functions. In Proc. 3th Intl. Conf. Medical Electronics, pp. 173-178, 1960.
    • (1960) Proc. 3th Intl. Conf. Medical Electronics , pp. 173-178
    • Donaldson, P.E.K.1
  • 19
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • Doya, K. Reinforcement learning in continuous time and space. Neural Comput., 12(1):219-245, 2000.
    • (2000) Neural Comput. , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 21
    • 84891539169 scopus 로고    scopus 로고
    • Infinite horizon model predictive control for nonlinear periodic tasks
    • Erez, Tom, Tassa, Yuval, and Todorov. Emanuel. Infinite horizon model predictive control for nonlinear periodic tasks. Manuscript under review, 4, 2011.
    • (2011) Manuscript Under Review , pp. 4
    • Erez, T.1    Tassa, Y.2    Emanuel, T.3
  • 24
    • 0017943442 scopus 로고
    • Computer control of a double inverted pendulum
    • Furuta, K., Okutani, T., and Sone, H. Computer control of a double inverted pendulum. Comput. Electr. Eng., 5(1):67-84, 1978.
    • (1978) Comput. Electr. Eng. , vol.5 , Issue.1 , pp. 67-84
    • Furuta, K.1    Okutani, T.2    Sone, H.3
  • 26
    • 85016587886 scopus 로고
    • SWITCH-BOARD: Telephone speech corpus for research and development
    • Godfrey, J. J., Holliman, E. C, and McDaniel, J. SWITCH-BOARD: Telephone speech corpus for research and development. In ICASSP, pp. 517-520, 1992.
    • (1992) ICASSP , pp. 517-520
    • Godfrey, J.J.1    Holliman, E.C.2    McDaniel, J.3
  • 27
    • 15744370759 scopus 로고    scopus 로고
    • 2-d pole balancing with recurrent evolutionary networks
    • Gomez, F. and Miikkulainen, R. 2-d pole balancing with recurrent evolutionary networks. In ICANN, pp. 425-430. 1998.
    • (1998) ICANN , pp. 425-430
    • Gomez, F.1    Miikkulainen, R.2
  • 28
    • 84937779024 scopus 로고    scopus 로고
    • Deep learning for real-time Atari game play using offline montecarlo tree search planning
    • Guo, X., Singh, S., Lee, H., Lewis, R. L., and Wang, X. Deep learning for real-time Atari game play using offline montecarlo tree search planning. In NIPS, pp. 3338-3346. 2014.
    • (2014) NIPS , pp. 3338-3346
    • Guo, X.1    Singh, S.2    Lee, H.3    Lewis, R.L.4    Wang, X.5
  • 29
    • 0035377566 scopus 로고    scopus 로고
    • Completely derandomized self-adaptation in evolution strategies
    • Hansen, N. and Ostermeier, A. Completely derandomized self-adaptation in evolution strategies. Evol. Comput., 9(2):159-195, 2001.
    • (2001) Evol. Comput. , vol.9 , Issue.2 , pp. 159-195
    • Hansen, N.1    Ostermeier, A.2
  • 31
    • 84965103751 scopus 로고    scopus 로고
    • Learning continuous control policies by stochastic value gradients
    • Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T, and Tassa, T. Learning continuous control policies by stochastic value gradients. In NIPS, pp. 2926-2934. 2015b.
    • (2015) NIPS , pp. 2926-2934
    • Heess, N.1    Wayne, G.2    Silver, D.3    Lillicrap, T.4    Erez, T.5    Tassa, T.6
  • 32
    • 84905454735 scopus 로고    scopus 로고
    • The open-source TEXPLORE code release for reinforcement learning on robots
    • Hester, T. and Stone, P. The open-source TEXPLORE code release for reinforcement learning on robots. In RoboCup 2013: Robot World Cup XVII, pp. 536-543. 2013.
    • (2013) RoboCup 2013: Robot World Cup XVII , pp. 536-543
    • Hester, T.1    Stone, P.2
  • 35
  • 36
    • 84898930479 scopus 로고    scopus 로고
    • A natural policy gradient
    • Kakade, S. M. A natural policy gradient. In NIPS, pp. 1531-1538. 2002.
    • (2002) NIPS , pp. 1531-1538
    • Kakade, S.M.1
  • 37
    • 84998584769 scopus 로고    scopus 로고
    • Stochastic real-valued reinforcement learning to solve a nonlinear control problem
    • Kimura, H. and Kobayashi, S. Stochastic real-valued reinforcement learning to solve a nonlinear control problem. In IEEE SMC, pp. 510-515, 1999.
    • (1999) IEEE SMC , pp. 510-515
    • Kimura, H.1    Kobayashi, S.2
  • 38
    • 84858754385 scopus 로고    scopus 로고
    • Policy search for motor primitives in robotics
    • Kober, J. and Peters, J. Policy search for motor primitives in robotics. In NIPS, pp. 849-856, 2009.
    • (2009) NIPS , pp. 849-856
    • Kober, J.1    Peters, J.2
  • 41
    • 84876231242 scopus 로고    scopus 로고
    • ImageNet classification with deep convolutional neural networks
    • Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. In NIPS, pp. 1097-1105. 2012.
    • (2012) NIPS , pp. 1097-1105
    • Krizhevsky, A.1    Sutskever, I.2    Hinton, G.3
  • 43
    • 84998840306 scopus 로고    scopus 로고
    • Guided policy search
    • Levine, S. and Koltun, V. Guided policy search. In ICML, pp. 1-9, 2013.
    • (2013) ICML , pp. 1-9
    • Levine, S.1    Koltun, V.2
  • 46
    • 0034850577 scopus 로고    scopus 로고
    • A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics
    • Martin, D., C. Fowlkes, D. Tal, and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, pp. 416-423, 2001.
    • (2001) ICCV , pp. 416-423
    • Martin, D.1    Fowlkes, C.2    Tal, D.3    Malik, J.4
  • 48
    • 0000827179 scopus 로고
    • BOXES: An experiment in adaptive control
    • Michie, D. and Chambers, R. A. BOXES: An experiment in adaptive control. Machine Intelligence, 2:137-152, 1968.
    • (1968) Machine Intelligence , vol.2 , pp. 137-152
    • Michie, D.1    Chambers, R.A.2
  • 52
    • 84998605266 scopus 로고
    • 3D balance in legged locomotion: Modeling and simulation for the one-legged case
    • Murthy, S. S. and Raibert, M. H. 3D balance in legged locomotion: modeling and simulation for the one-legged case. ACM SIGGRAPH Computer Graphics, 18(1):27-27, 1984.
    • (1984) ACM SIGGRAPH Computer Graphics , vol.18 , Issue.1 , pp. 27
    • Murthy, S.S.1    Raibert, M.H.2
  • 54
    • 84892504975 scopus 로고    scopus 로고
    • Dotrl: A platform for rapid reinforcement learning methods development and validation
    • Papis, B. and Wawrzynski, P. dotrl: A platform for rapid reinforcement learning methods development and validation. In FedCSIS, pp. pages 129-136., 2013.
    • (2013) FedCSIS , pp. 129-136
    • Papis, B.1    Wawrzynski, P.2
  • 57
    • 34547964788 scopus 로고    scopus 로고
    • Reinforcement learning by rewardweighted regression for operational space control
    • Peters, J. and Schaal, S. Reinforcement learning by rewardweighted regression for operational space control. In ICML, pp. 745-750, 2007.
    • (2007) ICML , pp. 745-750
    • Peters, J.1    Schaal, S.2
  • 58
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • Peters, J. and Schaal, S. Reinforcement learning of motor skills with policy gradients. Neural networks, 21(4):682-697, 2008.
    • (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 60
    • 77958569725 scopus 로고    scopus 로고
    • Relative entropy policy search
    • Peters, J., Mulling, K., and Altiin, Y. Relative entropy policy search. In AAAI, pp. 1607-1612, 2010.
    • (2010) AAAI , pp. 1607-1612
    • Peters, J.1    Mulling, K.2    Altiin, Y.3
  • 61
    • 84890506631 scopus 로고
    • Life at low Reynolds number
    • Purcell, E. M. Life at low Reynolds number. Am. J. Phys, 45(1):3-11, 1977.
    • (1977) Am. J. Phys , vol.45 , Issue.1 , pp. 3-11
    • Purcell, E.M.1
  • 64
    • 0000228665 scopus 로고    scopus 로고
    • The cross-entropy method for combinatorial and continuous optimization
    • Rubinstein, R. The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab., 1(2):127-190, 1999.
    • (1999) Methodol. Comput. Appl. Probab. , vol.1 , Issue.2 , pp. 127-190
    • Rubinstein, R.1
  • 65
    • 38049144999 scopus 로고    scopus 로고
    • Solving partially observable reinforcement learning problems with recurrent neural networks
    • Schäfer, A. M. and Udluft, S. Solving partially observable reinforcement learning problems with recurrent neural networks. In ECML Workshops, pp. 71-81, 2005.
    • (2005) ECML Workshops , pp. 71-81
    • Schäfer, A.M.1    Udluft, S.2
  • 69
    • 0000450303 scopus 로고
    • On induced stability
    • Stephenson, A. On induced stability. Philos. Mag., 15(86):233-236, 1908.
    • (1908) Philos. Mag. , vol.15 , Issue.86 , pp. 233-236
    • Stephenson, A.1
  • 71
    • 0033170372 scopus 로고    scopus 로고
    • Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
    • Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1):181-211, 1999.
    • (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 72
    • 33845344721 scopus 로고    scopus 로고
    • Learning Tetris using the noisy crossentropy method
    • Szita, I. and Lorincz, A. Learning Tetris using the noisy crossentropy method. Neural Comput., 18(12):2936-2941, 2006.
    • (2006) Neural Comput. , vol.18 , Issue.12 , pp. 2936-2941
    • Szita, I.1    Lorincz, A.2
  • 73
    • 0042967671 scopus 로고    scopus 로고
    • ϵ-MDPs: Learning in varying environments
    • Szita, I., Takacs, B., and Lorincz, A. ϵ-MDPs: Learning in varying environments. J. Mach. Learn. Res., 3:145-174, 2003.
    • (2003) J. Mach. Learn. Res. , vol.3 , pp. 145-174
    • Szita, I.1    Takacs, B.2    Lorincz, A.3
  • 75
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM, 38(3):58-68, 1995.
    • (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 76
    • 84872292044 scopus 로고    scopus 로고
    • MuJoCo: A physics engine for model-based control
    • Todorov, E., Erez, T, and Tassa, Y. MuJoCo: A physics engine for model-based control. In IROS, pp. 5026-5033, 2012.
    • (2012) IROS , pp. 5026-5033
    • Todorov, E.1    Erez, T.2    Tassa, Y.3
  • 77
    • 84954328659 scopus 로고    scopus 로고
    • Learning of nonparametric control policies with high-dimensional state features
    • van Hoof, H., Peters, J., and Neumann, G. Learning of nonparametric control policies with high-dimensional state features. In AISTATS, pp. 995-1003, 2015.
    • (2015) AISTATS , pp. 995-1003
    • Hoof, H.1    Peters, J.2    Neumann, G.3
  • 78
    • 84965129327 scopus 로고    scopus 로고
    • Embed to control: A locally linear latent dynamics model for control from raw images
    • Watter, M., Springenberg, J., Boedecker, J., and Riedmiller, M. Embed to control: A locally linear latent dynamics model for control from raw images. In NIPS, pp. 2728-2736, 2015.
    • (2015) NIPS , pp. 2728-2736
    • Watter, M.1    Springenberg, J.2    Boedecker, J.3    Riedmiller, M.4
  • 79
    • 46149089179 scopus 로고    scopus 로고
    • Learning to control a 6-degree-of-freedom walking robot
    • Wawrzynski, P. Learning to control a 6-degree-of-freedom walking robot. In IEEE EUROCON, pp. 698-705, 2007.
    • (2007) IEEE EUROCON , pp. 698-705
    • Wawrzynski, P.1
  • 80
    • 84867114808 scopus 로고
    • Pattern recognition and adaptive control
    • Widrow, B. Pattern recognition and adaptive control. IEEE Trans. Ind. Appl, 83(74):269-277, 1964.
    • (1964) IEEE Trans. Ind. Appl , vol.83 , Issue.74 , pp. 269-277
    • Widrow, B.1
  • 81
    • 38149018611 scopus 로고    scopus 로고
    • Solving deep memory POMDPs with recurrent policy gradients
    • Wierstra, D., Foerster, A., Peters, J., and Schmidhuber, J. Solving deep memory POMDPs with recurrent policy gradients. In ICANN, pp. 697-706. 2007.
    • (2007) ICANN , pp. 697-706
    • Wierstra, D.1    Foerster, A.2    Peters, J.3    Schmidhuber, J.4
  • 82
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229-256, 1992.
    • (1992) Mach. Learn. , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 83
    • 79851475664 scopus 로고    scopus 로고
    • SkyAI: Highly modularized reinforcement learning library
    • Yamaguchi, A. and Ogasawara, T. SkyAI: Highly modularized reinforcement learning library. In IEEE-RAS Humanoids, pp. 118-123, 2010.
    • (2010) IEEE-RAS Humanoids , pp. 118-123
    • Yamaguchi, A.1    Ogasawara, T.2
  • 84
    • 44049108531 scopus 로고    scopus 로고
    • Automated directory assistance system - From theory to practice
    • Yu, D., Ju, Y.-C, Wang, Y.-Y, Zweig, G., and Acero, A. Automated directory assistance system - from theory to practice. In Interspeech, pp. 2709-2712, 2007.
    • (2007) Interspeech , pp. 2709-2712
    • Yu, D.1    Ju, Y.-C.2    Wang, Y.-Y.3    Zweig, G.4    Acero, A.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.