메뉴 건너뛰기




Volumn 5, Issue , 2017, Pages 27091-27102

System Design Perspective for Human-Level Agents Using Deep Reinforcement Learning: A Survey

Author keywords

Deep learning; human level agents; reinforcement learning; robotics; survey; system design

Indexed keywords

ARTIFICIAL INTELLIGENCE; DEEP LEARNING; INTELLIGENT ROBOTS; LEARNING SYSTEMS; MACHINE DESIGN; MATHEMATICAL MODELS; MONTE CARLO METHODS; ROBOTICS; ROBOTS; SURVEYING; SURVEYS; SYSTEMS ANALYSIS;

EID: 85035762670     PISSN: None     EISSN: 21693536     Source Type: Journal    
DOI: 10.1109/ACCESS.2017.2777827     Document Type: Review
Times cited : (94)

References (75)
  • 2
    • 84884276459 scopus 로고    scopus 로고
    • Reinforcement learning in robotics: A survey
    • J. Kober, J. A. Bagnell, J. Peters, "Reinforcement learning in robotics: A survey, " Int. J. Robot. Res., vol. 32, no. 11, pp. 1238-1274, 2013.
    • (2013) Int. J. Robot. Res. , vol.32 , Issue.11 , pp. 1238-1274
    • Kober, J.1    Bagnell, J.A.2    Peters, J.3
  • 3
    • 0026880130 scopus 로고
    • Automatic programming of behaviorbased robots using reinforcement learning
    • Jun.
    • S. Mahadevan and J. Connell, "Automatic programming of behaviorbased robots using reinforcement learning, " Artif. Intell., vol. 55, nos. 2-3, pp. 311-365, Jun. 1992.
    • (1992) Artif. Intell. , vol.55 , Issue.2-3 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 5
    • 84979846290 scopus 로고    scopus 로고
    • Gait balance and acceleration of a biped robot based on Q-learning
    • J.-L. Lin, K.-S. Hwang, W.-C. Jiang, Y.-J. Chen, "Gait balance and acceleration of a biped robot based on Q-learning, " IEEE Access, vol. 4, pp. 2439-2449, 2016.
    • (2016) IEEE Access , vol.4 , pp. 2439-2449
    • Lin, J.-L.1    Hwang, K.-S.2    Jiang, W.-C.3    Chen, Y.-J.4
  • 6
    • 84876259942 scopus 로고    scopus 로고
    • Learning to select and generalize striking movements in robot table tennis
    • K. Mülling, J. Kober, O. Kroemer, J. Peters, "Learning to select and generalize striking movements in robot table tennis, " Int. J. Robot. Res., vol. 32, no. 3, pp. 263-279, 2013.
    • (2013) Int. J. Robot. Res. , vol.32 , Issue.3 , pp. 263-279
    • Mülling, K.1    Kober, J.2    Kroemer, O.3    Peters, J.4
  • 7
    • 67650996818 scopus 로고    scopus 로고
    • Reinforcement learning for robot soccer
    • M. Riedmiller, T. Gabel, R. Hafner, S. Lange, "Reinforcement learning for robot soccer, " Auton. Robots, vol. 27, no. 1, pp. 55-73, 2009.
    • (2009) Auton. Robots , vol.27 , Issue.1 , pp. 55-73
    • Riedmiller, M.1    Gabel, T.2    Hafner, R.3    Lange, S.4
  • 8
    • 33845914873 scopus 로고    scopus 로고
    • Animal intelligence: An experimental study of the associate processes in animals
    • E. L. Thorndike, "Animal intelligence: An experimental study of the associate processes in animals, " Amer. Psychol., vol. 53, no. 10, pp. 1125-1127, 1998.
    • (1998) Amer. Psychol. , vol.53 , Issue.10 , pp. 1125-1127
    • Thorndike, E.L.1
  • 9
    • 0030896968 scopus 로고    scopus 로고
    • Aneural substrate of prediction and reward
    • W. Schultz, P. Dayan, P. R. Montague, "Aneural substrate of prediction and reward, " Science, vol. 275, no. 5306, pp. 1593-1599, 1997.
    • (1997) Science , vol.275 , Issue.5306 , pp. 1593-1599
    • Schultz, W.1    Dayan, P.2    Montague, P.R.3
  • 10
    • 85012688561 scopus 로고    scopus 로고
    • Princeton NJ USA: Princeton Univ. Press
    • R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 2010.
    • (2010) Dynamic Programming
    • Bellman, R.1
  • 11
    • 84903724014 scopus 로고    scopus 로고
    • Deep learning: Methods and applications
    • Jun.
    • L. Deng and D. Yu, "Deep learning: Methods and applications, " Found. Trends Signal Process., vol. 7, nos. 3-4, pp. 197-387, Jun. 2014.
    • (2014) Found. Trends Signal Process. , vol.7 , Issue.3-4 , pp. 197-387
    • Deng, L.1    Yu, D.2
  • 12
    • 85040285806 scopus 로고    scopus 로고
    • ImageNet: Constructing a large-scale image database
    • L. Fei-Fei, J. Deng, K. Li, "ImageNet: Constructing a large-scale image database, " J. Vis., vol. 9, no. 8, p. 1037, 2009.
    • (2009) J. Vis. , vol.9 , Issue.8 , pp. 1037
    • Fei-Fei, L.1    Deng, J.2    Li, K.3
  • 16
    • 84867605836 scopus 로고    scopus 로고
    • Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
    • Mar.
    • O. Abdel-Hamid, A. Mohamed, H. Jiang, G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " in Proc. IEEE Conf. Acoust. Speech Signal Process., Mar. 2012, pp. 4277-4280.
    • (2012) Proc. IEEE Conf. Acoust. Speech Signal Process , pp. 4277-4280
    • Abdel-Hamid, O.1    Mohamed, A.2    Jiang, H.3    Penn, G.4
  • 18
    • 85028548653 scopus 로고    scopus 로고
    • Deep learning in robotics: A review of recent research
    • H. A. Pierson and M. S. Gashler, "Deep learning in robotics: A review of recent research, " Adv. Robot., vol. 31, no. 16, pp. 821-835, 2017.
    • (2017) Adv. Robot. , vol.31 , Issue.16 , pp. 821-835
    • Pierson, H.A.1    Gashler, M.S.2
  • 19
    • 34249833101 scopus 로고
    • Q-learning
    • C. J. C. H. Watkins and P. Dayan, "Q-learning, " Mach. Learn., vol. 8, nos. 3-4, pp. 279-292, 1992.
    • (1992) Mach. Learn. , vol.8 , Issue.3-4 , pp. 279-292
    • Watkins, C.J.C.H.1    Dayan, P.2
  • 20
    • 84924051598 scopus 로고    scopus 로고
    • Human-level control through deep reinforcement learning
    • V. Mnih, et al., "Human-level control through deep reinforcement learning, " Nature, vol. 518, no. 7540, pp. 529-533, 2015.
    • (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
    • Mnih, V.1
  • 21
    • 84879976780 scopus 로고    scopus 로고
    • The arcade learning environment: An evaluation platform for general agents
    • May
    • M. G. Bellemare, Y. Naddaf, J. Veness, M. Bowling, "The arcade learning environment: An evaluation platform for general agents, " J. Artif. Intell. Res., vol. 47, pp. 253-279, May 2013.
    • (2013) J. Artif. Intell. Res. , vol.47 , pp. 253-279
    • Bellemare, M.G.1    Naddaf, Y.2    Veness, J.3    Bowling, M.4
  • 22
    • 84963949906 scopus 로고    scopus 로고
    • Mastering the game of Go with deep neural networks and tree search
    • Jan.
    • D. Silver, et al., "Mastering the game of Go with deep neural networks and tree search, " Nature, vol. 529, no. 7578, pp. 484-489, Jan. 2016.
    • (2016) Nature , vol.529 , Issue.7578 , pp. 484-489
    • Silver, D.1
  • 24
    • 84887003012 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation, " in Proc. Adv. Neural Inf. Process. Syst., 1997, pp. 1075-1081.
    • (1997) Proc. Adv. Neural Inf. Process. Syst. , pp. 1075-1081
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 27
    • 34547971839 scopus 로고    scopus 로고
    • Efficient selectivity and backup operators in Monte-Carlo tree search
    • R. Coulom, "Efficient selectivity and backup operators in Monte-Carlo tree search, " in Proc. Int. Conf. Comput. Games, 2006, pp. 72-83.
    • (2006) Proc. Int. Conf. Comput. Games , pp. 72-83
    • Coulom, R.1
  • 28
    • 84858960516 scopus 로고    scopus 로고
    • A survey of Monte Carlo tree search methods
    • Mar.
    • C. B. Browne, et al., "A survey of Monte Carlo tree search methods, " IEEE Trans. Comput. Intell. AI Games, vol. 4, no. 1, pp. 1-43, Mar. 2012.
    • (2012) IEEE Trans. Comput. Intell. AI Games , vol.4 , Issue.1 , pp. 1-43
    • Browne, C.B.1
  • 31
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Mar.
    • G. Tesauro, "Temporal difference learning and TD-Gammon, " Commun. ACM, vol. 38, no. 3, pp. 58-68, Mar. 1995.
    • (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 32
    • 0032923221 scopus 로고    scopus 로고
    • Catastrophic forgetting in connectionist networks
    • R. M. French, "Catastrophic forgetting in connectionist networks, " Trends Cognit. Sci., vol. 3, no. 4, pp. 128-135, 1999.
    • (1999) Trends Cognit. Sci. , vol.3 , Issue.4 , pp. 128-135
    • French, R.M.1
  • 33
    • 84974531647 scopus 로고    scopus 로고
    • What learning systems do intelligent agents need Complementary learning systems theory updated
    • D. Kumaran, D. Hassabis, J. L. McClelland, "What learning systems do intelligent agents need Complementary learning systems theory updated, " Trends Cognit. Sci., vol. 20, no. 7, pp. 512-534, 2016.
    • (2016) Trends Cognit. Sci. , vol.20 , Issue.7 , pp. 512-534
    • Kumaran, D.1    Hassabis, D.2    McClelland, J.L.3
  • 35
    • 0001219859 scopus 로고
    • Regularization theory and neural networks architectures
    • Mar.
    • F. Girosi, M. Jones, T. Poggio, "Regularization theory and neural networks architectures, " Neural Comput., vol. 7, no. 2, pp. 219-269, Mar. 1995.
    • (1995) Neural Comput. , vol.7 , Issue.2 , pp. 219-269
    • Girosi, F.1    Jones, M.2    Poggio, T.3
  • 39
    • 84980003817 scopus 로고    scopus 로고
    • Nov.
    • A. A. Rusu, et al. (Nov. 2015). "Policy distillation." [Online]. Available: Https://arxiv.org/abs/1511.06295
    • (2015) Policy Distillation
    • Rusu, A.A.1
  • 41
    • 85030470931 scopus 로고    scopus 로고
    • Knowledge transfer for deep reinforcement learning with hierarchical experience replay
    • Jan.
    • H. Yin and S. J. Pan, "Knowledge transfer for deep reinforcement learning with hierarchical experience replay, " in Proc. AAAI Conf. Artif. Intell., Jan. 2017, pp. 1640-1646.
    • (2017) Proc. AAAI Conf. Artif. Intell , pp. 1640-1646
    • Yin, H.1    Pan, S.J.2
  • 42
    • 85016395012 scopus 로고    scopus 로고
    • Overcoming catastrophic forgetting in neural networks
    • J. Kirkpatrick, et al., "Overcoming catastrophic forgetting in neural networks, " in Proc. Nat. Acad. Sci. USA, vol. 114, no. 3, pp. 3521-3526, 2017.
    • (2017) Proc. Nat. Acad. Sci. USA , vol.114 , Issue.3 , pp. 3521-3526
    • Kirkpatrick, J.1
  • 43
    • 84865745286 scopus 로고    scopus 로고
    • Synaptic consolidation: An approach to long-term learning
    • Jun.
    • C. Clopath, "Synaptic consolidation: An approach to long-term learning, " Cognit. Neurodyn., vol. 6, no. 3, pp. 251-257, Jun. 2012.
    • (2012) Cognit. Neurodyn. , vol.6 , Issue.3 , pp. 251-257
    • Clopath, C.1
  • 44
    • 0026897370 scopus 로고
    • Uniqueness of the weights for minimal feedforward nets with a given input-output map
    • Jul./Aug.
    • H. J. Sussmann, "Uniqueness of the weights for minimal feedforward nets with a given input-output map, " J. Neural Netw., vol. 5, no. 4, pp. 589-593, Jul./Aug. 1992.
    • (1992) J. Neural Netw. , vol.5 , Issue.4 , pp. 589-593
    • Sussmann, H.J.1
  • 45
    • 33644997478 scopus 로고    scopus 로고
    • Flocking for multi-agent dynamic systems: Algorithms and theory
    • Mar.
    • R. Olfati-Saber, "Flocking for multi-agent dynamic systems: Algorithms and theory, " IEEE Trans. Autom. Control, vol. 51, no. 3, pp. 401-420, Mar. 2006.
    • (2006) IEEE Trans. Autom. Control , vol.51 , Issue.3 , pp. 401-420
    • Olfati-Saber, R.1
  • 46
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • M. L. Littman, "Markov games as a framework for multi-agent reinforcement learning, " in Proc. Int. Conf. Mach. Learn., 1994, pp. 157-163.
    • (1994) Proc. Int. Conf. Mach. Learn. , pp. 157-163
    • Littman, M.L.1
  • 47
    • 85017018413 scopus 로고    scopus 로고
    • Multiagent cooperation and competition with deep reinforcement learning
    • Apr.
    • A. Tampuu, et al., "Multiagent cooperation and competition with deep reinforcement learning, " PLoS ONE, vol. 12, no. 4, p. e0172395, Apr. 2017.
    • (2017) PLoS ONE , vol.12 , Issue.4 , pp. e0172395
    • Tampuu, A.1
  • 48
    • 84962082047 scopus 로고    scopus 로고
    • Multi-agent reinforcement learning as a rehearsal for decentralized planning
    • May
    • L. Kraemer and B. Banerjee, "Multi-agent reinforcement learning as a rehearsal for decentralized planning, " Neurocomputing, vol. 190, pp. 82-94, May 2016.
    • (2016) Neurocomputing , vol.190 , pp. 82-94
    • Kraemer, L.1    Banerjee, B.2
  • 55
    • 84971448181 scopus 로고    scopus 로고
    • Asynchronous methods for deep reinforcement learning
    • V. Mnih, et al., "Asynchronous methods for deep reinforcement learning, " in Proc. Int. Conf. Mach. Learn., 2016, pp. 1928-1937.
    • (2016) Proc. Int. Conf. Mach. Learn. , pp. 1928-1937
    • Mnih, V.1
  • 56
    • 84964687570 scopus 로고    scopus 로고
    • Deep recurrent Q-learning for partially observable MDPs
    • Nov. [Online]. Available
    • M. Hausknecht and P. Stone, "Deep recurrent Q-learning for partially observable MDPs, " in Proc. AAAI Symp. Seq. Decis. Mak. Intell. Agents, Nov. 2015. [Online]. Available: Https://www.cs.utexas.edu/~pstone/Papers/bib2html/b2hd-SDMIA15-Hausknecht.html
    • (2015) Proc. AAAI Symp. Seq. Decis. Mak. Intell. Agents
    • Hausknecht, M.1    Stone, P.2
  • 60
    • 0141988716 scopus 로고    scopus 로고
    • Recent advances in hierarchical reinforcement learning
    • Oct.
    • A. G. Barto and S. Mahadevan, "Recent advances in hierarchical reinforcement learning, " Discrete Event Dyn. Syst., vol. 13, no. 4, pp. 341-379, Oct. 2003.
    • (2003) Discrete Event Dyn. Syst. , vol.13 , Issue.4 , pp. 341-379
    • Barto, A.G.1    Mahadevan, S.2
  • 61
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • Aug.
    • R. S. Sutton, D. Precup, S. Singh, "Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning, " Artif. Intell., vol. 112, nos. 1-2, pp. 181-211, Aug. 1999.
    • (1999) Artif. Intell. , vol.112 , Issue.1-2 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 62
    • 4544318426 scopus 로고    scopus 로고
    • Efficient solution algorithms for factored MDPs
    • Jul./Dec. [Online]. Available
    • C. Guestrin, D. Koller, R. Parr, S. Venkataraman, "Efficient solution algorithms for factored MDPs, " J. Artif. Intell. Res., vol. 19, pp. 399-468, Jul./Dec. 2003. [Online]. Available: Http://www.jair.org/contents.html
    • (2003) J. Artif. Intell. Res. , vol.19 , pp. 399-468
    • Guestrin, C.1    Koller, D.2    Parr, R.3    Venkataraman, S.4
  • 63
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • Nov.
    • T. G. Dietterich, "Hierarchical reinforcement learning with the MAXQ value function decomposition, " J. Artif. Intell. Res., vol. 13, pp. 227-303, Nov. 2000.
    • (2000) J. Artif. Intell. Res. , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 65
  • 66
    • 85015444377 scopus 로고    scopus 로고
    • [Online]. Available
    • G. Brockman, et al. (2016). "OpenAI gym." [Online]. Available: Https://arxiv.org/abs/1606.01540
    • (2016) OpenAI Gym
    • Brockman, G.1
  • 68
    • 84988352824 scopus 로고    scopus 로고
    • Theano: Deep learning on GPUs with Python
    • J. Bergstra, et al., "Theano: Deep learning on GPUs with Python, " in Proc. BigLearn Workshop NIPS, 2011.
    • (2011) Proc. BigLearn Workshop NIPS
    • Bergstra, J.1
  • 69
    • 84971640658 scopus 로고    scopus 로고
    • [Online]. Available
    • F. Chollet. (2015). Keras. [Online]. Available: Http://keras.io
    • (2015) Keras
    • Chollet, F.1
  • 70
    • 80555140075 scopus 로고    scopus 로고
    • Scikit-learn: Machine learning in Python
    • Oct.
    • F. Pedregosa, et al., "Scikit-learn: Machine learning in Python, " J. Mach. Learn. Res., vol. 12, pp. 2825-2830, Oct. 2011.
    • (2011) J. Mach. Learn. Res. , vol.12 , pp. 2825-2830
    • Pedregosa, F.1
  • 71
    • 85040287186 scopus 로고    scopus 로고
    • [Online]. Available
    • O. Klimov and J. Schulman. (2017). Roboschool. [Online]. Available: Https://blog.openai.com/roboschool/
    • (2017) Roboschool
    • Klimov, O.1    Schulman, J.2
  • 73
    • 70449370276 scopus 로고    scopus 로고
    • RL-glue: Language-independent software for reinforcement-learning experiments
    • Sep.
    • B. Tanner and A. White, "RL-glue: Language-independent software for reinforcement-learning experiments, " J. Mach. Learn. Res., vol. 10, pp. 2133-2136, Sep. 2009.
    • (2009) J. Mach. Learn. Res. , vol.10 , pp. 2133-2136
    • Tanner, B.1    White, A.2
  • 74
  • 75
    • 84865223864 scopus 로고    scopus 로고
    • Inverse reinforcement learning
    • New York, NY, USA: Springer
    • P. Abbeel and A. Y. Ng, "Inverse reinforcement learning, " in Encyclopedia of Machine Learning. New York, NY, USA: Springer, 2011, pp. 554-558.
    • (2011) Encyclopedia of Machine Learning , pp. 554-558
    • Abbeel, P.1    Ng, A.Y.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.