메뉴 건너뛰기




Volumn 42, Issue 2, 2012, Pages 201-212

Experience replay for real-time reinforcement learning control

Author keywords

Experience replay (ER); Q learning; real time control; reinforcement learning (RL); robotics; SARSA

Indexed keywords

EXPERIENCE REPLAY (ER); OPTIMAL CONTROL STRATEGY; PENDULUM SWING-UP PROBLEM; Q-LEARNING; REAL-TIME EXPERIMENT; REAL-TIME LEARNING; REINFORCEMENT LEARNING CONTROL; SARSA; SARSA ALGORITHM; SIMULATED SYSTEM; SIMULATION STUDIES; VISION BASED CONTROL;

EID: 84857501996     PISSN: 10946977     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSMCC.2011.2106494     Document Type: Article
Times cited : (255)

References (37)
  • 1
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • Tahoe City, CA, Jul. 9-12
    • L. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. 12th Int. Conf. Mach. Learning, Tahoe City, CA, Jul. 9-12, 1995, pp. 30-37.
    • (1995) Proc. 12th Int. Conf. Mach. Learning , pp. 30-37
    • Baird, L.1
  • 2
    • 77953724549 scopus 로고    scopus 로고
    • An application of reinforcement learning for efficient spectrum usage in next-generation mobile cellular networks
    • Jul.
    • F. Bernardo, R. Agustí, J. Pérez-Romero, and O. Sallent, "An application of reinforcement learning for efficient spectrum usage in next-generation mobile cellular networks," IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 40, no. 4, pp. 477-484, Jul. 2010.
    • (2010) IEEE Trans. Syst., Man, Cybern. C: Appl. Rev. , vol.40 , Issue.4 , pp. 477-484
    • Bernardo, F.1    Agustí, R.2    Pérez-Romero, J.3    Sallent, O.4
  • 7
    • 77957782880 scopus 로고    scopus 로고
    • Online leastsquares policy iteration for reinforcement learning control
    • Baltimore, MD, Jun. 30-Jul.
    • L. Buşoniu, D. Ernst, B. De Schutter, and R. Babuška, "Online leastsquares policy iteration for reinforcement learning control," in Proc. Amer. Control Conf., Baltimore, MD, Jun. 30-Jul. 2, 2010, pp. 486-491.
    • (2010) Proc. Amer. Control Conf. , vol.2 , pp. 486-491
    • Buşoniu, L.1    Ernst, D.2    De Schutter, B.3    Babuška, R.4
  • 8
    • 0032649518 scopus 로고    scopus 로고
    • An analysis of experience replay in temporal difference learning
    • P. Cichosz, "An analysis of experience replay in temporal difference learning," Cybern. Syst., vol. 30, pp. 341-363, 1999.
    • (1999) Cybern. Syst. , vol.30 , pp. 341-363
    • Cichosz, P.1
  • 9
    • 56749173285 scopus 로고    scopus 로고
    • Efficient experience reuse in non-Markovian environments
    • Tokyo, Japan, Aug. 20-22
    • L. T. Dung, T. Komeda, and M. Takagi, "Efficient experience reuse in non-Markovian environments," in Proc. Int. Conf. Instrum., Control Inf. Technol., Tokyo, Japan, Aug. 20-22, 2008, pp. 3327-3332.
    • (2008) Proc. Int. Conf. Instrum., Control Inf. Technol. , pp. 3327-3332
    • Dung, L.T.1    Komeda, T.2    Takagi, M.3
  • 11
    • 21844465127 scopus 로고    scopus 로고
    • Tree-based batch mode reinforcement learning
    • D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learning Res., vol. 6, pp. 503-556, 2005.
    • (2005) J. Mach. Learning Res. , vol.6 , pp. 503-556
    • Ernst, D.1    Geurts, P.2    Wehenkel, L.3
  • 12
    • 39649096058 scopus 로고    scopus 로고
    • Clinical data based optimal STI strategies for HIV: A reinforcement learning approach
    • San Diego, CA, Dec. 13-15
    • D. Ernst, G.-B. Stan, J. Gonçalves, and L.Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proc. 45th IEEE Conf. Decis. Control, San Diego, CA, Dec. 13-15, 2006, pp. 667-672.
    • (2006) Proc. 45th IEEE Conf. Decis. Control , pp. 667-672
    • Ernst, D.1    Stan, G.-B.2    Gonçalves, J.3    Wehenkel, L.4
  • 13
    • 34447332815 scopus 로고    scopus 로고
    • Integration of coordination architecture and behavior fuzzy learning in quadruped walking robots
    • Jul.
    • D. Gu and H. Hu, "Integration of coordination architecture and behavior fuzzy learning in quadruped walking robots," IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 37, no. 4, pp. 670-681, Jul. 2007.
    • (2007) IEEE Trans. Syst., Man, Cybern. C: Appl. Rev. , vol.37 , Issue.4 , pp. 670-681
    • Gu, D.1    Hu, H.2
  • 14
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Comput., vol. 6, no. 6, pp. 1185-1201, 1994.
    • (1994) Neural Comput. , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 15
    • 0032140718 scopus 로고    scopus 로고
    • Fuzzy inference system learning by reinforcement methods
    • Aug.
    • L. Jouffe, "Fuzzy inference system learning by reinforcement methods," IEEE Trans. Syst.,Man, Cybern. C: Appl. Rev., vol. 28, no. 3, pp. 338-355, Aug. 1998.
    • (1998) IEEE Trans. Syst.,Man, Cybern. C: Appl. Rev. , vol.28 , Issue.3 , pp. 338-355
    • Jouffe, L.1
  • 17
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learning Res., vol. 4, pp. 1107-1149, 2003.
    • (2003) J. Mach. Learning Res. , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 19
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • (Special issue on reinforcement learning), Aug.
    • L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching," Mach. Learning, vol. 8, no. 3/4, (Special issue on reinforcement learning), pp. 293-321, Aug. 1992.
    • (1992) Mach. Learning , vol.8 , Issue.3-4 , pp. 293-321
    • Lin, L.-J.1
  • 20
    • 56449091120 scopus 로고    scopus 로고
    • An analysis of reinforcement learning with function approximation
    • Helsinki, Finland, Jul. 5-9
    • F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proc. 25th Int. Conf. Mach. Learning, Helsinki, Finland, Jul. 5-9, 2008, pp. 664-671.
    • (2008) Proc. 25th Int. Conf. Mach. Learning , pp. 664-671
    • Melo, F.S.1    Meyn, S.P.2    Ribeiro, M.I.3
  • 21
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Mach. Learning, vol. 13, pp. 103-130, 1993.
    • (1993) Mach. Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 22
    • 0035978635 scopus 로고    scopus 로고
    • Modular Q-learning based multi-agent cooperation for robot soccer
    • K. Park, Y. Kim, and J. Kim, "Modular Q-learning based multi-agent cooperation for robot soccer," Robot. Auton. Syst., vol. 35, pp. 109-122, 2001.
    • (2001) Robot. Auton. Syst. , vol.35 , pp. 109-122
    • Park, K.1    Kim, Y.2    Kim, J.3
  • 23
    • 84898960655 scopus 로고    scopus 로고
    • A convergent form of approximate policy iteration
    • S. Becker, S. Thrun, and K. Obermayer, Eds. Cambridge,MA: MIT Press
    • T. J. Perkins and D. Precup, "A convergent form of approximate policy iteration," in Advances in Neural Information Processing Systems, vol. 15, S. Becker, S. Thrun, and K. Obermayer, Eds. Cambridge,MA: MIT Press, 2003, pp. 1595-1602.
    • (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1595-1602
    • Perkins, T.J.1    Precup, D.2
  • 24
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • J. Peters and S. Schaal, "Reinforcement learning of motor skills with policy gradients," Neural Netw., vol. 21, pp. 682-697, 2008.
    • (2008) Neural Netw. , vol.21 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 25
    • 0003636089 scopus 로고
    • (Sep.). Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, U.K. [Online]
    • G. A. Rummery and M. Niranjan. (1994, Sep.). On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, U.K. [Online]. Available at http://mi.eng.cam.ac.uk/ reports/svr-ftp/rummery-tr166.ps.Z.
    • (1994) On-line Q-learning Using Connectionist Systems
    • Rummery, G.A.1    Niranjan, M.2
  • 26
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • S. Singh and R. Sutton, "Reinforcement learning with replacing eligibility traces," Mach. Learning, vol. 22, pp. 123-158, 1996.
    • (1996) Mach. Learning , vol.22 , pp. 123-158
    • Singh, S.1    Sutton, R.2
  • 27
    • 85153965130 scopus 로고
    • Reinforcement learning with soft state aggregation
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press
    • S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 361-368.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 29
    • 0001898381 scopus 로고    scopus 로고
    • Practical reinforcement learning in continuous spaces
    • Stanford University, Stanford, CA, Jun. 29-Jul.
    • W. D. Smart and L. P. Kaelbling, "Practical reinforcement learning in continuous spaces," in Proc. 17th Int. Conf. Mach. Learning, Stanford University, Stanford, CA, Jun. 29-Jul. 2, 2000, pp. 903-910.
    • (2000) Proc. 17th Int. Conf. Mach. Learning , vol.2 , pp. 903-910
    • Smart, W.D.1    Kaelbling, L.P.2
  • 30
    • 27544506565 scopus 로고    scopus 로고
    • Reinforcement learning for RoboCup soccer keepaway
    • P. Stone, R. Sutton, and G. Kuhlmann, "Reinforcement learning for RoboCup soccer keepaway," Adaptive Behav., vol. 13, pp. 165-188, 2005.
    • (2005) Adaptive Behav. , vol.13 , pp. 165-188
    • Stone, P.1    Sutton, R.2    Kuhlmann, G.3
  • 31
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Austin, TX, Jun. 21-23
    • R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proc. 7th Int. Conf. Mach. Learning, Austin, TX, Jun. 21-23, 1990, pp. 216-224.
    • (1990) Proc. 7th Int. Conf. Mach. Learning , pp. 216-224
    • Sutton, R.S.1
  • 35
    • 40649111409 scopus 로고    scopus 로고
    • Dynamic exploration in Q(λ)-learning
    • Vancouver, BC, Canada, Jul. 16-21
    • J. Van Ast and R. Babuska, "Dynamic exploration in Q(λ)-learning," in Proc. Int. Joint Conf. Neural Netw., Vancouver, BC, Canada, Jul. 16-21, 2006, pp. 41-46.
    • (2006) Proc. Int. Joint Conf. Neural Netw. , pp. 41-46
    • Van Ast, J.1    Babuska, R.2
  • 37
    • 71749106087 scopus 로고    scopus 로고
    • Real-time reinforcement learning by sequential actor-critics and experience replay
    • P. Wawrzynski, "Real-time reinforcement learning by sequential actor-critics and experience replay," Neural Netw., vol. 22, no. 10, pp. 1484-1497, 2009.
    • (2009) Neural Netw. , vol.22 , Issue.10 , pp. 1484-1497
    • Wawrzynski, P.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.