SCOPUS 정보 검색 플랫폼

IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews

Volumn 42, Issue 2, 2012, Pages 201-212

Experience replay for real-time reinforcement learning control

(3) Adam, Sander a Buşoniu, Lucian b Babuška, Robert b

a ABN AMRO Corporate Research (Netherlands)

b DELFT UNIVERSITY OF TECHNOLOGY (Netherlands)

Author keywords

Experience replay (ER); Q learning; real time control; reinforcement learning (RL); robotics; SARSA

Indexed keywords

EXPERIENCE REPLAY (ER); OPTIMAL CONTROL STRATEGY; PENDULUM SWING-UP PROBLEM; Q-LEARNING; REAL-TIME EXPERIMENT; REAL-TIME LEARNING; REINFORCEMENT LEARNING CONTROL; SARSA; SARSA ALGORITHM; SIMULATED SYSTEM; SIMULATION STUDIES; VISION BASED CONTROL;

ALGORITHMS; REAL TIME CONTROL; ROBOTICS;

REINFORCEMENT LEARNING;

EID: 84857501996 PISSN: 10946977 EISSN: None Source Type: Journal
DOI: 10.1109/TSMCC.2011.2106494 Document Type: Article

Times cited : (255)

References (37)

1
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Tahoe City, CA, Jul. 9-12
- L. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. 12th Int. Conf. Mach. Learning, Tahoe City, CA, Jul. 9-12, 1995, pp. 30-37.
- (1995) Proc. 12th Int. Conf. Mach. Learning , pp. 30-37
- Baird, L.¹

2
- 77953724549
- An application of reinforcement learning for efficient spectrum usage in next-generation mobile cellular networks
- Jul.
- F. Bernardo, R. Agustí, J. Pérez-Romero, and O. Sallent, "An application of reinforcement learning for efficient spectrum usage in next-generation mobile cellular networks," IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 40, no. 4, pp. 477-484, Jul. 2010.
- (2010) IEEE Trans. Syst., Man, Cybern. C: Appl. Rev. , vol.40 , Issue.4 , pp. 477-484
- Bernardo, F.¹ Agustí, R.² Pérez-Romero, J.³ Sallent, O.⁴

3
- 84857504369
- 3rd ed., Belmont, MA: Athena Sci.
- D. P. Bertsekas, Dynamic Programming and Optimal Control, 3rd ed. vol. 2, Belmont, MA: Athena Sci., 2007.
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

4
- 0003923091
- New York: Academic
- D. P. Bertsekas and S. E. Shreve, Stochastic Optimal Control: TheDiscrete Time Case. New York: Academic, 1978.
- (1978) Stochastic Optimal Control: TheDiscrete Time Case
- Bertsekas, D.P.¹ Shreve, S.E.²

5
- 0003487482
- Belmont, MA: Athena Sci.
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Sci., 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 85046476577
- New York/Boca Raton, FL: Taylor & Francis/CRC
- L. Buşoniu, R. Babuška, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering). New York/Boca Raton, FL: Taylor & Francis/CRC, 2010.
- (2010) Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering)
- Buşoniu, L.¹ Babuška, R.² De Schutter, B.³ Ernst, D.⁴

7
- 77957782880
- Online leastsquares policy iteration for reinforcement learning control
- Baltimore, MD, Jun. 30-Jul.
- L. Buşoniu, D. Ernst, B. De Schutter, and R. Babuška, "Online leastsquares policy iteration for reinforcement learning control," in Proc. Amer. Control Conf., Baltimore, MD, Jun. 30-Jul. 2, 2010, pp. 486-491.
- (2010) Proc. Amer. Control Conf. , vol.2 , pp. 486-491
- Buşoniu, L.¹ Ernst, D.² De Schutter, B.³ Babuška, R.⁴

8
- 0032649518
- An analysis of experience replay in temporal difference learning
- P. Cichosz, "An analysis of experience replay in temporal difference learning," Cybern. Syst., vol. 30, pp. 341-363, 1999.
- (1999) Cybern. Syst. , vol.30 , pp. 341-363
- Cichosz, P.¹

9
- 56749173285
- Efficient experience reuse in non-Markovian environments
- Tokyo, Japan, Aug. 20-22
- L. T. Dung, T. Komeda, and M. Takagi, "Efficient experience reuse in non-Markovian environments," in Proc. Int. Conf. Instrum., Control Inf. Technol., Tokyo, Japan, Aug. 20-22, 2008, pp. 3327-3332.
- (2008) Proc. Int. Conf. Instrum., Control Inf. Technol. , pp. 3327-3332
- Dung, L.T.¹ Komeda, T.² Takagi, M.³

10
- 1442288723
- Ph.D. dissertation, University of Liége, Liége, Belgium, Mar.
- D. Ernst, "Near optimal closed-loop control. Application to electric power systems," Ph.D. dissertation, University of Liége, Liége, Belgium, Mar. 2003.
- (2003) Near Optimal Closed-loop Control. Application to Electric Power Systems
- Ernst, D.¹

11
- 21844465127
- Tree-based batch mode reinforcement learning
- D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learning Res., vol. 6, pp. 503-556, 2005.
- (2005) J. Mach. Learning Res. , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

12
- 39649096058
- Clinical data based optimal STI strategies for HIV: A reinforcement learning approach
- San Diego, CA, Dec. 13-15
- D. Ernst, G.-B. Stan, J. Gonçalves, and L.Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proc. 45th IEEE Conf. Decis. Control, San Diego, CA, Dec. 13-15, 2006, pp. 667-672.
- (2006) Proc. 45th IEEE Conf. Decis. Control , pp. 667-672
- Ernst, D.¹ Stan, G.-B.² Gonçalves, J.³ Wehenkel, L.⁴

13
- 34447332815
- Integration of coordination architecture and behavior fuzzy learning in quadruped walking robots
- Jul.
- D. Gu and H. Hu, "Integration of coordination architecture and behavior fuzzy learning in quadruped walking robots," IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 37, no. 4, pp. 670-681, Jul. 2007.
- (2007) IEEE Trans. Syst., Man, Cybern. C: Appl. Rev. , vol.37 , Issue.4 , pp. 670-681
- Gu, D.¹ Hu, H.²

14
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Comput., vol. 6, no. 6, pp. 1185-1201, 1994.
- (1994) Neural Comput. , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

15
- 0032140718
- Fuzzy inference system learning by reinforcement methods
- Aug.
- L. Jouffe, "Fuzzy inference system learning by reinforcement methods," IEEE Trans. Syst.,Man, Cybern. C: Appl. Rev., vol. 28, no. 3, pp. 338-355, Aug. 1998.
- (1998) IEEE Trans. Syst.,Man, Cybern. C: Appl. Rev. , vol.28 , Issue.3 , pp. 338-355
- Jouffe, L.¹

16
- 60349130974
- Batch reinforcement learning in a complex domain
- Honolulu, HI, May 14-18
- S. Kalyanakrishnan and P. Stone, "Batch reinforcement learning in a complex domain," in Proc. 6th Int. Conf. Auton. Agents Multi-Agent Syst., Honolulu, HI, May 14-18, 2007, pp. 650-657.
- (2007) Proc. 6th Int. Conf. Auton. Agents Multi-Agent Syst. , pp. 650-657
- Kalyanakrishnan, S.¹ Stone, P.²

17
- 4644323293
- Least-squares policy iteration
- M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learning Res., vol. 4, pp. 1107-1149, 2003.
- (2003) J. Mach. Learning Res. , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

18
- 84899834143
- Online exploration in leastsquares policy iteration
- Budapest, Hungary, May 10-15
- L. Li, M. L. Littman, and C. R. Mansley, "Online exploration in leastsquares policy iteration," in Proc. 8th Int. Joint Conf. Auton. Agents Multiagent Syst., Budapest, Hungary, May 10-15, 2009, vol. 2, pp. 733-739.
- (2009) Proc. 8th Int. Joint Conf. Auton. Agents Multiagent Syst. , vol.2 , pp. 733-739
- Li, L.¹ Littman, M.L.² Mansley, C.R.³

19
- 0000123778
- Self-improving reactive agents based on reinforcement learning, planning and teaching
- (Special issue on reinforcement learning), Aug.
- L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching," Mach. Learning, vol. 8, no. 3/4, (Special issue on reinforcement learning), pp. 293-321, Aug. 1992.
- (1992) Mach. Learning , vol.8 , Issue.3-4 , pp. 293-321
- Lin, L.-J.¹

20
- 56449091120
- An analysis of reinforcement learning with function approximation
- Helsinki, Finland, Jul. 5-9
- F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proc. 25th Int. Conf. Mach. Learning, Helsinki, Finland, Jul. 5-9, 2008, pp. 664-671.
- (2008) Proc. 25th Int. Conf. Mach. Learning , pp. 664-671
- Melo, F.S.¹ Meyn, S.P.² Ribeiro, M.I.³

21
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less time
- A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Mach. Learning, vol. 13, pp. 103-130, 1993.
- (1993) Mach. Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

22
- 0035978635
- Modular Q-learning based multi-agent cooperation for robot soccer
- K. Park, Y. Kim, and J. Kim, "Modular Q-learning based multi-agent cooperation for robot soccer," Robot. Auton. Syst., vol. 35, pp. 109-122, 2001.
- (2001) Robot. Auton. Syst. , vol.35 , pp. 109-122
- Park, K.¹ Kim, Y.² Kim, J.³

23
- 84898960655
- A convergent form of approximate policy iteration
- S. Becker, S. Thrun, and K. Obermayer, Eds. Cambridge,MA: MIT Press
- T. J. Perkins and D. Precup, "A convergent form of approximate policy iteration," in Advances in Neural Information Processing Systems, vol. 15, S. Becker, S. Thrun, and K. Obermayer, Eds. Cambridge,MA: MIT Press, 2003, pp. 1595-1602.
- (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1595-1602
- Perkins, T.J.¹ Precup, D.²

24
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- J. Peters and S. Schaal, "Reinforcement learning of motor skills with policy gradients," Neural Netw., vol. 21, pp. 682-697, 2008.
- (2008) Neural Netw. , vol.21 , pp. 682-697
- Peters, J.¹ Schaal, S.²

25
- 0003636089
- (Sep.). Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, U.K. [Online]
- G. A. Rummery and M. Niranjan. (1994, Sep.). On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, U.K. [Online]. Available at http://mi.eng.cam.ac.uk/ reports/svr-ftp/rummery-tr166.ps.Z.
- (1994) On-line Q-learning Using Connectionist Systems
- Rummery, G.A.¹ Niranjan, M.²

26
- 0029753630
- Reinforcement learning with replacing eligibility traces
- S. Singh and R. Sutton, "Reinforcement learning with replacing eligibility traces," Mach. Learning, vol. 22, pp. 123-158, 1996.
- (1996) Mach. Learning , vol.22 , pp. 123-158
- Singh, S.¹ Sutton, R.²

27
- 85153965130
- Reinforcement learning with soft state aggregation
- G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press
- S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 361-368.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

28
- 0004090962
- Ph.D. dissertation, Brown University, Providence, RI
- W. Smart, "Making reinforcement learning work on real robots," Ph.D. dissertation, Brown University, Providence, RI, 2002.
- (2002) Making Reinforcement Learning Work on Real Robots
- Smart, W.¹

29
- 0001898381
- Practical reinforcement learning in continuous spaces
- Stanford University, Stanford, CA, Jun. 29-Jul.
- W. D. Smart and L. P. Kaelbling, "Practical reinforcement learning in continuous spaces," in Proc. 17th Int. Conf. Mach. Learning, Stanford University, Stanford, CA, Jun. 29-Jul. 2, 2000, pp. 903-910.
- (2000) Proc. 17th Int. Conf. Mach. Learning , vol.2 , pp. 903-910
- Smart, W.D.¹ Kaelbling, L.P.²

30
- 27544506565
- Reinforcement learning for RoboCup soccer keepaway
- P. Stone, R. Sutton, and G. Kuhlmann, "Reinforcement learning for RoboCup soccer keepaway," Adaptive Behav., vol. 13, pp. 165-188, 2005.
- (2005) Adaptive Behav. , vol.13 , pp. 165-188
- Stone, P.¹ Sutton, R.² Kuhlmann, G.³

31
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Austin, TX, Jun. 21-23
- R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proc. 7th Int. Conf. Mach. Learning, Austin, TX, Jun. 21-23, 1990, pp. 216-224.
- (1990) Proc. 7th Int. Conf. Mach. Learning , pp. 216-224
- Sutton, R.S.¹

32
- 0004102479
- Cambridge, MA: MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

33
- 80053284668
- Dynastyle planning with linear function approximation and prioritized sweeping
- Helsinki, Finland, Jul. 9-12
- R. S. Sutton, Cs. Szepesvári, A. Geramifard, and M. H. Bowling, "Dynastyle planning with linear function approximation and prioritized sweeping," in Proc. 24th Conf. Uncertainty Artif. Intell., Helsinki, Finland, Jul. 9-12, 2008, pp. 528-536.
- (2008) Proc. 24th Conf. Uncertainty Artif. Intell. , pp. 528-536
- Sutton, R.S.¹ Szepesvári, C.S.² Geramifard, A.³ Bowling, M.H.⁴

34
- 14344263882
- Interpolation-based Q-learning
- Bannf, AB, Canada, Jul. 4-8
- Cs. Szepesvári and W. D. Smart, "Interpolation-based Q-learning," in Proc. 21st Int. Conf. Mach. Learning, Bannf, AB, Canada, Jul. 4-8, 2004, pp. 791-798.
- (2004) Proc. 21st Int. Conf. Mach. Learning , pp. 791-798
- Szepesvári, C.S.¹ Smart, W.D.²

35
- 40649111409
- Dynamic exploration in Q(λ)-learning
- Vancouver, BC, Canada, Jul. 16-21
- J. Van Ast and R. Babuska, "Dynamic exploration in Q(λ)-learning," in Proc. Int. Joint Conf. Neural Netw., Vancouver, BC, Canada, Jul. 16-21, 2006, pp. 41-46.
- (2006) Proc. Int. Joint Conf. Neural Netw. , pp. 41-46
- Van Ast, J.¹ Babuska, R.²

36
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learning, vol. 8, pp. 279-292, 1992.
- (1992) Mach. Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

37
- 71749106087
- Real-time reinforcement learning by sequential actor-critics and experience replay
- P. Wawrzynski, "Real-time reinforcement learning by sequential actor-critics and experience replay," Neural Netw., vol. 22, no. 10, pp. 1484-1497, 2009.
- (2009) Neural Netw. , vol.22 , Issue.10 , pp. 1484-1497
- Wawrzynski, P.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.