-
1
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
Tahoe City, CA, Jul. 9-12
-
L. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. 12th Int. Conf. Mach. Learning, Tahoe City, CA, Jul. 9-12, 1995, pp. 30-37.
-
(1995)
Proc. 12th Int. Conf. Mach. Learning
, pp. 30-37
-
-
Baird, L.1
-
2
-
-
77953724549
-
An application of reinforcement learning for efficient spectrum usage in next-generation mobile cellular networks
-
Jul.
-
F. Bernardo, R. Agustí, J. Pérez-Romero, and O. Sallent, "An application of reinforcement learning for efficient spectrum usage in next-generation mobile cellular networks," IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 40, no. 4, pp. 477-484, Jul. 2010.
-
(2010)
IEEE Trans. Syst., Man, Cybern. C: Appl. Rev.
, vol.40
, Issue.4
, pp. 477-484
-
-
Bernardo, F.1
Agustí, R.2
Pérez-Romero, J.3
Sallent, O.4
-
6
-
-
85046476577
-
-
New York/Boca Raton, FL: Taylor & Francis/CRC
-
L. Buşoniu, R. Babuška, B. De Schutter, and D. Ernst, Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering). New York/Boca Raton, FL: Taylor & Francis/CRC, 2010.
-
(2010)
Reinforcement Learning and Dynamic Programming Using Function Approximators (Automation and Control Engineering)
-
-
Buşoniu, L.1
Babuška, R.2
De Schutter, B.3
Ernst, D.4
-
7
-
-
77957782880
-
Online leastsquares policy iteration for reinforcement learning control
-
Baltimore, MD, Jun. 30-Jul.
-
L. Buşoniu, D. Ernst, B. De Schutter, and R. Babuška, "Online leastsquares policy iteration for reinforcement learning control," in Proc. Amer. Control Conf., Baltimore, MD, Jun. 30-Jul. 2, 2010, pp. 486-491.
-
(2010)
Proc. Amer. Control Conf.
, vol.2
, pp. 486-491
-
-
Buşoniu, L.1
Ernst, D.2
De Schutter, B.3
Babuška, R.4
-
8
-
-
0032649518
-
An analysis of experience replay in temporal difference learning
-
P. Cichosz, "An analysis of experience replay in temporal difference learning," Cybern. Syst., vol. 30, pp. 341-363, 1999.
-
(1999)
Cybern. Syst.
, vol.30
, pp. 341-363
-
-
Cichosz, P.1
-
9
-
-
56749173285
-
Efficient experience reuse in non-Markovian environments
-
Tokyo, Japan, Aug. 20-22
-
L. T. Dung, T. Komeda, and M. Takagi, "Efficient experience reuse in non-Markovian environments," in Proc. Int. Conf. Instrum., Control Inf. Technol., Tokyo, Japan, Aug. 20-22, 2008, pp. 3327-3332.
-
(2008)
Proc. Int. Conf. Instrum., Control Inf. Technol.
, pp. 3327-3332
-
-
Dung, L.T.1
Komeda, T.2
Takagi, M.3
-
10
-
-
1442288723
-
-
Ph.D. dissertation, University of Liége, Liége, Belgium, Mar.
-
D. Ernst, "Near optimal closed-loop control. Application to electric power systems," Ph.D. dissertation, University of Liége, Liége, Belgium, Mar. 2003.
-
(2003)
Near Optimal Closed-loop Control. Application to Electric Power Systems
-
-
Ernst, D.1
-
11
-
-
21844465127
-
Tree-based batch mode reinforcement learning
-
D. Ernst, P. Geurts, and L. Wehenkel, "Tree-based batch mode reinforcement learning," J. Mach. Learning Res., vol. 6, pp. 503-556, 2005.
-
(2005)
J. Mach. Learning Res.
, vol.6
, pp. 503-556
-
-
Ernst, D.1
Geurts, P.2
Wehenkel, L.3
-
12
-
-
39649096058
-
Clinical data based optimal STI strategies for HIV: A reinforcement learning approach
-
San Diego, CA, Dec. 13-15
-
D. Ernst, G.-B. Stan, J. Gonçalves, and L.Wehenkel, "Clinical data based optimal STI strategies for HIV: A reinforcement learning approach," in Proc. 45th IEEE Conf. Decis. Control, San Diego, CA, Dec. 13-15, 2006, pp. 667-672.
-
(2006)
Proc. 45th IEEE Conf. Decis. Control
, pp. 667-672
-
-
Ernst, D.1
Stan, G.-B.2
Gonçalves, J.3
Wehenkel, L.4
-
13
-
-
34447332815
-
Integration of coordination architecture and behavior fuzzy learning in quadruped walking robots
-
Jul.
-
D. Gu and H. Hu, "Integration of coordination architecture and behavior fuzzy learning in quadruped walking robots," IEEE Trans. Syst., Man, Cybern. C: Appl. Rev., vol. 37, no. 4, pp. 670-681, Jul. 2007.
-
(2007)
IEEE Trans. Syst., Man, Cybern. C: Appl. Rev.
, vol.37
, Issue.4
, pp. 670-681
-
-
Gu, D.1
Hu, H.2
-
14
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Comput., vol. 6, no. 6, pp. 1185-1201, 1994.
-
(1994)
Neural Comput.
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
15
-
-
0032140718
-
Fuzzy inference system learning by reinforcement methods
-
Aug.
-
L. Jouffe, "Fuzzy inference system learning by reinforcement methods," IEEE Trans. Syst.,Man, Cybern. C: Appl. Rev., vol. 28, no. 3, pp. 338-355, Aug. 1998.
-
(1998)
IEEE Trans. Syst.,Man, Cybern. C: Appl. Rev.
, vol.28
, Issue.3
, pp. 338-355
-
-
Jouffe, L.1
-
16
-
-
60349130974
-
Batch reinforcement learning in a complex domain
-
Honolulu, HI, May 14-18
-
S. Kalyanakrishnan and P. Stone, "Batch reinforcement learning in a complex domain," in Proc. 6th Int. Conf. Auton. Agents Multi-Agent Syst., Honolulu, HI, May 14-18, 2007, pp. 650-657.
-
(2007)
Proc. 6th Int. Conf. Auton. Agents Multi-Agent Syst.
, pp. 650-657
-
-
Kalyanakrishnan, S.1
Stone, P.2
-
17
-
-
4644323293
-
Least-squares policy iteration
-
M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Mach. Learning Res., vol. 4, pp. 1107-1149, 2003.
-
(2003)
J. Mach. Learning Res.
, vol.4
, pp. 1107-1149
-
-
Lagoudakis, M.G.1
Parr, R.2
-
18
-
-
84899834143
-
Online exploration in leastsquares policy iteration
-
Budapest, Hungary, May 10-15
-
L. Li, M. L. Littman, and C. R. Mansley, "Online exploration in leastsquares policy iteration," in Proc. 8th Int. Joint Conf. Auton. Agents Multiagent Syst., Budapest, Hungary, May 10-15, 2009, vol. 2, pp. 733-739.
-
(2009)
Proc. 8th Int. Joint Conf. Auton. Agents Multiagent Syst.
, vol.2
, pp. 733-739
-
-
Li, L.1
Littman, M.L.2
Mansley, C.R.3
-
19
-
-
0000123778
-
Self-improving reactive agents based on reinforcement learning, planning and teaching
-
(Special issue on reinforcement learning), Aug.
-
L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, planning and teaching," Mach. Learning, vol. 8, no. 3/4, (Special issue on reinforcement learning), pp. 293-321, Aug. 1992.
-
(1992)
Mach. Learning
, vol.8
, Issue.3-4
, pp. 293-321
-
-
Lin, L.-J.1
-
20
-
-
56449091120
-
An analysis of reinforcement learning with function approximation
-
Helsinki, Finland, Jul. 5-9
-
F. S. Melo, S. P. Meyn, and M. I. Ribeiro, "An analysis of reinforcement learning with function approximation," in Proc. 25th Int. Conf. Mach. Learning, Helsinki, Finland, Jul. 5-9, 2008, pp. 664-671.
-
(2008)
Proc. 25th Int. Conf. Mach. Learning
, pp. 664-671
-
-
Melo, F.S.1
Meyn, S.P.2
Ribeiro, M.I.3
-
21
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less time
-
A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Mach. Learning, vol. 13, pp. 103-130, 1993.
-
(1993)
Mach. Learning
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
22
-
-
0035978635
-
Modular Q-learning based multi-agent cooperation for robot soccer
-
K. Park, Y. Kim, and J. Kim, "Modular Q-learning based multi-agent cooperation for robot soccer," Robot. Auton. Syst., vol. 35, pp. 109-122, 2001.
-
(2001)
Robot. Auton. Syst.
, vol.35
, pp. 109-122
-
-
Park, K.1
Kim, Y.2
Kim, J.3
-
23
-
-
84898960655
-
A convergent form of approximate policy iteration
-
S. Becker, S. Thrun, and K. Obermayer, Eds. Cambridge,MA: MIT Press
-
T. J. Perkins and D. Precup, "A convergent form of approximate policy iteration," in Advances in Neural Information Processing Systems, vol. 15, S. Becker, S. Thrun, and K. Obermayer, Eds. Cambridge,MA: MIT Press, 2003, pp. 1595-1602.
-
(2003)
Advances in Neural Information Processing Systems
, vol.15
, pp. 1595-1602
-
-
Perkins, T.J.1
Precup, D.2
-
24
-
-
44949241322
-
Reinforcement learning of motor skills with policy gradients
-
J. Peters and S. Schaal, "Reinforcement learning of motor skills with policy gradients," Neural Netw., vol. 21, pp. 682-697, 2008.
-
(2008)
Neural Netw.
, vol.21
, pp. 682-697
-
-
Peters, J.1
Schaal, S.2
-
25
-
-
0003636089
-
-
(Sep.). Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, U.K. [Online]
-
G. A. Rummery and M. Niranjan. (1994, Sep.). On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, U.K. [Online]. Available at http://mi.eng.cam.ac.uk/ reports/svr-ftp/rummery-tr166.ps.Z.
-
(1994)
On-line Q-learning Using Connectionist Systems
-
-
Rummery, G.A.1
Niranjan, M.2
-
26
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
S. Singh and R. Sutton, "Reinforcement learning with replacing eligibility traces," Mach. Learning, vol. 22, pp. 123-158, 1996.
-
(1996)
Mach. Learning
, vol.22
, pp. 123-158
-
-
Singh, S.1
Sutton, R.2
-
27
-
-
85153965130
-
Reinforcement learning with soft state aggregation
-
G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press
-
S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 361-368.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 361-368
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
28
-
-
0004090962
-
-
Ph.D. dissertation, Brown University, Providence, RI
-
W. Smart, "Making reinforcement learning work on real robots," Ph.D. dissertation, Brown University, Providence, RI, 2002.
-
(2002)
Making Reinforcement Learning Work on Real Robots
-
-
Smart, W.1
-
29
-
-
0001898381
-
Practical reinforcement learning in continuous spaces
-
Stanford University, Stanford, CA, Jun. 29-Jul.
-
W. D. Smart and L. P. Kaelbling, "Practical reinforcement learning in continuous spaces," in Proc. 17th Int. Conf. Mach. Learning, Stanford University, Stanford, CA, Jun. 29-Jul. 2, 2000, pp. 903-910.
-
(2000)
Proc. 17th Int. Conf. Mach. Learning
, vol.2
, pp. 903-910
-
-
Smart, W.D.1
Kaelbling, L.P.2
-
30
-
-
27544506565
-
Reinforcement learning for RoboCup soccer keepaway
-
P. Stone, R. Sutton, and G. Kuhlmann, "Reinforcement learning for RoboCup soccer keepaway," Adaptive Behav., vol. 13, pp. 165-188, 2005.
-
(2005)
Adaptive Behav.
, vol.13
, pp. 165-188
-
-
Stone, P.1
Sutton, R.2
Kuhlmann, G.3
-
31
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Austin, TX, Jun. 21-23
-
R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proc. 7th Int. Conf. Mach. Learning, Austin, TX, Jun. 21-23, 1990, pp. 216-224.
-
(1990)
Proc. 7th Int. Conf. Mach. Learning
, pp. 216-224
-
-
Sutton, R.S.1
-
33
-
-
80053284668
-
Dynastyle planning with linear function approximation and prioritized sweeping
-
Helsinki, Finland, Jul. 9-12
-
R. S. Sutton, Cs. Szepesvári, A. Geramifard, and M. H. Bowling, "Dynastyle planning with linear function approximation and prioritized sweeping," in Proc. 24th Conf. Uncertainty Artif. Intell., Helsinki, Finland, Jul. 9-12, 2008, pp. 528-536.
-
(2008)
Proc. 24th Conf. Uncertainty Artif. Intell.
, pp. 528-536
-
-
Sutton, R.S.1
Szepesvári, C.S.2
Geramifard, A.3
Bowling, M.H.4
-
34
-
-
14344263882
-
Interpolation-based Q-learning
-
Bannf, AB, Canada, Jul. 4-8
-
Cs. Szepesvári and W. D. Smart, "Interpolation-based Q-learning," in Proc. 21st Int. Conf. Mach. Learning, Bannf, AB, Canada, Jul. 4-8, 2004, pp. 791-798.
-
(2004)
Proc. 21st Int. Conf. Mach. Learning
, pp. 791-798
-
-
Szepesvári, C.S.1
Smart, W.D.2
-
35
-
-
40649111409
-
Dynamic exploration in Q(λ)-learning
-
Vancouver, BC, Canada, Jul. 16-21
-
J. Van Ast and R. Babuska, "Dynamic exploration in Q(λ)-learning," in Proc. Int. Joint Conf. Neural Netw., Vancouver, BC, Canada, Jul. 16-21, 2006, pp. 41-46.
-
(2006)
Proc. Int. Joint Conf. Neural Netw.
, pp. 41-46
-
-
Van Ast, J.1
Babuska, R.2
-
37
-
-
71749106087
-
Real-time reinforcement learning by sequential actor-critics and experience replay
-
P. Wawrzynski, "Real-time reinforcement learning by sequential actor-critics and experience replay," Neural Netw., vol. 22, no. 10, pp. 1484-1497, 2009.
-
(2009)
Neural Netw.
, vol.22
, Issue.10
, pp. 1484-1497
-
-
Wawrzynski, P.1
|