-
1
-
-
31844444663
-
Exploration and apprenticeship learning in reinforcement learning
-
ACM
-
Abbeel P., Ng A.Y. Exploration and apprenticeship learning in reinforcement learning. Proc. of the 22nd ICML 2005, 1-8. ACM.
-
(2005)
Proc. of the 22nd ICML
, pp. 1-8
-
-
Abbeel, P.1
Ng, A.Y.2
-
2
-
-
84857501996
-
Experience replay for real-time reinforcement learning control
-
Adam S., Busoniu L., Babuska R. Experience replay for real-time reinforcement learning control. IEEE Transactions on Systems, Man, and Cybernetics, Part C 2012, 42(2):201-212.
-
(2012)
IEEE Transactions on Systems, Man, and Cybernetics, Part C
, vol.42
, Issue.2
, pp. 201-212
-
-
Adam, S.1
Busoniu, L.2
Babuska, R.3
-
3
-
-
0020970738
-
Neuronlike adaptive elements that can learn difficult learning control problems
-
Barto A.G., Sutton R.S., Anderson C.W. Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 1983, 13:834-846.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.13
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
4
-
-
33750119752
-
On adaptive learning rate that guarantees convergence in feedforward networks
-
Behera L., Kumar S., Patnaik A. On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Transactions on Neural Networks 2006, 17(5):1116-1125.
-
(2006)
IEEE Transactions on Neural Networks
, vol.17
, Issue.5
, pp. 1116-1125
-
-
Behera, L.1
Kumar, S.2
Patnaik, A.3
-
5
-
-
70349984547
-
Natural actor-critic algorithms
-
Bhatnagar S., Sutton R., Ghavamzadeh M., Lee M. Natural actor-critic algorithms. Automatica 2009, 45:2471-2482.
-
(2009)
Automatica
, vol.45
, pp. 2471-2482
-
-
Bhatnagar, S.1
Sutton, R.2
Ghavamzadeh, M.3
Lee, M.4
-
7
-
-
0032649518
-
An analysis of experience replay in temporal difference learning
-
Cichosz P. An analysis of experience replay in temporal difference learning. Cybernetics and Systems 1999, 30:341-363.
-
(1999)
Cybernetics and Systems
, vol.30
, pp. 341-363
-
-
Cichosz, P.1
-
8
-
-
34548800682
-
Learning to control an octopus arm with gaussian process temporal difference methods.
-
Engel, Y., Szabó, P., & Volkinshtein, D. (2005). Learning to control an octopus arm with gaussian process temporal difference methods. In NIPS.
-
(2005)
In NIPS
-
-
Engel, Y.1
Szabó, P.2
Volkinshtein, D.3
-
9
-
-
33748998787
-
Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
-
George A.P., Powell W.B. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 2006, 65(1):167-198.
-
(2006)
Machine Learning
, vol.65
, Issue.1
, pp. 167-198
-
-
George, A.P.1
Powell, W.B.2
-
10
-
-
80053958320
-
Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
-
Hachiya H., Peters J., Sugiyama M. Reward-weighted regression with sample reuse for direct policy search in reinforcement learning. Neural Computation 2011, 23(11):2798-2832.
-
(2011)
Neural Computation
, vol.23
, Issue.11
, pp. 2798-2832
-
-
Hachiya, H.1
Peters, J.2
Sugiyama, M.3
-
11
-
-
0024137490
-
Increased rates of convergence through learning rate adaptation
-
Jacobs R.A. Increased rates of convergence through learning rate adaptation. Neural Networks 1988, 1(4):295-308.
-
(1988)
Neural Networks
, vol.1
, Issue.4
, pp. 295-308
-
-
Jacobs, R.A.1
-
12
-
-
69249225560
-
Neighborhood based modified backpropagation algorithm using adaptive learning parameters for training feedforward neural networks
-
Kathirvalavakumar T., Subavathi S.J. Neighborhood based modified backpropagation algorithm using adaptive learning parameters for training feedforward neural networks. Neurocomputing 2009, 72:3915-3921.
-
(2009)
Neurocomputing
, vol.72
, pp. 3915-3921
-
-
Kathirvalavakumar, T.1
Subavathi, S.J.2
-
13
-
-
0008336447
-
An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value functions.
-
Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value functions. In Proc. of the 15th ICML (pp. 278-286).
-
(1998)
In Proc. of the 15th ICML
, pp. 278-286
-
-
Kimura, H.1
Kobayashi, S.2
-
14
-
-
78049390740
-
Policy search for motor primitives in robotics
-
Kober J., Peters J. Policy search for motor primitives in robotics. Machine Learning 2011, 84(1-2):171-203.
-
(2011)
Machine Learning
, vol.84
, Issue.1-2
, pp. 171-203
-
-
Kober, J.1
Peters, J.2
-
16
-
-
84455205906
-
Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization.
-
IROS. San Francisco, USA.
-
Kormushev, P., Ugurlu, B., Calinon, S., Tsagarakis, N., & Caldwell, D.G. (2011). Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization. In Proc. IEEE/RSJ intl conf. on intelligent robots and systems, IROS (pp. 318-324). San Francisco, USA.
-
(2011)
In Proc. IEEE/RSJ intl conf. on intelligent robots and systems
, pp. 318-324
-
-
Kormushev, P.1
Ugurlu, B.2
Calinon, S.3
Tsagarakis, N.4
Caldwell, D.G.5
-
17
-
-
77955872473
-
Evolving neural networks in compressed weight space.
-
In Proceedings of the conference on genetic and evolutionary computation, GECCO-10
-
Koutnik, J., Gomez, F., & Schmidhuber, J. (2010). Evolving neural networks in compressed weight space. In Proceedings of the conference on genetic and evolutionary computation, GECCO-10 (pp. 619-626).
-
(2010)
, pp. 619-626
-
-
Koutnik, J.1
Gomez, F.2
Schmidhuber, J.3
-
19
-
-
33845545664
-
Field trials and testing of the octarm continuum manipulator.
-
In ICRA
-
McMahan, W., Chitrakaran, V.K., Csencsits, M.A., Dawson, D.M., Walker, I.D., & Jones, B.A. et al. (2006). Field trials and testing of the octarm continuum manipulator. In ICRA (pp. 2336-2341).
-
(2006)
, pp. 2336-2341
-
-
McMahan, W.1
Chitrakaran, V.K.2
Csencsits, M.A.3
Dawson, D.M.4
Walker, I.D.5
Jones, B.A.6
-
20
-
-
76649105482
-
Recursive adaptation of stepsize parameter for non-stationary environments
-
Noda I. Recursive adaptation of stepsize parameter for non-stationary environments. Principles of practice in multi-agent systems 2009, 525-533.
-
(2009)
Principles of practice in multi-agent systems
, pp. 525-533
-
-
Noda, I.1
-
21
-
-
84875903966
-
-
Octopus-sources
-
Octopus-sources (2006). http://www.cs.mcgill.ca/~dprecup/workshops/ICML06/Octopus/octopus-code-distribution.zip.
-
(2006)
-
-
-
22
-
-
34648858905
-
Bounds on sample size for policy evaluation in Markov environments.
-
on computational learning theory, Berlin: Springer.
-
Peshkin, L., & Mukherjee, S. (2001). Bounds on sample size for policy evaluation in Markov environments. In Proc. of the 14th annual conf. on computational learning theory, vol. 2111 (pp. 616-629). Berlin: Springer.
-
(2001)
In Proc. of the 14th annual conf.
, vol.2111
, pp. 616-629
-
-
Peshkin, L.1
Mukherjee, S.2
-
23
-
-
33646413135
-
Natural actor-critic
-
Springer-Verlag, Berlin Heidelberg
-
Peters J., Vijayakumar S., Schaal S. Natural actor-critic. Proc. of ECML 2005, 280-291. Springer-Verlag, Berlin Heidelberg.
-
(2005)
Proc. of ECML
, pp. 280-291
-
-
Peters, J.1
Vijayakumar, S.2
Schaal, S.3
-
26
-
-
84898942573
-
Online independent component analysis with local learning rate adaptation.
-
In Advances in NIPS
-
Schraudolph, N.N., & Giannakopoulos, X. (2000). Online independent component analysis with local learning rate adaptation. In Advances in NIPS, vol. 12 (pp. 789-795).
-
(2000)
, vol.12
, pp. 789-795
-
-
Schraudolph, N.N.1
Giannakopoulos, X.2
-
27
-
-
52149116388
-
Fast online policy gradient learning with SMD gain vector adaptation.
-
In Advances in NIPS
-
Schraudolph, N.N., Yu, J., & Aberdeen, D. (2006). Fast online policy gradient learning with SMD gain vector adaptation. In Advances in NIPS, vol. 18 (pp. 1185-1192).
-
(2006)
, vol.18
, pp. 1185-1192
-
-
Schraudolph, N.N.1
Yu, J.2
Aberdeen, D.3
-
29
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Morgan Kaufmann
-
Sutton R.S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proc. of the 7th ICML 1990, 216-224. Morgan Kaufmann.
-
(1990)
Proc. of the 7th ICML
, pp. 216-224
-
-
Sutton, R.S.1
-
30
-
-
84875887627
-
-
Gain adaptation beats least squares? In YWALS
-
Sutton, R.R. (1992a). Gain adaptation beats least squares? In YWALS (pp. 161-166).
-
(1992)
, pp. 161-166
-
-
Sutton, R.R.1
-
31
-
-
0026971570
-
Adapting bias by gradient descent: an incremental version of delta-bar-delta.
-
Sutton, R.S. (1992b). Adapting bias by gradient descent: an incremental version of delta-bar-delta. In Proc. of the 10th NCAI (pp. 171-176).
-
(1992)
In Proc. of the 10th NCAI
, pp. 171-176
-
-
Sutton, R.S.1
-
33
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
MIT Press
-
Sutton R.S., McAllester D., Singh S., Mansour Y. Policy gradient methods for reinforcement learning with function approximation. Advances in NIPS, vol. 12 2000, 1057-1063. MIT Press.
-
(2000)
Advances in NIPS, vol. 12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
34
-
-
71749106087
-
Real-time reinforcement learning by sequential actor-critics and experience replay
-
Wawrzyński P. Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks 2009, 22:1484-1497.
-
(2009)
Neural Networks
, vol.22
, pp. 1484-1497
-
-
Wawrzyński, P.1
-
35
-
-
85180619113
-
-
Fixed point method of step-size estimation for on-line neural network training. In IJCNN
-
Wawrzyński, P. (2010). Fixed point method of step-size estimation for on-line neural network training. In IJCNN (pp. 2012-2017).
-
(2010)
, pp. 2012-2017
-
-
Wawrzyński, P.1
-
36
-
-
80052930723
-
Fixed point method for autonomous on-line neural network training
-
Wawrzyński P., Papis B. Fixed point method for autonomous on-line neural network training. Neurocomputing 2011, 74:2893-2905.
-
(2011)
Neurocomputing
, vol.74
, pp. 2893-2905
-
-
Wawrzyński, P.1
Papis, B.2
-
37
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 1992, 229-256.
-
(1992)
Machine Learning
, pp. 229-256
-
-
Williams, R.J.1
-
39
-
-
23044435398
-
Dynamic model of the octopus arm. i. biomechanics of the octopus reaching movement
-
Yekutieli Y., Sagiv-Zohar R., Aharonov R., Engel Y., Hochner B., Flash T. Dynamic model of the octopus arm. i. biomechanics of the octopus reaching movement. Journal of Neurophysiology 2005, 94(2):1443-1458.
-
(2005)
Journal of Neurophysiology
, vol.94
, Issue.2
, pp. 1443-1458
-
-
Yekutieli, Y.1
Sagiv-Zohar, R.2
Aharonov, R.3
Engel, Y.4
Hochner, B.5
Flash, T.6
|