-
1
-
-
33846054812
-
Gradient descent for general reinforcement learning
-
MIT, Cambridge, MA
-
Baird, L.C., Moore, A.W.: Gradient descent for general reinforcement learning. In: Advances in Neural Information System, vol.11, MIT, Cambridge, MA (1995)
-
(1995)
Advances in Neural Information System
, vol.11
-
-
Baird, L.C.1
Moore, A.W.2
-
2
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. SMC 13(5), 834-846 (1983)
-
(1983)
IEEE Trans. SMC
, vol.13
, Issue.5
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
3
-
-
0013535965
-
Infinite-horizon policy-gradient estimation
-
Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15, 319-350 (2001)
-
(2001)
J. Artif. Intell. Res
, vol.15
, pp. 319-350
-
-
Baxter, J.1
Bartlett, P.L.2
-
4
-
-
0026923465
-
Learning and tuning fuzzy logic controllers through reinforcements
-
Berenji, H.R., Khedkar, P.: Learning and tuning fuzzy logic controllers through reinforcements. IEEE Trans. Neural Netw. 3(5), 724-740 (1992)
-
(1992)
IEEE Trans. Neural Netw
, vol.3
, Issue.5
, pp. 724-740
-
-
Berenji, H.R.1
Khedkar, P.2
-
5
-
-
0041877717
-
A convergent actor critic based fuzzy reinforcement learning algorithm with application to power management of wireless transmitters
-
Berenji, H.R., Vengerov, D.: A convergent actor critic based fuzzy reinforcement learning algorithm with application to power management of wireless transmitters. IEEE Trans. Fuzzy Systems. 11(4), 478-485 (2003)
-
(2003)
IEEE Trans. Fuzzy Systems
, vol.11
, Issue.4
, pp. 478-485
-
-
Berenji, H.R.1
Vengerov, D.2
-
6
-
-
0036531878
-
Multiagent learning using a variable learning rate
-
Bowling, M., Veloso, M.: Multiagent learning using a variable learning rate. Artif. Intell. 136, 215-150 (2002)
-
(2002)
Artif. Intell
, vol.136
, pp. 215-150
-
-
Bowling, M.1
Veloso, M.2
-
7
-
-
0347410594
-
Using policy gradient reinforcement learning on autonomous robot controllers
-
Las Vegas, Nevada, pp
-
Grudic, G.Z., Kumar, V., Ungar, L.: Using policy gradient reinforcement learning on autonomous robot controllers. In: Proceedings of IEEE-RSJ International Conference on Intelligent Robots and Systems(IROS), Las Vegas, Nevada, pp. 406-411 (2003)
-
(2003)
Proceedings of IEEE-RSJ International Conference on Intelligent Robots and Systems(IROS)
, pp. 406-411
-
-
Grudic, G.Z.1
Kumar, V.2
Ungar, L.3
-
8
-
-
4644369748
-
Nash Q-learning for general-sum stochastic games
-
Hu, J., Wellman, M.P.: Nash Q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4, 1039-1069 (2003)
-
(2003)
J. Mach. Learn. Res
, vol.4
, pp. 1039-1069
-
-
Hu, J.1
Wellman, M.P.2
-
9
-
-
33846076026
-
Reinformcenent leanring by stochastic hill climbing on discounted reward
-
California
-
Kimura, H., Yamamura, M., Kobayashi, S.: Reinformcenent leanring by stochastic hill climbing on discounted reward. In: Proceedings of the 12th International Conference Machine Learning, pp. 152-160 California (1995)
-
(1995)
Proceedings of the 12th International Conference Machine Learning
, pp. 152-160
-
-
Kimura, H.1
Yamamura, M.2
Kobayashi, S.3
-
13
-
-
0001547175
-
-
Littman, M.L.: Value-function reinforcement learning in Markov games. Cogn. Syst. Res. 2(1), 55-66 (2000)
-
Littman, M.L.: Value-function reinforcement learning in Markov games. Cogn. Syst. Res. 2(1), 55-66 (2000)
-
-
-
-
14
-
-
33846070940
-
Flcoking for multi-agent dynamic systems: Algorithms and theory
-
Olfati-Saber, R.: Flcoking for multi-agent dynamic systems: Algorithms and theory. IEEE Trans. Automat. Contr. 19(6), 933-941 (2006)
-
(2006)
IEEE Trans. Automat. Contr
, vol.19
, Issue.6
, pp. 933-941
-
-
Olfati-Saber, R.1
-
15
-
-
0012646255
-
Learning to cooperate via policy search
-
Peshkin, L., Kim, K., Meuleau, N., Kaelblingn, L.P.: Learning to cooperate via policy search. In: Proceedings of the 6th International Conference on uncertainty in artificial intelligence, pp. 307-314 (2000)
-
(2000)
Proceedings of the 6th International Conference on uncertainty in artificial intelligence
, pp. 307-314
-
-
Peshkin, L.1
Kim, K.2
Meuleau, N.3
Kaelblingn, L.P.4
-
16
-
-
0023379184
-
Flocks, herds, and schools: A distributed behavioural model
-
Reynolds, C.W.: Flocks, herds, and schools: A distributed behavioural model. Comput. Graph. 21(4), 25-34 (1987)
-
(1987)
Comput. Graph
, vol.21
, Issue.4
, pp. 25-34
-
-
Reynolds, C.W.1
-
17
-
-
0001644761
-
Nash convergence of gradient dynamics in general-sum games
-
Stanford University, Stanford, CA
-
Singh, S., Kearns, M., Mansour, Y.: Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the 16th Annual Conference on Uncertainty in Artificial Intelligence (UAI), pp. 541-548 Stanford University, Stanford, CA (2000)
-
(2000)
Proceedings of the 16th Annual Conference on Uncertainty in Artificial Intelligence (UAI)
, pp. 541-548
-
-
Singh, S.1
Kearns, M.2
Mansour, Y.3
-
18
-
-
84898939480
-
-
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process, syst. 12, 1057-1063 (2000)(MIT)
-
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. Adv. Neural Inf. Process, syst. 12, 1057-1063 (2000)(MIT)
-
-
-
-
20
-
-
13444294406
-
A multi-agent policy-gradient approach to network routing
-
Williamstown MA, pp, July
-
Tao, N., Baxter, J., Weaver, L.: A multi-agent policy-gradient approach to network routing. In: Proceedings of 18th International Conference on Machine Learning, Williamstown MA, pp. 553-560, July 2001
-
(2001)
Proceedings of 18th International Conference on Machine Learning
, pp. 553-560
-
-
Tao, N.1
Baxter, J.2
Weaver, L.3
-
21
-
-
14044262287
-
Stochastic policy gradient reinforcement learning on a simple 3D biped
-
Senda Japan, pp, October
-
Tedrake, R., Zhang, T., Seung, H.: Stochastic policy gradient reinforcement learning on a simple 3D biped. In: Proceedings of IEEE-RSJ International Conference on Intelligent Robots and Systems(IROS), Senda Japan, pp. 2849-2854, October 2004
-
(2004)
Proceedings of IEEE-RSJ International Conference on Intelligent Robots and Systems(IROS)
, pp. 2849-2854
-
-
Tedrake, R.1
Zhang, T.2
Seung, H.3
-
22
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
William, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8, 229-256 (1992)
-
(1992)
Mach. Learn
, vol.8
, pp. 229-256
-
-
William, R.J.1
-
23
-
-
33846062254
-
Nonsingular formation control of cooperative mobile robots via feedback linearization
-
Edmonton, Canada, pp, August
-
Yang, E., Gu, D., Hu, H.: Nonsingular formation control of cooperative mobile robots via feedback linearization. In: Proceedings of IEEE-RSJ International Conference on Intelligent Robots and Systems(IROS), Edmonton, Canada, pp. 3652-3657, August 2005
-
(2005)
Proceedings of IEEE-RSJ International Conference on Intelligent Robots and Systems(IROS)
, pp. 3652-3657
-
-
Yang, E.1
Gu, D.2
Hu, H.3
|