-
3
-
-
84995343329
-
Reinforcement learning with long short-term memory
-
Bakker, B. Reinforcement learning with long short-term memory. In NIPS, pp. 1475-1482, 2001.
-
(2001)
NIPS
, pp. 1475-1482
-
-
Bakker, B.1
-
4
-
-
84879976780
-
The Arcade Learning Environment: An evaluation platform for general agents
-
Bellemare, M. G., Naddaf, Y., Veness, J., and Bowling, M. The Arcade Learning Environment: An evaluation platform for general agents. J. Artif. Intell. Res., 47:253-279, 2013.
-
(2013)
J. Artif. Intell. Res.
, vol.47
, pp. 253-279
-
-
Bellemare, M.G.1
Naddaf, Y.2
Veness, J.3
Bowling, M.4
-
6
-
-
0029509952
-
Neuro-dynamic programming: An overview
-
Bertsekas, Dimitri P and Tsitsiklis, John N. Neuro-dynamic programming: an overview. In CDC, pp. 560-564, 1995.
-
(1995)
CDC
, pp. 560-564
-
-
Bertsekas, D.P.1
Tsitsiklis, J.N.2
-
10
-
-
84899800132
-
Policy evaluation with temporal differences: A survey and comparison
-
Dann, C, Neumann, G., and Peters, J. Policy evaluation with temporal differences: A survey and comparison. J. Mach. Learn. Res., 15(1):809-883, 2014.
-
(2014)
J. Mach. Learn. Res.
, vol.15
, Issue.1
, pp. 809-883
-
-
Dann, C.1
Neumann, G.2
Peters, J.3
-
11
-
-
84962234361
-
-
Degris, T., Bechu, J., White, A., Modayil, J., Pilarski, P. M., and Denk, C. RLPark. http://rlpark.github.io, 2013.
-
(2013)
RLPark
-
-
Degris, T.1
Bechu, J.2
White, A.3
Modayil, J.4
Pilarski, P.M.5
Denk, C.6
-
12
-
-
84903590417
-
A survey on policy search for robotics, foundations and trends in robotics
-
Deisenroth, M. P., Neumann, G., and Peters, J. A survey on policy search for robotics, foundations and trends in robotics. Found. Trends Robotics, 2(1-2):1-142, 2013.
-
(2013)
Found. Trends Robotics
, vol.2
, Issue.1-2
, pp. 1-142
-
-
Deisenroth, M.P.1
Neumann, G.2
Peters, J.3
-
13
-
-
0028605089
-
Swinging up the Acrobot: An example of intelligent control
-
DeJong, G. and Spong, M. W. Swinging up the Acrobot: An example of intelligent control. In ACC, pp. 2158-2162, 1994.
-
(1994)
ACC
, pp. 2158-2162
-
-
DeJong, G.1
Spong, M.W.2
-
14
-
-
85198028989
-
ImageNet: A large-scale hierarchical image database
-
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In CVPR, pp. 248-255, 2009.
-
(2009)
CVPR
, pp. 248-255
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
15
-
-
0002278788
-
Hierarchical reinforcement learning with the MAXQ value function decomposition
-
Dietterich, T. G. Hierarchical reinforcement learning with the MAXQ value function decomposition. J. Artif. Intell. Res, 13:227-303, 2000.
-
(2000)
J. Artif. Intell. Res
, vol.13
, pp. 227-303
-
-
Dietterich, T.G.1
-
17
-
-
84907654870
-
The reinforcement learning competition 2014
-
Dimitrakakis, Christos, Li, Guangliang, and Tziortziotis, Nikoalos. The reinforcement learning competition 2014. AI Magazine, 35(3):61-65, 2014.
-
(2014)
AI Magazine
, vol.35
, Issue.3
, pp. 61-65
-
-
Dimitrakakis, C.1
Li, G.2
Tziortziotis, N.3
-
18
-
-
0348221772
-
Error decorrelation: A technique for matching a class of functions
-
Donaldson, P. E. K. Error decorrelation: a technique for matching a class of functions. In Proc. 3th Intl. Conf. Medical Electronics, pp. 173-178, 1960.
-
(1960)
Proc. 3th Intl. Conf. Medical Electronics
, pp. 173-178
-
-
Donaldson, P.E.K.1
-
19
-
-
0033629916
-
Reinforcement learning in continuous time and space
-
Doya, K. Reinforcement learning in continuous time and space. Neural Comput., 12(1):219-245, 2000.
-
(2000)
Neural Comput.
, vol.12
, Issue.1
, pp. 219-245
-
-
Doya, K.1
-
20
-
-
70349417489
-
Reinforcement learning benchmarks and bake-offs II
-
Dutech, Alain, Edmunds, Timothy, Kok, Jelle, Lagoudakis, Michail, Littman, Michael, Riedmiller, Martin, Russell, Bryan, Scherrer, Bruno, Sutton, Richard, Timmer, Stephan, et al. Reinforcement learning benchmarks and bake-offs ii. Advances in Neural Information Processing Systems (NIPS), 17, 2005.
-
(2005)
Advances in Neural Information Processing Systems (NIPS)
, vol.17
-
-
Dutech, A.1
Edmunds, T.2
Kok, J.3
Lagoudakis, M.4
Littman, M.5
Riedmiller, M.6
Russell, B.7
Scherrer, B.8
Sutton, R.9
Timmer, S.10
-
21
-
-
84891539169
-
Infinite horizon model predictive control for nonlinear periodic tasks
-
Erez, Tom, Tassa, Yuval, and Todorov. Emanuel. Infinite horizon model predictive control for nonlinear periodic tasks. Manuscript under review, 4, 2011.
-
(2011)
Manuscript Under Review
, pp. 4
-
-
Erez, T.1
Tassa, Y.2
Emanuel, T.3
-
22
-
-
77951298115
-
The pascal visual object classes (VOC) challenge
-
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., and Zisserman, A. The pascal visual object classes (VOC) challenge. Int. J. Comput. Vision, 88(2):303-338, 2010.
-
(2010)
Int. J. Comput. Vision
, vol.88
, Issue.2
, pp. 303-338
-
-
Everingham, M.1
Van Gool, L.2
Williams, C.K.I.3
Winn, J.4
Zisserman, A.5
-
23
-
-
33144466753
-
One-shot learning of object categories
-
Fei-Fei, L., Fergus, R., and Perona, P. One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell, 28(4):594-611, 2006.
-
(2006)
IEEE Trans. Pattern Anal. Mach. Intell
, vol.28
, Issue.4
, pp. 594-611
-
-
Fei-Fei, L.1
Fergus, R.2
Perona, P.3
-
24
-
-
0017943442
-
Computer control of a double inverted pendulum
-
Furuta, K., Okutani, T., and Sone, H. Computer control of a double inverted pendulum. Comput. Electr. Eng., 5(1):67-84, 1978.
-
(1978)
Comput. Electr. Eng.
, vol.5
, Issue.1
, pp. 67-84
-
-
Furuta, K.1
Okutani, T.2
Sone, H.3
-
25
-
-
6344222337
-
DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1
-
Garofolo, J. S., Lamel, L. R, Fisher, W. M., Fiscus, J. G., and Pallett, D. S. DARPA TIMIT acoustic-phonetic continuous speech corpus CD-ROM. NIST speech disc 1-1.1. NASA STI/Recon Technical Report N, 93, 1993.
-
(1993)
NASA STI/Recon Technical Report N, 93
-
-
Garofolo, J.S.1
Lamel, L.R.2
Fisher, W.M.3
Fiscus, J.G.4
Pallett, D.S.5
-
26
-
-
85016587886
-
SWITCH-BOARD: Telephone speech corpus for research and development
-
Godfrey, J. J., Holliman, E. C, and McDaniel, J. SWITCH-BOARD: Telephone speech corpus for research and development. In ICASSP, pp. 517-520, 1992.
-
(1992)
ICASSP
, pp. 517-520
-
-
Godfrey, J.J.1
Holliman, E.C.2
McDaniel, J.3
-
27
-
-
15744370759
-
2-d pole balancing with recurrent evolutionary networks
-
Gomez, F. and Miikkulainen, R. 2-d pole balancing with recurrent evolutionary networks. In ICANN, pp. 425-430. 1998.
-
(1998)
ICANN
, pp. 425-430
-
-
Gomez, F.1
Miikkulainen, R.2
-
28
-
-
84937779024
-
Deep learning for real-time Atari game play using offline montecarlo tree search planning
-
Guo, X., Singh, S., Lee, H., Lewis, R. L., and Wang, X. Deep learning for real-time Atari game play using offline montecarlo tree search planning. In NIPS, pp. 3338-3346. 2014.
-
(2014)
NIPS
, pp. 3338-3346
-
-
Guo, X.1
Singh, S.2
Lee, H.3
Lewis, R.L.4
Wang, X.5
-
29
-
-
0035377566
-
Completely derandomized self-adaptation in evolution strategies
-
Hansen, N. and Ostermeier, A. Completely derandomized self-adaptation in evolution strategies. Evol. Comput., 9(2):159-195, 2001.
-
(2001)
Evol. Comput.
, vol.9
, Issue.2
, pp. 159-195
-
-
Hansen, N.1
Ostermeier, A.2
-
30
-
-
84998919856
-
-
arXiv: 1512.04455
-
Heess, N., Hunt, J., Lillicrap, T., and Silver, D. Memory-based control with recurrent neural networks. arXiv: 1512.04455, 2015a.
-
(2015)
Memory-based Control with Recurrent Neural Networks
-
-
Heess, N.1
Hunt, J.2
Lillicrap, T.3
Silver, D.4
-
31
-
-
84965103751
-
Learning continuous control policies by stochastic value gradients
-
Heess, N., Wayne, G., Silver, D., Lillicrap, T., Erez, T, and Tassa, T. Learning continuous control policies by stochastic value gradients. In NIPS, pp. 2926-2934. 2015b.
-
(2015)
NIPS
, pp. 2926-2934
-
-
Heess, N.1
Wayne, G.2
Silver, D.3
Lillicrap, T.4
Erez, T.5
Tassa, T.6
-
32
-
-
84905454735
-
The open-source TEXPLORE code release for reinforcement learning on robots
-
Hester, T. and Stone, P. The open-source TEXPLORE code release for reinforcement learning on robots. In RoboCup 2013: Robot World Cup XVII, pp. 536-543. 2013.
-
(2013)
RoboCup 2013: Robot World Cup XVII
, pp. 536-543
-
-
Hester, T.1
Stone, P.2
-
33
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition
-
Hinton, G., Deng, L., Yu, D., Mohamed, A.-R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Dahl, T. S. G., and Kingsbury, B. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag, 29(6):82-97, 2012.
-
(2012)
IEEE Signal Process. Mag
, vol.29
, Issue.6
, pp. 82-97
-
-
Hinton, G.D.1
Deng, L.2
Yu, D.3
Mohamed, A.-R.4
Jaitly, N.5
Senior, A.6
Vanhoucke, V.7
Nguyen, P.8
Dahl, T.S.G.9
Kingsbury, B.10
-
36
-
-
84898930479
-
A natural policy gradient
-
Kakade, S. M. A natural policy gradient. In NIPS, pp. 1531-1538. 2002.
-
(2002)
NIPS
, pp. 1531-1538
-
-
Kakade, S.M.1
-
37
-
-
84998584769
-
Stochastic real-valued reinforcement learning to solve a nonlinear control problem
-
Kimura, H. and Kobayashi, S. Stochastic real-valued reinforcement learning to solve a nonlinear control problem. In IEEE SMC, pp. 510-515, 1999.
-
(1999)
IEEE SMC
, pp. 510-515
-
-
Kimura, H.1
Kobayashi, S.2
-
38
-
-
84858754385
-
Policy search for motor primitives in robotics
-
Kober, J. and Peters, J. Policy search for motor primitives in robotics. In NIPS, pp. 849-856, 2009.
-
(2009)
NIPS
, pp. 849-856
-
-
Kober, J.1
Peters, J.2
-
41
-
-
84876231242
-
ImageNet classification with deep convolutional neural networks
-
Krizhevsky, A., Sutskever, I., and Hinton, G. ImageNet classification with deep convolutional neural networks. In NIPS, pp. 1097-1105. 2012.
-
(2012)
NIPS
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.3
-
43
-
-
84998840306
-
Guided policy search
-
Levine, S. and Koltun, V. Guided policy search. In ICML, pp. 1-9, 2013.
-
(2013)
ICML
, pp. 1-9
-
-
Levine, S.1
Koltun, V.2
-
44
-
-
84943767635
-
-
arXiv: 1504.00702
-
Levine, S., Finn, C, Darrell, T, and Abbeel, P. End-to-end training of deep visuomotor policies. arXiv: 1504.00702, 2015.
-
(2015)
End-to-end Training of Deep Visuomotor Policies
-
-
Levine, S.1
Finn, C.2
Darrell, T.3
Abbeel, P.4
-
45
-
-
84965135289
-
-
arXiv: 1509.02971
-
Lillicrap, T, Hunt, J., Pritzel, A., Heess, N., Erez, T, Tassa, Y, Silver, D., and Wierstra, D. Continuous control with deep reinforcement learning. arXiv: 1509.02971, 2015.
-
(2015)
Continuous Control with Deep Reinforcement Learning
-
-
Lillicrap, T.1
Hunt, J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
46
-
-
0034850577
-
A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics
-
Martin, D., C. Fowlkes, D. Tal, and Malik, J. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In ICCV, pp. 416-423, 2001.
-
(2001)
ICCV
, pp. 416-423
-
-
Martin, D.1
Fowlkes, C.2
Tal, D.3
Malik, J.4
-
48
-
-
0000827179
-
BOXES: An experiment in adaptive control
-
Michie, D. and Chambers, R. A. BOXES: An experiment in adaptive control. Machine Intelligence, 2:137-152, 1968.
-
(1968)
Machine Intelligence
, vol.2
, pp. 137-152
-
-
Michie, D.1
Chambers, R.A.2
-
49
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., Petersen, S., Beattie, C, Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., and Hassabis, D. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
Petersen, S.11
Beattie, C.12
Sadik, A.13
Antonoglou, I.14
King, H.15
Kumaran, D.16
Wierstra, D.17
Legg, S.18
Hassabis, D.19
-
52
-
-
84998605266
-
3D balance in legged locomotion: Modeling and simulation for the one-legged case
-
Murthy, S. S. and Raibert, M. H. 3D balance in legged locomotion: modeling and simulation for the one-legged case. ACM SIGGRAPH Computer Graphics, 18(1):27-27, 1984.
-
(1984)
ACM SIGGRAPH Computer Graphics
, vol.18
, Issue.1
, pp. 27
-
-
Murthy, S.S.1
Raibert, M.H.2
-
54
-
-
84892504975
-
Dotrl: A platform for rapid reinforcement learning methods development and validation
-
Papis, B. and Wawrzynski, P. dotrl: A platform for rapid reinforcement learning methods development and validation. In FedCSIS, pp. pages 129-136., 2013.
-
(2013)
FedCSIS
, pp. 129-136
-
-
Papis, B.1
Wawrzynski, P.2
-
57
-
-
34547964788
-
Reinforcement learning by rewardweighted regression for operational space control
-
Peters, J. and Schaal, S. Reinforcement learning by rewardweighted regression for operational space control. In ICML, pp. 745-750, 2007.
-
(2007)
ICML
, pp. 745-750
-
-
Peters, J.1
Schaal, S.2
-
58
-
-
44949241322
-
Reinforcement learning of motor skills with policy gradients
-
Peters, J. and Schaal, S. Reinforcement learning of motor skills with policy gradients. Neural networks, 21(4):682-697, 2008.
-
(2008)
Neural Networks
, vol.21
, Issue.4
, pp. 682-697
-
-
Peters, J.1
Schaal, S.2
-
60
-
-
77958569725
-
Relative entropy policy search
-
Peters, J., Mulling, K., and Altiin, Y. Relative entropy policy search. In AAAI, pp. 1607-1612, 2010.
-
(2010)
AAAI
, pp. 1607-1612
-
-
Peters, J.1
Mulling, K.2
Altiin, Y.3
-
61
-
-
84890506631
-
Life at low Reynolds number
-
Purcell, E. M. Life at low Reynolds number. Am. J. Phys, 45(1):3-11, 1977.
-
(1977)
Am. J. Phys
, vol.45
, Issue.1
, pp. 3-11
-
-
Purcell, E.M.1
-
63
-
-
84998928274
-
-
Riedmiller, M., Blum, M., and Lampe, T. CLS2: Closed loop simulation system, http://ml.inforniatik.uni-freiburg.de/research/clsquare, 2012.
-
(2012)
CLS2: Closed Loop Simulation System
-
-
Riedmiller, M.1
Blum, M.2
Lampe, T.3
-
64
-
-
0000228665
-
The cross-entropy method for combinatorial and continuous optimization
-
Rubinstein, R. The cross-entropy method for combinatorial and continuous optimization. Methodol. Comput. Appl. Probab., 1(2):127-190, 1999.
-
(1999)
Methodol. Comput. Appl. Probab.
, vol.1
, Issue.2
, pp. 127-190
-
-
Rubinstein, R.1
-
65
-
-
38049144999
-
Solving partially observable reinforcement learning problems with recurrent neural networks
-
Schäfer, A. M. and Udluft, S. Solving partially observable reinforcement learning problems with recurrent neural networks. In ECML Workshops, pp. 71-81, 2005.
-
(2005)
ECML Workshops
, pp. 71-81
-
-
Schäfer, A.M.1
Udluft, S.2
-
66
-
-
77949523247
-
-
Schaul, T, Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., Rückstieß, T, and Schmidhuber, J. PyBrain. J Mach. Learn. Res., 11:743-746, 2010.
-
(2010)
J Mach. Learn. Res.
, vol.11
, pp. 743-746
-
-
Schaul, T.1
Bayer, J.2
Wierstra, D.3
Sun, Y.4
Felder, M.5
Sehnke, F.6
Rückstieß, T.7
Schmidhuber, J.P.8
-
67
-
-
84969963490
-
Trust region policy optimization
-
Schulman, J., Levine, S., Abbeel, P., Jordan, M. I., and Moritz, P. Trust region policy optimization. In ICML, pp. 1889-1897, 2015a.
-
(2015)
ICML
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.I.4
Moritz, P.5
-
68
-
-
84993963574
-
-
arXiv: 1506.02438
-
Schulman, J., Moritz, P., Levine, S., Jordan, M. I., and Abbeel, P. High-dimensional continuous control using generalized advantage estimation. arXiv: 1506.02438, 2015b.
-
(2015)
High-dimensional Continuous Control Using Generalized Advantage Estimation
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.I.4
Abbeel, P.5
-
69
-
-
0000450303
-
On induced stability
-
Stephenson, A. On induced stability. Philos. Mag., 15(86):233-236, 1908.
-
(1908)
Philos. Mag.
, vol.15
, Issue.86
, pp. 233-236
-
-
Stephenson, A.1
-
70
-
-
33746857797
-
Keepaway soccer: From machine learning testbed to benchmark
-
Springer
-
Stone, Peter, Kuhlmann, Gregory, Taylor, Matthew E, and Liu, Yaxin. Keepaway soccer: From machine learning testbed to benchmark. In RoboCup 2005: Robot Soccer World Cup IX, pp. 93-105. Springer, 2005.
-
(2005)
RoboCup 2005: Robot Soccer World Cup IX
, pp. 93-105
-
-
Stone, P.1
Kuhlmann, G.2
Taylor, M.E.3
Liu, Y.4
-
71
-
-
0033170372
-
Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
-
Sutton, Richard S, Precup, Doina, and Singh, Satinder. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 112(1):181-211, 1999.
-
(1999)
Artificial Intelligence
, vol.112
, Issue.1
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
72
-
-
33845344721
-
Learning Tetris using the noisy crossentropy method
-
Szita, I. and Lorincz, A. Learning Tetris using the noisy crossentropy method. Neural Comput., 18(12):2936-2941, 2006.
-
(2006)
Neural Comput.
, vol.18
, Issue.12
, pp. 2936-2941
-
-
Szita, I.1
Lorincz, A.2
-
73
-
-
0042967671
-
ϵ-MDPs: Learning in varying environments
-
Szita, I., Takacs, B., and Lorincz, A. ϵ-MDPs: Learning in varying environments. J. Mach. Learn. Res., 3:145-174, 2003.
-
(2003)
J. Mach. Learn. Res.
, vol.3
, pp. 145-174
-
-
Szita, I.1
Takacs, B.2
Lorincz, A.3
-
74
-
-
84872363924
-
Synthesis and stabilization of complex behaviors through online trajectory optimization
-
IEEE
-
Tassa, Yuval, Erez, Tom, and Todorov, Emanuel. Synthesis and stabilization of complex behaviors through online trajectory optimization. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 4906-4913. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 4906-4913
-
-
Tassa, Y.1
Erez, T.2
Todorov, E.3
-
75
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM, 38(3):58-68, 1995.
-
(1995)
Commun. ACM
, vol.38
, Issue.3
, pp. 58-68
-
-
Tesauro, G.1
-
76
-
-
84872292044
-
MuJoCo: A physics engine for model-based control
-
Todorov, E., Erez, T, and Tassa, Y. MuJoCo: A physics engine for model-based control. In IROS, pp. 5026-5033, 2012.
-
(2012)
IROS
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
77
-
-
84954328659
-
Learning of nonparametric control policies with high-dimensional state features
-
van Hoof, H., Peters, J., and Neumann, G. Learning of nonparametric control policies with high-dimensional state features. In AISTATS, pp. 995-1003, 2015.
-
(2015)
AISTATS
, pp. 995-1003
-
-
Hoof, H.1
Peters, J.2
Neumann, G.3
-
78
-
-
84965129327
-
Embed to control: A locally linear latent dynamics model for control from raw images
-
Watter, M., Springenberg, J., Boedecker, J., and Riedmiller, M. Embed to control: A locally linear latent dynamics model for control from raw images. In NIPS, pp. 2728-2736, 2015.
-
(2015)
NIPS
, pp. 2728-2736
-
-
Watter, M.1
Springenberg, J.2
Boedecker, J.3
Riedmiller, M.4
-
79
-
-
46149089179
-
Learning to control a 6-degree-of-freedom walking robot
-
Wawrzynski, P. Learning to control a 6-degree-of-freedom walking robot. In IEEE EUROCON, pp. 698-705, 2007.
-
(2007)
IEEE EUROCON
, pp. 698-705
-
-
Wawrzynski, P.1
-
80
-
-
84867114808
-
Pattern recognition and adaptive control
-
Widrow, B. Pattern recognition and adaptive control. IEEE Trans. Ind. Appl, 83(74):269-277, 1964.
-
(1964)
IEEE Trans. Ind. Appl
, vol.83
, Issue.74
, pp. 269-277
-
-
Widrow, B.1
-
81
-
-
38149018611
-
Solving deep memory POMDPs with recurrent policy gradients
-
Wierstra, D., Foerster, A., Peters, J., and Schmidhuber, J. Solving deep memory POMDPs with recurrent policy gradients. In ICANN, pp. 697-706. 2007.
-
(2007)
ICANN
, pp. 697-706
-
-
Wierstra, D.1
Foerster, A.2
Peters, J.3
Schmidhuber, J.4
-
82
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams, R. J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn., 8:229-256, 1992.
-
(1992)
Mach. Learn.
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
-
83
-
-
79851475664
-
SkyAI: Highly modularized reinforcement learning library
-
Yamaguchi, A. and Ogasawara, T. SkyAI: Highly modularized reinforcement learning library. In IEEE-RAS Humanoids, pp. 118-123, 2010.
-
(2010)
IEEE-RAS Humanoids
, pp. 118-123
-
-
Yamaguchi, A.1
Ogasawara, T.2
-
84
-
-
44049108531
-
Automated directory assistance system - From theory to practice
-
Yu, D., Ju, Y.-C, Wang, Y.-Y, Zweig, G., and Acero, A. Automated directory assistance system - from theory to practice. In Interspeech, pp. 2709-2712, 2007.
-
(2007)
Interspeech
, pp. 2709-2712
-
-
Yu, D.1
Ju, Y.-C.2
Wang, Y.-Y.3
Zweig, G.4
Acero, A.5
|