-
1
-
-
0030149709
-
Purposive behavior acquisition for a real robot by vision-based reinforcement learning
-
Asada, M., Noda, S., Tawaratsumida, S. & Hosoda, K. (1996). Purposive behavior acquisition for a real robot by vision-based reinforcement learning. Machine Learning, 23:279-303.
-
(1996)
Machine Learning
, vol.23
, pp. 279-303
-
-
Asada, M.1
Noda, S.2
Tawaratsumida, S.3
Hosoda, K.4
-
2
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
Barto, A.G., Bradtke, S.J. & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 1(72):81-138.
-
(1995)
Artificial Intelligence
, vol.1
, Issue.72
, pp. 81-138
-
-
Barto, A.G.1
Bradtke, S.J.2
Singh, S.P.3
-
3
-
-
0003787146
-
-
Princeton University Press, Princeton, New Jersey
-
Bellman, R. (1957). Dynamic Programming. Princeton University Press, Princeton, New Jersey.
-
(1957)
Dynamic Programming
-
-
Bellman, R.1
-
5
-
-
0031185898
-
Modeling agents as qualitative decision makers
-
Brafman, R.I. & Moshe, T. (1997). Modeling agents as qualitative decision makers. Artificial Intelligence, 94 (1):217-268.
-
(1997)
Artificial Intelligence
, vol.94
, Issue.1
, pp. 217-268
-
-
Brafman, R.I.1
Moshe, T.2
-
6
-
-
0003672832
-
-
PhD thesis, Laboratory of Information and Decision, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA
-
Branicky, M.S. (1995). Studies in Hybrid Systems: Modeling, Analysis, and Control. PhD thesis, Laboratory of Information and Decision, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA.
-
(1995)
Studies in Hybrid Systems: Modeling, Analysis, and Control
-
-
Branicky, M.S.1
-
7
-
-
0344226898
-
-
Technical report lids-p-2239, Laboratory for Information and Decision Systems, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA
-
Branicky, M.S., Borkar, V.S. & Mitter, S.K. (1994). A unified framework for hybrid control: Background, model, and theory. Technical report lids-p-2239, Laboratory for Information and Decision Systems, MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307 USA.
-
(1994)
A Unified Framework for Hybrid Control: Background, Model, and Theory
-
-
Branicky, M.S.1
Borkar, V.S.2
Mitter, S.K.3
-
10
-
-
0001937317
-
Elephants don't play chess
-
Bradford-MIT Press, 1991
-
Brooks, R.A. (1991b). Elephants don't play chess. In Designing Autonomous Agents. Bradford-MIT Press, 1991.
-
(1991)
Designing Autonomous Agents
-
-
Brooks, R.A.1
-
11
-
-
0026998041
-
Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
-
San Jose, CA. AAAI Press
-
Chrisman, L. (1992). Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 183-188, San Jose, CA. AAAI Press.
-
(1992)
Proceedings of the Tenth National Conference on Artificial Intelligence
, pp. 183-188
-
-
Chrisman, L.1
-
12
-
-
0030167564
-
Behavior analysis and training: A methodology for behavior engineering
-
Colombetti, M., Dorigo, M. & Borghi, G. (1996). Behavior analysis and training: A methodology for behavior engineering. IEEE Transactions on Systems, Man, and Cybernetics-Part B, 26(3):365-380.
-
(1996)
IEEE Transactions on Systems, Man, and Cybernetics-Part B
, vol.26
, Issue.3
, pp. 365-380
-
-
Colombetti, M.1
Dorigo, M.2
Borghi, G.3
-
13
-
-
0021577685
-
A qualitative physics based on confluences
-
de Kleer, J. & Seely, B.J. (1984). A qualitative physics based on confluences. Artificial Intelligence, 24(1-3): 7-83.
-
(1984)
Artificial Intelligence
, vol.24
, Issue.1-3
, pp. 7-83
-
-
Kleer, J.1
Seely, B.J.2
-
14
-
-
0029326107
-
Alecsys and the autonomouse: Learning to control a real robot by distributed classifier systems
-
Dorigo, M. (1995). Alecsys and the autonomouse: Learning to control a real robot by distributed classifier systems. Machine Learning, 19(3):209-240.
-
(1995)
Machine Learning
, vol.19
, Issue.3
, pp. 209-240
-
-
Dorigo, M.1
-
15
-
-
0028739953
-
Robot shaping: Developing autonomous agents through learning
-
Dorigo, M. & Colombetti, M. (1994). Robot shaping: Developing autonomous agents through learning. Artificial Intelligence, 71:321-370.
-
(1994)
Artificial Intelligence
, vol.71
, pp. 321-370
-
-
Dorigo, M.1
Colombetti, M.2
-
16
-
-
0343860991
-
-
Technical report 98-115, Research Group on Artificial Intelligence, JATE-MTA
-
Gábor, Z., Kalmár, Zs. & Szepesvári, Cs.. (1998). Multi-criteria reinforcement learning. Technical report 98-115, Research Group on Artificial Intelligence, JATE-MTA.
-
(1998)
Multi-criteria Reinforcement Learning
-
-
Gábor, Z.1
Kalmár, Zs.2
Szepesvári, Cs.3
-
17
-
-
0004242478
-
-
Lecture Notes in Computer Science. Springer-Verlag, New York
-
Grossman, R.L., Nerode, A., Ravn, A. P. & Rischel, H. (1993). Hybrid Systems, volume 736 of Lecture Notes in Computer Science. Springer-Verlag, New York.
-
(1993)
Hybrid Systems
, vol.736
-
-
Grossman, R.L.1
Nerode, A.2
Ravn, A.P.3
Rischel, H.4
-
18
-
-
0029751418
-
The loss from imperfect value functions in expectation-based and minimax-based tasks
-
Heger, M. (1996). The loss from imperfect value functions in expectation-based and minimax-based tasks. Machine Learning, 22:197-225.
-
(1996)
Machine Learning
, vol.22
, pp. 197-225
-
-
Heger, M.1
-
19
-
-
0020749243
-
Vector-valued dynamic programming
-
Henig, M.I. (1983). Vector-valued dynamic programming. SIAM J. Control and Optimization, 21(3):490-499.
-
(1983)
SIAM J. Control and Optimization
, vol.21
, Issue.3
, pp. 490-499
-
-
Henig, M.I.1
-
20
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M.I. & Singh, S.P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-1201.
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
21
-
-
0029679044
-
Reinforcement learning: A survey
-
Kaebling, L.P., Littman, M.L. & Moore, A.W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237-285.
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaebling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
22
-
-
0028745065
-
Generalization in an autonomous agent
-
Orlando, Florida. IEEE Inc
-
Kalmár, Zs., Szepesvári, Cs. & Lorincz, A. (1994). Generalization in an autonomous agent. In Proc. of IEEE WCCI ICNN'94, volume 3, pages 1815-1817, Orlando, Florida. IEEE Inc.
-
(1994)
Proc. of IEEE WCCI ICNN'94
, vol.3
, pp. 1815-1817
-
-
Kalmár, Zs.1
Szepesvári, Cs.2
Lorincz, A.3
-
23
-
-
0029205333
-
Generalized dynamic concept model as a route to construct adaptive autonomous agents
-
Kalmár, Zs., Szepesvári, Cs. & Lorincz, A. (1995). Generalized dynamic concept model as a route to construct adaptive autonomous agents. Neural Network World, 5:353-360.
-
(1995)
Neural Network World
, vol.5
, pp. 353-360
-
-
Kalmár, Zs.1
Szepesvári, Cs.2
Lorincz, A.3
-
25
-
-
2342475494
-
Complexity analysis of real-time reinforcement learning applied to finding shortest paths in deterministic domains
-
Koenig, S. & Simmons, R.G. (1997). Complexity analysis of real-time reinforcement learning applied to finding shortest paths in deterministic domains. Machine Learning: A Special Issue on Reinforcement Learning, 12:234-345.
-
(1997)
Machine Learning: A Special Issue on Reinforcement Learning
, vol.12
, pp. 234-345
-
-
Koenig, S.1
Simmons, R.G.2
-
27
-
-
0022045044
-
Macro-operators: A weak method for learning
-
Korf, R.E. (1985b). Macro-operators: A weak method for learning. Artificial Intelligence, 26:35-77.
-
(1985)
Artificial Intelligence
, vol.26
, pp. 35-77
-
-
Korf, R.E.1
-
28
-
-
0023421864
-
Planning as search: A quantitative approach
-
Korf, R.E. (1987). Planning as search: A quantitative approach. Artificial Intelligence, 33:65-88.
-
(1987)
Artificial Intelligence
, vol.33
, pp. 65-88
-
-
Korf, R.E.1
-
30
-
-
0022062142
-
A survey of some results in stochastic adaptive controls
-
Kumar, P.R. (1985). A survey of some results in stochastic adaptive controls. SIAM Journal of Control and Optimization, 23:329-380.
-
(1985)
SIAM Journal of Control and Optimization
, vol.23
, pp. 329-380
-
-
Kumar, P.R.1
-
31
-
-
0003861655
-
-
PhD thesis, Department of Computer Science, Brown University. Also Technical Report CS-96-09
-
Littman, M.L. (1996). Algorithms for Sequential Decision Making. PhD thesis, Department of Computer Science, Brown University. Also Technical Report CS-96-09.
-
(1996)
Algorithms for Sequential Decision Making
-
-
Littman, M.L.1
-
32
-
-
0001961616
-
A Generalized Reinforcement Learning Model: Convergence and applications
-
Littman, M.L. & Szepesvári, Cs. (1996). A Generalized Reinforcement Learning Model: Convergence and applications. In Int. Conf. on Machine Learning, pages 310-318.
-
(1996)
Int. Conf. on Machine Learning
, pp. 310-318
-
-
Littman, M.L.1
Szepesvári, Cs.2
-
33
-
-
2342628252
-
A design framework for hierarchical, hybrid control
-
submitted
-
Lygeros, J., Godbole, D.N. & Sastry, S.S. (1997). A design framework for hierarchical, hybrid control. IEEE Transactions on Automatic Control, special issue on Hybrid Systems, (submitted).
-
(1997)
IEEE Transactions on Automatic Control, Special Issue on Hybrid Systems
-
-
Lygeros, J.1
Godbole, D.N.2
Sastry, S.S.3
-
35
-
-
0002765109
-
A bottom-up mechanism for behavior selection in an artificial creature
-
J.A. Meyer and S. Wilson, editors, MIT Press
-
Maes, P. (1991b), A bottom-up mechanism for behavior selection in an artificial creature. In J.A. Meyer and S. Wilson, editors, Proc. of the First International Conference on Simulation of Adaptive Behavior. MIT Press.
-
(1991)
Proc. of the First International Conference on Simulation of Adaptive Behavior
-
-
Maes, P.1
-
37
-
-
84976813028
-
Learning to coordinate behaviors
-
Boston, MA
-
Maes, P. & Brooks, R.A. (1990). Learning to coordinate behaviors. In Proc. of AAAI-90, pages 796-802, Boston, MA.
-
(1990)
Proc. of AAAI-90
, pp. 796-802
-
-
Maes, P.1
Brooks, R.A.2
-
38
-
-
0026880130
-
Automatic programming of behavior-based robots using reinforcement learning
-
Mahadevan, S. & Connell, J. (1992). Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311-365.
-
(1992)
Artificial Intelligence
, vol.55
, pp. 311-365
-
-
Mahadevan, S.1
Connell, J.2
-
39
-
-
0030647149
-
Reinforcement learning in the multi-robot domain
-
Matarić, M. (1997). Reinforcement learning in the multi-robot domain. Autonomous Robots, 4.
-
(1997)
Autonomous Robots
, vol.4
-
-
Matarić, M.1
-
40
-
-
85151432208
-
Overcoming incomplete perception with utile distinction memory
-
Amherst, Massachusetts. Morgan Kaufmann
-
McCallum, R.A. (1993). Overcoming incomplete perception with utile distinction memory. In Proceedings of the Tenth International Conference on Machine Learning, pages 190-196, Amherst, Massachusetts. Morgan Kaufmann.
-
(1993)
Proceedings of the Tenth International Conference on Machine Learning
, pp. 190-196
-
-
McCallum, R.A.1
-
41
-
-
84947933152
-
Finite-element methods with local triangulation refinement for continuous reinforcement learning problems
-
M.van Someren and G. Widmer, editors, Lecture Notes in Artificial Intelligence, Springer, Berlin
-
Munos, R. (1997). Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In M.van Someren and G. Widmer, editors, Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), volume 1224 of Lecture Notes in Artificial Intelligence, pages 170-183. Springer, Berlin.
-
(1997)
Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings)
, vol.1224
, pp. 170-183
-
-
Munos, R.1
-
44
-
-
0004235832
-
-
Princeton University Press, Princeton, NJ
-
Pólya, Gy.. (1945). How to solve it? Princeton University Press, Princeton, NJ.
-
(1945)
How to Solve It?
-
-
Pólya, Gy.1
-
47
-
-
0016069798
-
Planning in a hierarchy of abstraction spaces
-
Sacerdoti, E.D. (1974). Planning in a hierarchy of abstraction spaces. Artificial Intelligence, 5:115-135.
-
(1974)
Artificial Intelligence
, vol.5
, pp. 115-135
-
-
Sacerdoti, E.D.1
-
49
-
-
0030145238
-
Qualitative system identification: Deriving structure from behavior
-
Say, A.C.C. & Selahattin, K. (1996). Qualitative system identification: deriving structure from behavior. Artificial Intelligence, 83(1):75-141.
-
(1996)
Artificial Intelligence
, vol.83
, Issue.1
, pp. 75-141
-
-
Say, A.C.C.1
Selahattin, K.2
-
51
-
-
2342564758
-
On the convergence of single-step on-policy reinforcement-learning algorithms
-
accepted
-
Singh, S., Jaakkola, T., Littman, M.L. & Szepesvári, Cs.. (1997). On the convergence of single-step on-policy reinforcement-learning algorithms. Machine Learning. accepted.
-
(1997)
Machine Learning
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
Szepesvári, Cs.4
-
54
-
-
0000723997
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
Sutton, R.S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
-
-
Sutton, R.S.1
-
56
-
-
0028742076
-
Dynamic Concept Model learns optimal policies
-
Orlando, Florida. IEEE Inc
-
Szepesvári, Cs. (1994). Dynamic Concept Model learns optimal policies. In Proc. of IEEE WCCI ICNN'94, volume 3, pages 1738-1742, Orlando, Florida. IEEE Inc.
-
(1994)
Proc. of IEEE WCCI ICNN'94
, vol.3
, pp. 1738-1742
-
-
Szepesvári, C.1
-
57
-
-
84947910334
-
Learning and exploitation do not conflict under minimax optimality
-
M.van Someren and G. Widmer, editors, Lecture Notes in Artificial Intelligence, Springer, Berlin
-
Szepesvári, Cs. (1997a). Learning and exploitation do not conflict under minimax optimality. In M.van Someren and G. Widmer, editors, Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings), volume 1224 of Lecture Notes in Artificial Intelligence, pages 242-249. Springer, Berlin.
-
(1997)
Machine Learning: ECML'97 (9th European Conf. on Machine Learning, Proceedings)
, vol.1224
, pp. 242-249
-
-
Szepesvári, Cs.1
-
58
-
-
0008876345
-
-
PhD thesis, Bolyai Institute of Mathematics, University of Szeged, Szeged, Aradi vrt. tere 1, HUNGARY, 6720
-
Szepesvári, Cs. (1997b). Static and Dynamic Aspects of Optimal Sequential Decision Making. PhD thesis, Bolyai Institute of Mathematics, University of Szeged, Szeged, Aradi vrt. tere 1, HUNGARY, 6720.
-
(1997)
Static and Dynamic Aspects of Optimal Sequential Decision Making
-
-
Szepesvári, Cs.1
-
59
-
-
2342450292
-
Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms
-
in preparation
-
Szepesvári, Cs. & Littman, M.L. (1997). Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Neural Computation. in preparation.
-
(1997)
Neural Computation
-
-
Szepesvári, Cs.1
Littman, M.L.2
-
60
-
-
84977014241
-
Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts
-
Szepesvári, Cs. & Lorincz, A. (1994). Behavior of an adaptive self-organizing autonomous agent working with cues and competing concepts. Adaptive Behavior, 2(2):131-160.
-
(1994)
Adaptive Behavior
, vol.2
, Issue.2
, pp. 131-160
-
-
Szepesvári, Cs.1
Lorincz, A.2
-
62
-
-
33749882712
-
Finding structure in reinforcement learning
-
Gerald Tesauro, David S. Touretzky, and Todd K. Leen, editors, The MIT Press, Cambridge
-
Thrun, S. & Schwartz, A. (1995). Finding structure in reinforcement learning. In Gerald Tesauro, David S. Touretzky, and Todd K. Leen, editors, Advances in Neural Information Processing Systems, volume 7, pages 385-392. The MIT Press, Cambridge.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 385-392
-
-
Thrun, S.1
Schwartz, A.2
-
63
-
-
0344195285
-
Genetic algorithm with alphabet optimization
-
Tóth, G.J., Kovács, Sz. & Lorincz, A. (1995). Genetic algorithm with alphabet optimization. Biological Cybernetics, 73:61-68.
-
(1995)
Biological Cybernetics
, vol.73
, pp. 61-68
-
-
Tóth, G.J.1
Kovács, Sz.2
Lorincz, A.3
-
64
-
-
2342562099
-
Asynchronous stochastic approximation and q-learning
-
Tsitsiklis, J.N. (1994). Asynchronous stochastic approximation and q-learning. Machine Learning, 8(3-4): 257-277.
-
(1994)
Machine Learning
, vol.8
, Issue.3-4
, pp. 257-277
-
-
Tsitsiklis, J.N.1
-
65
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
Tsitsiklis, J.N. & Van Roy, B. (1996). Feature-based methods for large scale dynamic programming. Machine Learning, 22:59-94.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
70
-
-
0002557583
-
Advanced forecasting methods for global crisis warning and models of intelligence
-
Werbös, P.J. (1977). Advanced forecasting methods for global crisis warning and models of intelligence. General Systems Yearbook, 22:25-38.
-
(1977)
General Systems Yearbook
, vol.22
, pp. 25-38
-
-
Werbös, P.J.1
-
72
-
-
2342534771
-
Optimal control by means of switching
-
Zabczyk, J. (1973). Optimal control by means of switching. Studia Mathematica, 65:161-171.
-
(1973)
Studia Mathematica
, vol.65
, pp. 161-171
-
-
Zabczyk, J.1
|