SCOPUS 정보 검색 플랫폼

Journal of Machine Learning Research

Volumn 3, Issue 1, 2003, Pages 145-174

ε-MDPs: Learning in varying environments

(3) Szita, István a Takács, Bálint a Lorincz, András a

a EÖTVÖS LORÁND UNIVERSITY (Hungary)

Author keywords

MDP; Convergence; Event learning; Generalized MDP; MDP; Reinforcement learning; SARSA; SDS controller

Indexed keywords

CONVERGENCE OF NUMERICAL METHODS; DECISION MAKING; LEARNING ALGORITHMS; SOFTWARE AGENTS; THEOREM PROVING; VIRTUAL REALITY;

MARKOVIAN DECISION PROBLEMS (MDP);

LEARNING SYSTEMS;

EID: 0042967671 PISSN: 15324435 EISSN: None Source Type: Journal
DOI: 10.1162/153244303768966148 Document Type: Article

Times cited : (36)

References (40)

1
- 0042916518
- Base thesis, University of Toronto
- T. M. Aamodt. Intelligent control via reinforcement learning. Base thesis, University of Toronto, 1997. URL http://www.eecg.utoronto.ca/̃aamodt/.
- (1997) Intelligent Control via Reinforcement Learning
- Aamodt, T.M.¹

2
- 84972962505
- Discrete and continuous models
- A. Barto. Discrete and continuous models. International Journal of General Systems, 4:163-177, 1978.
- (1978) International Journal of General Systems , vol.4 , pp. 163-177
- Barto, A.¹

3
- 0003787146
- Princeton University Press, Princeton, New Jersey
- R. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

4
- 0011530731
- Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, UK, August
- J. A. Boyan. Modular neural networks for learning context-dependent game strategies. Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, UK, August 1992.
- (1992) Modular Neural Networks for Learning Context-dependent Game Strategies
- Boyan, J.A.¹

5
- 0001234682
- Feudal reinforcement learning
- San Mateo, CA, Morgan Kaufmann
- P. Dayan and G. E. Hinton. Feudal reinforcement learning. In Advances in Neural Information Processing Systems, volume 5, pages 271-278, San Mateo, CA, 1993. Morgan Kaufmann.
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 271-278
- Dayan, P.¹ Hinton, G.E.²

6
- 0002278788
- Hierarchical reinforcement learning with the MAXQ value function decomposition
- T.G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 2000.
- (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
- Dietterich, T.G.¹

7
- 0004671869
- Temporal difference learning in continuous time and space
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Cambridge, MA, MIT Press
- K. Doya. Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, Cambridge, MA, 1996. MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Doya, K.¹

8
- 0033629916
- Reinforcement learning in continuous time and space
- K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12:243-269, 2000.
- (2000) Neural Computation , vol.12 , pp. 243-269
- Doya, K.¹

9
- 0030324990
- Self-organizing multi-resolution grid for motion planning and control
- T. Fomin, T. Rozgonyi, Cs. Szepesvári, and A. Lorincz. Self-organizing multi-resolution grid for motion planning and control. International Journal of Neural Systems, 7:757-776, 1997.
- (1997) International Journal of Neural Systems , vol.7 , pp. 757-776
- Fomin, T.¹ Rozgonyi, T.² Szepesvári, Cs.³ Lorincz, A.⁴

10
- 0034272032
- Bounded-parameter markov decision processes
- R. Givan, S. M. Leach, and T. Dean. Bounded-parameter markov decision processes. Artificial Intelligence, 122(1-2):71-109, 2000. URL citeseer.nj.nec.com/article/givan97bounded.html.
- (2000) Artificial Intelligence , vol.122 , Issue.1-2 , pp. 71-109
- Givan, R.¹ Leach, S.M.² Dean, T.³

11
- 0002357911
- Convergence of indirect adaptive asynchronous value iteration algorithms
- J. D. Cowan, G. Tesauro, and J. Alspector, editor, San Mateo, CA. Morgan Kaufmann
- V. Gullapalli and A. G. Barto. Convergence of indirect adaptive asynchronous value iteration algorithms. In J. D. Cowan, G. Tesauro, and J. Alspector, editor, Advances in Neural Information Processing Systems, volume 6, pages 695-702, San Mateo, CA, 1994. Morgan Kaufmann.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 695-702
- Gullapalli, V.¹ Barto, A.G.²

12
- 85120861483
- Consideration of risk in reinforcement learning
- San Fransisco, CA. Morgan Kaufmann
- M. Heger. Consideration of risk in reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 105-111, San Fransisco, CA, 1994. Morgan Kaufmann.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 105-111
- Heger, M.¹

13
- 0026913154
- Gross motion planning - A survey
- Y. K. Hwang and N. Ahuja. Gross motion planning - a survey. ACM Computing Surveys, 24(3):219-291, 1992.
- (1992) ACM Computing Surveys , vol.24 , Issue.3 , pp. 219-291
- Hwang, Y.K.¹ Ahuja, N.²

14
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- November
- T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-1201, November 1994.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

15
- 0028566780
- When the best move isn't optimal: Q-learning with exploration
- Seattle, WA
- G. H. John. When the best move isn't optimal: Q-learning with exploration. In Proceedings of the Twelfth National Conference on Artificial Intelligence, page 1464, Seattle, WA, 1994.
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence , pp. 1464
- John, G.H.¹

16
- 85143168613
- Hierarchical learning in stochastic domains: Preliminary results
- San Mateo, CA. Morgan Kaufmann
- L.P. Kaelbling. Hierarchical learning in stochastic domains: Preliminary results. In Proceedings of the Tenth International Conference on Machine Learning, pages 167-173, San Mateo, CA, 1993. Morgan Kaufmann.
- (1993) Proceedings of the Tenth International Conference on Machine Learning , pp. 167-173
- Kaelbling, L.P.¹

17
- 0032045145
- Module-based reinforcement learning: Experiments with a real robot
- Z. Kalmár, Cs. Szepesvári, and A. Lorincz. Module-based reinforcement learning: Experiments with a real robot. Machine Learning, 31:55-85, 1998.
- (1998) Machine Learning , vol.31 , pp. 55-85
- Kalmár, Z.¹ Szepesvári, Cs.² Lorincz, A.³

18
- 0042415660
- Event-learning and robust policy heuristics
- forthcoming
- A. Lorincz, I. Pólik, and I. Szita. Event-learning and robust policy heuristics. Cognitive Systems Research, 2002, forthcoming. URL http://people.inf.elte.hu/lorincz/Files/NIPG-ELU-14-05-2001.ps.
- (2002) Cognitive Systems Research
- Lorincz, A.¹ Pólik, I.² Szita, I.³

19
- 85149834820
- Markov games as a framework for multi-agent reinforcement learning
- San Fransisco, CA. Morgan Kaufmann
- M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157-163, San Fransisco, CA, 1994. Morgan Kaufmann.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 157-163
- Littman, M.L.¹

20
- 0002434059
- Learning behavior networks from experience
- F. J. Varela and P. Bourgine, editors, Cambridge, MA. MIT Press, Cambridge
- P. Maes. Learning behavior networks from experience. In F. J. Varela and P. Bourgine, editors, Toward a practice of autonomous systems: Proceedings of the First European Conf. on Artificial Life, Cambridge, MA, 1992. MIT Press, Cambridge.
- (1992) Toward a Practice of Autonomous Systems: Proceedings of the First European Conf. on Artificial Life
- Maes, P.¹

21
- 0026880130
- Automatic programming of behavior-based robots using reinforcement learning
- S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311-365, 1992.
- (1992) Artificial Intelligence , vol.55 , pp. 311-365
- Mahadevan, S.¹ Connell, J.²

22
- 0003250286
- Behavior-based control: Examples from navigation, learning, and group behavior
- M.J. Mataric. Behavior-based control: Examples from navigation, learning, and group behavior. J. of Experimental and Theoretical Artificial Intelligence, 9:2-3, 1997.
- (1997) J. of Experimental and Theoretical Artificial Intelligence , vol.9 , pp. 2-3
- Mataric, M.J.¹

23
- 84899003140
- Multi-time models for temporally abstract planning
- D. Precup and R. Sutton. Multi-time models for temporally abstract planning. Advances in Neural Information Processing Systems, 10:1050-1056, 1998.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1050-1056
- Precup, D.¹ Sutton, R.²

24
- 85102627959
- John Wiley & Sons, New York
- M. Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, New York, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.¹

25
- 0000016172
- A stochastic approximation method
- H. Robbins and S. Monro. A stochastic approximation method. Annals of Mathematical Statistics, 22:400-407, 1951.
- (1951) Annals of Mathematical Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

26
- 0002876837
- Scaling reinforcement learning algorithms by learning variable temporal resolution models
- San Mateo, CA. Morgan Kaufmann
- S. P. Singh. Scaling reinforcement learning algorithms by learning variable temporal resolution models. In Proceedings of the Ninth International Conference on Machine Learning, MLC-92, San Mateo, CA, 1992. Morgan Kaufmann.
- (1992) Proceedings of the Ninth International Conference on Machine Learning , vol.MLC-92
- Singh, S.P.¹

27
- 0004102479
- MIT Press, Cambridge
- R. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.G.²

28
- 0009656873
- Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales
- R. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales. Journal of Artificial Intelligence Research, 1:1-39, 1998.
- (1998) Journal of Artificial Intelligence Research , vol.1 , pp. 1-39
- Sutton, R.¹ Precup, D.² Singh, S.³

29
- 0008876345
- Ph.d. thesis, Attila József University, Bolyai Institute of Mathematics
- Cs. Szepesvári. Static and dynamic aspects of optimal sequential decision making. Ph.d. thesis, Attila József University, Bolyai Institute of Mathematics, 1998.
- (1998) Static and Dynamic Aspects of Optimal Sequential Decision Making
- Szepesvári, Cs.¹

30
- 0031455549
- Neurocontroller using dynamic state feedback for compensatory control
- Cs. Szepesvári, Sz. Cimmer, and A. Lorincz. Neurocontroller using dynamic state feedback for compensatory control. Neural Networks, 10 (9):1691-1708, 1997.
- (1997) Neural Networks , vol.10 , Issue.9 , pp. 1691-1708
- Szepesvári, Cs.¹ Cimmer, Sz.² Lorincz, A.³

31
- 0031455549
- Dynamic state feedback neurocontroller for compensatory control
- Cs. Szepesvári, Sz. Cimmer, and A. Lorincz. Dynamic state feedback neurocontroller for compensatory control. Neural Networks, 10:1691-1708, 1997.
- (1997) Neural Networks , vol.10 , pp. 1691-1708
- Szepesvári, Cs.¹ Cimmer, Sz.² Lorincz, A.³

32
- 0003629453
- Generalized Markov decision processes: Dynamic-programming and reinforcement-learning algorithms
- Bari
- Cs. Szepesvári and M. L. Littman. Generalized Markov decision processes: Dynamic-programming and reinforcement-learning algorithms. In Proceedings of International Conference of Machine Learning '96, Bari, 1996.
- (1996) Proceedings of International Conference of Machine Learning '96
- Szepesvári, Cs.¹ Littman, M.L.²

33
- 0008572817
- Approximate inverse-dynamics based robust control using static and dynamic feedback
- J. Kalkkuhl, K. J. Hunt, R. Zbikowski, and A. Dzielinski, editors. World Scientific, Singapore
- Cs. Szepesvári and A. Lorincz. Approximate inverse-dynamics based robust control using static and dynamic feedback. In J. Kalkkuhl, K. J. Hunt, R. Zbikowski, and A. Dzielinski, editors, Applications of Neural Adaptive Control Theory, volume 2, pages 151-179. World Scientific, Singapore, 1997.
- (1997) Applications of Neural Adaptive Control Theory , vol.2 , pp. 151-179
- Szepesvári, Cs.¹ Lorincz, A.²

34
- 0031678862
- An integrated architecture for motion-control and path-planning
- Cs. Szepesvári and A. Lorincz. An integrated architecture for motion-control and path-planning. Journal of Robotic Systems, 15:1-15, 1998.
- (1998) Journal of Robotic Systems , vol.15 , pp. 1-15
- Szepesvári, Cs.¹ Lorincz, A.²

35
- 0041413263
- Event-learning with a non-markovian controller
- F. van Harmelen, editor, Lyon, IOS Press, Amsterdam
- I. Szita, B. Takács, and A. Lorincz. Event-learning with a non-markovian controller. In F. van Harmelen, editor, 15th European Conference on Artifical Intelligence, Lyon, pages 365-369. IOS Press, Amsterdam, 2002.
- (2002) 15th European Conference on Artifical Intelligence , pp. 365-369
- Szita, I.¹ Takács, B.² Lorincz, A.³

36
- 0042415648
- Phd thesis, University of Amsterdam, Amsterdam
- S. H. G. ten Hagen. Continuous state space Q-learning for control of non-linear systems. Phd thesis, University of Amsterdam, Amsterdam, 2001.
- (2001) Continuous State Space Q-learning for Control of Non-linear Systems
- Ten Hagen, S.H.G.¹

37
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- September
- J. N. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machime Learning, 3(16):185-202, September 1994.
- (1994) Machime Learning , vol.3 , Issue.16 , pp. 185-202
- Tsitsiklis, J.N.¹

38
- 0004049893
- Ph.d. thesis, King's College, Cambridge, UK
- C. J. C. H. Watkins. Learning from Delayed Rewards. Ph.d. thesis, King's College, Cambridge, UK, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

39
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8 (3):279-292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

40
- 0041914523
- M. Yamakita, M. Iwashiro, Y. Sugahara, and K. Furuta. Robust swing-up control of double pendulum, 1995.
- (1995) Robust Swing-up Control of Double Pendulum
- Yamakita, M.¹ Iwashiro, M.² Sugahara, Y.³ Furuta, K.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.