메뉴 건너뛰기




Volumn 3, Issue 1, 2003, Pages 145-174

ε-MDPs: Learning in varying environments

Author keywords

MDP; Convergence; Event learning; Generalized MDP; MDP; Reinforcement learning; SARSA; SDS controller

Indexed keywords

CONVERGENCE OF NUMERICAL METHODS; DECISION MAKING; LEARNING ALGORITHMS; SOFTWARE AGENTS; THEOREM PROVING; VIRTUAL REALITY;

EID: 0042967671     PISSN: 15324435     EISSN: None     Source Type: Journal    
DOI: 10.1162/153244303768966148     Document Type: Article
Times cited : (36)

References (40)
  • 3
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton, New Jersey
    • R. Bellman. Dynamic Programming. Princeton University Press, Princeton, New Jersey, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.1
  • 6
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • T.G. Dietterich. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research, 13:227-303, 2000.
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 7
    • 0004671869 scopus 로고    scopus 로고
    • Temporal difference learning in continuous time and space
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Cambridge, MA, MIT Press
    • K. Doya. Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, Cambridge, MA, 1996. MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Doya, K.1
  • 8
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya. Reinforcement learning in continuous time and space. Neural Computation, 12:243-269, 2000.
    • (2000) Neural Computation , vol.12 , pp. 243-269
    • Doya, K.1
  • 10
    • 0034272032 scopus 로고    scopus 로고
    • Bounded-parameter markov decision processes
    • R. Givan, S. M. Leach, and T. Dean. Bounded-parameter markov decision processes. Artificial Intelligence, 122(1-2):71-109, 2000. URL citeseer.nj.nec.com/article/givan97bounded.html.
    • (2000) Artificial Intelligence , vol.122 , Issue.1-2 , pp. 71-109
    • Givan, R.1    Leach, S.M.2    Dean, T.3
  • 11
    • 0002357911 scopus 로고
    • Convergence of indirect adaptive asynchronous value iteration algorithms
    • J. D. Cowan, G. Tesauro, and J. Alspector, editor, San Mateo, CA. Morgan Kaufmann
    • V. Gullapalli and A. G. Barto. Convergence of indirect adaptive asynchronous value iteration algorithms. In J. D. Cowan, G. Tesauro, and J. Alspector, editor, Advances in Neural Information Processing Systems, volume 6, pages 695-702, San Mateo, CA, 1994. Morgan Kaufmann.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 695-702
    • Gullapalli, V.1    Barto, A.G.2
  • 13
    • 0026913154 scopus 로고
    • Gross motion planning - A survey
    • Y. K. Hwang and N. Ahuja. Gross motion planning - a survey. ACM Computing Surveys, 24(3):219-291, 1992.
    • (1992) ACM Computing Surveys , vol.24 , Issue.3 , pp. 219-291
    • Hwang, Y.K.1    Ahuja, N.2
  • 14
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • November
    • T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185-1201, November 1994.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 17
    • 0032045145 scopus 로고    scopus 로고
    • Module-based reinforcement learning: Experiments with a real robot
    • Z. Kalmár, Cs. Szepesvári, and A. Lorincz. Module-based reinforcement learning: Experiments with a real robot. Machine Learning, 31:55-85, 1998.
    • (1998) Machine Learning , vol.31 , pp. 55-85
    • Kalmár, Z.1    Szepesvári, Cs.2    Lorincz, A.3
  • 18
    • 0042415660 scopus 로고    scopus 로고
    • Event-learning and robust policy heuristics
    • forthcoming
    • A. Lorincz, I. Pólik, and I. Szita. Event-learning and robust policy heuristics. Cognitive Systems Research, 2002, forthcoming. URL http://people.inf.elte.hu/lorincz/Files/NIPG-ELU-14-05-2001.ps.
    • (2002) Cognitive Systems Research
    • Lorincz, A.1    Pólik, I.2    Szita, I.3
  • 19
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • San Fransisco, CA. Morgan Kaufmann
    • M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157-163, San Fransisco, CA, 1994. Morgan Kaufmann.
    • (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 157-163
    • Littman, M.L.1
  • 21
    • 0026880130 scopus 로고
    • Automatic programming of behavior-based robots using reinforcement learning
    • S. Mahadevan and J. Connell. Automatic programming of behavior-based robots using reinforcement learning. Artificial Intelligence, 55:311-365, 1992.
    • (1992) Artificial Intelligence , vol.55 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 26
    • 0002876837 scopus 로고
    • Scaling reinforcement learning algorithms by learning variable temporal resolution models
    • San Mateo, CA. Morgan Kaufmann
    • S. P. Singh. Scaling reinforcement learning algorithms by learning variable temporal resolution models. In Proceedings of the Ninth International Conference on Machine Learning, MLC-92, San Mateo, CA, 1992. Morgan Kaufmann.
    • (1992) Proceedings of the Ninth International Conference on Machine Learning , vol.MLC-92
    • Singh, S.P.1
  • 28
    • 0009656873 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales
    • R. Sutton, D. Precup, and S. Singh. Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales. Journal of Artificial Intelligence Research, 1:1-39, 1998.
    • (1998) Journal of Artificial Intelligence Research , vol.1 , pp. 1-39
    • Sutton, R.1    Precup, D.2    Singh, S.3
  • 30
    • 0031455549 scopus 로고    scopus 로고
    • Neurocontroller using dynamic state feedback for compensatory control
    • Cs. Szepesvári, Sz. Cimmer, and A. Lorincz. Neurocontroller using dynamic state feedback for compensatory control. Neural Networks, 10 (9):1691-1708, 1997.
    • (1997) Neural Networks , vol.10 , Issue.9 , pp. 1691-1708
    • Szepesvári, Cs.1    Cimmer, Sz.2    Lorincz, A.3
  • 31
    • 0031455549 scopus 로고    scopus 로고
    • Dynamic state feedback neurocontroller for compensatory control
    • Cs. Szepesvári, Sz. Cimmer, and A. Lorincz. Dynamic state feedback neurocontroller for compensatory control. Neural Networks, 10:1691-1708, 1997.
    • (1997) Neural Networks , vol.10 , pp. 1691-1708
    • Szepesvári, Cs.1    Cimmer, Sz.2    Lorincz, A.3
  • 33
    • 0008572817 scopus 로고    scopus 로고
    • Approximate inverse-dynamics based robust control using static and dynamic feedback
    • J. Kalkkuhl, K. J. Hunt, R. Zbikowski, and A. Dzielinski, editors. World Scientific, Singapore
    • Cs. Szepesvári and A. Lorincz. Approximate inverse-dynamics based robust control using static and dynamic feedback. In J. Kalkkuhl, K. J. Hunt, R. Zbikowski, and A. Dzielinski, editors, Applications of Neural Adaptive Control Theory, volume 2, pages 151-179. World Scientific, Singapore, 1997.
    • (1997) Applications of Neural Adaptive Control Theory , vol.2 , pp. 151-179
    • Szepesvári, Cs.1    Lorincz, A.2
  • 34
    • 0031678862 scopus 로고    scopus 로고
    • An integrated architecture for motion-control and path-planning
    • Cs. Szepesvári and A. Lorincz. An integrated architecture for motion-control and path-planning. Journal of Robotic Systems, 15:1-15, 1998.
    • (1998) Journal of Robotic Systems , vol.15 , pp. 1-15
    • Szepesvári, Cs.1    Lorincz, A.2
  • 35
    • 0041413263 scopus 로고    scopus 로고
    • Event-learning with a non-markovian controller
    • F. van Harmelen, editor, Lyon, IOS Press, Amsterdam
    • I. Szita, B. Takács, and A. Lorincz. Event-learning with a non-markovian controller. In F. van Harmelen, editor, 15th European Conference on Artifical Intelligence, Lyon, pages 365-369. IOS Press, Amsterdam, 2002.
    • (2002) 15th European Conference on Artifical Intelligence , pp. 365-369
    • Szita, I.1    Takács, B.2    Lorincz, A.3
  • 37
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • September
    • J. N. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machime Learning, 3(16):185-202, September 1994.
    • (1994) Machime Learning , vol.3 , Issue.16 , pp. 185-202
    • Tsitsiklis, J.N.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.