SCOPUS 정보 검색 플랫폼

Journal of Intelligent and Robotic Systems: Theory and Applications

Volumn 21, Issue 1, 1998, Pages 51-71

Embedding a Priori Knowledge in Reinforcement Learning

(1) Ribeiro, Carlos H C a

a IMPERIAL COLLEGE LONDON (United Kingdom)

Author keywords

Experience generalisation; Q learning algorithm; Reinforcement learning

Indexed keywords

KNOWLEDGE BASED SYSTEMS; LEARNING ALGORITHMS; STATE SPACE METHODS;

EXPERIENCE GENERALIZATION; REINFORCEMENT LEARNING;

LEARNING SYSTEMS;

EID: 0031607078 PISSN: 09210296 EISSN: None Source Type: Journal
DOI: 10.1023/A:1007968115863 Document Type: Article

Times cited : (20)

References (21)

1
- 0003778897
- Springer
- Benveniste A., Métivier M., and Priouret P.: 1990, Adaptive Algorithms and Stochastic Approximations, Springer.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Métivier, M.² Priouret, P.³

2
- 0003565779
- Prentice-Hall
- Bertsekas D. P.: 1987, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall.
- (1987) Dynamic Programming: Deterministic and Stochastic Models
- Bertsekas, D.P.¹

3
- 0001133021
- Generalization in reinforcement learning: Safely approximating the value function
- G. Tesauro, D. S. Touretzky, and T. K. Leen (eds), MIT Press
- Boyan J. A. and Moore A. W.: 1995, Generalization in reinforcement learning: Safely approximating the value function, in: G. Tesauro, D. S. Touretzky, and T. K. Leen (eds), Advances in Neural Information Processing Systems Vol. 7, MIT Press.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Boyan, J.A.¹ Moore, A.W.²

4
- 0002192119
- Input generalization in delayed reinforcement learning: An algorithm and performance comparisons
- Chapman D. and Kaelbling L. P.: 1991, Input generalization in delayed reinforcement learning: An algorithm and performance comparisons, in: Proc. of the International Joint Conf. on Artificial Intelligence (IJCAI'91), pp. 726-731.
- (1991) Proc. of the International Joint Conf. on Artificial Intelligence (IJCAI'91) , pp. 726-731
- Chapman, D.¹ Kaelbling, L.P.²

5
- 84968515237
- Scattered data interpolation: Tests of some methods
- Franke R.: 1982, Scattered data interpolation: Tests of some methods. Mathematics of Computation 38(157), 181-200.
- (1982) Mathematics of Computation , vol.38 , Issue.157 , pp. 181-200
- Franke, R.¹

6
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakola T., Jordan M. I., and Singh S. P.: 1994, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation 6(6), 1185-1201.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakola, T.¹ Jordan, M.I.² Singh, S.P.³

7
- 0003730487
- Academic Press
- Karlin S. and Taylor H. M.: 1975, A First Course in Stochastic Processes, Academic Press.
- (1975) A First Course in Stochastic Processes
- Karlin, S.¹ Taylor, H.M.²

8
- 0000123778
- Self-improving reactive agents based on reinforcement learning, planning and teaching
- Lin L.-Ji: 1992, Self-improving reactive agents based on reinforcement learning, planning and teaching, Machine Learning 8, 293-321.
- (1992) Machine Learning , vol.8 , pp. 293-321
- Lin, L.-J.¹

9
- 0001961616
- A generalized reinforcement learning model: Convergence and applications
- Littman M. L. and Szepesvári C.: 1996, A generalized reinforcement learning model: Convergence and applications, in: Procs. of the Thirteenth International Conf. on Machine Learning (ICML'96), pp. 310-318.
- (1996) Procs. of the Thirteenth International Conf. on Machine Learning (ICML'96) , pp. 310-318
- Littman, M.L.¹ Szepesvári, C.²

10
- 0026880130
- Automatic programming of behavior-based robots using reinforcement learning
- Mahadevan S. and Connell J.: 1992, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55, 311-365.
- (1992) Artificial Intelligence , vol.55 , pp. 311-365
- Mahadevan, S.¹ Connell, J.²

11
- 17144419347
- The NSF workshop on reinforcement learning: Summary and observations
- in press
- Mahadevan S. and Kaelbling L. P.: 1996, The NSF workshop on reinforcement learning: Summary and observations, AI Magazine, in press.
- (1996) AI Magazine
- Mahadevan, S.¹ Kaelbling, L.P.²

12
- 0010220924
- Q-Learning combined with spreading: Convergence and results
- Ribeiro C. H. C. and Szepesvári C.: 1996, Q-Learning combined with spreading: Convergence and results, in: Procs. of the ISRF-IEE International Conf. on Intelligent and Cognitive Systems (Neural Networks Symposium), pp. 32-36.
- (1996) Procs. of the ISRF-IEE International Conf. on Intelligent and Cognitive Systems (Neural Networks Symposium) , pp. 32-36
- Ribeiro, C.H.C.¹ Szepesvári, C.²

13
- 0039753967
- Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm
- F. Fogelman-Soulié and P. Gallinari (eds), EC2 et Cie
- Ribeiro C. H. C.: 1995, Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm, in: F. Fogelman-Soulié and P. Gallinari (eds), Procs. of the International Conf. on Artificial Neural Networks (ICANN'95), Vol. 1, EC2 et Cie, pp. 455-460.
- (1995) Procs. of the International Conf. on Artificial Neural Networks (ICANN'95) , vol.1 , pp. 455-460
- Ribeiro, C.H.C.¹

14
- 0014432211
- A two-dimensional interpolation function for irregularly spaced data
- Shepard D.: 1968, A two-dimensional interpolation function for irregularly spaced data, in: Procs. of the 23th National Conf. ACM, pp. 517-523.
- (1968) Procs. of the 23th National Conf. ACM , pp. 517-523
- Shepard, D.¹

15
- 85156221438
- Generalization in reinforcement learning: Succesful examples using sparse coarse coding
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (eds), MIT Press
- Sutton R. S.: 1996, Generalization in reinforcement learning: Succesful examples using sparse coarse coding, in: D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo (eds), Advances in Neural Information Processing Systems Vol. 8, MIT Press, pp. 1038-1044.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
- Sutton, R.S.¹

16
- 0003629453
- Brown University, Department of Computer Science, Providence
- Szepesvári C. and Littman M. L.: 1996, Generalized Markov decision processes: Dynamic-programming and reinforcement-learning algorithms, Cs-96-11, Brown University, Department of Computer Science, Providence.
- (1996) Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms, Cs-96-11
- Szepesvári, C.¹ Littman, M.L.²

17
- 0001046225
- Practical issues in temporal difference learning
- Tesauro G.: 1992, Practical issues in temporal difference learning, Machine Learning 8, 257-277.
- (1992) Machine Learning , vol.8 , pp. 257-277
- Tesauro, G.¹

18
- 2342572884
- PhD thesis, University of Cambridge
- Tham C. K.: 1994, Modular on-line function approximation for scaling up reinforcement learning, PhD thesis, University of Cambridge.
- (1994) Modular On-line Function Approximation for Scaling Up Reinforcement Learning
- Tham, C.K.¹

19
- 0029752470
- Feature-based methods for large scale dynamic programming
- Tsitsiklis J. N. and Van Roy B.: 1996, Feature-based methods for large scale dynamic programming, Machine Learning 22, 59-94.
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

20
- 0031143730
- An analysis of temporal-difference learning with function approximation
- in press
- Tsitsiklis J. N. and Van Roy B.: 1997, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, in press.
- (1997) IEEE Transactions on Automatic Control
- Tsitsiklis, J.N.¹ Van Roy, B.²

21
- 0004049893
- PhD thesis, University of Cambridge
- Watkins C. J. C. H.: 1989, Learning from delayed rewards, PhD thesis, University of Cambridge.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.