SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 49, Issue 2-3, 2002, Pages 325-346

Structure in the space of value functions

(2) Foster, David a,b Dayan, Peter b

a UNIVERSITY OF EDINBURGH (United Kingdom)

b UNIVERSITY COLLEGE LONDON (United Kingdom)

Author keywords

Density estimation; Dynamic programming; Mixture models; Reinforcement learning; Unsupervised learning; Value functions

Indexed keywords

DENSITY ESTIMATION; MARKOV DECISION PROBLEMS; MIXTURE MODELS; REINFORCEMENT LEARNING; UNSUPERVISED LEARNING; VALUE FUNCTIONS;

DYNAMIC PROGRAMMING; HIERARCHICAL SYSTEMS; MARKOV PROCESSES; MATHEMATICAL MODELS; OPTIMAL CONTROL SYSTEMS;

LEARNING SYSTEMS;

EID: 0036832959 PISSN: 08856125 EISSN: None Source Type: Journal
DOI: 10.1023/A:1017944732463 Document Type: Article

Times cited : (65)

References (41)

1
- 70349670157
- Learning structure of latent variable models by variational Bayes
- In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.); Cambridge, MA: MIT Press
- (2000) Advances in Neural Information Processing Systems , vol.13
- Attias, H.¹

2
- 0020970738
- Neuronlike elements that can solve difficult learning problems
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

3
- 0002201501
- Learning and sequential decision making
- In M. Gabriel & J. Moore (Eds.); Cambridge, MA: MIT Press, Bradford Books
- (1990) Learning and Computational Neuroscience: Foundations of Adaptive Networks
- Barto, A.G.¹ Sutton, R.S.² Watkins, C.J.C.H.³

4
- 0346942368
- Decision theoretic planning: Structural assumptions and computational leverage
- (2000) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
- Boutilier, C.¹ Dean, T.² Hanks, S.³

5
- 0020737631
- The Laplacian pyramid as a compact image code
- (1983) IEEE Transactions on Communications , vol.31 , pp. 532-540
- Burt, P.J.¹ Adelson, E.H.²

6
- 0027693887
- Finite element methods for active contour models and balloons for 2d and 3d images
- (1993) IEEE Transactions in Pattern Analysis and Machine Intelligence , vol.15 , pp. 1131-1147
- Cohen, L.D.¹ Cohen, I.²

7
- 84889281816
- New York, NY: Wiley
- (1991) Elements of Information Theory
- Cover, T.M.¹ Thomas, J.A.²

8
- 0003259931
- Improving elevator performance using reinforcement learning
- In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.); Cambridge, MA: MIT Press
- (1996) Advances in Neural Information Processing , vol.9
- Crites, R.H.¹ Barto, A.G.²

9
- 0026255231
- O-Plan: The open planning architecture
- (1991) Artificial Intelligence , vol.52 , pp. 49-86
- Currie, K.W.¹ Tate, A.²

10
- 0001234682
- Feudal reinforcement learning
- In S. J. Hanson, J. D. Cowan, & C. L. Giles (Eds.); San Mateo, CA: Morgan Kaufmann
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 271-278
- Dayan, P.¹ Hinton, G.E.²

11
- 0002629270
- Maximum likelihood from incomplete data via the EM algorithm
- (1977) Journal of the Royal Statistical Society, B , vol.39 , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

12
- 0008813540
- Hierarchical reinforcement learning with the MAXQ value function decomposition
- San Mateo, CA: Morgan Kaufmann Publishers
- (1997) Proceedings of the 15th International Conference on Machine Learning
- Dietterich, T.G.¹

13
- 84957069785
- Composing functions to speed up reinforcement learning in a changing world
- In C. Nedellec & C. Rouveirol (Eds.)
- (1998) Lecture Notes in Artificial Intelligence , vol.1398 , pp. 370-381
- Drummond, C.¹

14
- 0015440625
- Learning and executing generalized robot plans
- (1972) Artificial Intelligence , vol.3 , pp. 251-288
- Fikes, R.E.¹ Hart, P.E.² Nilsson, N.J.³

15
- 0017961288
- Multilayer control of large Markov chains
- (1978) IEEE Transactions on Automatic Control , vol.AC-23 , pp. 298-304
- Forestier, J.-P.¹ Varaiya, P.²

16
- 0008830333
- Efficient stochastic source coding and an application to a Bayesian network source model
- (1997) The Computer Journal , vol.40 , pp. 157-165
- Frey, B.J.¹ Hinton, G.E.²

17
- 84898934543
- Variational inference for Bayesian mixtures of factor analyzers
- In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.); Cambridge, MA: MIT Press
- (2000) Advances in Neural Information Processing Systems , vol.13
- Ghahramani, Z.¹ Beal, M.J.²

18
- 85156203891
- Stable fitted reinforcement learning
- In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.); Cambridge, MA: MIT Press
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1052-1058
- Gordon, G.J.¹

19
- 24844453140
- Planning with temporally abstract actions
- Report CS-98-01, Department of Computer Science, Brown University, Providence, RI
- (1998)
- Hauskrecht, M.¹

20
- 0006419533
- Hierarchical solution of Markov-decision processes using macro-actions
- (1998) Proceedings of the 14th Annual Conference on Uncertainty in Artificial Intelligence
- Hauskrecht, M.¹ Meuleau, N.² Boutilier, C.³ Kaelbling, L.P.⁴ Dean, T.⁵

21
- 0031590130
- Generative models for discovering sparse distributed representations
- (1997) Philosophical Transactions of the Royal Society, Series B , vol.352 , pp. 1177-1190
- Hinton, G.E.¹ Ghahramani, Z.²

22
- 0002834189
- Autoencoders, minimum description length, and Helmholtz free energy
- In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.); San Mateo, CA: Morgan Kaufmann
- (1994) Advances in Neural Information Processing Systems , vol.6
- Hinton, G.E.¹ Zemel, R.S.²

23
- 0000908087
- Hierarchical reinforcement learning: Preliminary results
- San Francisco, CA, USA: Morgan Kaufmann Publishers
- (1993) Proceedings of the 10th International Conference on Machine Learning , pp. 163
- Kaelbling, L.P.¹

24
- 0008815681
- Exponentiated gradient versus gradient descent for linear predictors
- (1997) Information and Computation , vol.132 , pp. 1463
- Kivinen, J.¹ Warmuth, M.K.²

25
- 0022045044
- Macro-operators: A weak method for learning
- (1985) Artificial Intelligence , vol.26 , pp. 35-77
- Korf, R.E.¹

26
- 0003891734
- New York, NY: Marcel Dekker
- (1988) Mixture Models: Inference and Applications to Clustering
- McLachlan, G.J.¹ Basford, K.E.²

27
- 84880688141
- Multi-value functions: Efficient automatic action hierarchies for multiple goal MDPs
- (1999) International Joint Conference on Artificial Intelligence
- Moore, A.W.¹ Baird, L.² Kaelbling, L.P.³

28
- 0003989214
- Hierarchical control and learning for Markov decision processes
- Ph.D. Thesis, Computer Science Division, UC Berkeley
- (1998)
- Parr, R.¹

29
- 84898956770
- Reinforcement learning with hierarchies of machines
- In M. Mozer, M. I. Jordan, & T. Petsche (Eds.); Cambridge, MA: MIT Press
- (1998) Advances in Neural Information Processing Systems , vol.11
- Parr, R.¹ Russell, S.²

30
- 84899003140
- Multi-time models for temporally abstract planning
- In M. Mozer, M. I. Jordan, & T. Petsche (Eds.); Cambridge, MA: MIT Press
- (1998) Advances in Neural Information Processing Systems , vol.11 , pp. 1050-1056
- Precup, D.¹ Sutton, R.S.²

31
- 84957069070
- Theoretical results on reinforcement learning with temporally abstract options
- Berlin, Germany: Springer-Verlag
- (1998) Proceedings of the 10th European Conference on Machine Learning , pp. 382-393
- Precup, D.¹ Sutton, R.S.² Singh, S.P.³

32
- 0004087635
- Singapore: World Scientific
- (1989) Stochastic Complexity in Statistical Inquiry
- Rissanen, J.¹

33
- 0003636089
- On-line Q-learning using connectionist systems
- Technical Report CUED/F-INFENG/TR 166. Engineering Department, Cambridge University
- (1994)
- Rummery, G.A.¹ Niranjan, M.²

34
- 0026962175
- Reinforcement learning with a hierarchy of abstract models
- Menlo Park, CA: AAAI Press/MIT Press
- (1992) Proceedings of the 10th National Conference on Artificial Intelligence , pp. 202-207
- Singh, S.P.¹

35
- 85153965130
- Reinforcement learning with soft state aggregation
- In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.); Cambridge, MA: MIT Press
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

36
- 84922015064
- TD models: Modeling the world at a mixture of time scales
- San Francisco, CA, USA: Morgan Kaufmann Publishers
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 531-539
- Sutton, R.S.¹

37
- 0004102479
- Cambridge, MA: MIT Press
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

38
- 0003899594
- Between MDPs and semi-MDPs: Learning, planning and representing knowledge at multiple temporal scales
- Report 98-74, Department of Computer Science, University of Massachusetts, Amherst, MA
- (1998)
- Sutton, R.S.¹ Precup, D.² Singh, S.P.³

39
- 0002680667
- Generating project networks
- Cambridge, MA: IJCAI
- (1977) Proceedings of the 5th International Joint Conference on Artificial Intelligence , pp. 888-893
- Tate, A.¹

40
- 0000277836
- Finding structure in reinforcement learning
- In G. Tesauro, D. S. Touretzky, & T. Leen (Eds.); Cambridge, MA: MIT Press
- (1995) Advances in Neural Information Processing Systems , vol.7
- Thrun, S.¹ Schwartz, A.²

41
- 0004049893
- Learning from delayed rewards
- Ph.D. thesis, University of Cambridge, Cambridge, UK
- (1989)
- Watkins, C.J.C.H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.