SCOPUS 정보 검색 플랫폼

Handbook of Learning and Approximate Dynamic Programming

Volumn , Issue , 2004, Pages 47-63

Reinforcement learning and its relationship to supervised learning

(2) Barto, Andrew G a Dietterich, Thomas G b

a University of Massachusetts Amherst (United States)

b Oregon State University (United States)

Author keywords

Algorithm design and analysis; Learning; Loss measurement; Machine learning; Supervised learning; Training

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING SYSTEMS; PERSONNEL TRAINING; REINFORCEMENT LEARNING; SUPERVISED LEARNING;

ALGORITHM DESIGN AND ANALYSIS; APPROXIMATE DYNAMIC PROGRAMMING; LEARNING; LOSS MEASUREMENT; ON-MACHINES; REAL-WORLD;

DYNAMIC PROGRAMMING;

EID: 84986214645 PISSN: None EISSN: None Source Type: Book
DOI: 10.1109/9780470544785.ch2 Document Type: Chapter

Times cited : (69)

References (37)

1
- 0029210635
- Learning to act using real-time dynamic programming
- A. G. Barto, S. J Bradtke, and S. P. Singh, Learning to act using real-time dynamic programming, Artificial Intelligence, vol. 72, pp. 81-138, 1995.
- (1995) Artificial Intelligence , vol.72 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

2
- 0013535965
- Infinite-horizon gradient-based policy search
- J. Baxter and P. L. Bartlett, Infinite-horizon gradient-based policy search, Journal of Artificial Intelligence Research, vol. 15, pp. 319-350, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

3
- 0013495368
- Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments
- J. Baxter, P. L. Bartlett, and L. Weaver, Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments, Journal of Artificial Intelligence Research, vol. 15, pp. 351-381, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.L.² Weaver, L.³

4
- 0003565779
- Prentice-Hall, Englewood Cliffs, NJ
- D. P. Bertsekas, Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Cliffs, NJ, 1987.
- (1987) Dynamic Programming: Deterministic and Stochastic Models
- Bertsekas, D.P.¹

5
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 0003487601
- Oxford University Press, Oxford, England
- C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, Oxford, England, 1996.
- (1996) Neural Networks for Pattern Recognition
- Bishop, C.M.¹

7
- 0038595396
- Least-squares temporal difference learning
- I. Bratko, and S. Dze-roski (eds.)
- J. A. Boyan, Least-squares temporal difference learning, in I. Bratko, and S. Dze-roski (eds.), Machine Learning: Proc. Of the 16th International Conference (ICML), 1999.
- (1999) Machine Learning: Proc. Of the 16Th International Conference (ICML)
- Boyan, J.A.¹

8
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol. 22, pp. 33-57,1996.
- (1996) Machine Learning , vol.22 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

9
- 0003802343
- Wadsworth and Brooks, Monterey, CA
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA, 1984.
- (1984) Classification and Regression Trees
- Breiman, L.¹ Friedman, J.H.² Olshen, R.A.³ Stone, C.J.⁴

10
- 84899017487
- Motivated reinforcement learning
- T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), MIT Press, Cambridge, MA
- P. Dayan, Motivated reinforcement learning, in T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference, pp. 11-18, MIT Press, Cambridge, MA, 2003.
- (2003) Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference , pp. 11-18
- Dayan, P.¹

11
- 84899029004
- Batch value function approximation via support vectors
- T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), MIT Press, Cambridge, MA
- T. G. Dietterich and X. Wang, Batch value function approximation via support vectors, in T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference, pp. 1491-1498, MIT Press, Cambridge, MA, 2003.
- (2003) Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference , pp. 1491-1498
- Dietterich, T.G.¹ Wang, X.²

12
- 0003547856
- Norton, New York
- R. Dawkins, The Blind Watchmaker, Norton, New York, 1986.
- (1986) The Blind Watchmaker
- Dawkins, R.¹

13
- 0003922190
- Wiley, New York
- R. O. Duda, R E. Hart, and D. G. Stork, Pattern Classification, Second Edition, Wiley, New York, 2001.
- (2001) Pattern Classification, Second Edition
- Duda, R.O.¹ Hart, R.E.² Stork, D.G.³

14
- 0004003001
- Academic Press, New York
- A. A. Feldbaum, Optimal Control Systems, Academic Press, New York, 1965.
- (1965) Optimal Control Systems
- Feldbaum, A.A.¹

15
- 0004232663
- Prentice-Hall, Upper Saddle River, NJ
- B. R. Hergenhahn and M. H. Olson, An Introduction to Theories of Learning (Sixth Edition), Prentice-Hall, Upper Saddle River, NJ, 2001.
- (2001) An Introduction to Theories of Learning (Sixth Edition)
- Hergenhahn, B.R.¹ Olson, M.H.²

16
- 0003879107
- MIT Press, Cambridge, MA
- M. J. Kearns and U. V. Vazirani, An Introduction to Computational Learning Theory, MIT Press, Cambridge, MA, 1994.
- (1994) An Introduction to Computational Learning Theory
- Kearns, M.J.¹ Vazirani, U.V.²

17
- 1942420814
- Reinforcement learning as classification: Leveraging modem classifiers
- T. G. Fawcett, N. Mishra (eds.), AAAI Press, Menlo Park, CA
- M. G. Lagoudakis and R. Parr, Reinforcement learning as classification: leveraging modem classifiers, in T. G. Fawcett, N. Mishra (eds.), Proc. 20th International Conference on Machine Learning, pp. 424-431, AAAI Press, Menlo Park, CA, 2003.
- (2003) Proc. 20Th International Conference on Machine Learning , pp. 424-431
- Lagoudakis, M.G.¹ Parr, R.²

18
- 0004203240
- John Wiley & Sons, Inc., New York
- G. J. McLachlan and T. Krishnan, The EM Algorithms and Extensions, John Wiley & Sons, Inc., New York, 1997.
- (1997) The EM Algorithms and Extensions
- Mc Lachlan, G.J.¹ Krishnan, T.²

19
- 77956759998
- Reinforcement learning control and pattem recognition systems
- J. M. Mendel and K. S. Fu (eds.), Academic Press, New York
- J. M. Mendel and R. W. McLaren, Reinforcement learning control and pattem recognition systems, in J. M. Mendel and K. S. Fu (eds.), Adaptive Learning and Pattern Recognition Systems: Theory and Applications, pp. 287-318, Academic Press, New York, 1970.
- (1970) Adaptive Learning and Pattern Recognition Systems: Theory and Applications , pp. 287-318
- Mendel, J.M.¹ Mc Laren, R.W.²

20
- 0347592013
- Behavioural clones and cognitive skill models
- K. Furukawa, D. Michie, and S. Muggleton (eds.), Oxford University Press, New York
- D. Michie and C. Sammut, Behavioural clones and cognitive skill models, in K. Furukawa, D. Michie, and S. Muggleton (eds.), Machine Intelligence 14: Applied Machine Intelligence, pp. 387-395, Oxford University Press, New York, 1996.
- (1996) Machine Intelligence 14: Applied Machine Intelligence , pp. 387-395
- Michie, D.¹ Sammut, C.²

21
- 0013500961
- Ph.D. dissertation, Princeton University
- M. L. Minsky, Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain-Model Problem, Ph.D. dissertation, Princeton University, 1954.
- (1954) Theory of Neural-Analog Reinforcement Systems and Its Application to the Brain-Model Problem
- Minsky, M.L.¹

22
- 84937350040
- Steps toward artificial intelligence
- E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 406-450, McGraw-Hill, New York
- M. L. Minsky, Steps toward artificial intelligence, Proc. Of the Institute of Radio Engineers, vol. 49, pp. 8-30, 1961. E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 406-450, McGraw-Hill, New York, 1963.
- (1961) Proc. Of the Institute of Radio Engineers , vol.49 , pp. 8-30
- Minsky, M.L.¹

23
- 0004255908
- McGraw-Hill, New York
- T. Mitchell, Machine Learning, McGraw-Hill, New York, 1997.
- (1997) Machine Learning
- Mitchell, T.¹

24
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- A. W. Moore and C. G. Atkeson, Prioritized sweeping: reinforcement learning with less data and less real time, Machine Learning, vol. 13, pp. 103-130, 1993.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

25
- 0003212629
- Efficient training of artificial neural networks for autonomous navigation
- D. A. Pomerleau, Efficient training of artificial neural networks for autonomous navigation, Neural Computation, vol. 3, pp. 88-97, 1991.
- (1991) Neural Computation , vol.3 , pp. 88-97
- Pomerleau, D.A.¹

26
- 84867767727
- Morgan Kaufmann, San Francisco
- J. R. Quinlan, C4.5: Programs for Empirical Learning, Morgan Kaufmann, San Francisco, 1993.
- (1993) C4.5: Programs for Empirical Learning
- Quinlan, J.R.¹

27
- 0001201756
- Some studies in machine learning using the game of checkers
- Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 71-105, McGraw-Hill, New York
- A. L. Samuel, Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, vol. 3, pp. 211-229, 1959. Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 71-105, McGraw-Hill, New York, 1963.
- (1959) IBM Journal on Research and Development , vol.3 , pp. 211-229
- Samuel, A.L.¹

28
- 0004286830
- Macmillan, New York
- H. A. Simon, Administrative Behavior, Macmillan, New York, 1947.
- (1947) Administrative Behavior
- Simon, H.A.¹

29
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

30
- 33847202724
- Learning to predict by the method of temporal differences
- R. S. Sutton, Learning to predict by the method of temporal differences, Machine Learning, vol. 3, pp. 9-44,1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

31
- 85156221438
- Generalization in reinforcement learning: Successful examples using coarse coding
- D. S. Touretzky, M. C. Moser and M. E. Hesselmo (eds.), MIT Press, Cambridge, MA
- R. S. Sutton, Generalization in reinforcement learning: successful examples using coarse coding, in D. S. Touretzky, M. C. Moser and M. E. Hesselmo (eds.), Advances in Neural Information Processing Systems, Proc. Of the 1995 Conference, pp. 1038-1044, MIT Press, Cambridge, MA, 1996.
- (1996) Advances in Neural Information Processing Systems, Proc. Of the 1995 Conference , pp. 1038-1044
- Sutton, R.S.¹

32
- 0001046225
- Practical issues in temp oral difference learning
- G. J. Tesauro, Practical issues in temp oral difference learning, Machi ne Learning, vol. 8, pp. 217-257,1992.
- (1992) Machi Ne Learning , vol.8 , pp. 217-257
- Tesauro, G.J.¹

33
- 0000985504
- TD-Gammon, A self-teaching backgammon program, achieves master-level play
- G. J. Tesauro, TD-Gammon, A self-teaching backgammon program, achieves master-level play, Neural Computation, vol. 6, pp. 215-219, 1994.
- (1994) Neural Computation , vol.6 , pp. 215-219
- Tesauro, G.J.¹

34
- 0029276036
- Temporal Difference Learning and TD-Gammon
- G. Tesauro, Temporal Difference Learning and TD-Gammon, Communications of the ACM, vol. 28, pp. 58-68,1995.
- (1995) Communications of the ACM , vol.28 , pp. 58-68
- Tesauro, G.¹

35
- 0003998491
- Hafher, Darien, CT
- E. L. Thorndike, Animal Intelligence, Hafher, Darien, CT, 1911.
- (1911) Animal Intelligence
- Thorndike, E.L.¹

36
- 0002988210
- Computing machinery and intelligence
- Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 11-15, McGraw-Hill, New York, 1963
- A. M. Turing, Computing machinery and intelligence, Mind, vol. 59, pp. 433-460, 1950. Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 11-15, McGraw-Hill, New York, 1963.
- (1950) Mind , vol.59 , pp. 433-460
- Turing, A.M.¹

37
- 1942451973
- Model-based policy gradient reinforcement learning
- T. G. Fawcett, N. Mishra (eds.), AAAI Press, Menlo Park, CA
- X. Wang and T. G. Dietterich, Model-based policy gradient reinforcement learning, in T. G. Fawcett, N. Mishra (eds.), Proc. 20th International Conference on Machine Learning, pp. 776-783, AAAI Press, Menlo Park, CA, 2003.
- (2003) Proc. 20Th International Conference on Machine Learning , pp. 776-783
- Wang, X.¹ Dietterich, T.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.