메뉴 건너뛰기




Volumn , Issue , 2004, Pages 47-63

Reinforcement learning and its relationship to supervised learning

Author keywords

Algorithm design and analysis; Learning; Loss measurement; Machine learning; Supervised learning; Training

Indexed keywords

ARTIFICIAL INTELLIGENCE; LEARNING SYSTEMS; PERSONNEL TRAINING; REINFORCEMENT LEARNING; SUPERVISED LEARNING;

EID: 84986214645     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1109/9780470544785.ch2     Document Type: Chapter
Times cited : (69)

References (37)
  • 1
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • A. G. Barto, S. J Bradtke, and S. P. Singh, Learning to act using real-time dynamic programming, Artificial Intelligence, vol. 72, pp. 81-138, 1995.
    • (1995) Artificial Intelligence , vol.72 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 3
    • 0013495368 scopus 로고    scopus 로고
    • Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments
    • J. Baxter, P. L. Bartlett, and L. Weaver, Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments, Journal of Artificial Intelligence Research, vol. 15, pp. 351-381, 2001.
    • (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
    • Baxter, J.1    Bartlett, P.L.2    Weaver, L.3
  • 8
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol. 22, pp. 33-57,1996.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 10
    • 84899017487 scopus 로고    scopus 로고
    • Motivated reinforcement learning
    • T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), MIT Press, Cambridge, MA
    • P. Dayan, Motivated reinforcement learning, in T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference, pp. 11-18, MIT Press, Cambridge, MA, 2003.
    • (2003) Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference , pp. 11-18
    • Dayan, P.1
  • 11
    • 84899029004 scopus 로고    scopus 로고
    • Batch value function approximation via support vectors
    • T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), MIT Press, Cambridge, MA
    • T. G. Dietterich and X. Wang, Batch value function approximation via support vectors, in T. G. Dietterich, S. Becker, and Z. Ghahramani (eds.), Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference, pp. 1491-1498, MIT Press, Cambridge, MA, 2003.
    • (2003) Advances in Neural Information Processing Systems 14, Proc. Of the 2002 Conference , pp. 1491-1498
    • Dietterich, T.G.1    Wang, X.2
  • 17
    • 1942420814 scopus 로고    scopus 로고
    • Reinforcement learning as classification: Leveraging modem classifiers
    • T. G. Fawcett, N. Mishra (eds.), AAAI Press, Menlo Park, CA
    • M. G. Lagoudakis and R. Parr, Reinforcement learning as classification: leveraging modem classifiers, in T. G. Fawcett, N. Mishra (eds.), Proc. 20th International Conference on Machine Learning, pp. 424-431, AAAI Press, Menlo Park, CA, 2003.
    • (2003) Proc. 20Th International Conference on Machine Learning , pp. 424-431
    • Lagoudakis, M.G.1    Parr, R.2
  • 19
    • 77956759998 scopus 로고
    • Reinforcement learning control and pattem recognition systems
    • J. M. Mendel and K. S. Fu (eds.), Academic Press, New York
    • J. M. Mendel and R. W. McLaren, Reinforcement learning control and pattem recognition systems, in J. M. Mendel and K. S. Fu (eds.), Adaptive Learning and Pattern Recognition Systems: Theory and Applications, pp. 287-318, Academic Press, New York, 1970.
    • (1970) Adaptive Learning and Pattern Recognition Systems: Theory and Applications , pp. 287-318
    • Mendel, J.M.1    Mc Laren, R.W.2
  • 20
    • 0347592013 scopus 로고    scopus 로고
    • Behavioural clones and cognitive skill models
    • K. Furukawa, D. Michie, and S. Muggleton (eds.), Oxford University Press, New York
    • D. Michie and C. Sammut, Behavioural clones and cognitive skill models, in K. Furukawa, D. Michie, and S. Muggleton (eds.), Machine Intelligence 14: Applied Machine Intelligence, pp. 387-395, Oxford University Press, New York, 1996.
    • (1996) Machine Intelligence 14: Applied Machine Intelligence , pp. 387-395
    • Michie, D.1    Sammut, C.2
  • 22
    • 84937350040 scopus 로고
    • Steps toward artificial intelligence
    • E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 406-450, McGraw-Hill, New York
    • M. L. Minsky, Steps toward artificial intelligence, Proc. Of the Institute of Radio Engineers, vol. 49, pp. 8-30, 1961. E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 406-450, McGraw-Hill, New York, 1963.
    • (1961) Proc. Of the Institute of Radio Engineers , vol.49 , pp. 8-30
    • Minsky, M.L.1
  • 24
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less real time
    • A. W. Moore and C. G. Atkeson, Prioritized sweeping: reinforcement learning with less data and less real time, Machine Learning, vol. 13, pp. 103-130, 1993.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 25
    • 0003212629 scopus 로고
    • Efficient training of artificial neural networks for autonomous navigation
    • D. A. Pomerleau, Efficient training of artificial neural networks for autonomous navigation, Neural Computation, vol. 3, pp. 88-97, 1991.
    • (1991) Neural Computation , vol.3 , pp. 88-97
    • Pomerleau, D.A.1
  • 27
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 71-105, McGraw-Hill, New York
    • A. L. Samuel, Some studies in machine learning using the game of checkers, IBM Journal on Research and Development, vol. 3, pp. 211-229, 1959. Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 71-105, McGraw-Hill, New York, 1963.
    • (1959) IBM Journal on Research and Development , vol.3 , pp. 211-229
    • Samuel, A.L.1
  • 30
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • R. S. Sutton, Learning to predict by the method of temporal differences, Machine Learning, vol. 3, pp. 9-44,1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 31
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using coarse coding
    • D. S. Touretzky, M. C. Moser and M. E. Hesselmo (eds.), MIT Press, Cambridge, MA
    • R. S. Sutton, Generalization in reinforcement learning: successful examples using coarse coding, in D. S. Touretzky, M. C. Moser and M. E. Hesselmo (eds.), Advances in Neural Information Processing Systems, Proc. Of the 1995 Conference, pp. 1038-1044, MIT Press, Cambridge, MA, 1996.
    • (1996) Advances in Neural Information Processing Systems, Proc. Of the 1995 Conference , pp. 1038-1044
    • Sutton, R.S.1
  • 32
    • 0001046225 scopus 로고
    • Practical issues in temp oral difference learning
    • G. J. Tesauro, Practical issues in temp oral difference learning, Machi ne Learning, vol. 8, pp. 217-257,1992.
    • (1992) Machi Ne Learning , vol.8 , pp. 217-257
    • Tesauro, G.J.1
  • 33
    • 0000985504 scopus 로고
    • TD-Gammon, A self-teaching backgammon program, achieves master-level play
    • G. J. Tesauro, TD-Gammon, A self-teaching backgammon program, achieves master-level play, Neural Computation, vol. 6, pp. 215-219, 1994.
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.J.1
  • 34
    • 0029276036 scopus 로고
    • Temporal Difference Learning and TD-Gammon
    • G. Tesauro, Temporal Difference Learning and TD-Gammon, Communications of the ACM, vol. 28, pp. 58-68,1995.
    • (1995) Communications of the ACM , vol.28 , pp. 58-68
    • Tesauro, G.1
  • 36
    • 0002988210 scopus 로고
    • Computing machinery and intelligence
    • Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 11-15, McGraw-Hill, New York, 1963
    • A. M. Turing, Computing machinery and intelligence, Mind, vol. 59, pp. 433-460, 1950. Reprinted in E. A. Feigenbaum and J. Feldman (eds.), Computers and Thought, pp. 11-15, McGraw-Hill, New York, 1963.
    • (1950) Mind , vol.59 , pp. 433-460
    • Turing, A.M.1
  • 37
    • 1942451973 scopus 로고    scopus 로고
    • Model-based policy gradient reinforcement learning
    • T. G. Fawcett, N. Mishra (eds.), AAAI Press, Menlo Park, CA
    • X. Wang and T. G. Dietterich, Model-based policy gradient reinforcement learning, in T. G. Fawcett, N. Mishra (eds.), Proc. 20th International Conference on Machine Learning, pp. 776-783, AAAI Press, Menlo Park, CA, 2003.
    • (2003) Proc. 20Th International Conference on Machine Learning , pp. 776-783
    • Wang, X.1    Dietterich, T.G.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.