메뉴 건너뛰기




Volumn 100, Issue 1-2, 1998, Pages 177-224

Model-based average reward reinforcement learning

Author keywords

AGV scheduling; Average reward; Bayesian networks; Exploration; Linear regression; Machine learning; Model based; Reinforcement learning

Indexed keywords

COMPUTER SIMULATION; FUNCTIONS; MATHEMATICAL MODELS; OPTIMIZATION; PERFORMANCE; REGRESSION ANALYSIS; STATE SPACE METHODS;

EID: 0032050241     PISSN: 00043702     EISSN: None     Source Type: Journal    
DOI: 10.1016/s0004-3702(98)00002-2     Document Type: Article
Times cited : (66)

References (47)
  • 6
    • 0003315455 scopus 로고
    • Dynamic Programming and Optimal Control
    • Belmont, MA
    • D. P. Bertsekas, Dynamic Programming and Optimal Control, Athena Scientific, Belmont, MA, 1995.
    • (1995) Athena Scientific
    • Bertsekas, D.P.1
  • 11
    • 0003259931 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • MIT Press, Cambridge, MA
    • R.H. Crites, A.G. Barto, Improving elevator performance using reinforcement learning, in: Advances in Neural Information Processing Systems, Vol. 8, MIT Press, Cambridge, MA, 1996.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Crites, R.H.1    Barto, A.G.2
  • 12
    • 84990553353 scopus 로고
    • A model for reasoning about persistence and causation
    • T. Dean, K. Kanazawa, A model for reasoning about persistence and causation, Computational Intelligence 5 (3) (1989) 142-150.
    • (1989) Computational Intelligence , vol.5 , Issue.3 , pp. 142-150
    • Dean, T.1    Kanazawa, K.2
  • 13
    • 0015346497 scopus 로고
    • The reduced nearest neighbor rule
    • G.W. Gates, The reduced nearest neighbor rule, IEEE Trans. Inform. Theory (1972) 431-433.
    • (1972) IEEE Trans. Inform. Theory , pp. 431-433
    • Gates, G.W.1
  • 14
    • 84931162639 scopus 로고
    • The condensed nearest neighbor rule
    • P.E. Hart, The condensed nearest neighbor rule, IEEE Trans. Inform. Theory 14 (1968) 515-516.
    • (1968) IEEE Trans. Inform. Theory , vol.14 , pp. 515-516
    • Hart, P.E.1
  • 20
    • 0029751419 scopus 로고    scopus 로고
    • The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms
    • S. Koenig, R.G Simmons, The effect of representation and knowledge on goal-directed exploration with reinforcement-learning algorithms, Machine Learning 22 (1996) 227-250.
    • (1996) Machine Learning , vol.22 , pp. 227-250
    • Koenig, S.1    Simmons, R.G.2
  • 21
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning, and teaching
    • L-J. Lin, Self-improving reactive agents based on reinforcement learning, planning, and teaching, Machine Learning 8 (1992) 293-321.
    • (1992) Machine Learning , vol.8 , pp. 293-321
    • Lin, L.-J.1
  • 24
    • 0029752592 scopus 로고    scopus 로고
    • Average reward reinforcement learning: Foundations, algorithms, and empirical results
    • S. Mahadevan, Average reward reinforcement learning: foundations, algorithms, and empirical results, Machine Learning 22 (1996) 159-195.
    • (1996) Machine Learning , vol.22 , pp. 159-195
    • Mahadevan, S.1
  • 25
    • 0042147543 scopus 로고    scopus 로고
    • Sensitive discount optimality: Unifying discounted and average reward reinforcement learning
    • Bari, Italy
    • S. Mahadevan, Sensitive discount optimality: Unifying discounted and average reward reinforcement learning, in: Proceedings International Machine Learning Conference, Bari, Italy, 1996.
    • (1996) Proceedings International Machine Learning Conference
    • Mahadevan, S.1
  • 26
    • 0026880130 scopus 로고
    • Automatic programming of behavior-based robots using reinforcement learning
    • S. Mahadevan, J. Connell, Automatic programming of behavior-based robots using reinforcement learning, Artificial Intelligence 55 (1992) 311-365.
    • (1992) Artificial Intelligence , vol.55 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 28
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • A.W. Moore, A.G. Atkeson, Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning J. 13 (1993) 103-130.
    • (1993) Machine Learning J. , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, A.G.2
  • 30
    • 0002154410 scopus 로고
    • The data association problem when monitoring robot vehicles using dynamic belief networks
    • Vienna, Austria, Wiley, New York
    • A.E. Nicholson, J.M. Brady, The data association problem when monitoring robot vehicles using dynamic belief networks, in: ECAI 92: 10th European Conference on Artificial Intelligence Proceedings, Vienna, Austria, Wiley, New York, 1992, pp. 689-693.
    • (1992) ECAI 92: 10th European Conference on Artificial Intelligence Proceedings , pp. 689-693
    • Nicholson, A.E.1    Brady, J.M.2
  • 31
    • 0042147542 scopus 로고    scopus 로고
    • Ph.D. Thesis, Technical Report, 96-30-2, Department of Computer Science, Oregon State University, Corvallis, OR
    • D. Ok, A study of model-based average reward reinforcement learning, Ph.D. Thesis, Technical Report, 96-30-2, Department of Computer Science, Oregon State University, Corvallis, OR, 1996.
    • (1996) A Study of Model-based Average Reward Reinforcement Learning
    • Ok, D.1
  • 36
    • 0028374275 scopus 로고
    • Robot juggling: An implementation of memory-based learning
    • S. Schaal, C. Atkeson, Robot juggling: an implementation of memory-based learning, in: IEEE Control Systems, Vol. 14, 1994, pp. 57-71.
    • (1994) IEEE Control Systems , vol.14 , pp. 57-71
    • Schaal, S.1    Atkeson, C.2
  • 37
    • 85152626183 scopus 로고
    • A reinforcement learning method for maximizing undiscounted rewards
    • Morgan Kaufmann, San Mateo, CA
    • A. Schwartz, A reinforcement learning method for maximizing undiscounted rewards, in: Proceedings 10th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
    • (1993) Proceedings 10th International Conference on Machine Learning
    • Schwartz, A.1
  • 38
    • 0028574683 scopus 로고
    • Reinforcement learning algorithms for average-payoff markovian decision processes
    • Seattle, WA, MIT Press, Cambridge, MA
    • S.P. Singh, Reinforcement learning algorithms for average-payoff markovian decision processes, in: Proceedings National Conference on Artificial Intelligence (AAAI-94), Seattle, WA, MIT Press, Cambridge, MA, 1994.
    • (1994) Proceedings National Conference on Artificial Intelligence (AAAI-94)
    • Singh, S.P.1
  • 39
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R.S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning 3 (1988) 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 40
    • 85132026293 scopus 로고
    • Integrating architectures for learning, planning and reacting based on approximating dynamic programming
    • Austin, TX
    • R.S. Sutton, Integrating architectures for learning, planning and reacting based on approximating dynamic programming, in: Proceedings Seventh International Conference on Machine Learning, Austin, TX, 1990.
    • (1990) Proceedings Seventh International Conference on Machine Learning
    • Sutton, R.S.1
  • 42
    • 0002313852 scopus 로고    scopus 로고
    • Scaling up average reward reinforcement learning by approximating the domain models and the value function
    • P. Tadepalli, D. Ok, Scaling up average reward reinforcement learning by approximating the domain models and the value function, in: Proceedings 13th International Conference on Machine Learning, 1996.
    • (1996) Proceedings 13th International Conference on Machine Learning
    • Tadepalli, P.1    Ok, D.2
  • 43
    • 85152198941 scopus 로고
    • Multi-agent reinforcement learning: Independent vs. cooperative agents
    • Morgan Kaufmann, San Mateo, CA
    • M. Tan, Multi-agent reinforcement learning: independent vs. cooperative agents, in: Proceedings 10th International Conference on Machine Learning, Morgan Kaufmann, San Mateo, CA, 1993.
    • (1993) Proceedings 10th International Conference on Machine Learning
    • Tan, M.1
  • 44
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • G. Tesauro, Practical issues in temporal difference learning, Machine Learning 8 (3-4) (1992) 257-277.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
    • Tesauro, G.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.