메뉴 건너뛰기




Volumn 2, Issue 2, 1999, Pages 141-172

Exploration Strategies for Model-based Learning in Multi-agent Systems

Author keywords

Exploration; Model based learning; Multi agent systems

Indexed keywords


EID: 0033423368     PISSN: 13872532     EISSN: None     Source Type: Journal    
DOI: 10.1023/A:1010007108196     Document Type: Article
Times cited : (67)

References (42)
  • 1
    • 0023453626 scopus 로고
    • Learning regular sets from queries and counterexamples
    • D. Angluin. "Learning regular sets from queries and counterexamples." Information and Computation, vol. 75 pp. 87-106, 1987.
    • (1987) Information and Computation , vol.75 , pp. 87-106
    • Angluin, D.1
  • 3
    • 0011471586 scopus 로고
    • The complexity of computing a best response automaton in repeated games with mixed strategies
    • E. Ben-Porath. "The complexity of computing a best response automaton in repeated games with mixed strategies." Games and Economic Behavior, vol. 2 pp. 1-12, 1990.
    • (1990) Games and Economic Behavior , vol.2 , pp. 1-12
    • Ben-Porath, E.1
  • 8
    • 0003328374 scopus 로고
    • Neural network exploration using optimal experimental design
    • J. D. Cowan, G. Tesauro, and J. Alspector, (Eds.), Morgan Kaufmann
    • D. A. Cohn. "Neural network exploration using optimal experimental design," in J. D. Cowan, G. Tesauro, and J. Alspector, (Eds.), Advances in Neural Information Processing Systems 6, Morgan Kaufmann: pp. 679-686, 1994.
    • (1994) Advances in Neural Information Processing Systems 6 , pp. 679-686
    • Cohn, D.A.1
  • 9
    • 0030260201 scopus 로고    scopus 로고
    • Exploration bonuses and dual control
    • P. Dayan and T. J. Sejnowski. "Exploration bonuses and dual control." Machine Learning, vol. 25(1) pp. 5-22, 1996.
    • (1996) Machine Learning , vol.25 , Issue.1 , pp. 5-22
    • Dayan, P.1    Sejnowski, T.J.2
  • 14
    • 0001536620 scopus 로고
    • Steady state learning and nash equilibrium
    • D. Fudenberg and D. Levine. "Steady state learning and nash equilibrium." Econometrica, vol. 61 pp. 547-574, 1993.
    • (1993) Econometrica , vol.61 , pp. 547-574
    • Fudenberg, D.1    Levine, D.2
  • 15
    • 38249006045 scopus 로고
    • Bounded versus unbounded rationality: The tyranny of the weak
    • I. Gilboa and D. Samet. "Bounded versus unbounded rationality: The tyranny of the weak." Games and Economic Behavior, vol. 1 pp. 213-221, 1989.
    • (1989) Games and Economic Behavior , vol.1 , pp. 213-221
    • Gilboa, I.1    Samet, D.2
  • 16
    • 38249029225 scopus 로고
    • The complexity of computing best response automata in repeated games
    • I. Gilboa. "The complexity of computing best response automata in repeated games." Journal of Economic Theory, vol. 45 pp. 342-352, 1988.
    • (1988) Journal of Economic Theory , vol.45 , pp. 342-352
    • Gilboa, I.1
  • 19
    • 0002298153 scopus 로고
    • Bayesian learning in normal form games
    • J. S. Jordan. "Bayesian learning in normal form games." Games and Economic Behavior, vol. 3 pp. 60-81, 1991.
    • (1991) Games and Economic Behavior , vol.3 , pp. 60-81
    • Jordan, J.S.1
  • 20
    • 38249015887 scopus 로고
    • The exponential convergence of bayesian learning in normal form games
    • J. S. Jordan. "The exponential convergence of bayesian learning in normal form games." Games and Economic Behavior, vol. 4 pp. 202-217, 1991.
    • (1991) Games and Economic Behavior , vol.4 , pp. 202-217
    • Jordan, J.S.1
  • 23
    • 0000221289 scopus 로고
    • Rational learning leads to Nash equilibrium
    • September
    • E. Kalai and E. Lehrer. "Rational learning leads to Nash equilibrium." Econometrica, vol. 61(5) pp. 1019-1045, September 1993.
    • (1993) Econometrica , vol.61 , Issue.5 , pp. 1019-1045
    • Kalai, E.1    Lehrer, E.2
  • 24
    • 0011473030 scopus 로고
    • Bounded rationality and strategic complexity in repeated games
    • T Ichiishi, A. Neyman, and Y. Tauman, (Eds.), Academic Press: San Diego
    • E. Kalai. "Bounded rationality and strategic complexity in repeated games," in T Ichiishi, A. Neyman, and Y. Tauman, (Eds.), Game Theory and Applications, Academic Press: San Diego, pp. 131-157, 1990.
    • (1990) Game Theory and Applications , pp. 131-157
    • Kalai, E.1
  • 28
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • A. W. Moore and C. G. Atkeson. "Prioritized sweeping: Reinforcement learning with less data and less time." Machine Learning, vol. 13(1), 1993.
    • (1993) Machine Learning , vol.13 , Issue.1
    • Moore, A.W.1    Atkeson, C.G.2
  • 30
    • 0042914184 scopus 로고    scopus 로고
    • Optimization and rational learning in games
    • J. H. Nachbar. "Optimization and rational learning in games." Econometrica vol. 65(2), 1997.
    • (1997) Econometrica , vol.65 , Issue.2
    • Nachbar, J.H.1
  • 32
    • 0000948830 scopus 로고
    • On players with a bounded number of states
    • C. H. Papadimitriou. "On players with a bounded number of states." Games and Economic Behavior, vol. 4 pp. 122-131, 1992.
    • (1992) Games and Economic Behavior , vol.4 , pp. 122-131
    • Papadimitriou, C.H.1
  • 34
    • 46149134052 scopus 로고
    • Finite automata play the repeated Prisoner's Dilemma
    • A. Rubinstein. "Finite automata play the repeated Prisoner's Dilemma." Journal of Economic Theory, vol. 39 pp. 83-96, 1986.
    • (1986) Journal of Economic Theory , vol.39 , pp. 83-96
    • Rubinstein, A.1
  • 35
    • 0030050933 scopus 로고
    • Multiagent reinforcement learning and the iterated Prisoner's Dilemma
    • T. W. Sandholm and R. H. Crites. "Multiagent reinforcement learning and the iterated Prisoner's Dilemma." Biosystems Journal, vol. 37 pp. 147-166, 1995.
    • (1995) Biosystems Journal , vol.37 , pp. 147-166
    • Sandholm, T.W.1    Crites, R.H.2
  • 36
    • 0024079557 scopus 로고
    • Learning control of finite Markov chains with an explicit trade-off between estimation and control
    • September
    • M. Sato, K. Abe, and H. Takeda. "Learning control of finite Markov chains with an explicit trade-off between estimation and control," in IEEE Transactions on Systems, Man and Cybernetics, vol. 18(5), September 1991.
    • (1991) IEEE Transactions on Systems, Man and Cybernetics , vol.18 , Issue.5
    • Sato, M.1    Abe, K.2    Takeda, H.3
  • 38
    • 0041410934 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • to appear
    • S. Singh, T. Jaakkola, M. L. Littman, and C. Szpezvari. "Convergence results for single-step on-policy reinforcement-learning algorithms." Machine Learning Journal (to appear), 1998.
    • (1998) Machine Learning Journal
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szpezvari, C.4
  • 39
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Morgan Kaufman: San Mateo, CA
    • R. S. Sutton. "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," in Proceedings of the Seventh International Conference on Machine Learning, Morgan Kaufman: San Mateo, CA, pp. 216-224, 1990.
    • (1990) Proceedings of the Seventh International Conference on Machine Learning , pp. 216-224
    • Sutton, R.S.1
  • 40
    • 0002210775 scopus 로고
    • The role of exploration in learning control
    • David A. White and Donald Sopfge, (Eds.), Multiscience Press Inc.
    • S. B. Thrun. "The role of exploration in learning control," in David A. White and Donald Sopfge, (Eds.), Handbook for Intelligent Control. Multiscience Press Inc.: 1992.
    • (1992) Handbook for Intelligent Control
    • Thrun, S.B.1
  • 41
    • 34249833101 scopus 로고
    • Technical notes: Q-learning
    • C. J. C. H. Watkins and P. Dayan. "Technical notes: Q-learning." Machine Learning, vol. 8 pp. 279-292, 1992.
    • (1992) Machine Learning , vol.8 , pp. 279-292
    • Watkins, C.J.C.H.1    Dayan, P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.