메뉴 건너뛰기




Volumn 59, Issue 1-2, 2005, Pages 31-54

A reinforcement learning scheme for a partially-observable multi-agent game

Author keywords

Card game; Model based; Multi agent system; POMDP; Reinforcement learning

Indexed keywords

ALGORITHMS; COMPUTER SIMULATION; INFORMATION RETRIEVAL; LEARNING SYSTEMS; MARKOV PROCESSES;

EID: 21244489639     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1007/s10994-005-0461-8     Document Type: Article
Times cited : (23)

References (23)
  • 1
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst., Man. & Cybern., 13, 834-846.
    • (1983) IEEE Trans. Syst., Man. & Cybern. , vol.13 , pp. 834-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 2
    • 0030284259 scopus 로고
    • Perfect recall and pruning in games with imperfect information
    • Blair, J. R. S., Mutchler, D., & Lent, M. (1995). Perfect recall and pruning in games with imperfect information. Computational Intelligence, 12, 131-154.
    • (1995) Computational Intelligence , vol.12 , pp. 131-154
    • Blair, J.R.S.1    Mutchler, D.2    Lent, M.3
  • 4
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • Crites, R. H., & Barto, A. G. (1996). Elevator group control using multiple reinforcement learning agents. Machine Learning, 33, 235-262.
    • (1996) Machine Learning , vol.33 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 5
    • 0036374294 scopus 로고    scopus 로고
    • Gib: Imperfect information in a computationally challenging fame
    • Ginsberg, M. (2001). Gib: Imperfect information in a computationally challenging fame. Journal of Artificial Intelligence Research, 14, 303-358.
    • (2001) Journal of Artificial Intelligence Research , vol.14 , pp. 303-358
    • Ginsberg, M.1
  • 7
    • 0036592028 scopus 로고    scopus 로고
    • Control of exploitation-exploration meta-parameter in reinforcement learning
    • Ishii, S., Yoshida, W., & Yoshimoto, J. (2002). Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Networks, 15, 665-687.
    • (2002) Neural Networks , vol.15 , pp. 665-687
    • Ishii, S.1    Yoshida, W.2    Yoshimoto, J.3
  • 8
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • Kaelbling, L. P., Littman, M. L., & Cassandra, A. (1998). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134.
    • (1998) Artificial Intelligence , vol.101 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.L.2    Cassandra, A.3
  • 9
    • 0012331016 scopus 로고
    • Memory approaches to reinforcement learning in non-markovian domains
    • Lin, L.-J., & Mitchell, T. (1992). Memory approaches to reinforcement learning in non-markovian domains. Tech. rep., CMU-CS-92-138.
    • (1992) Tech. Rep. , vol.CMU-CS-92-138
    • Lin, L.-J.1    Mitchell, T.2
  • 13
    • 0000672424 scopus 로고
    • Fast learning in networks of locally-tuned processing units
    • Moody, J., & Darken, C. J. (1989). Fast learning in networks of locally-tuned processing units. Neural Computation, 1, 281-294.
    • (1989) Neural Computation , vol.1 , pp. 281-294
    • Moody, J.1    Darken, C.J.2
  • 14
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less real time
    • Moore, A., & Atkeson, C. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.1    Atkeson, C.2
  • 17
    • 0030050933 scopus 로고
    • Multiagent reinforcement learning in the iterated prisoner's dilemma
    • Sandholm, T. W., & Crites, R. H. (1995). Multiagent reinforcement learning in the iterated prisoner's dilemma, Biosystems, 37, 147-166.
    • (1995) Biosystems , vol.37 , pp. 147-166
    • Sandholm, T.W.1    Crites, R.H.2
  • 18
    • 0034131785 scopus 로고    scopus 로고
    • On-line em algorithm for the normalized gaussian network
    • Sato, M., & Ishii, S. (2000). On-line em algorithm for the normalized gaussian network. Neural Computation, 12, 407-432.
    • (2000) Neural Computation , vol.12 , pp. 407-432
    • Sato, M.1    Ishii, S.2
  • 22
    • 0000985504 scopus 로고
    • Td-gammon, a self-teaching backgammon program, achieves masterlevel play
    • Tesauro, G. J. (1994). Td-gammon, a self-teaching backgammon program, achieves masterlevel play. Neural Computation, 6, 215-219.
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.J.1
  • 23
    • 0029250080 scopus 로고
    • Reinforcement learning of non-markov decision processes
    • Whitehead, S., & Lin, L.-J. (1995). Reinforcement learning of non-markov decision processes. Artificial Intelligence, 73, 271-306.
    • (1995) Artificial Intelligence , vol.73 , pp. 271-306
    • Whitehead, S.1    Lin, L.-J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.