메뉴 건너뛰기




Volumn 12, Issue , 2012, Pages 539-577

Reinforcement learning in games

Author keywords

Coherence; Covariance; Defend; Income; Nash

Indexed keywords


EID: 84867399396     PISSN: 18674534     EISSN: 18674542     Source Type: Book Series    
DOI: 10.1007/978-3-642-27645-3_17     Document Type: Chapter
Times cited : (52)

References (111)
  • 2
    • 33744825172 scopus 로고    scopus 로고
    • Learning to bid in bridge
    • Amit, A., Markovitch, S.: Learning to bid in bridge. Machine Learning 63(3), 287–327 (2006)
    • (2006) Machine Learning , vol.63 , Issue.3 , pp. 287-327
    • Amit, A.1    Markovitch, S.2
  • 3
    • 84883207406 scopus 로고    scopus 로고
    • Online adaptation of computer games agents: A reinforcement learning approach
    • Andrade, G., Santana, H., Furtado, A., Leitão, A., Ramalho, G.: Online adaptation of computer games agents: A reinforcement learning approach. Scientia 15(2) (2004)
    • (2004) Scientia , vol.15 , Issue.2
    • Andrade, G.1    Santana, H.2    Furtado, A.3    Leitão, A.4    Ramalho, G.5
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47, 235–256 (2002)
    • (2002) Machine Learning , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 76349090438 scopus 로고    scopus 로고
    • Models of active learning in group-structured state spaces
    • Bartók, G., Szepesvári, C., Zilles, S.: Models of active learning in group-structured state spaces. Information and Computation 208, 364–384 (2010)
    • (2010) Information and Computation , vol.208 , pp. 364-384
    • Bartók, G.1    Szepesvári, C.2    Zilles, S.3
  • 6
    • 0034275416 scopus 로고    scopus 로고
    • Learning to play chess using temporal-differences
    • Baxter, J., Tridgell, A., Weaver, L.: Learning to play chess using temporal-differences. Machine learning 40(3), 243–263 (2000)
    • (2000) Machine Learning , vol.40 , Issue.3 , pp. 243-263
    • Baxter, J.1    Tridgell, A.2    Weaver, L.3
  • 8
    • 0004502426 scopus 로고    scopus 로고
    • Learning piece values using temporal differences
    • Beal, D., Smith, M.C.: Learning piece values using temporal differences. ICCA Journal 20(3), 147–151 (1997)
    • (1997) ICCA Journal , vol.20 , Issue.3 , pp. 147-151
    • Beal, D.1    Smith, M.C.2
  • 10
    • 33745168595 scopus 로고    scopus 로고
    • Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games
    • van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.), Springer, Heidelberg
    • Billings, D., Davidson, A., Schauenberg, T., Burch, N., Bowling, M., Holte, R.C., Schaeffer, J., Szafron, D.: Game-Tree Search with Adaptation in Stochastic Imperfect-Information Games. In: van den Herik, H.J., Björnsson, Y., Netanyahu, N.S. (eds.) CG 2004. LNCS, vol. 3846, pp. 21–34. Springer, Heidelberg (2006)
    • (2006) CG 2004. LNCS , vol.3846 , pp. 21-34
    • Billings, D.1    Davidson, A.2    Schauenberg, T.3    Burch, N.4    Bowling, M.5    Holte, R.C.6    Schaeffer, J.7    Szafron, D.8
  • 15
    • 31844436490 scopus 로고    scopus 로고
    • Convergence and no-regret in multiagent learning
    • Bowling, M.: Convergence and no-regret in multiagent learning. In: Neural Information Processing Systems, pp. 209–216 (2004)
    • (2004) Neural Information Processing Systems , pp. 209-216
    • Bowling, M.1
  • 16
    • 84956863737 scopus 로고    scopus 로고
    • From simple features to sophisticated evaluation functions
    • Buro, M.: From simple features to sophisticated evaluation functions. In: International Conference on Computers and Games, pp. 126–145 (1998)
    • (1998) International Conference on Computers and Games , pp. 126-145
    • Buro, M.1
  • 17
    • 33744829091 scopus 로고    scopus 로고
    • RTS games as test-bed for real-time research
    • Buro, M., Furtak, T.: RTS games as test-bed for real-time research. JCIS, 481–484 (2003)
    • (2003) JCIS , pp. 481-484
    • Buro, M.1    Furtak, T.2
  • 18
    • 84898602823 scopus 로고    scopus 로고
    • The second annual real-time strategy game AI competition
    • Buro, M., Lanctot, M., Orsten, S.: The second annual real-time strategy game AI competition. In: GAME-ON NA (2007)
    • (2007) GAME-ON NA
    • Buro, M.1    Lanctot, M.2    Orsten, S.3
  • 20
    • 77953762833 scopus 로고    scopus 로고
    • Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search
    • van den Herik, H.J., Spronck, P. (eds.), Springer, Heidelberg
    • Chaslot, G., Fiter, C., Hoock, J.B., Rimmel, A., Teytaud, O.: Adding Expert Knowledge and Exploration in Monte-Carlo Tree Search. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 1–13. Springer, Heidelberg (2010)
    • (2010) ACG 2009. LNCS , vol.6048 , pp. 1-13
    • Chaslot, G.1    Fiter, C.2    Hoock, J.B.3    Rimmel, A.4    Teytaud, O.5
  • 23
    • 38049037928 scopus 로고    scopus 로고
    • Efficient Selectivity and Backup Operators in Monte-carlo Tree Search
    • van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006, Springer, Heidelberg
    • Coulom, R.: Efficient Selectivity and Backup Operators in Monte-carlo Tree Search. In: van den Herik, H.J., Ciancarini, P., Donkers, H.H.L.M(J.) (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)
    • (2007) LNCS , vol.4630 , pp. 72-83
    • Coulom, R.1
  • 24
    • 38849139064 scopus 로고    scopus 로고
    • Computing Elo ratings of move patterns in the game of go
    • Coulom, R.: Computing Elo ratings of move patterns in the game of go. ICGA Journal 30(4), 198–208 (2007)
    • (2007) ICGA Journal , vol.30 , Issue.4 , pp. 198-208
    • Coulom, R.1
  • 25
    • 24944480025 scopus 로고    scopus 로고
    • Honte, a Go-playing program using neural nets
    • Nova Science Publishers
    • Dahl, F.A.: Honte, a Go-playing program using neural nets. In: Machines that learn to play games, pp. 205–223. Nova Science Publishers (2001)
    • (2001) Machines that Learn to Play Games , pp. 205-223
    • Dahl, F.A.1
  • 30
    • 0028443409 scopus 로고
    • Toward an ideal trainer
    • Epstein, S.L.: Toward an ideal trainer. Machine Learning 15, 251–277 (1994)
    • (1994) Machine Learning , vol.15 , pp. 251-277
    • Epstein, S.L.1
  • 33
    • 0013372262 scopus 로고    scopus 로고
    • Learning to play chess selectively by acquiring move patterns
    • Finkelstein, L., Markovitch, S.: Learning to play chess selectively by acquiring move patterns. ICCA Journal 21, 100–119 (1998)
    • (1998) ICCA Journal , vol.21 , pp. 100-119
    • Finkelstein, L.1    Markovitch, S.2
  • 35
    • 24544450341 scopus 로고    scopus 로고
    • Machine learning in games: A survey
    • Nova Science Publishers
    • Fürnkranz, J.: Machine learning in games: a survey. In: Machines that Learn to Play Games, pp. 11–59. Nova Science Publishers (2001)
    • (2001) Machines that Learn to Play Games , pp. 11-59
    • Fürnkranz, J.1
  • 38
    • 57749091602 scopus 로고    scopus 로고
    • Achieving master-level play in 9x9 computer go
    • Gelly, S., Silver, D.: Achieving master-level play in 9x9 computer go. In: AAAI, pp. 1537– 1540 (2008)
    • (2008) AAAI , pp. 1537-1540
    • Gelly, S.1    Silver, D.2
  • 40
    • 5844312285 scopus 로고
    • PhD thesis, University of California, San Diego, CA
    • Gherrity, M.: A game-learning machine. PhD thesis, University of California, San Diego, CA (1993)
    • (1993) A Game-Learning Machine
    • Gherrity, M.1
  • 41
    • 34948832502 scopus 로고    scopus 로고
    • Tech. rep., Department of Computer Science, University of Bristol
    • Ghory, I.: Reinforcement learning in board games. Tech. rep., Department of Computer Science, University of Bristol (2004)
    • (2004) Reinforcement Learning in Board Games
    • Ghory, I.1
  • 42
    • 80052001262 scopus 로고    scopus 로고
    • Fun game AI design for beginners
    • Charles River Media, Inc
    • Gilgenbach, M.: Fun game AI design for beginners. In: AI Game Programming Wisdom, vol. 3. Charles River Media, Inc. (2006)
    • (2006) AI Game Programming Wisdom , vol.3
    • Gilgenbach, M.1
  • 43
    • 35348940239 scopus 로고    scopus 로고
    • Lossless abstraction of imperfect information games
    • Gilpin, A., Sandholm, T.: Lossless abstraction of imperfect information games. Journal of the ACM 54(5), 25 (2007)
    • (2007) Journal of the ACM , vol.54 , Issue.5 , pp. 25
    • Gilpin, A.1    Sandholm, T.2
  • 44
    • 84860643163 scopus 로고    scopus 로고
    • Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold’em poker
    • Gilpin, A., Sandholm, T., Sørensen, T.B.: Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold’em poker. In: AAAI, vol. 22, pp. 50–57 (2007)
    • (2007) AAAI , vol.22 , pp. 50-57
    • Gilpin, A.1    Sandholm, T.2    Sørensen, T.B.3
  • 45
    • 0036374294 scopus 로고    scopus 로고
    • Gib: Imperfect information in a computationally challenging game
    • Ginsberg, M.L.: Gib: Imperfect information in a computationally challenging game. Journal of Artificial Intelligence Research 14, 313–368 (2002)
    • (2002) Journal of Artificial Intelligence Research , vol.14 , pp. 313-368
    • Ginsberg, M.L.1
  • 46
    • 85042906319 scopus 로고
    • Tech. Rep. UCSC-CRL-92-10, University of California at Santa Cruz
    • Gould, J., Levinson, R.: Experience-based adaptive search. Tech. Rep. UCSC-CRL-92-10, University of California at Santa Cruz (1992)
    • (1992) Experience-Based Adaptive Search
    • Gould, J.1    Levinson, R.2
  • 54
    • 84986621078 scopus 로고    scopus 로고
    • On verifying game designs and playing strategies using reinforcement learning
    • Kalles, D., Kanellopoulos, P.: On verifying game designs and playing strategies using reinforcement learning. In: ACM Symposium on Applied Computing, pp. 6–11 (2001)
    • (2001) ACM Symposium on Applied Computing , pp. 6-11
    • Kalles, D.1    Kanellopoulos, P.2
  • 56
    • 33750293964 scopus 로고    scopus 로고
    • Bandit Based Monte-Carlo Planning
    • Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.), Springer, Heidelberg
    • Kocsis, L., Szepesvári, C.: Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)
    • (2006) ECML 2006. LNCS (LNAI) , vol.4212 , pp. 282-293
    • Kocsis, L.1    Szepesvári, C.2
  • 57
    • 77049089986 scopus 로고    scopus 로고
    • RSPSA: Enhanced Parameter Optimization in Games
    • van den Herik, H.J., Hsu, S.-C., Hsu, T.-s., Donkers, H.H.L.M(J.) (eds.), Springer, Heidelberg
    • Kocsis, L., Szepesvári, C., Winands, M.H.M.: RSPSA: Enhanced Parameter Optimization in Games. In: van den Herik, H.J., Hsu, S.-C., Hsu, T.-s., Donkers, H.H.L.M(J.) (eds.) CG 2005. LNCS, vol. 4250, pp. 39–56. Springer, Heidelberg (2006)
    • (2006) CG 2005. LNCS , vol.4250 , pp. 39-56
    • Kocsis, L.1    Szepesvári, C.2    Winands, M.H.M.3
  • 61
    • 35048819671 scopus 로고    scopus 로고
    • Least-Squares Methods in Reinforcement Learning for Control
    • Vlahavas, I.P., Spyropoulos, C.D. (eds.), Springer, Heidelberg
    • Lagoudakis, M.G., Parr, R., Littman, M.L.: Least-Squares Methods in Reinforcement Learning for Control. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol. 2308, pp. 249–260. Springer, Heidelberg (2002)
    • (2002) SETN 2002. LNCS (LNAI) , vol.2308 , pp. 249-260
    • Lagoudakis, M.G.1    Parr, R.2    Littman, M.L.3
  • 63
    • 84898646291 scopus 로고    scopus 로고
    • Chess Neighborhoods, Function Combination, and Reinforcement Learning
    • Marsland, T., Frank, I. (eds.), Springer, Heidelberg
    • Levinson, R., Weber, R.: Chess Neighborhoods, Function Combination, and Reinforcement Learning. In: Marsland, T., Frank, I. (eds.) CG 2001. LNCS, vol. 2063, pp. 133–150. Springer, Heidelberg (2002)
    • (2002) CG 2001. LNCS , vol.2063 , pp. 133-150
    • Levinson, R.1    Weber, R.2
  • 64
    • 33747193691 scopus 로고    scopus 로고
    • Beyond Optimal Play in Two-Person-Zerosum Games
    • Albers, S., Radzik, T. (eds.), Springer, Heidelberg
    • Lorenz, U.: Beyond Optimal Play in Two-Person-Zerosum Games. In: Albers, S., Radzik, T. (eds.) ESA 2004. LNCS, vol. 3221, pp. 749–759. Springer, Heidelberg (2004)
    • (2004) ESA 2004. LNCS , vol.3221 , pp. 749-759
    • Lorenz, U.1
  • 67
    • 33646264632 scopus 로고    scopus 로고
    • Learning of AI players from game observation data
    • McGlinchey, S.J.: Learning of AI players from game observation data. In: GAME-ON, pp. 106–110 (2003)
    • (2003) GAME-ON , pp. 106-110
    • McGlinchey, S.J.1
  • 69
    • 21844502480 scopus 로고
    • Discovering complex Othello strategies through evolutionary neural networks
    • Moriarty, D.E., Miikkulainen, R.: Discovering complex Othello strategies through evolutionary neural networks. Connection Science 7, 195–209 (1995)
    • (1995) Connection Science , vol.7 , pp. 195-209
    • Moriarty, D.E.1    Miikkulainen, R.2
  • 70
    • 24944583230 scopus 로고    scopus 로고
    • Position evaluation in computer go
    • Müller, M.: Position evaluation in computer go. ICGA Journal 25(4), 219–228 (2002)
    • (2002) ICGA Journal , vol.25 , Issue.4 , pp. 219-228
    • Müller, M.1
  • 76
    • 77950871800 scopus 로고    scopus 로고
    • Abstraction and Generalization in Reinforcement Learning: A Summary and Framework
    • Taylor, M.E., Tuyls, K. (eds.), Springer, Heidelberg
    • Ponsen, M., Taylor, M.E., Tuyls, K.: Abstraction and Generalization in Reinforcement Learning: A Summary and Framework. In: Taylor, M.E., Tuyls, K. (eds.) ALA 2009. LNCS, vol. 5924, pp. 1–33. Springer, Heidelberg (2010)
    • (2010) ALA 2009. LNCS , vol.5924 , pp. 1-33
    • Ponsen, M.1    Taylor, M.E.2    Tuyls, K.3
  • 79
    • 79953207627 scopus 로고    scopus 로고
    • Computer poker: A review
    • Rubin, J., Watson, I.: Computer poker: A review. Artificial Intelligence 175(5-6), 958–987 (2011)
    • (2011) Artificial Intelligence , vol.175 , Issue.5-6 , pp. 958-987
    • Rubin, J.1    Watson, I.2
  • 80
    • 0000302898 scopus 로고    scopus 로고
    • The games computers (And people) play
    • Zelkowitz, M. (ed.), Academic Press
    • Schaeffer, J.: The games computers (and people) play. In: Zelkowitz, M. (ed.) Advances in Computers, vol. 50, pp. 89–266. Academic Press (2000)
    • (2000) Advances in Computers , vol.50 , pp. 89-266
    • Schaeffer, J.1
  • 84
    • 33744791590 scopus 로고    scopus 로고
    • The illusion of intelligence
    • Charles River Media
    • Scott, B.: The illusion of intelligence. In: AI Game Programming Wisdom, pp. 16–20. Charles River Media (2002)
    • (2002) AI Game Programming Wisdom , pp. 16-20
    • Scott, B.1
  • 85
    • 0345014819 scopus 로고    scopus 로고
    • Learning a Game Strategy Using Pattern-Weights and Self-Play
    • Schaeffer, J., Müller, M., Björnsson, Y. (eds.), Springer, Heidelberg
    • Shapiro, A., Fuchs, G., Levinson, R.: Learning a Game Strategy Using Pattern-Weights and Self-Play. In: Schaeffer, J., Müller, M., Björnsson, Y. (eds.) CG 2002. LNCS, vol. 2883, pp. 42–60. Springer, Heidelberg (2003)
    • (2003) CG 2002. LNCS , vol.2883 , pp. 42-60
    • Shapiro, A.1    Fuchs, G.2    Levinson, R.3
  • 86
    • 84883067875 scopus 로고    scopus 로고
    • Learning companion behaviors using reinforcement learning in games
    • Sharifi, A.A., Zhao, R., Szafron, D.: Learning companion behaviors using reinforcement learning in games. In: AIIDE (2010)
    • (2010) AIIDE
    • Sharifi, A.A.1    Zhao, R.2    Szafron, D.3
  • 89
    • 56449110907 scopus 로고    scopus 로고
    • Sample-based learning and search with permanent and transient memories
    • Silver, D., Sutton, R., Mueller, M.: Sample-based learning and search with permanent and transient memories. In: ICML (2008)
    • (2008) ICML
    • Silver, D.1    Sutton, R.2    Mueller, M.3
  • 93
    • 38049011913 scopus 로고    scopus 로고
    • Feature construction for reinforcement learning in Hearts
    • Sturtevant, N., White, A.: Feature construction for reinforcement learning in Hearts. In: Advances in Computers and Games, pp. 122–134 (2007)
    • (2007) Advances in Computers and Games , pp. 122-134
    • Sturtevant, N.1    White, A.2
  • 95
    • 33845344721 scopus 로고    scopus 로고
    • Learning Tetris using the noisy cross-entropy method
    • Szita, I., Lörincz, A.: Learning Tetris using the noisy cross-entropy method. Neural Computation 18(12), 2936–2941 (2006a)
    • (2006) Neural Computation , vol.18 , Issue.12 , pp. 2936-2941
    • Szita, I.1    Lörincz, A.2
  • 96
    • 38349162555 scopus 로고    scopus 로고
    • Learning to play using low-complexity rule-based policies: Illustrations through Ms
    • Szita, I., Lörincz, A.: Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man. Journal of Articial Intelligence Research 30, 659–684 (2006b)
    • (2006) Pac-Man. Journal of Articial Intelligence Research , vol.30 , pp. 659-684
    • Szita, I.1    Lörincz, A.2
  • 98
    • 77953795616 scopus 로고    scopus 로고
    • Monte-Carlo Tree Search in Settlers of Catan
    • van den Herik, H.J., Spronck, P. (eds.), Springer, Heidelberg
    • Szita, I., Chaslot, G., Spronck, P.: Monte-Carlo Tree Search in Settlers of Catan. In: van den Herik, H.J., Spronck, P. (eds.) ACG 2009. LNCS, vol. 6048, pp. 21–32. Springer, Heidelberg (2010)
    • (2010) ACG 2009. LNCS , vol.6048 , pp. 21-32
    • Szita, I.1    Chaslot, G.2    Spronck, P.3
  • 99
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • Tesauro, G.: Practical issues in temporal difference learning. Machine Learning 8, 257–277 (1992)
    • (1992) Machine Learning , vol.8 , pp. 257-277
    • Tesauro, G.1
  • 100
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-gammon
    • Tesauro, G.: Temporal difference learning and TD-gammon. Communications of the ACM 38(3), 58–68 (1995)
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 101
    • 0032156140 scopus 로고    scopus 로고
    • Comments on co-evolution in the successful learning of backgammon strategy
    • Tesauro, G.: Comments on co-evolution in the successful learning of backgammon strategy’. Machine Learning 32(3), 241–243 (1998)
    • (1998) Machine Learning , vol.32 , Issue.3 , pp. 241-243
    • Tesauro, G.1
  • 102
    • 0036147771 scopus 로고    scopus 로고
    • Programming backgammon using self-teaching neural nets
    • Tesauro, G.: Programming backgammon using self-teaching neural nets. Artificial Intelligence 134(1-2), 181–199 (2002)
    • (2002) Artificial Intelligence , vol.134 , Issue.1-2 , pp. 181-199
    • Tesauro, G.1
  • 103
    • 70350140182 scopus 로고    scopus 로고
    • Building controllers for Tetris
    • Thiery, C., Scherrer, B.: Building controllers for Tetris. ICGA Journal 32(1), 3–11 (2009)
    • (2009) ICGA Journal , vol.32 , Issue.1 , pp. 3-11
    • Thiery, C.1    Scherrer, B.2
  • 104
    • 85153958149 scopus 로고
    • Learning to play the game of chess
    • Thrun, S.: Learning to play the game of chess. In: Neural Information Processing Systems, vol. 7, pp. 1069–1076 (1995)
    • (1995) Neural Information Processing Systems , vol.7 , pp. 1069-1076
    • Thrun, S.1
  • 105
    • 27844446638 scopus 로고    scopus 로고
    • Feature construction for game playing
    • Fürnkranz, J., Kubat, M. (eds.), Nova Science Publishers
    • Utgoff, P.: Feature construction for game playing. In: Fürnkranz, J., Kubat, M. (eds.) Machines that Learn to Play Games, pp. 131–152. Nova Science Publishers (2001)
    • (2001) Machines that Learn to Play Games , pp. 131-152
    • Utgoff, P.1
  • 109
    • 70349301926 scopus 로고    scopus 로고
    • Using reinforcement learning for city site selection in the turn-based strategy game Civilization IV
    • Wender, S., Watson, I.: Using reinforcement learning for city site selection in the turn-based strategy game Civilization IV. In: Computational Intelligence and Games, pp. 372–377 (2009)
    • (2009) Computational Intelligence and Games , pp. 372-377
    • Wender, S.1    Watson, I.2
  • 110
    • 82655164054 scopus 로고    scopus 로고
    • Self-play and using an expert to learn to play backgammon with temporal difference learning
    • Wiering, M.A.: Self-play and using an expert to learn to play backgammon with temporal difference learning. Journal of Intelligent Learning Systems and Applications 2, 57–68 (2010)
    • (2010) Journal of Intelligent Learning Systems and Applications , vol.2 , pp. 57-68
    • Wiering, M.A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.