메뉴 건너뛰기




Volumn 35, Issue 6, 2008, Pages 1999-2017

Application of reinforcement learning to the game of Othello

Author keywords

Dynamic programming; Game playing; Markov decision processes; Multiagent learning; Neural networks; Othello; Q learning; Reinforcement learning

Indexed keywords

DECISION MAKING; DYNAMIC PROGRAMMING; FUNCTION EVALUATION; MANAGEMENT SCIENCE; MARKOV PROCESSES; NEURAL NETWORKS; OPERATIONS RESEARCH; PROBLEM SOLVING;

EID: 35349027192     PISSN: 03050548     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.cor.2006.10.004     Document Type: Article
Times cited : (33)

References (31)
  • 1
    • 84974870136 scopus 로고
    • A survey of applications of Markov decision processes
    • White D.J. A survey of applications of Markov decision processes. The Journal of the Operational Research Society 44 11 (1993) 1073-1096
    • (1993) The Journal of the Operational Research Society , vol.44 , Issue.11 , pp. 1073-1096
    • White, D.J.1
  • 6
    • 0000985504 scopus 로고
    • TD-gammon a self-teaching backgammon program, achieves master-level play
    • Tesauro G. TD-gammon a self-teaching backgammon program, achieves master-level play. Neural Computation 6 2 (1994) 215-219
    • (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G.1
  • 7
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-gammon
    • Tesauro G. Temporal difference learning and TD-gammon. Communications of the ACM 38 3 (1995) 58-68
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 8
    • 0036147771 scopus 로고    scopus 로고
    • Programming backgammon using self-teaching neural nets
    • Tesauro G. Programming backgammon using self-teaching neural nets. Artificial Intelligence 134 1-2 (2002) 181-199
    • (2002) Artificial Intelligence , vol.134 , Issue.1-2 , pp. 181-199
    • Tesauro, G.1
  • 9
    • 0036722536 scopus 로고    scopus 로고
    • A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking
    • Gosavi A., Bandla N., and Das T.K. A reinforcement learning approach to a single leg airline revenue management problem with multiple fare classes and overbooking. IIE Transactions 34 9 (2002) 729-742
    • (2002) IIE Transactions , vol.34 , Issue.9 , pp. 729-742
    • Gosavi, A.1    Bandla, N.2    Das, T.K.3
  • 10
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • Touretzky D.S., Mozer M.C., and Hasselmo M.E. (Eds), The MIT Press, Cambridge, MA
    • Crites R.H., and Barto A.G. Improving elevator performance using reinforcement learning. In: Touretzky D.S., Mozer M.C., and Hasselmo M.E. (Eds). Advances in neural information processing systems vol. 8 (1996), The MIT Press, Cambridge, MA 1017-1023
    • (1996) Advances in neural information processing systems , vol.8 , pp. 1017-1023
    • Crites, R.H.1    Barto, A.G.2
  • 11
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • Crites R.H., and Barto A.G. Elevator group control using multiple reinforcement learning agents. Machine Learning 33 2 (1998) 235-262
    • (1998) Machine Learning , vol.33 , Issue.2 , pp. 235-262
    • Crites, R.H.1    Barto, A.G.2
  • 12
    • 0242540456 scopus 로고    scopus 로고
    • Pednault E, Abe N, Zadrozny B. Sequential cost-sensitive decision-making with reinforcement learning. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining. Edmonton, Alberta, Canada: ACM Press; 2002. p. 259-68.
  • 13
    • 0032643313 scopus 로고    scopus 로고
    • Solving semi-Markov decision problems using average reward reinforcement learning
    • Das T.K., Gosavi A., Mahadevan S., and Marchalleck N. Solving semi-Markov decision problems using average reward reinforcement learning. Management Science 45 4 (1999) 560-574
    • (1999) Management Science , vol.45 , Issue.4 , pp. 560-574
    • Das, T.K.1    Gosavi, A.2    Mahadevan, S.3    Marchalleck, N.4
  • 14
    • 0742319170 scopus 로고    scopus 로고
    • Reinforcement learning for long run average cost
    • Gosavi A. Reinforcement learning for long run average cost. European Journal of Operational Research 144 (2004) 654-674
    • (2004) European Journal of Operational Research , vol.144 , pp. 654-674
    • Gosavi, A.1
  • 15
    • 24544450341 scopus 로고    scopus 로고
    • Machine learning in games: a survey
    • Fürnkranz J., and Kubat M. (Eds), Nova Science Publishers, Huntington, NY, USA [chapter 2]
    • Fürnkranz J. Machine learning in games: a survey. In: Fürnkranz J., and Kubat M. (Eds). Machines that learn to play games (2001), Nova Science Publishers, Huntington, NY, USA 11-59 [chapter 2]
    • (2001) Machines that learn to play games , pp. 11-59
    • Fürnkranz, J.1
  • 16
    • 21844502480 scopus 로고
    • Discovering complex Othello strategies through evolutionary neural networks
    • Moriarty D.E., and Miikkulainen R. Discovering complex Othello strategies through evolutionary neural networks. Connection Science 7 3 (1995) 195-210
    • (1995) Connection Science , vol.7 , Issue.3 , pp. 195-210
    • Moriarty, D.E.1    Miikkulainen, R.2
  • 17
    • 21044442867 scopus 로고    scopus 로고
    • Observing the evolution of neural networks learning to play the game of othello
    • Chong S.Y., Tan M.K., and White J.D. Observing the evolution of neural networks learning to play the game of othello. IEEE Transactions on Evolutionary Computation 9 3 (2005) 240-251
    • (2005) IEEE Transactions on Evolutionary Computation , vol.9 , Issue.3 , pp. 240-251
    • Chong, S.Y.1    Tan, M.K.2    White, J.D.3
  • 19
    • 0003787146 scopus 로고
    • Princeton University Press, Princeton, NJ, USA
    • Bellman R.E. Dynamic programming (1957), Princeton University Press, Princeton, NJ, USA
    • (1957) Dynamic programming
    • Bellman, R.E.1
  • 20
    • 35348954680 scopus 로고    scopus 로고
    • Watkins CJCH. Learning from delayed rewards. PhD thesis, Cambridge University, Cambridge, England; 1989.
  • 22
    • 35348961891 scopus 로고    scopus 로고
    • Allis LV. Searching for solutions in games and artificial intelligence. PhD thesis, University of Limburg, Maastricht, The Netherlands; 1994.
  • 25
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • Morgan Kaufmann, San Francisco, CA, USA
    • Littman M.L. Markov games as a framework for multi-agent reinforcement learning. Proceedings of the eleventh international conference on machine learning (1994), Morgan Kaufmann, San Francisco, CA, USA 157-163
    • (1994) Proceedings of the eleventh international conference on machine learning , pp. 157-163
    • Littman, M.L.1
  • 26
    • 0001547175 scopus 로고    scopus 로고
    • Value-function reinforcement learning in Markov games
    • Littman M.L. Value-function reinforcement learning in Markov games. Journal of Cognitive Systems Research 2 1 (2001) 55-66
    • (2001) Journal of Cognitive Systems Research , vol.2 , Issue.1 , pp. 55-66
    • Littman, M.L.1
  • 27
  • 28
    • 35349018883 scopus 로고    scopus 로고
    • le Comte M. Introduction to Othello; 2000.
  • 29
    • 35349017772 scopus 로고    scopus 로고
    • Rose B. Othello: a minute to learn-a lifetime to master; 2005.
  • 30
    • 35349010953 scopus 로고    scopus 로고
    • Doucette MJ. Wipeout: the engineering of an Othello program. Project report, Acadia University, Wolfville, NS, Canada; 1998.
  • 31
    • 35349007860 scopus 로고    scopus 로고
    • Leouski AV, Utgoff PE. What a neural network can learn about Othello. Technical report UM-CS-1996-010, Computer Science Department, Lederle Graduate Research Center, University of Massachusetts, Amherst, MA, USA; 1996.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.