메뉴 건너뛰기




Volumn , Issue , 2006, Pages 187-194

Monte-Carlo Go reinforcement learning experiments

Author keywords

Computer Go; Monte Carlo; Reinforcement learning

Indexed keywords

MACHINE LEARNING; REINFORCEMENT LEARNING;

EID: 45149127471     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/CIG.2006.311699     Document Type: Conference Paper
Times cited : (41)

References (49)
  • 2
    • 0029679044 scopus 로고    scopus 로고
    • Reinforcement learning: A survey
    • Online, Available
    • L. P. Kaelbling, M. Littman, and A. Moore, "Reinforcement learning: A survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996. [Online]. Available: citeseer.ist.psu.edu/ kaelbling96reinforcement.html
    • (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
    • Kaelbling, L.P.1    Littman, M.2    Moore, A.3
  • 3
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • R. Sutton, "Learning to predict by the method of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.1
  • 4
    • 0004049893 scopus 로고
    • Learning from delayed rewards,
    • Ph.D. dissertation, Cambridge University
    • C. Watkins, "Learning from delayed rewards," Ph.D. dissertation, Cambridge University, 1989.
    • (1989)
    • Watkins, C.1
  • 5
    • 45149092016 scopus 로고    scopus 로고
    • home page
    • B. Bouzy, "Indigo home page," www.math-info.univparis5.fr/ ~bouzy/INDIGO.html, 2005.
    • (2005) Indigo
    • Bouzy, B.1
  • 6
    • 0036145791 scopus 로고    scopus 로고
    • Games, Computers, and Artificial Intelligence
    • J. Schaeffer and J. van den Herik, "Games, Computers, and Artificial Intelligence," Artificial Intelligence, vol. 134, pp. 1-7, 2002.
    • (2002) Artificial Intelligence , vol.134 , pp. 1-7
    • Schaeffer, J.1    van den Herik, J.2
  • 11
    • 0036148118 scopus 로고    scopus 로고
    • M. Buro, Improving heuristic mini-max search by supervised learning, Artificial Intelligence Journal, 134, pp. 85-99, 2002.
    • M. Buro, "Improving heuristic mini-max search by supervised learning," Artificial Intelligence Journal, vol. 134, pp. 85-99, 2002.
  • 12
    • 24944583230 scopus 로고    scopus 로고
    • Position evaluation in computer go
    • December
    • M. Müller, "Position evaluation in computer go," ICGA Journal, vol. 25, no. 4, pp. 219-228, December 2002.
    • (2002) ICGA Journal , vol.25 , Issue.4 , pp. 219-228
    • Müller, M.1
  • 14
    • 0035479281 scopus 로고    scopus 로고
    • Computer go: An Al oriented survey
    • B. Bouzy and T. Cazenave, "Computer go: an Al oriented survey," Artificial Intelligence, vol. 132, pp. 39-103, 2001.
    • (2001) Artificial Intelligence , vol.132 , pp. 39-103
    • Bouzy, B.1    Cazenave, T.2
  • 16
    • 45149130544 scopus 로고    scopus 로고
    • M. Reiss, "Go++," www.goplusplus.com/.
    • Go
    • Reiss, M.1
  • 17
    • 45149134550 scopus 로고    scopus 로고
    • home page
    • D. Bump, "Gnugo home page," www.gnu.org/software/gnugo/devel. html, 2006.
    • (2006) Gnugo
    • Bump, D.1
  • 19
    • 45149085972 scopus 로고    scopus 로고
    • Explorer
    • M. Müller, "Explorer," web.cs.ualberta.ca/~mmueller/cgo/ explorer.html, 2005.
    • (2005)
    • Müller, M.1
  • 20
    • 45149093374 scopus 로고    scopus 로고
    • T. Cazenave, "Golois," www.ai.univ-paris8.fr/~cazenave/Golois. html.
    • Golois
    • Cazenave, T.1
  • 21
    • 0001798654 scopus 로고    scopus 로고
    • Some practical techniques for global search in go
    • K. Chen, "Some practical techniques for global search in go," ICGA Journal, vol. 23, no. 2, pp. 67-74, 2000.
    • (2000) ICGA Journal , vol.23 , Issue.2 , pp. 67-74
    • Chen, K.1
  • 22
    • 45149134549 scopus 로고    scopus 로고
    • M. Enzenberger, Evaluation in go by a neural network using soft segmentation, in 10th Advances in Computer Games. E. A. H. H. Jaap van den Herik, Hiroyuki lida. Ed. Graz: Kluwer Academic Publishers, 2003. pp. 97-108.
    • M. Enzenberger, "Evaluation in go by a neural network using soft segmentation," in 10th Advances in Computer Games. E. A. H. H. Jaap van den Herik, Hiroyuki lida. Ed. Graz: Kluwer Academic Publishers, 2003. pp. 97-108.
  • 23
    • 0001580774 scopus 로고    scopus 로고
    • Decomposition search: A combinatorial games approach to game tree search, with applications to solving go endgame
    • M. Müller, "Decomposition search: A combinatorial games approach to game tree search, with applications to solving go endgame," in IJCAI, 1999, pp. 578-583.
    • (1999) IJCAI , pp. 578-583
    • Müller, M.1
  • 24
    • 84958743851 scopus 로고    scopus 로고
    • Abstract proof search
    • Computers and Games, F. T. Marsland, Ed, Springer
    • T. Cazenave, "Abstract proof search," in Computers and Games, ser. Lecture Notes in Computer Science, I. F. T. Marsland, Ed., no. 2063. Springer, 2000, pp. 39-54.
    • (2000) ser. Lecture Notes in Computer Science , vol.1 , Issue.2063 , pp. 39-54
    • Cazenave, T.1
  • 25
    • 85085780301 scopus 로고    scopus 로고
    • Learning to score final positions in the game of go
    • H. J. van den Herik, H. Iida, and E. A. Heinz, Eds, Kluwer Academic Publishers
    • E. van der Werf, J. Uiterwijk, and J. van den Herik, "Learning to score final positions in the game of go," in Advances in Computer Games, Many Games, Many Challenges, H. J. van den Herik, H. Iida, and E. A. Heinz, Eds., vol. 10. Kluwer Academic Publishers, 2003, pp. 143-158.
    • (2003) Advances in Computer Games, Many Games, Many Challenges , vol.10 , pp. 143-158
    • van der Werf, E.1    Uiterwijk, J.2    van den Herik, J.3
  • 27
    • 84898992015 scopus 로고    scopus 로고
    • On-line policy improvement using Monte Carlo search
    • Cambridge MA: MIT Press
    • G. Tesauro and G. Galperin, "On-line policy improvement using Monte Carlo search," in Advances in Neural Information Processing Systems. Cambridge MA: MIT Press, 1996, pp. 1068-1074.
    • (1996) Advances in Neural Information Processing Systems , pp. 1068-1074
    • Tesauro, G.1    Galperin, G.2
  • 29
    • 0036146034 scopus 로고    scopus 로고
    • World-championship-caliber scrabble
    • B. Sheppard, "World-championship-caliber scrabble," Artificial Intelligence, vol. 134, pp. 241-275, 2002.
    • (2002) Artificial Intelligence , vol.134 , pp. 241-275
    • Sheppard, B.1
  • 30
    • 0025386231 scopus 로고
    • Expected-outcome : A general model of static evaluation
    • B. Abramson, "Expected-outcome : a general model of static evaluation," IEEE Transactions on PAMI, vol. 12, pp. 182-193, 1990.
    • (1990) IEEE Transactions on PAMI , vol.12 , pp. 182-193
    • Abramson, B.1
  • 33
    • 84902513084 scopus 로고    scopus 로고
    • B. Bouzy and B. Helmstetter, Monte Carlo go developments, in 10th Advances in Computer Games, E. A. H. H. Jaap van den Herik, Hiroyuki Iida, Ed. Graz: Kluwer Academic Publishers, 2003, pp. 159-174.
    • B. Bouzy and B. Helmstetter, "Monte Carlo go developments," in 10th Advances in Computer Games, E. A. H. H. Jaap van den Herik, Hiroyuki Iida, Ed. Graz: Kluwer Academic Publishers, 2003, pp. 159-174.
  • 34
    • 0004280606 scopus 로고
    • Learning in embedded systems,
    • Ph.D. dissertation, MIT
    • L. P. Kaelbling, "Learning in embedded systems," Ph.D. dissertation, MIT, 1993.
    • (1993)
    • Kaelbling, L.P.1
  • 35
    • 24944572334 scopus 로고    scopus 로고
    • The move decision process of Indigo
    • March
    • B. Bouzy, "The move decision process of Indigo," International Computer Game Association Journal, vol. 26, no. 1, pp. 14-27, March 2003.
    • (2003) International Computer Game Association Journal , vol.26 , Issue.1 , pp. 14-27
    • Bouzy, B.1
  • 36
    • 45149121322 scopus 로고    scopus 로고
    • _, Associating shallow and selective global tree search with Monte Carlo for 9×9 go, in Computers and Games: 4th International Conference, CG 2004, ser. Lecture Notes in Computer Science, N. N. J. van den Herik, Y. Björnsson, Ed., 3846 / 2006. Ramat-Gan. Israel: Springer Verlag, July 2004, pp. 67-80.
    • _, "Associating shallow and selective global tree search with Monte Carlo for 9×9 go," in Computers and Games: 4th International Conference, CG 2004, ser. Lecture Notes in Computer Science, N. N. J. van den Herik, Y. Björnsson, Ed., vol. 3846 / 2006. Ramat-Gan. Israel: Springer Verlag, July 2004, pp. 67-80.
  • 37
    • 40649089044 scopus 로고    scopus 로고
    • home page
    • P. Kaminski, "Vegos home page," www.ideanest.com/vegos/, 2003.
    • (2003) Vegos
    • Kaminski, P.1
  • 38
    • 45149122181 scopus 로고    scopus 로고
    • Seven year itch
    • J. Hamlen, "Seven year itch," ICGA Journal, vol. 27, no. 4, pp. 255-258, 2004.
    • (2004) ICGA Journal , vol.27 , Issue.4 , pp. 255-258
    • Hamlen, J.1
  • 39
    • 34547971839 scopus 로고    scopus 로고
    • Efficient selectivity and back-up operators in montecarlo tree search
    • Torino, Italy, paper currently submitted
    • R. Coulom, "Efficient selectivity and back-up operators in montecarlo tree search," in Computers and Games, Torino, Italy, 2006, paper currently submitted.
    • (2006) Computers and Games
    • Coulom, R.1
  • 40
    • 33646238098 scopus 로고    scopus 로고
    • The go-playing program called go81
    • Helsinki, Finland, September
    • T. Raiko, "The go-playing program called go81," in Finnish Artificial Intelligence Conference, Helsinki, Finland, September 2004, pp. 197-206.
    • (2004) Finnish Artificial Intelligence Conference , pp. 197-206
    • Raiko, T.1
  • 41
    • 24944478740 scopus 로고    scopus 로고
    • Associating knowledge and Monte Carlo approaches within a go program
    • November
    • B. Bouzy, "Associating knowledge and Monte Carlo approaches within a go program," Information Sciences, vol. 175, no. 4, pp. 247-257, November 2005.
    • (2005) Information Sciences , vol.175 , Issue.4 , pp. 247-257
    • Bouzy, B.1
  • 45
    • 0004370245 scopus 로고
    • Online, Available
    • L. Baird, "Advantage updating." 1993. [Online]. Available: citeseer.ist.psu.edu/baird93advantage.html
    • (1993) Advantage updating
    • Baird, L.1
  • 47
    • 85153938292 scopus 로고
    • Reinforcement learning algorithm for partially observable Markov decision problems
    • G. Tesauro, D. Touretzky, and T. Leen, Eds, The MIT Press, Online, Available
    • T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable Markov decision problems," in Advances in Neural Information Processing Systems, G. Tesauro, D. Touretzky, and T. Leen, Eds., vol. 7, The MIT Press, 1995, pp. 345-352. [Online]. Available: citeseer.ist.psu.edu/jaakkola95reinforcement.html
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 345-352
    • Jaakkola, T.1    Singh, S.P.2    Jordan, M.I.3
  • 48
    • 29244474089 scopus 로고    scopus 로고
    • Co-evolution versus self-play temporal difference learning for acquiring position evaluation in small-board go
    • December
    • T. P. Runarsson and S. Lucas, "Co-evolution versus self-play temporal difference learning for acquiring position evaluation in small-board go," IEEE Transactions on Evolutionary Computation, vol. 9, no. 6, pp. 628-640, December 2005.
    • (2005) IEEE Transactions on Evolutionary Computation , vol.9 , Issue.6 , pp. 628-640
    • Runarsson, T.P.1    Lucas, S.2
  • 49
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • New Brunswick, NJ: Morgan Kaufmann, Online, Available
    • M. L. Littman, "Markov games as a framework for multi-agent reinforcement learning," in Proceedings of the 11th international Conference on Machine Learning (ML-94). New Brunswick, NJ: Morgan Kaufmann, 1994, pp. 157-163. [Online]. Available: citeseer.ist.psu.edu/littman94markov. html
    • (1994) Proceedings of the 11th international Conference on Machine Learning (ML-94) , pp. 157-163
    • Littman, M.L.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.