메뉴 건너뛰기




Volumn 78, Issue 1, 2012, Pages 23-29

Self-teaching adaptive dynamic programming for Gomoku

Author keywords

Adaptive dynamic programming; Gomoku; Neural network; Reinforcement learning; Temporal difference learning

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; COMPARISON RESULT; CRITIC NETWORK; GOMOKU; TEMPORAL DIFFERENCE LEARNING;

EID: 82655181840     PISSN: 09252312     EISSN: 18728286     Source Type: Journal    
DOI: 10.1016/j.neucom.2011.05.032     Document Type: Article
Times cited : (37)

References (22)
  • 1
    • 0026391196 scopus 로고
    • Experience-based learning experiments using Gomoku, in: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Charlottesville, Virginia, USA, October 13-16
    • T.K. William, S. Pham, Experience-based learning experiments using Gomoku, in: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Charlottesville, Virginia, USA, vol. 2, October 13-16, 1991, pp. 1405-1410.
    • (1991) , vol.2 , pp. 1405-1410
    • William, T.K.1    Pham, S.2
  • 2
    • 0024767179 scopus 로고
    • The history heuristic and alpha-beta search enhancements in practice
    • Schaeffer J. The history heuristic and alpha-beta search enhancements in practice. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11(11):1203-1212.
    • (1989) IEEE Trans. Pattern Anal. Mach. Intell. , vol.11 , Issue.11 , pp. 1203-1212
    • Schaeffer, J.1
  • 3
    • 84855338829 scopus 로고    scopus 로고
    • Gomoku and Threat-Space Search, doi:.
    • L.V. Allis, H.J. Herik, M.P.H. Huntjens, Gomoku and Threat-Space Search, 2010, doi:. http://10.1.1.96.5836.
    • (2010)
    • Allis, L.V.1    Herik, H.J.2    Huntjens, M.P.H.3
  • 5
    • 0007993990 scopus 로고
    • Connectionist learning of expert preferences by comparison training
    • Morgan Kaufman, San Francisco
    • Teasauro G. Connectionist learning of expert preferences by comparison training. Advances in Neural Information Processing Systems 1989, vol. 1:99-106. Morgan Kaufman, San Francisco.
    • (1989) Advances in Neural Information Processing Systems , vol.1 , pp. 99-106
    • Teasauro, G.1
  • 6
    • 61849147871 scopus 로고    scopus 로고
    • Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task
    • Ishida F., Sasaki T., Sakaguchi Y., Shimai H. Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task. Neurocomputing 2009, 72:1979-1984.
    • (2009) Neurocomputing , vol.72 , pp. 1979-1984
    • Ishida, F.1    Sasaki, T.2    Sakaguchi, Y.3    Shimai, H.4
  • 7
    • 0025559238 scopus 로고
    • Neurogammon: a neural-network backgammon program
    • Proceedings of International Joint Conference Neural Networks, San Diego, California, USA, June 17-21
    • G. Tesauro, Neurogammon: a neural-network backgammon program, in: Proceedings of International Joint Conference Neural Networks, San Diego, California, USA, June 17-21, 1990, pp. 33-40.
    • (1990) , pp. 33-40
    • Tesauro, G.1
  • 8
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • Tesauro G. Practical issues in temporal difference learning. Mach. Learn. 1992, 8:257-277.
    • (1992) Mach. Learn. , vol.8 , pp. 257-277
    • Tesauro, G.1
  • 9
    • 0000985504 scopus 로고
    • TD-Gammon, A self-teaching backgammon program achieves master-level play
    • Tesauro G. TD-Gammon, A self-teaching backgammon program achieves master-level play. Neural Comput. 1994, 6:215-219.
    • (1994) Neural Comput. , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 10
    • 82655187342 scopus 로고    scopus 로고
    • Study and Practice on Machine Self-Learning of Game-Playing. Master Thesis, Guangxi Normal University
    • J.M. Mo, Study and Practice on Machine Self-Learning of Game-Playing. Master Thesis, Guangxi Normal University, 2003.
    • (2003)
    • Mo, J.M.1
  • 11
    • 0034275416 scopus 로고    scopus 로고
    • Learning to play chess using temporal differences
    • Baxter J., Tridgell A., Weaver L. Learning to play chess using temporal differences. Mach. Learn. 2000, 40:243-263.
    • (2000) Mach. Learn. , vol.40 , pp. 243-263
    • Baxter, J.1    Tridgell, A.2    Weaver, L.3
  • 12
    • 77949562818 scopus 로고    scopus 로고
    • Knowledge-free and learning-based methods in intelligent game playing
    • Jacek M. Knowledge-free and learning-based methods in intelligent game playing. Stud. Comput. Intell. 2010, 276:71-89.
    • (2010) Stud. Comput. Intell. , vol.276 , pp. 71-89
    • Jacek, M.1
  • 13
    • 1542471417 scopus 로고    scopus 로고
    • Mini-max initialization for function approximation
    • Zhang X.M., Chen Y.Q., Ansari N., Shi Y.Q. Mini-max initialization for function approximation. Neurocomputing 2004, 57:389-409.
    • (2004) Neurocomputing , vol.57 , pp. 389-409
    • Zhang, X.M.1    Chen, Y.Q.2    Ansari, N.3    Shi, Y.Q.4
  • 14
    • 79952619657 scopus 로고    scopus 로고
    • Robust high performance reinforcement learning through weighted k-nearest neighbors
    • Martin H J.A., Lope J., Maravall D. Robust high performance reinforcement learning through weighted k-nearest neighbors. Neurocomputing 2011, 74:1251-1259.
    • (2011) Neurocomputing , vol.74 , pp. 1251-1259
    • Martin H, J.A.1    Lope, J.2    Maravall, D.3
  • 15
    • 0020970738 scopus 로고
    • Neuron like adaptive elements that can solve difficult learning control problems
    • Barto A.G., Sutton R.S., Anderson C.W. Neuron like adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 1983, 13:834-847.
    • (1983) IEEE Trans. Syst. Man Cybern. , vol.13 , pp. 834-847
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 16
    • 0002011091 scopus 로고
    • A Menu of Designs for Reinforcement Learning Over Time
    • MIT Press, Cambridge
    • Werbos P.J. A Menu of Designs for Reinforcement Learning Over Time. Neural Networks for Control 1990, MIT Press, Cambridge.
    • (1990) Neural Networks for Control
    • Werbos, P.J.1
  • 18
    • 0001192446 scopus 로고    scopus 로고
    • A neighboring optimal adaptive critic for missile guidance
    • Dalton J., Balakrishnan S.N. A neighboring optimal adaptive critic for missile guidance. Math. Comput. Model. 1996, 23(1):175-188.
    • (1996) Math. Comput. Model. , vol.23 , Issue.1 , pp. 175-188
    • Dalton, J.1    Balakrishnan, S.N.2
  • 19
    • 34548766226 scopus 로고    scopus 로고
    • Particle swarm optimized adaptive dynamic programming
    • Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, Honolulu, Hawaiian Islands, USA, April 1-5
    • D.B. Zhao, J.Q. Yi, D.R. Liu, Particle swarm optimized adaptive dynamic programming, in: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, Honolulu, Hawaiian Islands, USA, April 1-5, 2007, pp. 32-37.
    • (2007) , pp. 32-37
    • Zhao, D.B.1    Yi, J.Q.2    Liu, D.R.3
  • 20
    • 82655164054 scopus 로고    scopus 로고
    • Self-play and using an expert to learn to play backgammon with temporal difference learning
    • Wiering M.A. Self-play and using an expert to learn to play backgammon with temporal difference learning. J. Intell. Learn. Syst. Appl. 2010, 2:57-68.
    • (2010) J. Intell. Learn. Syst. Appl. , vol.2 , pp. 57-68
    • Wiering, M.A.1
  • 22
    • 84855331502 scopus 로고    scopus 로고
    • 2010. http://gomocup.wz.cz/gomoku/download.php.
    • (2010)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.