SCOPUS 정보 검색 플랫폼

Neurocomputing

Volumn 78, Issue 1, 2012, Pages 23-29

Self-teaching adaptive dynamic programming for Gomoku

(3) Zhao, Dongbin a Zhang, Zhen a Dai, Yujie a

a INSTITUTE OF AUTOMATION (China)

Author keywords

Adaptive dynamic programming; Gomoku; Neural network; Reinforcement learning; Temporal difference learning

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; COMPARISON RESULT; CRITIC NETWORK; GOMOKU; TEMPORAL DIFFERENCE LEARNING;

NEURAL NETWORKS; REINFORCEMENT LEARNING;

DYNAMIC PROGRAMMING;

ADAPTIVE DYNAMIC PROGRAMMING; ALGORITHM; ARCHITECTURE; ARTICLE; ARTIFICIAL NEURAL NETWORK; COMPARATIVE STUDY; CONTROLLED STUDY; ERROR; GAME; GOMOKU; LEARNING; PRIORITY JOURNAL; REINFORCEMENT; REWARD; TEACHING;

EID: 82655181840 PISSN: 09252312 EISSN: 18728286 Source Type: Journal
DOI: 10.1016/j.neucom.2011.05.032 Document Type: Article

Times cited : (37)

References (22)

1
- 0026391196
- Experience-based learning experiments using Gomoku, in: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Charlottesville, Virginia, USA, October 13-16
- T.K. William, S. Pham, Experience-based learning experiments using Gomoku, in: Proceedings of IEEE International Conference on Systems, Man, and Cybernetics, Charlottesville, Virginia, USA, vol. 2, October 13-16, 1991, pp. 1405-1410.
- (1991) , vol.2 , pp. 1405-1410
- William, T.K.¹ Pham, S.²

2
- 0024767179
- The history heuristic and alpha-beta search enhancements in practice
- Schaeffer J. The history heuristic and alpha-beta search enhancements in practice. IEEE Trans. Pattern Anal. Mach. Intell. 1989, 11(11):1203-1212.
- (1989) IEEE Trans. Pattern Anal. Mach. Intell. , vol.11 , Issue.11 , pp. 1203-1212
- Schaeffer, J.¹

3
- 84855338829
- Gomoku and Threat-Space Search, doi:.
- L.V. Allis, H.J. Herik, M.P.H. Huntjens, Gomoku and Threat-Space Search, 2010, doi:. http://10.1.1.96.5836.
- (2010)
- Allis, L.V.¹ Herik, H.J.² Huntjens, M.P.H.³

4
- 84944093801
- A neural network that learns to play five-in-a-row
- Freisleben B. A neural network that learns to play five-in-a-row. Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems 1995, 87-90.
- (1995) Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems , pp. 87-90
- Freisleben, B.¹

5
- 0007993990
- Connectionist learning of expert preferences by comparison training
- Morgan Kaufman, San Francisco
- Teasauro G. Connectionist learning of expert preferences by comparison training. Advances in Neural Information Processing Systems 1989, vol. 1:99-106. Morgan Kaufman, San Francisco.
- (1989) Advances in Neural Information Processing Systems , vol.1 , pp. 99-106
- Teasauro, G.¹

6
- 61849147871
- Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task
- Ishida F., Sasaki T., Sakaguchi Y., Shimai H. Reinforcement-learning agents with different temperature parameters explain the variety of human action-selection behavior in a Markov decision process task. Neurocomputing 2009, 72:1979-1984.
- (2009) Neurocomputing , vol.72 , pp. 1979-1984
- Ishida, F.¹ Sasaki, T.² Sakaguchi, Y.³ Shimai, H.⁴

7
- 0025559238
- Neurogammon: a neural-network backgammon program
- Proceedings of International Joint Conference Neural Networks, San Diego, California, USA, June 17-21
- G. Tesauro, Neurogammon: a neural-network backgammon program, in: Proceedings of International Joint Conference Neural Networks, San Diego, California, USA, June 17-21, 1990, pp. 33-40.
- (1990) , pp. 33-40
- Tesauro, G.¹

8
- 0001046225
- Practical issues in temporal difference learning
- Tesauro G. Practical issues in temporal difference learning. Mach. Learn. 1992, 8:257-277.
- (1992) Mach. Learn. , vol.8 , pp. 257-277
- Tesauro, G.¹

9
- 0000985504
- TD-Gammon, A self-teaching backgammon program achieves master-level play
- Tesauro G. TD-Gammon, A self-teaching backgammon program achieves master-level play. Neural Comput. 1994, 6:215-219.
- (1994) Neural Comput. , vol.6 , pp. 215-219
- Tesauro, G.¹

10
- 82655187342
- Study and Practice on Machine Self-Learning of Game-Playing. Master Thesis, Guangxi Normal University
- J.M. Mo, Study and Practice on Machine Self-Learning of Game-Playing. Master Thesis, Guangxi Normal University, 2003.
- (2003)
- Mo, J.M.¹

11
- 0034275416
- Learning to play chess using temporal differences
- Baxter J., Tridgell A., Weaver L. Learning to play chess using temporal differences. Mach. Learn. 2000, 40:243-263.
- (2000) Mach. Learn. , vol.40 , pp. 243-263
- Baxter, J.¹ Tridgell, A.² Weaver, L.³

12
- 77949562818
- Knowledge-free and learning-based methods in intelligent game playing
- Jacek M. Knowledge-free and learning-based methods in intelligent game playing. Stud. Comput. Intell. 2010, 276:71-89.
- (2010) Stud. Comput. Intell. , vol.276 , pp. 71-89
- Jacek, M.¹

13
- 1542471417
- Mini-max initialization for function approximation
- Zhang X.M., Chen Y.Q., Ansari N., Shi Y.Q. Mini-max initialization for function approximation. Neurocomputing 2004, 57:389-409.
- (2004) Neurocomputing , vol.57 , pp. 389-409
- Zhang, X.M.¹ Chen, Y.Q.² Ansari, N.³ Shi, Y.Q.⁴

14
- 79952619657
- Robust high performance reinforcement learning through weighted k-nearest neighbors
- Martin H J.A., Lope J., Maravall D. Robust high performance reinforcement learning through weighted k-nearest neighbors. Neurocomputing 2011, 74:1251-1259.
- (2011) Neurocomputing , vol.74 , pp. 1251-1259
- Martin H, J.A.¹ Lope, J.² Maravall, D.³

15
- 0020970738
- Neuron like adaptive elements that can solve difficult learning control problems
- Barto A.G., Sutton R.S., Anderson C.W. Neuron like adaptive elements that can solve difficult learning control problems. IEEE Trans. Syst. Man Cybern. 1983, 13:834-847.
- (1983) IEEE Trans. Syst. Man Cybern. , vol.13 , pp. 834-847
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

16
- 0002011091
- A Menu of Designs for Reinforcement Learning Over Time
- MIT Press, Cambridge
- Werbos P.J. A Menu of Designs for Reinforcement Learning Over Time. Neural Networks for Control 1990, MIT Press, Cambridge.
- (1990) Neural Networks for Control
- Werbos, P.J.¹

17
- 0003487482
- Athena Scientific, Belmont
- Bertsekas D.P., Tsitsiklis J.N. Neuro-Dynamic Programming 1996, Athena Scientific, Belmont.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

18
- 0001192446
- A neighboring optimal adaptive critic for missile guidance
- Dalton J., Balakrishnan S.N. A neighboring optimal adaptive critic for missile guidance. Math. Comput. Model. 1996, 23(1):175-188.
- (1996) Math. Comput. Model. , vol.23 , Issue.1 , pp. 175-188
- Dalton, J.¹ Balakrishnan, S.N.²

19
- 34548766226
- Particle swarm optimized adaptive dynamic programming
- Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, Honolulu, Hawaiian Islands, USA, April 1-5
- D.B. Zhao, J.Q. Yi, D.R. Liu, Particle swarm optimized adaptive dynamic programming, in: Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, Honolulu, Hawaiian Islands, USA, April 1-5, 2007, pp. 32-37.
- (2007) , pp. 32-37
- Zhao, D.B.¹ Yi, J.Q.² Liu, D.R.³

20
- 82655164054
- Self-play and using an expert to learn to play backgammon with temporal difference learning
- Wiering M.A. Self-play and using an expert to learn to play backgammon with temporal difference learning. J. Intell. Learn. Syst. Appl. 2010, 2:57-68.
- (2010) J. Intell. Learn. Syst. Appl. , vol.2 , pp. 57-68
- Wiering, M.A.¹

21
- 0004102479
- The MIT Press, Cambridge
- Sutton R.S., Barto A.G. Reinforcement Learning: An Introduction 1998, The MIT Press, Cambridge.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

22
- 84855331502
- 2010. http://gomocup.wz.cz/gomoku/download.php.
- (2010)

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.