메뉴 건너뛰기




Volumn 227, Issue , 2007, Pages 273-280

Combining online and offline knowledge in UCT

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER SIMULATION; FUNCTIONAL ANALYSIS; MONTE CARLO METHODS; ONLINE SYSTEMS; RANDOM PROCESSES;

EID: 34547990649     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1273496.1273531     Document Type: Conference Paper
Times cited : (458)

References (16)
  • 1
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multi-armed bandit problem
    • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multi-armed bandit problem. Machine Learning, 47, 235-256.
    • (2002) Machine Learning , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 6
    • 24944563092 scopus 로고    scopus 로고
    • Evaluation in Go by a neural network using soft segmentation
    • Enzenberger, M. (2003). Evaluation in Go by a neural network using soft segmentation. 10th Advances in Computer Games Conference (pp. 97-108).
    • (2003) 10th Advances in Computer Games Conference , pp. 97-108
    • Enzenberger, M.1
  • 7
    • 34250659969 scopus 로고    scopus 로고
    • Modification of UCT with patterns in Monte-Carlo Go
    • 6062, INRIA
    • Gelly, S., Wang, Y., Munos, R., & Teytaud, O. (2006). Modification of UCT with patterns in Monte-Carlo Go (Technical Report 6062). INRIA.
    • (2006) Technical Report
    • Gelly, S.1    Wang, Y.2    Munos, R.3    Teytaud, O.4
  • 10
    • 0000433333 scopus 로고
    • Temporal difference learning of position evaluation in the game of Go
    • San Francisco: Morgan Kaufmann
    • Schraudolph, N., Dayan, P., & Sejnowski, T. (1994). Temporal difference learning of position evaluation in the game of Go. Advances in Neural Information Processing Systems 6 (pp. 817-824). San Francisco: Morgan Kaufmann.
    • (1994) Advances in Neural Information Processing Systems 6 , pp. 817-824
    • Schraudolph, N.1    Dayan, P.2    Sejnowski, T.3
  • 12
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.1
  • 13
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. 7th International Conference on Machine Learning (pp. 216-224).
    • (1990) 7th International Conference on Machine Learning , pp. 216-224
    • Sutton, R.1
  • 14
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • Sutton, R. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems 8 (pp. 1038-1044).
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.