메뉴 건너뛰기




Volumn FS-15-06, Issue , 2015, Pages 29-37

Deep recurrent q-learning for partially observable MDPs

Author keywords

[No Author keywords available]

Indexed keywords

COMPLEX NETWORKS; DECISION MAKING; INTELLIGENT AGENTS; REINFORCEMENT LEARNING;

EID: 84964687570     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (1150)

References (14)
  • 1
    • 84995343329 scopus 로고    scopus 로고
    • Reinforcement learning with long short- Term memory
    • MIT Press
    • Bakker, B. 2001. Reinforcement learning with long short- Term memory. In NIPS, 1475-1482. MIT Press.
    • (2001) NIPS , pp. 1475-1482
    • Bakker, B.1
  • 3
    • 84937779024 scopus 로고    scopus 로고
    • Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
    • Ghahramani, Z. Welling, M. Cortes, C. Lawrence, N. and Weinberger, K. eds., Curran Associates, Inc
    • Guo, X.; Singh, S.; Lee, H.; Lewis, R. L.; and Wang, X. 2014. Deep learning for real-time atari game play using offline monte-carlo tree search planning. In Ghahramani, Z.; Welling, M.; Cortes, C.; Lawrence, N.; and Weinberger, K., eds., Advances in Neural Information Processing Systems 27. Curran Associates, Inc. 3338-3346.
    • (2014) Advances in Neural Information Processing Systems , vol.27 , pp. 3338-3346
    • Guo, X.1    Singh, S.2    Lee, H.3    Lewis, R.L.4    Wang, X.5
  • 4
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Comput. 9(8): 1735-1780.
    • (1997) Neural Comput , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 10
    • 84893343292 scopus 로고    scopus 로고
    • Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude
    • Tieleman, T., and Hinton, G. 2012. Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.
    • (2012) COURSERA: Neural Networks for Machine Learning
    • Tieleman, T.1    Hinton, G.2
  • 11
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N. and Roy, B. V. 1997. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5):674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Roy, B.V.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.