메뉴 건너뛰기




Volumn , Issue , 2016, Pages 1476-1483

Increasing the action gap: New operators for reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; REINFORCEMENT LEARNING;

EID: 85007236718     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (168)

References (32)
  • 5
    • 85012688561 scopus 로고
    • Princeton, NJ: Princeton University Press
    • Bellman, R. E. 1957. Dynamic programming. Princeton, NJ: Princeton University Press.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 7
    • 84861380255 scopus 로고    scopus 로고
    • Q-learning and enhanced policy iteration in discounted dynamic programming
    • Bertsekas, D. P., and Yu, H. 2012. Q-learning and enhanced policy iteration in discounted dynamic programming. Mathematics of Operations Research 37(1):66-94.
    • (2012) Mathematics of Operations Research , vol.37 , Issue.1 , pp. 66-94
    • Bertsekas, D.P.1    Yu, H.2
  • 8
    • 79960439729 scopus 로고    scopus 로고
    • Approximate policy iteration: A survey and some new methods
    • Bertsekas, D. P. 2011. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications 9(3):310-335.
    • (2011) Journal of Control Theory and Applications , vol.9 , Issue.3 , pp. 310-335
    • Bertsekas, D.P.1
  • 13
    • 84947249480 scopus 로고    scopus 로고
    • Optimal hour ahead bidding in the real time electricity market with battery storage using approximate dynamic programming
    • Jiang, D. R., and Powell, W. B. 2015. Optimal hour ahead bidding in the real time electricity market with battery storage using approximate dynamic programming. INFORMS Journal on Computing 27(3):525 - 543.
    • (2015) INFORMS Journal on Computing , vol.27 , Issue.3 , pp. 525-543
    • Jiang, D.R.1    Powell, W.B.2
  • 17
  • 18
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • Munos, R., and Moore, A. 2002. Variable resolution discretization in optimal control. Machine learning 49(2- 3):291-323.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 19
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • Ormoneit, D., and Sen, S. 2002. Kernel-based reinforcement learning. Machine learning 49(2-3):161-178.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 25
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3(1):9-44.
    • (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 26
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • Sutton, R. S. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, 1038- 1044.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.S.1
  • 27
    • 0029276036 scopus 로고
    • Temporal difference learning and TDGammon
    • Tesauro, G. 1995. Temporal difference learning and TDGammon. Communications of the ACM 38(3).
    • (1995) Communications of the ACM , vol.38 , Issue.3
    • Tesauro, G.1
  • 32
    • 0004049893 scopus 로고
    • Ph.D. Dissertation, Cambridge University, Cambridge, England
    • Watkins, C. 1989. Learning From Delayed Rewards. Ph.D. Dissertation, Cambridge University, Cambridge, England.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.