메뉴 건너뛰기




Volumn 18, Issue 1, 2009, Pages 83-105

Learning and planning in environments with delayed feedback

Author keywords

Delayed feedback; Markov decision processes; Reinforcement learning

Indexed keywords


EID: 58049186782     PISSN: 13872532     EISSN: 15737454     Source Type: Journal    
DOI: 10.1007/s10458-008-9056-7     Document Type: Article
Times cited : (88)

References (27)
  • 3
    • 0032629911 scopus 로고    scopus 로고
    • Markov decision processes with noise-corrupted and delayed state observations
    • J.L. Bander C.C. White III 1999 Markov decision processes with noise-corrupted and delayed state observations Journal of the Operational Research Society 50 660 668
    • (1999) Journal of the Operational Research Society , vol.50 , pp. 660-668
    • Bander, J.L.1    White Iii, C.C.2
  • 6
    • 0041965975 scopus 로고    scopus 로고
    • R-max-A general polynomial time algorithm for near-optimal reinforcement learning
    • R.I. Brafman M. Tennenholtz 2002 R-max-A general polynomial time algorithm for near-optimal reinforcement learning Journal of Machine Learning Research 3 213 231
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 7
    • 0242628951 scopus 로고
    • Markov decision processes with state-information lag
    • 4
    • D.M. Brooks C.T. Leondes 1972 Markov decision processes with state-information lag Operations Research 20 4 904 907
    • (1972) Operations Research , vol.20 , pp. 904-907
    • Brooks, D.M.1    Leondes, C.T.2
  • 8
    • 36349002318 scopus 로고    scopus 로고
    • A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs
    • Fox, R., & Tennenholtz, M. (2007). A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs. In Proceedings of the 22nd Conference on Artificial Intelligence, pp. 553-558.
    • (2007) Proceedings of the 22nd Conference on Artificial Intelligence , pp. 553-558
    • Fox, R.1    Tennenholtz, M.2
  • 9
    • 84947403595 scopus 로고
    • Probability inequalities for sums of bounded random variables
    • 301
    • W. Hoeffding 1963 Probability inequalities for sums of bounded random variables Journal of the American Statistical Association 58 301 13 30
    • (1963) Journal of the American Statistical Association , vol.58 , pp. 13-30
    • Hoeffding, W.1
  • 16
    • 0012327484 scopus 로고    scopus 로고
    • Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes
    • Loch, J., & Singh, S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the 15th International Conference on Machine Learning, pp. 323-331.
    • (1998) Proceedings of the 15th International Conference on Machine Learning , pp. 323-331
    • Loch, J.1    Singh, S.2
  • 18
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit Ś. Sen 2002 Kernel-based reinforcement learning Machine Learning 49 161 178
    • (2002) Machine Learning , vol.49 , pp. 161-178
    • Ormoneit, D.1    Sen, Ś.2
  • 21
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • 1-3
    • S.P. Singh R.S. Sutton 1996 Reinforcement learning with replacing eligibility traces Machine Learning 22 1-3 123 158
    • (1996) Machine Learning , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 22
    • 0028497385 scopus 로고
    • An upper bound on the loss from approximate optimal-value functions
    • 3
    • S.P. Singh R.C. Yee 1994 An upper bound on the loss from approximate optimal-value functions Machine Learning 16 3 227 233
    • (1994) Machine Learning , vol.16 , pp. 227-233
    • Singh, S.P.1    Yee, R.C.2
  • 26
    • 0002891388 scopus 로고    scopus 로고
    • Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space
    • Vijayakumar, S., & Schaal, S. (2000). Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space. In Proceedings of the 17th International Conference on Machine Learning, pp. 1079-1086.
    • (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 1079-1086
    • Vijayakumar, S.1    Schaal, R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.