메뉴 건너뛰기




Volumn 1572, Issue , 1999, Pages 11-17

Open theoretical questions in reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords


EID: 84947807317     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/3-540-49097-3_2     Document Type: Conference Paper
Times cited : (37)

References (18)
  • 1
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • Morgan Kaufmann, San Francisco
    • Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 30-37. Morgan Kaufmann, San Francisco.
    • (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
    • Baird, L.C.1
  • 5
    • 0012327484 scopus 로고    scopus 로고
    • Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes
    • Morgan Kaufmann, San Francisco
    • Loch J., and Singh S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco.
    • (1998) Proceedings of the Fifteenth International Conference on Machine Learning
    • Loch, J.1    Singh, S.2
  • 6
    • 0029752592 scopus 로고    scopus 로고
    • Average reward reinforcement learning: Foundations, algorithms, and empirical results
    • Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22: 159-196.
    • (1996) Machine Learning , vol.22 , pp. 159-196
    • Mahadevan, S.1
  • 7
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less real time
    • Moore, A. W., and Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13: 103-130.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 8
    • 0003824303 scopus 로고
    • Ph. D. thesis, University of Massachusetts, Amherst. Appeared as CMPSCI Technical Report 93-77
    • Singh, S. P. (1993). Learning to Solve Markovian Decision Processes. Ph. D. thesis, University of Massachusetts, Amherst. Appeared as CMPSCI Technical Report 93-77.
    • (1993) Learning to Solve Markovian Decision Processes
    • Singh, S.P.1
  • 10
    • 0032114627 scopus 로고    scopus 로고
    • Analytical mean squared error curves for temporal difference learning
    • Singh S., and Dayan P. (1998). Analytical mean squared error curves for temporal difference learning. Machine Learning.
    • (1998) Machine Learning
    • Singh, S.1    Dayan, P.2
  • 11
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Singh, S. P., and Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22: 123-158.
    • (1996) Machine Learning , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 13
  • 15
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38: 58-68.
    • (1995) Communications of the ACM , vol.38 , pp. 58-68
    • Tesauro, G.J.1
  • 16
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42: 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.