SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 1572, Issue , 1999, Pages 11-17

Open theoretical questions in reinforcement learning

(1) Sutton, Richard S a

a AT AND T LABS RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 84947807317 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/3-540-49097-3_2 Document Type: Conference Paper

Times cited : (37)

References (18)

1
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Morgan Kaufmann, San Francisco
- Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 30-37. Morgan Kaufmann, San Francisco.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
- Baird, L.C.¹

2
- 0003487482
- Athena Scientific, Belmont, MA
- Bertsekas, D. P., and Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 85156187730
- Improving elevator performance using reinforcement learning
- MIT Press, Cambridge, MA
- Crites, R. H., and Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1017-1023. MIT Press, Cambridge, MA.
- (1996) Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference , pp. 1017-1023
- Crites, R.H.¹ Barto, A.G.²

4
- 84947771545
- (in prep.)
- Kearns, M., Mansour, Y., Ng, A. Y. (in prep.). Sparse sampling methods for planning and learning in large and partially observable Markov decision processes.
- Sparse sampling methods for planning and learning in large and partially observable Markov decision processes
- Kearns, M.¹ Mansour, Y.² Ng, A.Y.³

5
- 0012327484
- Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes
- Morgan Kaufmann, San Francisco
- Loch J., and Singh S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco.
- (1998) Proceedings of the Fifteenth International Conference on Machine Learning
- Loch, J.¹ Singh, S.²

6
- 0029752592
- Average reward reinforcement learning: Foundations, algorithms, and empirical results
- Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22: 159-196.
- (1996) Machine Learning , vol.22 , pp. 159-196
- Mahadevan, S.¹

7
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- Moore, A. W., and Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13: 103-130.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

8
- 0003824303
- Ph. D. thesis, University of Massachusetts, Amherst. Appeared as CMPSCI Technical Report 93-77
- Singh, S. P. (1993). Learning to Solve Markovian Decision Processes. Ph. D. thesis, University of Massachusetts, Amherst. Appeared as CMPSCI Technical Report 93-77.
- (1993) Learning to Solve Markovian Decision Processes
- Singh, S.P.¹

9
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- MIT Press, Cambridge, MA
- Singh, S. P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference, pp. 974-980. MIT Press, Cambridge, MA.
- (1997) Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference , pp. 974-980
- Singh, S.P.¹ Bertsekas, D.²

10
- 0032114627
- Analytical mean squared error curves for temporal difference learning
- Singh S., and Dayan P. (1998). Analytical mean squared error curves for temporal difference learning. Machine Learning.
- (1998) Machine Learning
- Singh, S.¹ Dayan, P.²

11
- 0029753630
- Reinforcement learning with replacing eligibility traces
- Singh, S. P., and Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22: 123-158.
- (1996) Machine Learning , vol.22 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

12
- 0003617454
- Ph. D. thesis, University of Massachusetts, Amherst
- Sutton, R. S. (1984). Temporal Credit Assignment in Reinforcement Learning. Ph. D. thesis, University of Massachusetts, Amherst.
- (1984) Temporal Credit Assignment in Reinforcement Learning
- Sutton, R.S.¹

13
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- MIT Press, Cambridge, MA
- Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1038-1044. MIT Press, Cambridge, MA.
- (1996) Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference , pp. 1038-1044
- Sutton, R.S.¹

14
- 0004102479
- MIT Press, Cambridge, MA
- Sutton, R. S., and Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

15
- 0029276036
- Temporal difference learning and TD-Gammon
- Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38: 58-68.
- (1995) Communications of the ACM , vol.38 , pp. 58-68
- Tesauro, G.J.¹

16
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42: 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

17
- 0004049893
- Ph. D. thesis, Cambridge University
- Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph. D. thesis, Cambridge University.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

18
- 34249833101
- Q-learning
- Watkins, C. J. C. H., and Dayan, P. (1992). Q-learning. Machine Learning, 8: 279-292.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.