SCOPUS 정보 검색 플랫폼

Autonomous Agents and Multi-Agent Systems

Volumn 18, Issue 1, 2009, Pages 83-105

Learning and planning in environments with delayed feedback

(4) Walsh, Thomas J a Nouri, Ali a Li, Lihong a Littman, Michael L a

a RUTGERS UNIVERSITY (United States)

Author keywords

Delayed feedback; Markov decision processes; Reinforcement learning

Indexed keywords

EID: 58049186782 PISSN: 13872532 EISSN: 15737454 Source Type: Journal
DOI: 10.1007/s10458-008-9056-7 Document Type: Article

Times cited : (88)

References (27)

1
- 58049153208
- Closed-loop control with delayed information
- Altman, E., & Nain, P. Closed-loop control with delayed information. In Proceedings of the ACM SIGMETRICS and Performance 1-5, pp. 193-204.
- Proceedings of the ACM SIGMETRICS and Performance , vol.1-5 , pp. 193-204
- Altman, E.¹ Nain, R.²

2
- 0031073475
- Locally weighted learning for control
- 1-5
- C.G. Atkeson A.W. Moore S. Schaal 1997 Locally weighted learning for control Artificial Intelligence Review 11 1-5 75 113
- (1997) Artificial Intelligence Review , vol.11 , pp. 75-113
- Atkeson, C.G.¹ Moore, A.W.² Schaal, S.³

3
- 0032629911
- Markov decision processes with noise-corrupted and delayed state observations
- J.L. Bander C.C. White III 1999 Markov decision processes with noise-corrupted and delayed state observations Journal of the Operational Research Society 50 660 668
- (1999) Journal of the Operational Research Society , vol.50 , pp. 660-668
- Bander, J.L.¹ White Iii, C.C.²

4
- 0003565783
- Athena Scientific
- Bertsekas, D. P. (2001). Dynamic programming and optimal control (2nd ed., Vol. 1/2). Athena Scientific.
- (2001) Dynamic Programming and Optimal Control (2nd Ed , vol.1-2
- Bertsekas, D.P.¹

5
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- Cambridge, MA: MIT Press
- Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in neural information processing systems: Proceedings of the 1994 conference (pp. 369-376). Cambridge, MA: MIT Press.
- (1995) Advances in Neural Information Processing Systems: Proceedings of the 1994 Conference , pp. 369-376
- Boyan, J.A.¹ Moore, A.W.²

6
- 0041965975
- R-max-A general polynomial time algorithm for near-optimal reinforcement learning
- R.I. Brafman M. Tennenholtz 2002 R-max-A general polynomial time algorithm for near-optimal reinforcement learning Journal of Machine Learning Research 3 213 231
- (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

7
- 0242628951
- Markov decision processes with state-information lag
- 4
- D.M. Brooks C.T. Leondes 1972 Markov decision processes with state-information lag Operations Research 20 4 904 907
- (1972) Operations Research , vol.20 , pp. 904-907
- Brooks, D.M.¹ Leondes, C.T.²

8
- 36349002318
- A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs
- Fox, R., & Tennenholtz, M. (2007). A reinforcement learning algorithm with polynomial interaction complexity for only-costly-observable MDPs. In Proceedings of the 22nd Conference on Artificial Intelligence, pp. 553-558.
- (2007) Proceedings of the 22nd Conference on Artificial Intelligence , pp. 553-558
- Fox, R.¹ Tennenholtz, M.²

9
- 84947403595
- Probability inequalities for sums of bounded random variables
- 301
- W. Hoeffding 1963 Probability inequalities for sums of bounded random variables Journal of the American Statistical Association 58 301 13 30
- (1963) Journal of the American Statistical Association , vol.58 , pp. 13-30
- Hoeffding, W.¹

10
- 38049096465
- Kernel-based models for reinforcement learning
- Jong, N. K., & Stone, P. (2006). Kernel-based models for reinforcement learning. In Proceedings of the 2006 ICML Kernel Machines and Reinforcement Learning Workshop.
- (2006) Proceedings of the 2006 ICML Kernel Machines and Reinforcement Learning Workshop
- Jong, N.K.¹ Stone, P.²

11
- 0032073263
- Planning and acting in partially observable stochastic domains
- 1-2
- L.P. Kaelbling M.L. Littman A.R. Cassandra 1998 Planning and acting in partially observable stochastic domains Artificial Intelligence 101 1-2 99 134
- (1998) Artificial Intelligence , vol.101 , pp. 99-134
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

12
- 23244466805
- PhD thesis, University College London, UK
- Kakade, S. (2003). On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, UK.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.¹

13
- 0037399236
- Markov decision processes with delays and asynchronous cost collection
- K.V. Katsikopoulos S.E. Engelbrecht 2003 Markov decision processes with delays and asynchronous cost collection IEEE Transactions on Automatic Control 48 568 574
- (2003) IEEE Transactions on Automatic Control , vol.48 , pp. 568-574
- Katsikopoulos, K.V.¹ Engelbrecht, S.E.²

14
- 0003673017
- PhD thesis, Carnegie Mellon University, Pittsburgh, PA
- Lin, L.-J. (1993). Reinforcement Learning for Robots using Neural Networks. PhD thesis, Carnegie Mellon University, Pittsburgh, PA.
- (1993) Reinforcement Learning for Robots Using Neural Networks
- Lin, L.-J.¹

15
- 0003861655
- PhD thesis, Brown University, Providence, RI, 1996
- Littman, M. L. (1996). Algorithms for sequential decision making. PhD thesis, Brown University, Providence, RI, 1996.
- (1996) Algorithms for Sequential Decision Making
- Littman, M.L.¹

16
- 0012327484
- Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes
- Loch, J., & Singh, S. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the 15th International Conference on Machine Learning, pp. 323-331.
- (1998) Proceedings of the 15th International Conference on Machine Learning , pp. 323-331
- Loch, J.¹ Singh, S.²

17
- 0008833147
- Rates of convergence for variable resolution schemes in optimal control
- Munos, R., & Moore, A. W. (2000). Rates of convergence for variable resolution schemes in optimal control. In Proceedings of the 17th International Conference on Machine Learning, pp. 647-654.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 647-654
- Munos, R.¹ Moore, A.W.²

18
- 0036832956
- Kernel-based reinforcement learning
- D. Ormoneit Ś. Sen 2002 Kernel-based reinforcement learning Machine Learning 49 161 178
- (2002) Machine Learning , vol.49 , pp. 161-178
- Ormoneit, D.¹ Sen, Ś.²

19
- 0000977910
- The complexity of Markov decision processes
- 3
- C.H. Papadimitriou J.N. Tsitsiklis 1987 The complexity of Markov decision processes Mathematics of Operations Research 12 3 441 450
- (1987) Mathematics of Operations Research , vol.12 , pp. 441-450
- Papadimitriou, C.H.¹ Tsitsiklis, J.N.²

20
- 85102627959
- Wiley New York
- Puterman M.L. (1994) Markov decision processes: Discrete stochastic dynamic programming. Wiley, New York
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

21
- 0029753630
- Reinforcement learning with replacing eligibility traces
- 1-3
- S.P. Singh R.S. Sutton 1996 Reinforcement learning with replacing eligibility traces Machine Learning 22 1-3 123 158
- (1996) Machine Learning , vol.22 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

22
- 0028497385
- An upper bound on the loss from approximate optimal-value functions
- 3
- S.P. Singh R.C. Yee 1994 An upper bound on the loss from approximate optimal-value functions Machine Learning 16 3 227 233
- (1994) Machine Learning , vol.16 , pp. 227-233
- Singh, S.P.¹ Yee, R.C.²

23
- 33749255382
- PAC model-free reinforcement learning
- Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006). PAC model-free reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning, pp. 881-888.
- (2006) Proceedings of the 23rd International Conference on Machine Learning , pp. 881-888
- Strehl, A.L.¹ Li, L.² Wiewiora, E.³ Langford, J.⁴ Littman, M.L.⁵

24
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- MIT Press Cambridge, MA
- Sutton R.S. (1996) Generalization in reinforcement learning: Successful examples using sparse coarse coding. In: Touretzky D.S., Mozer M.C., HasselmoM. E. (Eds) Advances in neural information processing systems 8. MIT Press, Cambridge, MA, pp 1038-1045
- (1996) Advances in Neural Information Processing Systems 8 , pp. 1038-1045
- Sutton, R.S.¹ Touretzky, D.S.² Mozer, M.C.³ Hasselmo, M.E.⁴

25
- 0004102479
- MIT Press Cambridge, MA
- Sutton R.S., Barto A.G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

26
- 0002891388
- Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space
- Vijayakumar, S., & Schaal, S. (2000). Locally weighted projection regression: An O(n) algorithm for incremental real time learning in high dimensional space. In Proceedings of the 17th International Conference on Machine Learning, pp. 1079-1086.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 1079-1086
- Vijayakumar, S.¹ Schaal, R.²

27
- 84867833986
- A POMDP approximation algorithm that anticipates the need to observe
- Zubek, V. B., & Dietterich, T. G. (2000). A POMDP approximation algorithm that anticipates the need to observe. In Proceedings of the Pacific Rim International Conference on Artificial Intelligence, pp. 521-532.
- (2000) Proceedings of the Pacific Rim International Conference on Artificial Intelligence , pp. 521-532
- Zubek, V.B.¹ Dietterich, T.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.