SCOPUS 정보 검색 플랫폼

Neural Computation

Volumn 11, Issue 8, 1999, Pages 2017-2060

A unified analysis of value-function-based reinforcement-learning algorithms

(2) Szepesvári, Csaba a Littman, Michael L b

a Mindmaker Ltd (Hungary)

b Duke University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ARTICLE; ARTIFICIAL INTELLIGENCE; LEARNING; REINFORCEMENT;

ALGORITHMS; ARTIFICIAL INTELLIGENCE; LEARNING; REINFORCEMENT (PSYCHOLOGY);

EID: 0033570798 PISSN: 08997667 EISSN: None Source Type: Journal
DOI: 10.1162/089976699300016070 Document Type: Article

Times cited : (176)

References (46)

1
- 0029210635
- Learning to act using real-time dynamic programming
- Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1), 81-138.
- (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

2
- 0003602259
- (Tech. Rep. No. 89-95). Amherst, MA: Department of Computer and Information Science, University of Massachusetts
- Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1989). Learning and sequential decision making (Tech. Rep. No. 89-95). Amherst, MA: Department of Computer and Information Science, University of Massachusetts.
- (1989) Learning and Sequential Decision Making
- Barto, A.G.¹ Sutton, R.S.² Watkins, C.J.C.H.³

3
- 0003778897
- New York: Springer-Verlag
- Benveniste, A., Métivier, M., & Priouret, P. (1990). Adaptive algorithms and stochastic approximations. New York: Springer-Verlag.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Métivier, M.² Priouret, P.³

4
- 0024680419
- Adaptive aggregation methods for infinite horizon dynamic programming
- Bertsekas, D. P., & Castañon, D. A. (1989). Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, 34(6), 589-598.
- (1989) IEEE Transactions on Automatic Control , vol.34 , Issue.6 , pp. 589-598
- Bertsekas, D.P.¹ Castañon, D.A.²

5
- 0003636164
- Englewood Cliffs, NJ: Prentice-Hall
- Bertsekas, D. P., & Tsitsiklis, J. N. (1989). Parallel and distributed computation: Numerical methods. Englewood Cliffs, NJ: Prentice-Hall.
- (1989) Parallel and Distributed Computation: Numerical Methods
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Belmont, MA: Athena Scientific.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

7
- 84880694195
- Stable function approximation in dynamic programming
- A. Prieditis & S. Russell (Eds.), San Mateo: Morgan Kaufmann
- Gordon, G. J. (1995). Stable function approximation in dynamic programming. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 261-268). San Mateo: Morgan Kaufmann.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 261-268
- Gordon, G.J.¹

8
- 0008929555
- Embedding fields: A theory of learning with physiological implications
- Grossberg, S. (1969). Embedding fields: A theory of learning with physiological implications. Journal of Mathematical Psychology, 6, 209-239.
- (1969) Journal of Mathematical Psychology , vol.6 , pp. 209-239
- Grossberg, S.¹

9
- 0002357911
- Convergence of indirect adaptive asynchronous value iteration algorithms
- J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
- Gullapalli, V., & Barto, A. G. (1994). Convergence of indirect adaptive asynchronous value iteration algorithms. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6 (pp. 695-702). San Mateo, CA: Morgan Kaufmann.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 695-702
- Gullapalli, V.¹ Barto, A.G.²

10
- 0004199140
- Reading, MA: Addison-Wesley
- Hecht-Nielsen, R. (1991). Neurocomputing. Reading, MA: Addison-Wesley.
- (1991) Neurocomputing
- Hecht-Nielsen, R.¹

11
- 85120861483
- Consideration of risk in reinforcement learning
- San Mateo, CA: Morgan Kaufmann
- Heger, M. (1994). Consideration of risk in reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 105-111). San Mateo, CA: Morgan Kaufmann.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 105-111
- Heger, M.¹

12
- 0000929496
- Multiagent reinforcement learning: Theoretical framework and an algorithm
- J. Shavlik (Ed.), San Mateo, CA: Morgan Kaufmann
- Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In J. Shavlik (Ed.), Proceedings of the Fifteenth International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
- (1998) Proceedings of the Fifteenth International Conference on Machine Learning
- Hu, J.¹ Wellman, M.P.²

13
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

14
- 0028566780
- When the best move isn't optimal: Q-learning with exploration
- John, G. H. (1994). When the best move isn't optimal: Q-learning with exploration. In Proceedings of the Twelfth National Conference on Artificial Intelligence (p. 1464).
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence , pp. 1464
- John, G.H.¹

15
- 0029679044
- Reinforcement learning: A survey
- Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

16
- 85034138798
- Unpublished manuscript
- Konda, V., & Borkar, V. (1997). Actor-critic type learning algorithms for Markov decision processes. Unpublished manuscript. Available at: http://donald-duck.mit.edu/-konda/siam.ps.gz.
- (1997) Actor-critic Type Learning Algorithms for Markov Decision Processes
- Konda, V.¹ Borkar, V.²

17
- 0025400088
- Real-time heuristic search
- Korf, R. E. (1990). Real-time heuristic search. Artificial Intelligence, 42, 189-211.
- (1990) Artificial Intelligence , vol.42 , pp. 189-211
- Korf, R.E.¹

18
- 0003452601
- Berlin: Springer-Verlag
- Kushner, H., & Clark, D. (1978). Stochastic approximation methods for constrained and unconstrained systems. Berlin: Springer-Verlag.
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems
- Kushner, H.¹ Clark, D.²

19
- 0004066022
- New York: Springer-Verlag
- Kushner, H., & Yin, G. (1997). Stochastic approximation algorithms and applications. New York: Springer-Verlag.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.¹ Yin, G.²

20
- 85149834820
- Markov games as a framework for multi-agent reinforcement learning
- San Mateo, CA: Morgan Kaufmann
- Littman, M. L. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 157-163). San Mateo, CA: Morgan Kaufmann.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 157-163
- Littman, M.L.¹

21
- 0003861655
- Unpublished Ph.D. dissertation, Brown University
- Littman, M. L. (1996). Algorithms for sequential decision making. Unpublished Ph.D. dissertation, Brown University.
- (1996) Algorithms for Sequential Decision Making
- Littman, M.L.¹

22
- 0001961616
- A generalized reinforcement-learning model: Convergence and applications
- L. Saitta (Ed.)
- Littman, M. L., & Szepesvári, C. (1996). A generalized reinforcement-learning model: Convergence and applications. In L. Saitta (Ed.), Proceedings of the Thirteenth International Conference on Machine Learning (pp. 310-318).
- (1996) Proceedings of the Thirteenth International Conference on Machine Learning , pp. 310-318
- Littman, M.L.¹ Szepesvári, C.²

23
- 0017526570
- Analysis of recursive stochastic algorithms
- Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Control, 22, 551-575.
- (1977) IEEE Trans. Automat. Control , vol.22 , pp. 551-575
- Ljung, L.¹

24
- 0029752592
- Average reward reinforcement learning: Foundations, algorithms, and empirical results
- 2
- Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1/2/3), 159-196.
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 159-196
- Mahadevan, S.¹

25
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

26
- 0004260006
- Orlando, FL: Academic Press
- Owen, G. (1982). Game theory (2nd ed.). Orlando, FL: Academic Press.
- (1982) Game Theory (2nd Ed.)
- Owen, G.¹

27
- 85102627959
- New York: Wiley
- Puterman, M. L. (1994). Markov decision processes - Discrete stochastic dynamic programming. New York: Wiley.
- (1994) Markov Decision Processes - Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

28
- 0039753967
- Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm
- Ribeiro, C. (1995). Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm. In Proceedings of ICANN'95 (Vol. 1, pp. 455-460).
- (1995) Proceedings of ICANN'95 , vol.1 , pp. 455-460
- Ribeiro, C.¹

29
- 0010220924
- Q-learning combined with spreading: Convergence and results
- Ribeiro, C., & Szepesvári, C. (1996). Q-learning combined with spreading: Convergence and results. In Proceedings of ISRF-IEE International Conference: Intelligent and Cognitive Systems, Neural Networks Symposium (pp. 32-36).
- (1996) Proceedings of ISRF-IEE International Conference: Intelligent and Cognitive Systems, Neural Networks Symposium , pp. 32-36
- Ribeiro, C.¹ Szepesvári, C.²

30
- 0000016172
- A stochastic approximation method
- Robbins, H., & Monro, S. (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22, 400-407.
- (1951) Annals of Mathematical Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

31
- 0002686402
- A convergence theorem for non-negative almost supermartingales and some applications
- J. Rustagi (Ed.), New York: Academic Press
- Robbins, H., & Siegmund, D. (1971). A convergence theorem for non-negative almost supermartingales and some applications. In J. Rustagi (Ed.), Optimizing methods in statistics (pp. 235-257). New York: Academic Press.
- (1971) Optimizing Methods in Statistics , pp. 235-257
- Robbins, H.¹ Siegmund, D.²

32
- 0003636089
- (Tech. Rep. No. CUED/F-INFENG/TR 166). Cambridge: Cambridge University, Engineering Department
- Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems (Tech. Rep. No. CUED/F-INFENG/TR 166). Cambridge: Cambridge University, Engineering Department.
- (1994) On-line Q-learning Using Connectionist Systems
- Rummery, G.A.¹ Niranjan, M.²

33
- 85152626183
- A reinforcement learning method for maximizing undiscounted rewards
- San Mateo, CA: Morgan Kaufmann
- Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning (pp. 298-305). San Mateo, CA: Morgan Kaufmann.
- (1993) Proceedings of the Tenth International Conference on Machine Learning , pp. 298-305
- Schwartz, A.¹

34
- 0021594295
- Aggregation methods for large Markov chains
- G. Iazola, P. J. Coutois, & A. Hordijk (Eds.), Amsterdam: Elsevier
- Schweitzer, P. J. (1984). Aggregation methods for large Markov chains. In G. Iazola, P. J. Coutois, & A. Hordijk (Eds.), Mathematical computer performance and reliability (pp. 275-302). Amsterdam: Elsevier.
- (1984) Mathematical Computer Performance and Reliability , pp. 275-302
- Schweitzer, P.J.¹

35
- 85153965130
- Reinforcement learning with soft state aggregation
- G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
- Singh, S., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 361-368). Cambridge, MA: MIT Press.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
- Singh, S.¹ Jaakkola, T.² Jordan, M.³

36
- 0040560367
- in press
- Singh, S., Jaakkola, T., Littman, M. L., & Szepesvári, C. (in press). Convergence results for single-step on-policy reinforcement-learning algorithms.
- Convergence Results for Single-step On-policy Reinforcement-learning Algorithms
- Singh, S.¹ Jaakkola, T.² Littman, M.L.³ Szepesvári, C.⁴

37
- 0029753630
- Reinforcement learning with replacing eligibility traces
- 2
- Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1/2/3), 123-158.
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

38
- 0004102479
- Cambridge, MA: MIT Press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

39
- 84898998140
- The asymptotic convergence rate of Q-learning
- M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
- Szepesvári, M. (1998a). The asymptotic convergence rate of Q-learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10. Cambridge, MA: MIT Press.
- (1998) Advances in Neural Information Processing Systems , vol.10
- Szepesvári, M.¹

40
- 0008876345
- Unpublished Ph.D. dissertation, Bolyai Institute of Mathematics, "József Attila" University, Szeged, Hungary
- Szepesvári, C. (1998b). Static and dynamic aspects of optimal sequential decision making. Unpublished Ph.D. dissertation, Bolyai Institute of Mathematics, "József Attila" University, Szeged, Hungary.
- (1998) Static and Dynamic Aspects of Optimal Sequential Decision Making
- Szepesvári, C.¹

41
- 0003629453
- (Tech. Rep. No. CS-96-11). Providence, RI: Brown University
- Szepesvári, C., & Littman, M. L. (1996). Generalized Markov decision processes: Dynamic-programming and reinforcement-learning algorithms (Tech. Rep. No. CS-96-11). Providence, RI: Brown University.
- (1996) Generalized Markov Decision Processes: Dynamic-programming and Reinforcement-learning Algorithms
- Szepesvári, C.¹ Littman, M.L.²

42
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185-202.
- (1994) Machine Learning , vol.16 , Issue.3 , pp. 185-202
- Tsitsiklis, J.N.¹

43
- 0011628309
- Fictitious play applied to sequences of games and discounted stochastic games
- Vrieze, O. J., & Tijs, S. H. (1982). Fictitious play applied to sequences of games and discounted stochastic games. International Journal of Game Theory, 11(2), 71-85.
- (1982) International Journal of Game Theory , vol.11 , Issue.2 , pp. 71-85
- Vrieze, O.J.¹ Tijs, S.H.²

44
- 0004049893
- Unpublished Ph.D. dissertation, King's College, Cambridge
- Watkins, C. J. C. H. (1989). Learning from delayed rewards. Unpublished Ph.D. dissertation, King's College, Cambridge.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

45
- 34249833101
- Q-learning
- Watkins, C. J. C. H., & Dayan, P. (1992). Q-learning. Machine Learning, 8(3), 279-292.
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

46
- 0039967456
- (Tech. Rep. No. NU-CCS-93-11). Boston: Northeastern University, College of Computer Science
- Williams, R. J., & Baird, III, L. C. (1993). Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems (Tech. Rep. No. NU-CCS-93-11). Boston: Northeastern University, College of Computer Science.
- (1993) Analysis of Some Incremental Variants of Policy Iteration: First Steps Toward Understanding Actor-critic Learning Systems
- Williams, R.J.¹ Baird L.C. III²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.