메뉴 건너뛰기




Volumn 11, Issue 8, 1999, Pages 2017-2060

A unified analysis of value-function-based reinforcement-learning algorithms

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ARTICLE; ARTIFICIAL INTELLIGENCE; LEARNING; REINFORCEMENT;

EID: 0033570798     PISSN: 08997667     EISSN: None     Source Type: Journal    
DOI: 10.1162/089976699300016070     Document Type: Article
Times cited : (176)

References (46)
  • 1
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1), 81-138.
    • (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 2
    • 0003602259 scopus 로고
    • (Tech. Rep. No. 89-95). Amherst, MA: Department of Computer and Information Science, University of Massachusetts
    • Barto, A. G., Sutton, R. S., & Watkins, C. J. C. H. (1989). Learning and sequential decision making (Tech. Rep. No. 89-95). Amherst, MA: Department of Computer and Information Science, University of Massachusetts.
    • (1989) Learning and Sequential Decision Making
    • Barto, A.G.1    Sutton, R.S.2    Watkins, C.J.C.H.3
  • 4
    • 0024680419 scopus 로고
    • Adaptive aggregation methods for infinite horizon dynamic programming
    • Bertsekas, D. P., & Castañon, D. A. (1989). Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control, 34(6), 589-598.
    • (1989) IEEE Transactions on Automatic Control , vol.34 , Issue.6 , pp. 589-598
    • Bertsekas, D.P.1    Castañon, D.A.2
  • 7
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • A. Prieditis & S. Russell (Eds.), San Mateo: Morgan Kaufmann
    • Gordon, G. J. (1995). Stable function approximation in dynamic programming. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 261-268). San Mateo: Morgan Kaufmann.
    • (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 261-268
    • Gordon, G.J.1
  • 8
    • 0008929555 scopus 로고
    • Embedding fields: A theory of learning with physiological implications
    • Grossberg, S. (1969). Embedding fields: A theory of learning with physiological implications. Journal of Mathematical Psychology, 6, 209-239.
    • (1969) Journal of Mathematical Psychology , vol.6 , pp. 209-239
    • Grossberg, S.1
  • 9
    • 0002357911 scopus 로고
    • Convergence of indirect adaptive asynchronous value iteration algorithms
    • J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
    • Gullapalli, V., & Barto, A. G. (1994). Convergence of indirect adaptive asynchronous value iteration algorithms. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6 (pp. 695-702). San Mateo, CA: Morgan Kaufmann.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 695-702
    • Gullapalli, V.1    Barto, A.G.2
  • 12
    • 0000929496 scopus 로고    scopus 로고
    • Multiagent reinforcement learning: Theoretical framework and an algorithm
    • J. Shavlik (Ed.), San Mateo, CA: Morgan Kaufmann
    • Hu, J., & Wellman, M. P. (1998). Multiagent reinforcement learning: Theoretical framework and an algorithm. In J. Shavlik (Ed.), Proceedings of the Fifteenth International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
    • (1998) Proceedings of the Fifteenth International Conference on Machine Learning
    • Hu, J.1    Wellman, M.P.2
  • 13
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 17
    • 0025400088 scopus 로고
    • Real-time heuristic search
    • Korf, R. E. (1990). Real-time heuristic search. Artificial Intelligence, 42, 189-211.
    • (1990) Artificial Intelligence , vol.42 , pp. 189-211
    • Korf, R.E.1
  • 20
  • 23
    • 0017526570 scopus 로고
    • Analysis of recursive stochastic algorithms
    • Ljung, L. (1977). Analysis of recursive stochastic algorithms. IEEE Trans. Automat. Control, 22, 551-575.
    • (1977) IEEE Trans. Automat. Control , vol.22 , pp. 551-575
    • Ljung, L.1
  • 24
    • 0029752592 scopus 로고    scopus 로고
    • Average reward reinforcement learning: Foundations, algorithms, and empirical results
    • 2
    • Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1/2/3), 159-196.
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 159-196
    • Mahadevan, S.1
  • 25
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less real time
    • Moore, A. W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 28
    • 0039753967 scopus 로고
    • Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm
    • Ribeiro, C. (1995). Attentional mechanisms as a strategy for generalisation in the Q-learning algorithm. In Proceedings of ICANN'95 (Vol. 1, pp. 455-460).
    • (1995) Proceedings of ICANN'95 , vol.1 , pp. 455-460
    • Ribeiro, C.1
  • 31
    • 0002686402 scopus 로고
    • A convergence theorem for non-negative almost supermartingales and some applications
    • J. Rustagi (Ed.), New York: Academic Press
    • Robbins, H., & Siegmund, D. (1971). A convergence theorem for non-negative almost supermartingales and some applications. In J. Rustagi (Ed.), Optimizing methods in statistics (pp. 235-257). New York: Academic Press.
    • (1971) Optimizing Methods in Statistics , pp. 235-257
    • Robbins, H.1    Siegmund, D.2
  • 32
    • 0003636089 scopus 로고
    • (Tech. Rep. No. CUED/F-INFENG/TR 166). Cambridge: Cambridge University, Engineering Department
    • Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems (Tech. Rep. No. CUED/F-INFENG/TR 166). Cambridge: Cambridge University, Engineering Department.
    • (1994) On-line Q-learning Using Connectionist Systems
    • Rummery, G.A.1    Niranjan, M.2
  • 33
    • 85152626183 scopus 로고
    • A reinforcement learning method for maximizing undiscounted rewards
    • San Mateo, CA: Morgan Kaufmann
    • Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning (pp. 298-305). San Mateo, CA: Morgan Kaufmann.
    • (1993) Proceedings of the Tenth International Conference on Machine Learning , pp. 298-305
    • Schwartz, A.1
  • 34
    • 0021594295 scopus 로고
    • Aggregation methods for large Markov chains
    • G. Iazola, P. J. Coutois, & A. Hordijk (Eds.), Amsterdam: Elsevier
    • Schweitzer, P. J. (1984). Aggregation methods for large Markov chains. In G. Iazola, P. J. Coutois, & A. Hordijk (Eds.), Mathematical computer performance and reliability (pp. 275-302). Amsterdam: Elsevier.
    • (1984) Mathematical Computer Performance and Reliability , pp. 275-302
    • Schweitzer, P.J.1
  • 35
    • 85153965130 scopus 로고
    • Reinforcement learning with soft state aggregation
    • G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
    • Singh, S., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 361-368). Cambridge, MA: MIT Press.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
    • Singh, S.1    Jaakkola, T.2    Jordan, M.3
  • 37
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • 2
    • Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22(1/2/3), 123-158.
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 39
    • 84898998140 scopus 로고    scopus 로고
    • The asymptotic convergence rate of Q-learning
    • M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
    • Szepesvári, M. (1998a). The asymptotic convergence rate of Q-learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10. Cambridge, MA: MIT Press.
    • (1998) Advances in Neural Information Processing Systems , vol.10
    • Szepesvári, M.1
  • 40
    • 0008876345 scopus 로고    scopus 로고
    • Unpublished Ph.D. dissertation, Bolyai Institute of Mathematics, "József Attila" University, Szeged, Hungary
    • Szepesvári, C. (1998b). Static and dynamic aspects of optimal sequential decision making. Unpublished Ph.D. dissertation, Bolyai Institute of Mathematics, "József Attila" University, Szeged, Hungary.
    • (1998) Static and Dynamic Aspects of Optimal Sequential Decision Making
    • Szepesvári, C.1
  • 42
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185-202.
    • (1994) Machine Learning , vol.16 , Issue.3 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 43
    • 0011628309 scopus 로고
    • Fictitious play applied to sequences of games and discounted stochastic games
    • Vrieze, O. J., & Tijs, S. H. (1982). Fictitious play applied to sequences of games and discounted stochastic games. International Journal of Game Theory, 11(2), 71-85.
    • (1982) International Journal of Game Theory , vol.11 , Issue.2 , pp. 71-85
    • Vrieze, O.J.1    Tijs, S.H.2
  • 44
    • 0004049893 scopus 로고
    • Unpublished Ph.D. dissertation, King's College, Cambridge
    • Watkins, C. J. C. H. (1989). Learning from delayed rewards. Unpublished Ph.D. dissertation, King's College, Cambridge.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.C.H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.