메뉴 건너뛰기




Volumn 6, Issue 2, 1997, Pages 163-217

Experiments with reinforcement learning in problems with continuous state and action spaces

Author keywords

Continuous domains; Function approximation; Memory based methods; Optimal control; Reinforcement learning; Resource preallocation

Indexed keywords


EID: 0031231885     PISSN: 10597123     EISSN: None     Source Type: Journal    
DOI: 10.1177/105971239700600201     Document Type: Article
Times cited : (166)

References (26)
  • 1
    • 0016556021 scopus 로고
    • A new approach to manipulator control: The cerebellar model articulation controller (cmac)
    • Albus, J. S. (1975). A new approach to manipulator control: The cerebellar model articulation controller (cmac). Journal of Dynamic Systems, Measurement, and Control, 97(3), 220-227.
    • (1975) Journal of Dynamic Systems, Measurement, and Control , vol.97 , Issue.3 , pp. 220-227
    • Albus, J.S.1
  • 2
    • 0026367243 scopus 로고
    • Memory-based learning control
    • New York: American Automatic Control Council
    • Atkeson, C. G. (1991). Memory-based learning control. In Proceedings of the 1991 American Control Conference (Vol. 3). New York: American Automatic Control Council.
    • (1991) Proceedings of the 1991 American Control Conference , vol.3
    • Atkeson, C.G.1
  • 4
    • 0003787146 scopus 로고
    • Princeton, NJ: Princeton University Press
    • Bellman, R. (1957). Dynamic programming. Princeton, NJ: Princeton University Press.
    • (1957) Dynamic Programming
    • Bellman, R.1
  • 6
    • 0007512578 scopus 로고    scopus 로고
    • Truncating temporal differences: On the efficient implementation of TD(λ) for reinforcement learning
    • Cishosz, P. (1996). Truncating temporal differences: on the efficient implementation of TD(λ) for reinforcement learning. Journal of Artificial Intelligence Research, Athena Scientific, 287-318.
    • (1996) Journal of Artificial Intelligence Research, Athena Scientific , pp. 287-318
    • Cishosz, P.1
  • 7
    • 0010814177 scopus 로고
    • Sparse distributed memory and related models
    • M. H. Hassoun (ed.). New York: Oxford University Press
    • Kanerva, P. (1993). Sparse distributed memory and related models. In M. H. Hassoun (ed.), Associative neural memories: Theory and implementation. New York: Oxford University Press.
    • (1993) Associative Neural Memories: Theory and Implementation
    • Kanerva, P.1
  • 8
    • 0001892558 scopus 로고
    • Instance-based prediction of real-valued attributes
    • Kibler, D., & Aha, D. W. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence, 5(2), 51-57.
    • (1989) Computational Intelligence , vol.5 , Issue.2 , pp. 51-57
    • Kibler, D.1    Aha, D.W.2
  • 9
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning
    • Lin, L. J. (1992). Self-improving reactive agents based on reinforcement learning. Machine Learning, 8(3-4), 293-321.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 293-321
    • Lin, L.J.1
  • 12
    • 0029514510 scopus 로고
    • The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
    • Moore, A. W., & Atkeson, C. G. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21(3), 199-233.
    • (1995) Machine Learning , vol.21 , Issue.3 , pp. 199-233
    • Moore, A.W.1    Atkeson, C.G.2
  • 14
  • 16
    • 0031072835 scopus 로고    scopus 로고
    • Continuous case-based reasoning
    • Ram, A., & Santamaría, J. C. (1997). Continuous case-based reasoning. Artificial Intelligence, 90(1-2), 25-77.
    • (1997) Artificial Intelligence , vol.90 , Issue.1-2 , pp. 25-77
    • Ram, A.1    Santamaría, J.C.2
  • 20
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Singh, S. P., & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, 22, 123-158.
    • (1996) Machine Learning , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 22
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 23
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems, 8, 1038-1044.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.S.1
  • 24
    • 0029390263 scopus 로고
    • Reinforcement learning of multiple tasks using a hierarchical cmac architecture
    • Tham, C. L. (1995). Reinforcement learning of multiple tasks using a hierarchical cmac architecture. Robotics and Autonomous Systems, 15(4), 247-274.
    • (1995) Robotics and Autonomous Systems , vol.15 , Issue.4 , pp. 247-274
    • Tham, C.L.1
  • 25
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, H2(5), 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.H2 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 26
    • 0004049893 scopus 로고
    • Unpublished doctoral thesis, University of Cambridge, Cambridge, England
    • Watkins, C. J. C. H. (1989). Learning from delayed rewards. Unpublished doctoral thesis, University of Cambridge, Cambridge, England.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.C.H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.