메뉴 건너뛰기




Volumn 12, Issue 1, 2000, Pages 219-245

Reinforcement learning in continuous time and space

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ARTICLE; ARTIFICIAL INTELLIGENCE; LEARNING; PSYCHOLOGICAL MODEL; REINFORCEMENT; REWARD; STATISTICAL MODEL; TIME;

EID: 0033629916     PISSN: 08997667     EISSN: None     Source Type: Journal    
DOI: 10.1162/089976600300015961     Document Type: Article
Times cited : (854)

References (38)
  • 2
    • 0039816976 scopus 로고
    • Using local trajectory optimizers to speed up global optimization in dynamic programming
    • J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
    • Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing system, 6 (pp. 663-670). San Mateo, CA: Morgan Kaufmann.
    • (1994) Advances in Neural Information Processing System , vol.6 , pp. 663-670
    • Atkeson, C.G.1
  • 3
    • 0004370245 scopus 로고
    • Advantage updating
    • Wright Laboratory, Wright-Patterson Air Force Base, OH
    • Baird, L. C. (1993). Advantage updating (Tech. Rep. No. WL-TR-93-1146). Wright Laboratory, Wright-Patterson Air Force Base, OH.
    • (1993) Tech. Rep. No. WL-TR-93-1146
    • Baird, L.C.1
  • 4
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • A. Prieditis & S. Russel (Eds.), San Mateo, CA: Morgan Kaufmann
    • Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis & S. Russel (Eds.), Machine learning: Proceedings of the Twelfth International Conference. San Mateo, CA: Morgan Kaufmann.
    • (1995) Machine Learning: Proceedings of the Twelfth International Conference
    • Baird, L.C.1
  • 7
    • 0000859970 scopus 로고
    • Reinforcement learning applied to linear quadratic regulation
    • C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), San Mateo, CA: Morgan Kaufmann
    • Bradtke, S. J. (1993). Reinforcement learning applied to linear quadratic regulation. In C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), Advances in neural information processing systems, 5 (pp. 295-302). San Mateo, CA: Morgan Kaufmann.
    • (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 295-302
    • Bradtke, S.J.1
  • 8
    • 85150714688 scopus 로고
    • Reinforcement learning methods for continuous-time Markov decision problems
    • G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
    • Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 393-400). Cambridge, MA: MIT Press.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 393-400
    • Bradtke, S.J.1    Duff, M.O.2
  • 9
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
    • Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1017-1023). Cambridge, MA: MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
    • Crites, R.H.1    Barto, A.G.2
  • 10
    • 0040409498 scopus 로고    scopus 로고
    • Improving policies without measuring merits
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
    • Dayan, P., & Singh, S. P. (1996). Improving policies without measuring merits. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1059-1065). Cambridge, MA: MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1059-1065
    • Dayan, P.1    Singh, S.P.2
  • 11
    • 85156231814 scopus 로고    scopus 로고
    • Temporal difference learning in continuous time and space
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
    • Doya, K. (1996). Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1073-1079). Cambridge, MA: MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1073-1079
    • Doya, K.1
  • 12
    • 0000406101 scopus 로고    scopus 로고
    • Efficient nonlinear control with actor-tutor architecture
    • M. C. Mozer, M. I. Jordan, & T. P. Petsche (Eds.), Cambridge, MA: MIT Press
    • Doya, K. (1997). Efficient nonlinear control with actor-tutor architecture. In M. C. Mozer, M. I. Jordan, & T. P. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 1012-1018). Cambridge, MA: MIT Press.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 1012-1018
    • Doya, K.1
  • 14
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • A. Prieditis & S. Russel (Eds.), San Mateo, CA: Morgan Kaufmann
    • Gordon, G. J. (1995). Stable function approximation in dynamic programming. In A. Prieditis & S. Russel (Eds.), Machine learning: Proceedings of the Twelfth International Conference. San Mateo, CA: Morgan Kaufmann.
    • (1995) Machine Learning: Proceedings of the Twelfth International Conference
    • Gordon, G.J.1
  • 15
    • 85156203891 scopus 로고    scopus 로고
    • Stable fitted reinforcement learning
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
    • Gordon, G. J. (1996). Stable fitted reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1052-1058). Cambridge, MA: MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1052-1058
    • Gordon, G.J.1
  • 16
    • 0025600638 scopus 로고
    • A stochastic reinforcement learning algorithm for learning real-valued functions
    • Gullapalli, V. (1990). A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks, 3, 671-192.
    • (1990) Neural Networks , vol.3 , pp. 671-1192
    • Gullapalli, V.1
  • 17
    • 79957749002 scopus 로고    scopus 로고
    • Reinforcement learning applied to a differential game
    • Harmon, M. E., Baird, III, L. C., & Klopf, A. H. (1996). Reinforcement learning applied to a differential game. Adaptive Behavior, 4, 3-28.
    • (1996) Adaptive Behavior , vol.4 , pp. 3-28
    • Harmon, M.E.1    Baird L.C. III2    Klopf, A.H.3
  • 18
    • 0004469897 scopus 로고
    • Neurons with graded response have collective computational properties like those of two-state neurons
    • Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of National Academy of Science, 81, 3088-3092.
    • (1984) Proceedings of National Academy of Science , vol.81 , pp. 3088-3092
    • Hopfield, J.J.1
  • 21
    • 0006488247 scopus 로고
    • The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
    • J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
    • Moore, A. W. (1994). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. InJ. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6 (pp. 711-718). San Mateo, CA: Morgan Kaufmann.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 711-718
    • Moore, A.W.1
  • 23
    • 0039225090 scopus 로고    scopus 로고
    • A convergent reinforcement learning algorithm in the continuous case based on a finite difference method
    • Munos, R. (1997). A convergent reinforcement learning algorithm in the continuous case based on a finite difference method. In Proceedings of International Joint Conference on Artificial Intelligence (pp. 826-831).
    • (1997) Proceedings of International Joint Conference on Artificial Intelligence , pp. 826-831
    • Munos, R.1
  • 24
    • 0039225086 scopus 로고    scopus 로고
    • Reinforcement learning for continuous stochastic control problems
    • M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
    • Munos, R., & Bourgine, P. (1998). Reinforcement learning for continuous stochastic control problems. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10 (pp. 1029-1035). Cambridge, MA: MIT Press.
    • (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1029-1035
    • Munos, R.1    Bourgine, P.2
  • 25
    • 0039225087 scopus 로고    scopus 로고
    • Adaptive choice of grid and time in reinforcement learning
    • M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
    • Pareigis, S. (1998). Adaptive choice of grid and time in reinforcement learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10 (pp. 1036-1042). Cambridge, MA: MIT Press.
    • (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1036-1042
    • Pareigis, S.1
  • 26
    • 0039225088 scopus 로고
    • On-line estimation of the optimal value function: HJB-estimators
    • C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), San Mateo, CA: Morgan Kaufmann
    • Peterson, J. K. (1993). On-line estimation of the optimal value function: HJB-estimators. In C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), Advances in neural information processing systems, 5 (pp. 319-326). San Mateo, CA: Morgan Kaufmann.
    • (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 319-326
    • Peterson, J.K.1
  • 27
    • 84898995067 scopus 로고    scopus 로고
    • Learning from demonstration
    • M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
    • Schaal, S. (1997). Learning from demonstration. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 1040-1046). Cambridge, MA: MIT Press.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 1040-1046
    • Schaal, S.1
  • 28
    • 84898972974 scopus 로고    scopus 로고
    • Reinforcement learning for dynamic channel allocation in cellular telephone systems
    • M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
    • Singh, S., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 974-980). Cambridge, MA: MIT Press.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974-980
    • Singh, S.1    Bertsekas, D.2
  • 29
    • 85153965130 scopus 로고
    • Reinforcement learning with soft state aggregation
    • G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
    • Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 361-368). Cambridge, MA: MIT Press.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 30
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal difference
    • Sutton, R. S. (1988). Learning to predict by the methods of temporal difference. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 32
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
    • Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1038-1044). Cambridge, MA: MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.S.1
  • 34
    • 0000985504 scopus 로고
    • TD-Gammon, a self teaching backgammon program, achieves master-level play
    • Tesauro, G. (1994). TD-Gammon, a self teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 35
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 36
    • 0004049893 scopus 로고
    • Unpublished doctoral dissertation, Cambridge University
    • Watkins, C. J. C. H. (1989). Learning from delayed rewards. Unpublished doctoral dissertation, Cambridge University.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.C.H.1
  • 37
    • 0002011091 scopus 로고
    • A menu of designs for reinforcement learning over time
    • W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Cambridge, MA: MIT Press
    • Werbos, P. J. (1990). A menu of designs for reinforcement learning over time. In W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Neural networks for control (pp. 67-95). Cambridge, MA: MIT Press.
    • (1990) Neural Networks for Control , pp. 67-95
    • Werbos, P.J.1
  • 38
    • 0001648572 scopus 로고    scopus 로고
    • High-performance job-shop scheduling with a time-delay TD(λ) network
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
    • Zhang, W., & Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD(λ) network. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems, 8. Cambridge, MA: MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Zhang, W.1    Dietterich, T.G.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.