SCOPUS 정보 검색 플랫폼

Neural Computation

Volumn 12, Issue 1, 2000, Pages 219-245

Reinforcement learning in continuous time and space

(1) Doya, Kenji a

a ADVANCED TELECOMMUNICATIONS RESEARCH INSTITUTE INTERNATIONAL (Japan)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ARTICLE; ARTIFICIAL INTELLIGENCE; LEARNING; PSYCHOLOGICAL MODEL; REINFORCEMENT; REWARD; STATISTICAL MODEL; TIME;

ALGORITHMS; ARTIFICIAL INTELLIGENCE; LEARNING; MODELS, PSYCHOLOGICAL; MODELS, STATISTICAL; REINFORCEMENT (PSYCHOLOGY); REWARD; TIME FACTORS;

EID: 0033629916 PISSN: 08997667 EISSN: None Source Type: Journal
DOI: 10.1162/089976600300015961 Document Type: Article

Times cited : (854)

References (38)

1
- 0030380251
- Action-based sensor space categorization for robot learning
- Asada, M., Noda, S., & Hosoda, K. (1996). Action-based sensor space categorization for robot learning. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems (pp. 1502-1509).
- (1996) Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems , pp. 1502-1509
- Asada, M.¹ Noda, S.² Hosoda, K.³

2
- 0039816976
- Using local trajectory optimizers to speed up global optimization in dynamic programming
- J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
- Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing system, 6 (pp. 663-670). San Mateo, CA: Morgan Kaufmann.
- (1994) Advances in Neural Information Processing System , vol.6 , pp. 663-670
- Atkeson, C.G.¹

3
- 0004370245
- Advantage updating
- Wright Laboratory, Wright-Patterson Air Force Base, OH
- Baird, L. C. (1993). Advantage updating (Tech. Rep. No. WL-TR-93-1146). Wright Laboratory, Wright-Patterson Air Force Base, OH.
- (1993) Tech. Rep. No. WL-TR-93-1146
- Baird, L.C.¹

4
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- A. Prieditis & S. Russel (Eds.), San Mateo, CA: Morgan Kaufmann
- Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis & S. Russel (Eds.), Machine learning: Proceedings of the Twelfth International Conference. San Mateo, CA: Morgan Kaufmann.
- (1995) Machine Learning: Proceedings of the Twelfth International Conference
- Baird, L.C.¹

5
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834-846.
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

6
- 0003565783
- Cambridge, MA: MIT Press
- Bertsekas, D. P. (1995). Dynamic programming and optimal control. Cambridge, MA: MIT Press.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

7
- 0000859970
- Reinforcement learning applied to linear quadratic regulation
- C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), San Mateo, CA: Morgan Kaufmann
- Bradtke, S. J. (1993). Reinforcement learning applied to linear quadratic regulation. In C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), Advances in neural information processing systems, 5 (pp. 295-302). San Mateo, CA: Morgan Kaufmann.
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 295-302
- Bradtke, S.J.¹

8
- 85150714688
- Reinforcement learning methods for continuous-time Markov decision problems
- G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
- Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 393-400). Cambridge, MA: MIT Press.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 393-400
- Bradtke, S.J.¹ Duff, M.O.²

9
- 85156187730
- Improving elevator performance using reinforcement learning
- D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
- Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1017-1023). Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
- Crites, R.H.¹ Barto, A.G.²

10
- 0040409498
- Improving policies without measuring merits
- D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
- Dayan, P., & Singh, S. P. (1996). Improving policies without measuring merits. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1059-1065). Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1059-1065
- Dayan, P.¹ Singh, S.P.²

11
- 85156231814
- Temporal difference learning in continuous time and space
- D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
- Doya, K. (1996). Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1073-1079). Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1073-1079
- Doya, K.¹

12
- 0000406101
- Efficient nonlinear control with actor-tutor architecture
- M. C. Mozer, M. I. Jordan, & T. P. Petsche (Eds.), Cambridge, MA: MIT Press
- Doya, K. (1997). Efficient nonlinear control with actor-tutor architecture. In M. C. Mozer, M. I. Jordan, & T. P. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 1012-1018). Cambridge, MA: MIT Press.
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 1012-1018
- Doya, K.¹

13
- 0003423896
- New York: Springer-Verlag
- Fleming, W. H., & Soner, H. M. (1993). Controlled Markov processes and viscosity solutions. New York: Springer-Verlag.
- (1993) Controlled Markov Processes and Viscosity Solutions
- Fleming, W.H.¹ Soner, H.M.²

14
- 84880694195
- Stable function approximation in dynamic programming
- A. Prieditis & S. Russel (Eds.), San Mateo, CA: Morgan Kaufmann
- Gordon, G. J. (1995). Stable function approximation in dynamic programming. In A. Prieditis & S. Russel (Eds.), Machine learning: Proceedings of the Twelfth International Conference. San Mateo, CA: Morgan Kaufmann.
- (1995) Machine Learning: Proceedings of the Twelfth International Conference
- Gordon, G.J.¹

15
- 85156203891
- Stable fitted reinforcement learning
- D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
- Gordon, G. J. (1996). Stable fitted reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1052-1058). Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1052-1058
- Gordon, G.J.¹

16
- 0025600638
- A stochastic reinforcement learning algorithm for learning real-valued functions
- Gullapalli, V. (1990). A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks, 3, 671-192.
- (1990) Neural Networks , vol.3 , pp. 671-1192
- Gullapalli, V.¹

17
- 79957749002
- Reinforcement learning applied to a differential game
- Harmon, M. E., Baird, III, L. C., & Klopf, A. H. (1996). Reinforcement learning applied to a differential game. Adaptive Behavior, 4, 3-28.
- (1996) Adaptive Behavior , vol.4 , pp. 3-28
- Harmon, M.E.¹ Baird L.C. III² Klopf, A.H.³

18
- 0004469897
- Neurons with graded response have collective computational properties like those of two-state neurons
- Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of National Academy of Science, 81, 3088-3092.
- (1984) Proceedings of National Academy of Science , vol.81 , pp. 3088-3092
- Hopfield, J.J.¹

19
- 0029679044
- Reinforcement learning: A survey
- Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

20
- 84957895797
- Reward functions for accelerated learning
- W. W. Cohen & H. Hirsh (Eds.), San Mateo, CA: Morgan Kaufmann
- Mataric, M. J. (1994). Reward functions for accelerated learning. In W. W. Cohen & H. Hirsh (Eds.), Proceedings of the 11th International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
- (1994) Proceedings of the 11th International Conference on Machine Learning
- Mataric, M.J.¹

21
- 0006488247
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
- Moore, A. W. (1994). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. InJ. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6 (pp. 711-718). San Mateo, CA: Morgan Kaufmann.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 711-718
- Moore, A.W.¹

22
- 0032312876
- Reinforcement learning of dynamic motor sequence: Learning to stand up
- Morimoto, J., & Doya, K. (1998). Reinforcement learning of dynamic motor sequence: Learning to stand up. In Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems, 3, 1721-1726.
- (1998) Proceedings of IEEE/RSJ International Conference on Intelligent Robots and Systems , vol.3 , pp. 1721-1726
- Morimoto, J.¹ Doya, K.²

23
- 0039225090
- A convergent reinforcement learning algorithm in the continuous case based on a finite difference method
- Munos, R. (1997). A convergent reinforcement learning algorithm in the continuous case based on a finite difference method. In Proceedings of International Joint Conference on Artificial Intelligence (pp. 826-831).
- (1997) Proceedings of International Joint Conference on Artificial Intelligence , pp. 826-831
- Munos, R.¹

24
- 0039225086
- Reinforcement learning for continuous stochastic control problems
- M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
- Munos, R., & Bourgine, P. (1998). Reinforcement learning for continuous stochastic control problems. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10 (pp. 1029-1035). Cambridge, MA: MIT Press.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1029-1035
- Munos, R.¹ Bourgine, P.²

25
- 0039225087
- Adaptive choice of grid and time in reinforcement learning
- M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
- Pareigis, S. (1998). Adaptive choice of grid and time in reinforcement learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10 (pp. 1036-1042). Cambridge, MA: MIT Press.
- (1998) Advances in Neural Information Processing Systems , vol.10 , pp. 1036-1042
- Pareigis, S.¹

26
- 0039225088
- On-line estimation of the optimal value function: HJB-estimators
- C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), San Mateo, CA: Morgan Kaufmann
- Peterson, J. K. (1993). On-line estimation of the optimal value function: HJB-estimators. In C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), Advances in neural information processing systems, 5 (pp. 319-326). San Mateo, CA: Morgan Kaufmann.
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 319-326
- Peterson, J.K.¹

27
- 84898995067
- Learning from demonstration
- M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
- Schaal, S. (1997). Learning from demonstration. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 1040-1046). Cambridge, MA: MIT Press.
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 1040-1046
- Schaal, S.¹

28
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
- Singh, S., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 974-980). Cambridge, MA: MIT Press.
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974-980
- Singh, S.¹ Bertsekas, D.²

29
- 85153965130
- Reinforcement learning with soft state aggregation
- G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
- Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 361-368). Cambridge, MA: MIT Press.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

30
- 33847202724
- Learning to predict by the methods of temporal difference
- Sutton, R. S. (1988). Learning to predict by the methods of temporal difference. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

31
- 84922015064
- TD models: Modeling the world at a mixture of time scales
- San Mateo, CA: Morgan Kaufmann
- Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Proceedings of the 12th International Conference on Machine Learning (pp. 531-539). San Mateo, CA: Morgan Kaufmann.
- (1995) Proceedings of the 12th International Conference on Machine Learning , pp. 531-539
- Sutton, R.S.¹

32
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
- Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1038-1044). Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
- Sutton, R.S.¹

33
- 0004007508
- Cambridge, MA: MIT Press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning. Cambridge, MA: MIT Press.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

34
- 0000985504
- TD-Gammon, a self teaching backgammon program, achieves master-level play
- Tesauro, G. (1994). TD-Gammon, a self teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
- (1994) Neural Computation , vol.6 , pp. 215-219
- Tesauro, G.¹

35
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

36
- 0004049893
- Unpublished doctoral dissertation, Cambridge University
- Watkins, C. J. C. H. (1989). Learning from delayed rewards. Unpublished doctoral dissertation, Cambridge University.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

37
- 0002011091
- A menu of designs for reinforcement learning over time
- W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Cambridge, MA: MIT Press
- Werbos, P. J. (1990). A menu of designs for reinforcement learning over time. In W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Neural networks for control (pp. 67-95). Cambridge, MA: MIT Press.
- (1990) Neural Networks for Control , pp. 67-95
- Werbos, P.J.¹

38
- 0001648572
- High-performance job-shop scheduling with a time-delay TD(λ) network
- D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
- Zhang, W., & Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD(λ) network. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems, 8. Cambridge, MA: MIT Press.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Zhang, W.¹ Dietterich, T.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.