SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 7524 LNAI, Issue PART 2, 2012, Pages 211-226

Policy iteration based on a learned transition model

(2) Ramavajjala, Vivek a Elkan, Charles a

a UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BASIS FUNCTIONS; COMMITMENT PROBLEM; HIGH-DIMENSIONAL; INVERTED PENDULUM; LEAST SQUARE; LINEAR APPROXIMATIONS; POLICY ITERATION; REINFORCEMENT LEARNING METHOD; RESOURCE MANAGEMENT; STATE SPACE; SWING-UP; TRANSITION MODEL; VALUE FUNCTIONS;

APPROXIMATION ALGORITHMS; REINFORCEMENT LEARNING;

ITERATIVE METHODS;

EID: 84866843932 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-33486-3_14 Document Type: Conference Paper

Times cited : (4)

References (19)

1
- 0001130234
- A trust region method based on interior point techniques for nonlinear programming
- Byrd, R.H., Gilbert, J.C., Nocedal, J.: A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming 89, 149-185 (1996)
- (1996) Mathematical Programming , vol.89 , pp. 149-185
- Byrd, R.H.¹ Gilbert, J.C.² Nocedal, J.³

2
- 84861697773
- Reinforcement Learning with a Bilinear Q Function
- Sanner, S., Hutter, M. (eds.) EWRL 2011. Springer, Heidelberg
- Elkan, C.: Reinforcement Learning with a Bilinear Q Function. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 78-88. Springer, Heidelberg (2012)
- (2012) LNCS , vol.7188 , pp. 78-88
- Elkan, C.¹

3
- 21844465127
- Tree-based batch mode reinforcement learning
- Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6(1), 503-556 (2005)
- (2005) Journal of Machine Learning Research , vol.6 , Issue.1 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

4
- 80053442033
- Approximate dynamic programming for storage problems
- Hannah, L., Dunson, D.B.: Approximate dynamic programming for storage problems. In: Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 337-344 (2011)
- (2011) Proceedings of the 28th International Conference on Machine Learning (ICML) , pp. 337-344
- Hannah, L.¹ Dunson, D.B.²

5
- 84866862758
- Hesami, A.: Matlab implementation of inverted pendulum, http://webdocs.cs.ualberta.ca/~sutton/pole.zip
- Matlab Implementation of Inverted Pendulum
- Hesami, A.¹

6
- 0001455103
- Comments on the origin and application of Markov decision processes
- Howard, R.A.: Comments on the origin and application of Markov decision processes. Management Science 14(7), 503-507 (1968)
- (1968) Management Science , vol.14 , Issue.7 , pp. 503-507
- Howard, R.A.¹

7
- 60349084848
- Model-based function approximation in reinforcement learning
- ACM
- Jong, N., Stone, P.: Model-based function approximation in reinforcement learning. In: Proceedings of the Sixth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 658-665. ACM (2007)
- (2007) Proceedings of the Sixth International Conference on Autonomous Agents and Multiagent Systems (AAMAS) , pp. 658-665
- Jong, N.¹ Stone, P.²

8
- 4644323293
- Least-squares policy iteration
- Lagoudakis, M.G., Parr, R., Bartlett, L.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107-1149 (2003)
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.² Bartlett, L.³

9
- 35748957806
- Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
- Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research, 2169-2231 (2007)
- (2007) Journal of Machine Learning Research , pp. 2169-2231
- Mahadevan, S.¹ Maggioni, M.²

10
- 56049095326
- Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs
- Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. Springer, Heidelberg
- Melo, F.S., Lopes, M.: Fitted Natural Actor-Critic: A New Algorithm for Continuous State-Action MDPs. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008, Part II. LNCS (LNAI), vol. 5212, pp. 66-81. Springer, Heidelberg (2008)
- (2008) LNCS (LNAI) , vol.5212 , pp. 66-81
- Melo, F.S.¹ Lopes, M.²

11
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134(1), 215-238 (2005)
- (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

12
- 56449092660
- An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning
- Parr, R., Li, L., Taylor, G., Painter-Wakefield, C., Littman, M.: An analysis of linear models, linear value-function approximation, and feature selection for reinforcement learning. In: Proceedings of the 25th International Conference on Machine Learning (ICML), pp. 752-759 (2008)
- (2008) Proceedings of the 25th International Conference on Machine Learning (ICML) , pp. 752-759
- Parr, R.¹ Li, L.² Taylor, G.³ Painter-Wakefield, C.⁴ Littman, M.⁵

13
- 77949361320
- Merging AI and or to solve high-dimensional stochastic optimization problems using approximate dynamic programming
- Powell, W.B.: Merging AI and OR to solve high-dimensional stochastic optimization problems using approximate dynamic programming. INFORMS Journal on Computing 22(1), 2-17 (2010)
- (2010) INFORMS Journal on Computing , vol.22 , Issue.1 , pp. 2-17
- Powell, W.B.¹

14
- 0001898381
- Practical reinforcement learning in continuous spaces
- Morgan Kaufmann
- Smart, W.D., Kaelbling, L.P.: Practical reinforcement learning in continuous spaces. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 903-910. Morgan Kaufmann (2000)
- (2000) Proceedings of the International Conference on Machine Learning (ICML) , pp. 903-910
- Smart, W.D.¹ Kaelbling, L.P.²

15
- 0011200414
- Reinforcement learning architectures for animats
- MIT Press
- Sutton, R.S.: Reinforcement learning architectures for animats. In: Proceedings of the International Workshop on the Simulation of Adaptive Behavior: From Animals to Animats, pp. 288-296. MIT Press (1991)
- (1991) Proceedings of the International Workshop on the Simulation of Adaptive Behavior: From Animals to Animats , pp. 288-296
- Sutton, R.S.¹

16
- 0004102479
- Cambridge University Press
- Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. Cambridge University Press (1998)
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

17
- 77949360515
- Commentary - Perspectives on stochastic optimization over time
- Tsitsiklis, J.N.: Commentary - perspectives on stochastic optimization over time. INFORMS Journal on Computing 22(1), 18-19 (2010)
- (2010) INFORMS Journal on Computing , vol.22 , Issue.1 , pp. 18-19
- Tsitsiklis, J.N.¹

18
- 84887002026
- Multilayer perceptrons with radial basis functions as value functions in reinforcement learning
- Uc Cetina, V.: Multilayer perceptrons with radial basis functions as value functions in reinforcement learning. In: Proceedings of the 16th European Symposium on Artificial Neural Networks (ESANN), pp. 161-166 (2008)
- (2008) Proceedings of the 16th European Symposium on Artificial Neural Networks (ESANN) , pp. 161-166
- Uc Cetina, V.¹

19
- 0030082891
- An approach to fuzzy control of nonlinear systems: Stability and design issues
- Wang, H.O., Tanaka, K., Griffin, M.F.: An approach to fuzzy control of nonlinear systems: stability and design issues. IEEE Transactions on Fuzzy Systems 4(1), 14-23 (1996)
- (1996) IEEE Transactions on Fuzzy Systems , vol.4 , Issue.1 , pp. 14-23
- Wang, H.O.¹ Tanaka, K.² Griffin, M.F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.