SCOPUS 정보 검색 플랫폼

Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence, UAI 2008

Volumn , Issue , 2008, Pages 528-536

Dyna-style planning with linear function approximation and prioritized sweeping

(4) Sutton, Richard S a Szepesvári, Csaba a Geramifard, Alborz a Bowling, Michael a

Author keywords

[No Author keywords available]

Indexed keywords

BACKING-UP; LEAST SQUARE; LIMIT POINTS; LINEAR APPROXIMATIONS; LINEAR FUNCTIONS; MODEL BASED APPROACH; MODEL FREE; NATURAL CONDITIONS; ON-LINE SETTING; OPTIMAL CONTROL POLICY; POLICY EVALUATION; PRIORITIZED SWEEPING; STATE SPACE; STATE TRANSITIONS; VALUE FUNCTIONS; WORLD MODEL;

ARTIFICIAL INTELLIGENCE; LEARNING ALGORITHMS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING;

UNCERTAINTY ANALYSIS;

EID: 80053284668 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (111)

References (26)

1
- 0039816976
- Using local trajectory optimizers to speed up global optimization in dynamic programming
- Atkeson, C. (1993). Using local trajectory optimizers to speed up global optimization in dynamic programming. Advances in Neural Information Processing Systems, 5, 663-670.
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 663-670
- Atkeson, C.¹

2
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 30-37.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
- Baird, L.C.¹

3
- 0003351108
- Neuro-dynamic programming
- Bertsekas, Dimitri P., Tsitsiklis. J. (1996). Neuro-Dynamic Programming. Athena Scientific, 1996.
- (1996) Athena Scientific , vol.1996
- Bertsekas, D.P.¹ Tsitsiklis, J.²

4
- 0034248853
- Stochastic dynamic programming with factored representations
- Boutilier, C., Dearden, R., Goldszmidt, M. (2000). Stochastic dynamic programming with factored representations. Artificial Intelligence 121: 49-107.
- (2000) Artificial Intelligence , vol.121 , pp. 49-107
- Boutilier, C.¹ Dearden, R.² Goldszmidt, M.³

5
- 84899910885
- Sigma point policy iteration
- Bowling, M., Geramifard, A., Wingate, D. (2008). Sigma point policy iteration. In Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems.
- (2008) Proceedings of the Seventh International Conference on Autonomous Agents and Multiagent Systems
- Bowling, M.¹ Geramifard, A.² Wingate, D.³

6
- 0038595396
- Least-squares temporal difference learning
- Boyan, J. A. (1999). Least-squares temporal difference learning. In Proceedings of the Sixteenth International Conference on Machine Learning, 49-56.
- (1999) Proceedings of the Sixteenth International Conference on Machine Learning , pp. 49-56
- Boyan, J.A.¹

7
- 0036832950
- Technical update: Least-squares temporal difference learning
- DOI 10.1023/A:1017936530646
- Boyan, J. A. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246. (Pubitemid 34325688)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
- Boyan, J.A.¹

8
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- Bradtke, S., Barto, A. G. (1996). Linear least-squares al gorithms for temporal difference learning. Machine Learning, 22:33-57. (Pubitemid 126724362)
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
- Bradtke, S.J.¹

9
- 34250766214
- Learning the structure of factored markov decision processes in reinforcement learning problems
- Degris, T., Sigaud, O., Wuillemin, P. (2006). Learning the structure of factored markov decision processes in reinforcement learning problems. Proceedings of the 23rd International Conference on Machine Learning.
- (2006) Proceedings of the 23rd International Conference on Machine Learning
- Degris, T.¹ Sigaud, O.² Wuillemin, P.³

10
- 0030242092
- General results on the convergence of stochastic algorithms
- PII S0018928696067748
- Delyon, B. (1996). General results on the convergence of stochastic algorithms. IEEE Transactions on Automatic Control, 41:1245-1255. (Pubitemid 126768500)
- (1996) IEEE Transactions on Automatic Control , vol.41 , Issue.9 , pp. 1245-1255
- Delyon, B.¹

11
- 33750737011
- Incremental least-squares temporal difference learning
- Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
- Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings of the National Conference on Artificial Intelligence, pp. 356-361. (Pubitemid 44705310)
- (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
- Geramifard, A.¹ Bowling, M.² Sutton, R.S.³

12
- 0037631834
- Model-based reinforcement learning with an approximate, learned model
- Yale University, New Haven, CT
- Kuvayev, L., Sutton, R. S. (1996). Model-based reinforcement learning with an approximate, learned model. Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems, pp. 101-105, Yale University, New Haven, CT.
- (1996) Proceedings of the Ninth Yale Workshop on Adaptive and Learning Systems , pp. 101-105
- Kuvayev, L.¹ Sutton, R.S.²

13
- 4644323293
- Least squares policy iteration
- Lagoudakis, M., Parr, R. (2003). Least squares policy iteration. Journal of Machine Learning Research, 4:1107-1149.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.¹ Parr, R.²

14
- 84890267871
- Fast exact planning in Markov decision processes
- McMahan H. B., Gordon G. J. (2005). Fast exact planning in Markov decision processes. Proceedings of the 15th International Conference on Automated Planning and Scheduling.
- (2005) th International Conference on Automated Planning and Scheduling
- McMahan, H.B.¹ Gordon, G.J.²

15
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- Moore, A. W., Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13:103-130.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

16
- 56449088291
- MSc thesis, Department of Computing Science, University of Alberta
- Paduraru, C. (2007). Planning with Approximate and Learned Models of Markov Decision Processes. MSc thesis, Department of Computing Science, University of Alberta.
- (2007) Planning with Approximate and Learned Models of Markov Decision Processes
- Paduraru, C.¹

17
- 84977063352
- Efficient learning and planning within the Dyna framework
- Peng, J.,Williams, R.J. (1993). Efficient learning and planning within the Dyna framework, Adaptive Behavior 1, 437-454.
- (1993) Adaptive Behavior , vol.1 , pp. 437-454
- Peng, J.¹ Williams, R.J.²

18
- 33646413135
- Natural actor-critic
- Peters, J., Vijayakumar, S. and Schaal, S. (2005). Natural actor-critic. Proceedings of the 16th European Conference on Machine Learning, pp. 280-291.
- (2005) Proceedings of the 16th European Conference on Machine Learning , pp. 280-291
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

19
- 0038145011
- Temporal difference learning applied to a high-performance game-playing program
- Schaeffer, J., Hlynka, M., Jussila, V. (2001). Temporal difference learning applied to a high-performance game-playing program. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 529-534.
- (2001) Proceedings of the International Joint Conference on Artificial Intelligence , pp. 529-534
- Schaeffer, J.¹ Hlynka, M.² Jussila, V.³

20
- 84880900542
- Reinforcement learning of local shape in the game of Go
- Silver, D., Sutton, R. S., Müller, M. (2007). Reinforcement learning of local shape in the game of Go. Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1053-1058.
- (2007) Proceedings of the 20th International Joint Conference on Artificial Intelligence , pp. 1053-1058
- Silver, D.¹ Sutton, R.S.² Müller, M.³

21
- 0026962175
- Reinforcement learning with a hierarchy of abstract models
- Singh, S. P. (1992). Reinforcement learning with a hierarchy of abstract models. Proceedings of the Tenth National Conference on Artificial Intelligence, pp. 202-207.
- (1992) Proceedings of the Tenth National Conference on Artificial Intelligence , pp. 202-207
- Singh, S.P.¹

22
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3:9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

23
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming, Proceedings of the Seventh International Conference on Machine Learning, pp. 216-224.
- (1990) Proceedings of the Seventh International Conference on Machine Learning , pp. 216-224
- Sutton, R.S.¹

24
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- MIT Press
- Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference, pp. 1038-1044. MIT Press.
- (1996) Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference , pp. 1038-1044
- Sutton, R.S.¹

25
- 0004102479
- MIT Press
- Sutton, R. S., Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

26
- 21844451909
- Prioritization methods for accelerating MDP solvers
- Wingate, D., Seppi, K. D. (2005). Prioritization methods for accelerating MDP solvers. Journal of Machine Learning Research, 6: 851-881.
- (2005) Journal of Machine Learning Research , vol.6 , pp. 851-881
- Wingate, D.¹ Seppi, K.D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.