SCOPUS 정보 검색 플랫폼

Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS

Volumn 1, Issue , 2009, Pages 554-561

An empirical analysis of value function-based and policy search reinforcement learning

(2) Kalyanakrishnan, Shivaram a Stone, Peter a

a University of Texas at Austin (United States)

Author keywords

Function approximation; Policy search; Reinforcement learning; Temporal difference learning

Indexed keywords

MULTI AGENT SYSTEMS; REINFORCEMENT LEARNING;

EMPIRICAL ANALYSIS; EMPIRICAL STUDIES; FUNCTION APPROXIMATION; PARTIAL OBSERVABILITY; POLICY SEARCH; SEQUENTIAL DECISION MAKING; TEMPORAL DIFFERENCE LEARNING; UNKNOWN ENVIRONMENTS;

AUTONOMOUS AGENTS;

EID: 84899831232 PISSN: 15488403 EISSN: 15582914 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (16)

References (23)

1
- 0013535965
- Infinite-horizon policy-gradient estimation
- J. Baxter and P. L. Bartlett. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research, 15:319-350, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

2
- 0003787146
- Princeton University Press. Princeton, NJ. June
- R. E. Bellman. Dynamic Programming. Princeton University Press. Princeton, NJ. June 1957.
- (1957) Dynamic Programming
- Bellman, R.E.¹

3
- 85162049326
- Incremental natural actor-critic algorithms
- J. Piatt, D. Koller, Y. Singer, and S. Roweis, editors., MIT Press, Cambridge, MA
- S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actor-critic algorithms. In J. Piatt, D. Koller, Y. Singer, and S. Roweis, editors. Advances in Neural Information Processing Systems 20, pages 105-112. MIT Press, Cambridge, MA, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 105-112
- Bhatnagar, S.¹ Sutton, R.² Ghavamzadeh, M.³ Lee, M.⁴

4
- 0028564629
- Acting optimally in partially observable stochastic domains
- Seattle, Washington, USA, AAAI Press/MIT Press
- A. R. Cassandra, L. P. Kaelbling, and M. L. Littman. Acting optimally in partially observable stochastic domains. In Proceedings of the Twelfth National Conference on Artificial Intelligence, volume 2. pages 1023-1028, Seattle, Washington, USA, 1994. AAAI Press/MIT Press.
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence , vol.2 , pp. 1023-1028
- Cassandra, A.R.¹ Kaelbling, L.P.² Littman, M.L.³

5
- 85156187730
- Improving elevator performance using reinforcement learning
- D. S. Touretzky, M. Mozer, and M. E. Hassehno, editors, NIPS, Denver, CO, November 27-30, 1995, MIT Press
- R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. Mozer, and M. E. Hassehno, editors, Advances in Neural Information Processing Systems 8, NIPS, Denver, CO, November 27-30, 1995, pages 1017-1023. MIT Press, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
- Crites, R.H.¹ Barto, A.G.²

6
- 17444409624
- A tutorial on the cross-entropy method
- P. T. De Boer, D. P. Kroese, S. Mannor, and R. Rubinstein. A tutorial on the cross-entropy method. Annals of Operations Research, 134(1):19-67, 2005.
- (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 19-67
- De Boer, P.T.¹ Kroese, D.P.² Mannor, S.³ Rubinstein, R.⁴

7
- 33646243319
- A natural policy gradient
- T. G. Dietterich, S. Becker, and 2. Ghahramani, editors, MIT Press
- S. Kakade. A natural policy gradient. In T. G. Dietterich, S. Becker, and 2. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1531-1538. MIT Press. 2001.
- (2001) Advances in Neural Information Processing Systems , vol.14 , pp. 1531-1538
- Kakade, S.¹

8
- 9444275934
- Machine learning for fast quadrupedal locomotion
- July
- N. Kohl and P. Stone. Machine learning for fast quadrupedal locomotion. In The Nineteenth National Conference on Artificial Intelligence, pages 611-616, July 2004.
- (2004) The Nineteenth National Conference on Artificial Intelligence , pp. 611-616
- Kohl, N.¹ Stone, P.²

9
- 4644323293
- Least-squares policy iteration
- M. G. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

10
- 0012327484
- Using eligibility traces to find the best memoryless policy in partially observable markov decision processes
- Morgan Kaufmann
- J. Loch and S. Singh. Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 323-331. Morgan Kaufmann, 1998.
- (1998) Proceedings of the Fifteenth International Conference on Machine Learning , pp. 323-331
- Loch, J.¹ Singh, S.²

11
- 84898980684
- Autonomous helicopter flight via reinforcement learning
- S. Thrun, L. Saul, and B. Scholkopf, editors, MIT Press, Cambridge, MA
- A. Y. Ng, H. J. Kim, M. I. Jordan, and S. Sastry. Autonomous helicopter flight via reinforcement learning. In S. Thrun, L. Saul, and B. Scholkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004.
- (2004) Advances in Neural Information Processing Systems , vol.16
- Ng, A.Y.¹ Kim, H.J.² Jordan, M.I.³ Sastry, S.⁴

12
- 84898960655
- A convergent form of approximate policy iteration
- S. T. S. Becker and K. Obermayer, editors, MIT Press, Cambridge, MA
- T. J. Perkins and D. Precup. A convergent form of approximate policy iteration. In S. T. S. Becker and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 1595-1602. MIT Press, Cambridge, MA, 2003.
- (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1595-1602
- Perkins, T.J.¹ Precup, D.²

13
- 40649106649
- Natural actor-critic
- J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71 (7-9): 1180-1190, 2008.
- (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

14
- 0003636089
- On-line q-learning using connectionist systems
- G. A. Rummery and M. Niranjan. On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department, 1994.
- (1994) Technical Report CUED/F-INFENG/TR 166, Cambridge University Engineering Department
- Rummery, G.A.¹ Niranjan, M.²

15
- 27144536042
- Efficient evolution of neural networks through complexification
- August
- K. O. Stanley. Efficient evolution of neural networks through complexification. Technical Report Al-TR-04-314, Department of Computer Sciences. University of Texas at Austin, August 2004.
- (2004) Technical Report Al-TR-04-314, Department of Computer Sciences. University of Texas at Austin
- Stanley, K.O.¹

16
- 27544506565
- Reinforcement learning for robocup-soccer keepaway
- P. Stone, R. S. Sutton, and G. Kuhlmann. Reinforcement learning for RoboCup-soccer keepaway. Adaptive Behavior, 13(3):165-188. 2005.
- (2005) Adaptive Behavior , vol.13 , Issue.3 , pp. 165-188
- Stone, P.¹ Sutton, R.S.² Kuhlmann, G.³

17
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

18
- 33845344721
- Learning tetris using the noisy cross-entropy method
- I. Szita and A. Lorincz. Learning Tetris using the noisy cross-entropy method. Neural Computation, 18:2936-2941, 2006.
- (2006) Neural Computation , vol.18 , pp. 2936-2941
- Szita, I.¹ Lorincz, A.²

19
- 33750259111
- Comparing evolutionary and temporal difference methods for reinforcement learning
- July
- M. Taylor, S. Whiteson. and P. Stone. Comparing evolutionary and temporal difference methods for reinforcement learning. In Proceedings of the Genetic and Evolutionary Computation Conference, pages 1321-28, July 2006.
- (2006) Proceedings of the Genetic and Evolutionary Computation Conference , pp. 1321-1328
- Taylor, M.¹ Whiteson, S.² Stone, P.³

20
- 27544473171
- Behavior transfer for value-function-based reinforcement learning
- F. Dignum, V. Dignum, S. Koenig. S. Kraus, M. P. Singh, and M. Wooldridge, editors, New York, NY, July, ACM Press
- M. E. Taylor and P. Stone. Behavior transfer for value-function-based reinforcement learning. In F. Dignum, V. Dignum, S. Koenig. S. Kraus, M. P. Singh, and M. Wooldridge, editors, The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems, pages 53-59, New York, NY, July 2005. ACM Press.
- (2005) The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems , pp. 53-59
- Taylor, M.E.¹ Stone, P.²

21
- 34548031419
- On the use of hybrid reinforcement learning for autonomic resource allocation
- G. Tesauro, N. K. Jong, R. Das, and M. N. Bennani. On the use of hybrid reinforcement learning for autonomic resource allocation. Cluster Computing, 10(3):287-299, 2007.
- (2007) Cluster Computing , vol.10 , Issue.3 , pp. 287-299
- Tesauro, G.¹ Jong, N.K.² Das, R.³ Bennani, M.N.⁴

22
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8(3-4):279-292; 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

23
- 33646714634
- Evolutionary function approximation for reinforcement learning
- May
- S. Whiteson and P. Stone. Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research, 7:877-917, May 2006.
- (2006) Journal of Machine Learning Research , vol.7 , pp. 877-917
- Whiteson, S.¹ Stone, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.