SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 4, Issue , 2012, Pages 2717-2725

A unifying perspective of parametric policy search methods for Markov Decision Processes

a UNIVERSITY COLLEGE LONDON (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

EXPECTATION-MAXIMISATION; MARKOV DECISION PROCESSES; NATURAL GRADIENT; NEWTON'S METHODS; OPTIMISATION METHOD; OPTIMISATIONS; PARAMETER SPACES; ROBUSTNESS PROPERTIES;

MARKOV PROCESSES; NEWTON-RAPHSON METHOD; OPTIMIZATION;

ALGORITHMS;

EID: 84877731836 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (26)

References (31)

1
- 0000396062
- Natural gradient works efficiently in learning
- S. Amari. Natural Gradient Works Efficiently in Learning. Neural Computation, 10:251-276, 1998.
- (1998) Neural Computation , vol.10 , pp. 251-276
- Amari, S.¹

2
- 84862300689
- Dynamic policy programming with function approximation
- M. Azar, V. Gómez, and H. Kappen. Dynamic policy programming with function approximation. Journal of Machine Learning Research - Proceedings Track, 15:119-127, 2011.
- (2011) Journal of Machine Learning Research - Proceedings Track , vol.15 , pp. 119-127
- Azar, M.¹ Gómez, V.² Kappen, H.³

3
- 84858765598
- Covariant policy search
- J. Bagnell and J. Schneider. Covariant Policy Search. IJCAI, 18:1019-1024, 2003.
- (2003) IJCAI , vol.18 , pp. 1019-1024
- Bagnell, J.¹ Schneider, J.²

4
- 0013535965
- Infinite horizon policy gradient estimation
- J. Baxter and P. Bartlett. Infinite Horizon Policy Gradient Estimation. Journal of Artificial Intelligence Research, 15:319-350, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.²

5
- 0003565783
- Athena Scientific, second edition
- D. P. Bertsekas. Dynamic Programming and Optimal Control. Athena Scientific, second edition, 2000.
- (2000) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

6
- 79953155554
- Approximate policy iteration: A survey and some new methods
- D. P. Bertsekas. Approximate Policy Iteration: A Survey and Some New Methods. Research report, Massachusetts Institute of Technology, 2010.
- (2010) Research Report, Massachusetts Institute of Technology
- Bertsekas, D.P.¹

7
- 4243567726
- Temporal differences-based policy iteration and applications in neuro- dynamic programming
- D. P. Bertsekas and S. Ioffe. Temporal Differences-Based Policy Iteration and Applications in Neuro- Dynamic Programming. Research report, Massachusetts Institute of Technology, 1997.
- (1997) Research Report Massachusetts Institute of Technology
- Bertsekas, D.P.¹ Ioffe, S.²

8
- 70349984547
- Natural actor-critic algorithms
- S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and L. Mark. Natural Actor-Critic Algorithms. Automatica, 45:2471-2482, 2009.
- (2009) Automatica , vol.45 , pp. 2471-2482
- Bhatnagar, S.¹ Sutton, R.² Ghavamzadeh, M.³ Mark, L.⁴

9
- 0004055894
- Cambridge University Press
- S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
- (2004) Convex Optimization
- Boyd, S.¹ Vandenberghe, L.²

10
- 0346982426
- Using expectation-maximization for reinforcement learning
- P. Dayan and G. E. Hinton. Using Expectation-Maximization for Reinforcement Learning. Neural Computation, 9:271-278, 1997.
- (1997) Neural Computation , vol.9 , pp. 271-278
- Dayan, P.¹ Hinton, G.E.²

11
- 0002629270
- Maximum likelihood from incomplete data via the em algorithm
- A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977.
- (1977) Journal of the Royal Statistical Society. Series B (Methodological) , vol.39 , Issue.1 , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

12
- 84877749472
- C. Fahey. Tetris AI, Computers Play Tetris http://colinfahey.com/tetris/ tetris-en.html, 2003.
- (2003) Computers Play Tetris
- Fahey, C.¹

13
- 80053139999
- Efficient inference for Markov control problems
- T. Furmston and D. Barber. Efficient Inference for Markov Control Problems. UAI, 29:221-229, 2011.
- (2011) UAI , vol.29 , pp. 221-229
- Furmston, T.¹ Barber, D.²

14
- 84976859194
- Likelihood ratio gradient estimation for stochastic systems
- P.W. Glynn. Likelihood Ratio Gradient Estimation for Stochastic Systems. Communications of the ACM, 33:97-84, 1990.
- (1990) Communications of the ACM , vol.33 , pp. 97-84
- Glynn, P.W.¹

15
- 84897694817
- Variance reduction techniques for gradient based estimates in reinforcement learning
- E. Greensmith, P. Bartlett, and J. Baxter. Variance Reduction Techniques For Gradient Based Estimates in Reinforcement Learning. Journal of Machine Learning Research, 5:1471-1530, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.² Baxter, J.³

16
- 84898930479
- A natural policy gradient
- S. Kakade. A Natural Policy Gradient. NIPS, 14:1531-1538, 2002.
- (2002) NIPS , vol.14 , pp. 1531-1538
- Kakade, S.¹

17
- 0004178386
- Prentice Hall
- H. Khalil. Nonlinear Systems. Prentice Hall, 2001.
- (2001) Nonlinear Systems
- Khalil, H.¹

18
- 78049390740
- Policy search for motor primitives in robotics
- J. Kober and J. Peters. Policy Search for Motor Primitives in Robotics. Machine Learning, 84(1-2):171-203, 2011.
- (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 171-203
- Kober, J.¹ Peters, J.²

19
- 33750293964
- Bandit based Monte-Carlo planning
- L. Kocsis and C. Szepesvári. Bandit Based Monte-Carlo Planning. European Conference on Machine Learning (ECML), 17:282-293, 2006.
- (2006) European Conference on Machine Learning (ECML) , vol.17 , pp. 282-293
- Kocsis, L.¹ Szepesvári, C.²

20
- 4043069840
- On actor-critic algorithms
- V. R. Konda and J. N. Tsitsiklis. On Actor-Critic Algorithms. SIAM J. Control Optim., 42(4):1143-1166, 2003.
- (2003) SIAM J. Control Optim. , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

21
- 0035249254
- Simulation-based optimisation of Markov reward processes
- P. Marbach and J. Tsitsiklis. Simulation-Based Optimisation of Markov Reward Processes. IEEE Transactions on Automatic Control, 46(2):191-209, 2001.
- (2001) IEEE Transactions on Automatic Control , vol.46 , Issue.2 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, J.²

22
- 33646430192
- Learning finite-state controllers for partially observable environments
- N. Meuleau, L. Peshkin, K. Kim, and L. Kaelbling. Learning Finite-State Controllers for Partially Observable Environments. UAI, 15:427-436, 1999.
- (1999) UAI , vol.15 , pp. 427-436
- Meuleau, N.¹ Peshkin, L.² Kim, K.³ Kaelbling, L.⁴

23
- 0003612147
- Springer
- J. Nocedal and S. Wright. Numerical Optimisation. Springer, 2006.
- (2006) Numerical Optimisation
- Nocedal, J.¹ Wright, S.²

24
- 40649106649
- Natural actor-critic
- J. Peters and S. Schaal. Natural Actor-Critic. Neurocomputing, 71(7-9):1180-1190, 2008.
- (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

25
- 84877282363
- On stochastic optimal control and reinforcement learning by approximate inference
- K. Rawlik, Toussaint. M, and S. Vijayakumar. On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference. International Conference on Robotics Science and Systems, 2012.
- (2012) International Conference on Robotics Science and Systems
- Rawlik, K.¹ Toussaint, M.² Vijayakumar, S.³

26
- 84864064043
- Natural actor-critic for road traffic optimisation
- S. Richter, D. Aberdeen, and J. Yu. Natural Actor-Critic for Road Traffic Optimisation. NIPS, 19:1169-1176, 2007.
- (2007) NIPS , vol.19 , pp. 1169-1176
- Richter, S.¹ Aberdeen, D.² Yu, J.³

27
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS, 13:1057-1063, 2000.
- (2000) NIPS , vol.13 , pp. 1057-1063
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

28
- 51349153274
- Probabilistic inference for solving (PO)MDPs
- School of Informatics
- M. Toussaint, S. Harmeling, and A. Storkey. Probabilistic Inference for Solving (PO)MDPs. Research Report EDI-INF-RR-0934, University of Edinburgh, School of Informatics, 2006.
- (2006) Research Report EDI-INF-RR-0934, University of Edinburgh
- Toussaint, M.¹ Harmeling, S.² Storkey, A.³

29
- 70349327392
- Learning model-free robot control by a Monte Carlo em algorithm
- N. Vlassis, M. Toussaint, G. Kontes, and S. Piperidis. Learning Model-Free Robot Control by a Monte Carlo EM Algorithm. Autonomous Robots, 27(2):123-130, 2009.
- (2009) Autonomous Robots , vol.27 , Issue.2 , pp. 123-130
- Vlassis, N.¹ Toussaint, M.² Kontes, G.³ Piperidis, S.⁴

30
- 21444437925
- The optimal reward baseline for gradient based reinforcement learning
- L. Weaver and N. Tao. The Optimal Reward Baseline for Gradient Based Reinforcement Learning. UAI, 17(29):538-545, 2001.
- (2001) UAI , vol.17 , Issue.29 , pp. 538-545
- Weaver, L.¹ Tao, N.²

31
- 0000337576
- Simple statistical gradient following algorithms for connectionist reinforcement learning
- R. Williams. Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8:229-256, 1992.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.