SCOPUS 정보 검색 플랫폼

1
- 85057345756
- arXiv preprint
- Kavosh Asadi, Cameron Allen, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, and Michael Littman. Mean actor critic. arXiv preprint arXiv:1709.00503, 2017.
- (2017) Mean Actor Critic
- Asadi, K.¹ Allen, C.² Roderick, M.³ Mohamed, A.-R.⁴ Konidaris, G.⁵ Littman, M.⁶

2
- 70449565494
- World Scientific
- Andrew D Barbour and Louis Hsiao Yun Chen. An introduction to Stein’s method, volume 4. World Scientific, 2005.
- (2005) An Introduction to Stein’S Method , vol.4
- Barbour, A.D.¹ Chen, L.H.Y.²

3
- 85015444377
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
- (2016) Openai Gym
- Brockman, G.¹ Cheung, V.² Pettersson, L.³ Schneider, J.⁴ Schulman, J.⁵ Tang, J.⁶ Zaremba, W.⁷

4
- 85015918947
- A kernel test of goodness of fit
- Kacper Chwialkowski, Heiko Strathmann, and Arthur Gretton. A kernel test of goodness of fit. In International Conference on Machine Learning, pp. 2606–2615, 2016.
- (2016) International Conference on Machine Learning , pp. 2606-2615
- Chwialkowski, K.¹ Strathmann, H.² Gretton, A.³

5
- 85031125843
- Learning to draw samples with amortized Stein variational gradient descent
- Yihao Feng, Dilin Wang, and Qiang Liu. Learning to draw samples with amortized stein variational gradient descent. Conference on Uncertainty in Artificial Intelligence (UAI), 2017.
- (2017) Conference on Uncertainty in Artificial Intelligence (UAI)
- Feng, Y.¹ Wang, D.² Liu, Q.³

6
- 84965097708
- Measuring sample quality with Stein’s method
- Jackson Gorham and Lester Mackey. Measuring sample quality with stein’s method. In Advances in Neural Information Processing Systems, pp. 226–234, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 226-234
- Gorham, J.¹ Mackey, L.²

7
- 85083950952
- Backpropagation through the void: Optimizing control variates for black-box gradient estimation
- Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, and David Duvenaud. Backpropagation through the void: Optimizing control variates for black-box gradient estimation. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyzKd1bCW.
- (2018) International Conference on Learning Representations
- Grathwohl, W.¹ Choi, D.² Wu, Y.³ Roeder, G.⁴ Duvenaud, D.⁵

8
- 84897694817
- Variance reduction techniques for gradient estimates in reinforcement learning
- Nov
- Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471–1530, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.L.² Baxter, J.³

9
- 84979289652
- Continuous deep q-learning with model-based acceleration
- Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning, pp. 2829–2838, 2016a.
- (2016) International Conference on Machine Learning , pp. 2829-2838
- Gu, S.¹ Lillicrap, T.² Sutskever, I.³ Levine, S.⁴

10
- 85064813907
- Q-prop: Sample-efficient policy gradient with an off-policy critic
- Shixiang Gu, Timothy P. Lillicrap, Zoubin Ghahramani, Richard E. Turner, and Sergey Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. International Conference on Learning Representations (ICLR), 2016b.
- (2016) International Conference on Learning Representations (ICLR)
- Gu, S.¹ Lillicrap, T.P.² Ghahramani, Z.³ Turner, R.E.⁴ Levine, S.⁵

11
- 85047014445
- Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning
- Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, Bernhard Schölkopf, and Sergey Levine. Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. Advances in Neural Information Processing Systems, 2017.
- (2017) Advances in Neural Information Processing Systems
- Gu, S.¹ Lillicrap, T.² Ghahramani, Z.³ Turner, R.E.⁴ Schölkopf, B.⁵ Levine, S.⁶

12
- 85044446086
- arXiv preprint
- Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
- (2017) Emergence of Locomotion Behaviours in Rich Environments
- Heess, N.¹ Sriram, S.² Lemmon, J.³ Merel, J.⁴ Wayne, G.⁵ Tassa, Y.⁶ Erez, T.⁷ Wang, Z.⁸ Eslami, A.⁹ Riedmiller, M.¹⁰

13
- 84898930479
- A natural policy gradient
- Sham M Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, pp. 1531–1538, 2002.
- (2002) Advances in Neural Information Processing Systems , pp. 1531-1538
- Kakade, S.M.¹

14
- 84941620184
- ADaM: A method for stochastic optimization
- Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2014.
- (2014) Proceedings of the 3rd International Conference on Learning Representations (ICLR)
- Kingma, D.¹ Ba, J.²

15
- 85083952489
- Auto-encoding variational bayes
- Diederik P Kingma and Max Welling. Auto-encoding variational bayes. Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2013.
- (2013) Proceedings of the 2nd International Conference on Learning Representations (ICLR)
- Kingma, D.P.¹ Welling, M.²

16
- 85083953657
- Continuous control with deep reinforcement learning
- Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2015.
- (2015) Proceedings of the 2nd International Conference on Learning Representations (ICLR)
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

17
- 85083938188
- Black-box importance sampling
- Qiang Liu and Jason D Lee. Black-box importance sampling. International Conference on Artificial Intelligence and Statistics, 2017.
- (2017) International Conference on Artificial Intelligence and Statistics
- Liu, Q.¹ Lee, J.D.²

18
- 85018878907
- Stein variational gradient descent: A general purpose Bayesian inference algorithm
- Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in Neural Information Processing Systems, 2016.
- (2016) Advances in Neural Information Processing Systems
- Liu, Q.¹ Wang, D.²

19
- 85045554292
- A kernelized Stein discrepancy for goodness-of-fit tests
- Qiang Liu, Jason Lee, and Michael Jordan. A kernelized stein discrepancy for goodness-of-fit tests. In International Conference on Machine Learning, pp. 276–284, 2016.
- (2016) International Conference on Machine Learning , pp. 276-284
- Liu, Q.¹ Lee, J.² Jordan, M.³

20
- 84904867557
- arXiv preprint
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- (2013) Playing Atari with Deep Reinforcement Learning
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.⁷

21
- 84971448181
- Asynchronous methods for deep reinforcement learning
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
- (2016) International Conference on Machine Learning , pp. 1928-1937
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

22
- 85059686468
- Control functionals for quasi-monte carlo integration
- Chris J. Oates and Mark A. Girolami. Control functionals for quasi-monte carlo integration. In International Conference on Artificial Intelligence and Statistics, 2016.
- (2016) International Conference on Artificial Intelligence and Statistics
- Oates, C.J.¹ Girolami, M.A.²

23
- 84997727861
- arXiv preprint
- Chris J Oates, Jon Cockayne, François-Xavier Briol, and Mark Girolami. Convergence rates for a class of estimators based on stein’s identity. arXiv preprint arXiv:1603.03220, 2016.
- (2016) Convergence Rates for A Class of Estimators Based on Stein’S Identity
- Oates, C.J.¹ Cockayne, J.² Briol, F.-X.³ Girolami, M.⁴

24
- 84971325632
- Control functionals for monte carlo integration
- Chris J Oates, Mark Girolami, and Nicolas Chopin. Control functionals for monte carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):695–718, 2017.
- (2017) Journal of the Royal Statistical Society: Series B (Statistical Methodology) , vol.79 , Issue.3 , pp. 695-718
- Oates, C.J.¹ Girolami, M.² Chopin, N.³

25
- 84919796093
- Stochastic backpropagation and approximate inference in deep generative models
- Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the 31st International Conference on Machine Learning (ICML), 2014.
- (2014) Proceedings of the 31st International Conference on Machine Learning (ICML)
- Rezende, D.J.¹ Mohamed, S.² Wierstra, D.³

26
- 85046997203
- Sticking the landing: An asymptotically zero-variance gradient estimator for variational inference
- Geoffrey Roeder, Yuhuai Wu, and David K. Duvenaud. Sticking the landing: An asymptotically zero-variance gradient estimator for variational inference. Advances in Neural Information Processing Systems, 2017.
- (2017) Advances in Neural Information Processing Systems
- Roeder, G.¹ Wu, Y.² Duvenaud, D.K.³

27
- 84969963490
- Trust region policy optimization
- John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. In International Conference on Machine Learning, 2015.
- (2015) International Conference on Machine Learning
- Schulman, J.¹ Levine, S.² Moritz, P.³ Jordan, M.I.⁴ Abbeel, P.⁵

28
- 85083954383
- High-dimensional continuous control using generalized advantage estimation
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. International Conference of Learning Representations (ICLR), 2016.
- (2016) International Conference of Learning Representations (ICLR)
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

29
- 85064820904
- Proximal policy optimization algorithms
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. Advances in Neural Information Processing Systems, 2017.
- (2017) Advances in Neural Information Processing Systems
- Schulman, J.¹ Wolski, F.² Dhariwal, P.³ Radford, A.⁴ Klimov, O.⁵

30
- 85048938500
- Provable tensor methods for learning mixtures of generalized linear models
- Hanie Sedghi, Majid Janzamin, and Anima Anandkumar. Provable tensor methods for learning mixtures of generalized linear models. In International Conference on Artificial Intelligence and Statistics, 2016.
- (2016) International Conference on Artificial Intelligence and Statistics
- Sedghi, H.¹ Janzamin, M.² Anandkumar, A.³

31
- 84919793697
- Deterministic policy gradient algorithms
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 387–395, 2014.
- (2014) Proceedings of the 31st International Conference on Machine Learning (ICML) , pp. 387-395
- Silver, D.¹ Lever, G.² Heess, N.³ Degris, T.⁴ Wierstra, D.⁵ Riedmiller, M.⁶

32
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

33
- 85031918331
- Mastering the game of go without human knowledge
- Oct
- David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550(7676):354–359, Oct 2017. ISSN 0028-0836.
- (2017) Nature , vol.550 , Issue.7676 , pp. 354-359
- Silver, D.¹ Schrittwieser, J.² Simonyan, K.³ Antonoglou, I.⁴ Huang, A.⁵ Guez, A.⁶ Hubert, T.⁷ Baker, L.⁸ Lai, M.⁹ Bolton, A.¹⁰ Chen, Y.¹¹ Lillicrap, T.¹² Hui, F.¹³ Sifre, L.¹⁴ Van Den Driessche, G.¹⁵ Graepel, T.¹⁶ Hassabis, D.¹⁷

34
- 0003722779
- Approximate computation of expectations
- Charles Stein. Approximate computation of expectations. Lecture Notes-Monograph Series, 7: i–164, 1986.
- (1986) Lecture Notes-Monograph Series , vol.7 , pp. 164
- Stein, C.¹

35
- 0004102479
- MIT Press, Cambridge, MA, USA, 1st edition
- Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. ISBN 0262193981.
- (1998) Introduction to Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

36
- 85057234218
- arxiv, abs/1706.06643
- Philip S. Thomas and Emma Brunskill. Policy gradient methods for reinforcement learning with function approximation and action-dependent baselines. arxiv, abs/1706.06643, 2017.
- (2017) Policy Gradient Methods for Reinforcement Learning with Function Approximation and Action-Dependent Baselines
- Thomas, P.S.¹ Brunskill, E.²

37
- 84872292044
- MujoCo: A physics engine for model-based control
- IEEE
- Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
- (2012) Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

38
- 85046959617
- REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
- George Tucker, Andriy Mnih, Chris J Maddison, and Jascha Sohl-Dickstein. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. Advances in Neural Information Processing Systems, 2017.
- (2017) Advances in Neural Information Processing Systems
- Tucker, G.¹ Mnih, A.² Maddison, C.J.³ Sohl-Dickstein, J.⁴

39
- 21444437925
- The optimal reward baseline for gradient-based reinforcement learning
- Morgan Kaufmann Publishers Inc
- Lex Weaver and Nigel Tao. The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 538–545. Morgan Kaufmann Publishers Inc., 2001.
- (2001) Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pp. 538-545
- Weaver, L.¹ Tao, N.²

40
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

41
- 85083951478
- Variance reduction for policy gradient with action-dependent factorized baselines
- Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, and Pieter Abbeel. Variance reduction for policy gradient with action-dependent factorized baselines. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=H1tSsb-AW.
- (2018) International Conference on Learning Representations
- Wu, C.¹ Rajeswaran, A.² Duan, Y.³ Kumar, V.⁴ Bayen, A.M.⁵ Kakade, S.⁶ Mordatch, I.⁷ Abbeel, P.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.