메뉴 건너뛰기




Volumn , Issue , 2017, Pages

Epopt: Learning robust neural network policies using model ensembles

Author keywords

[No Author keywords available]

Indexed keywords

BAYESIAN NETWORKS; DEEP NEURAL NETWORKS; PROBABILITY DISTRIBUTIONS;

EID: 85064811489     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (181)

References (39)
  • 1
    • 33749242451 scopus 로고    scopus 로고
    • Using inaccurate models in reinforcement learning
    • Pieter Abbeel, Morgan Quigley, and Andrew Y. Ng. Using inaccurate models in reinforcement learning. In ICML, 2006.
    • (2006) ICML
    • Abbeel, P.1    Quigley, M.2    Ng, A.Y.3
  • 6
    • 77249117255 scopus 로고    scopus 로고
    • Percentile optimization for markov decision processes with parameter uncertainty
    • Erick Delage and Shie Mannor. Percentile optimization for markov decision processes with parameter uncertainty. Operations Research, 58(1):203-213, 2010.
    • (2010) Operations Research , vol.58 , Issue.1 , pp. 203-213
    • Delage, E.1    Mannor, S.2
  • 7
    • 84999018287 scopus 로고    scopus 로고
    • Benchmarking deep reinforcement learning for continuous control
    • Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In ICML, 2016.
    • (2016) ICML
    • Duan, Y.1    Chen, X.2    Houthooft, R.3    Schulman, J.4    Abbeel, P.5
  • 8
    • 1942421168 scopus 로고    scopus 로고
    • Design for an optimal probe
    • Michael O. Duff. Design for an optimal probe. In ICML, 2003.
    • (2003) ICML
    • Duff, M.O.1
  • 12
    • 33646243319 scopus 로고    scopus 로고
    • A natural policy gradient
    • Sham Kakade. A natural policy gradient. In NIPS, 2001.
    • (2001) NIPS
    • Kakade, S.1
  • 14
    • 1942514728 scopus 로고    scopus 로고
    • Approximately optimal approximate reinforcement learning
    • Sham Kakade and John Langford. Approximately optimal approximate reinforcement learning. In ICML, 2002.
    • (2002) ICML
    • Kakade, S.1    Langford, J.2
  • 15
    • 84897529781 scopus 로고    scopus 로고
    • Guided policy search
    • Sergey Levine and Vladlen Koltun. Guided policy search. In ICML, 2013.
    • (2013) ICML
    • Levine, S.1    Koltun, V.2
  • 17
    • 84899014168 scopus 로고    scopus 로고
    • Reinforcement learning in robust markov decision processes
    • Shiau Hong Lim, Huan Xu, and Shie Mannor. Reinforcement learning in robust markov decision processes. In NIPS. 2013.
    • (2013) NIPS
    • Lim, S.H.1    Xu, H.2    Mannor, S.3
  • 18
    • 0003473124 scopus 로고    scopus 로고
    • Birkhäuser Boston, Boston, MA
    • Lennart Ljung. System Identification, pp. 163-173. Birkhäuser Boston, Boston, MA, 1998.
    • (1998) System Identification , pp. 163-173
    • Ljung, L.1
  • 19
    • 84924051598 scopus 로고    scopus 로고
    • Human-level control through deep reinforcement learning
    • Feb
    • Volodymyr Mnih et al. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, Feb 2015.
    • (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
    • Mnih, V.1
  • 20
    • 84958149573 scopus 로고    scopus 로고
    • Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids
    • I. Mordatch, K. Lowrey, and E. Todorov. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids. In IROS, 2015a.
    • (2015) IROS
    • Mordatch, I.1    Lowrey, K.2    Todorov, E.3
  • 21
    • 84965182099 scopus 로고    scopus 로고
    • Interactive control of diverse complex characters with neural networks
    • Igor Mordatch, Kendall Lowrey, Galen Andrew, Zoran Popovic, and Emanuel V. Todorov. Interactive control of diverse complex characters with neural networks. In NIPS. 2015b.
    • (2015) NIPS
    • Mordatch, I.1    Lowrey, K.2    Andrew, G.3    Popovic, Z.4    Todorov, E.V.5
  • 22
    • 14344250395 scopus 로고    scopus 로고
    • Robust control of markov decision processes with uncertain transition matrices
    • Arnab Nilim and Laurent El Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Operations Research, 53(5):780-798, 2005.
    • (2005) Operations Research , vol.53 , Issue.5 , pp. 780-798
    • Nilim, A.1    Ghaoui, L.E.2
  • 25
    • 33749251297 scopus 로고    scopus 로고
    • An analytic solution to discrete Bayesian reinforcement learning
    • Pascal Poupart, Nikos A. Vlassis, Jesse Hoey, and Kevin Regan. An analytic solution to discrete bayesian reinforcement learning. In ICML, 2006.
    • (2006) ICML
    • Poupart, P.1    Vlassis, N.A.2    Hoey, J.3    Regan, K.4
  • 26
    • 51649091499 scopus 로고    scopus 로고
    • Bayesian reinforcement learning in continuous pomdps with application to robot navigation
    • S. Ross, B. Chaib-draa, and J. Pineau. Bayesian reinforcement learning in continuous pomdps with application to robot navigation. In ICRA, 2008.
    • (2008) ICRA
    • Ross, S.1    Chaib-Draa, B.2    Pineau, J.3
  • 27
    • 84867115891 scopus 로고    scopus 로고
    • Agnostic system identification for model-based reinforcement learning
    • Stephane Ross and Drew Bagnell. Agnostic system identification for model-based reinforcement learning. In ICML, 2012.
    • (2012) ICML
    • Ross, S.1    Bagnell, D.2
  • 29
    • 84963949906 scopus 로고    scopus 로고
    • Mastering the game of go with deep neural networks and tree search
    • Jan
    • David Silver et al. Mastering the game of go with deep neural networks and tree search. Nature, 529 (7587):484-489, Jan 2016.
    • (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
    • Silver, D.1
  • 32
    • 68949157375 scopus 로고    scopus 로고
    • Transfer learning for reinforcement learning domains: A survey
    • December
    • Matthew E. Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10:1633-1685, December 2009.
    • (2009) Journal of Machine Learning Research , vol.10 , pp. 1633-1685
    • Taylor, M.E.1    Stone, P.2
  • 36
    • 84871677137 scopus 로고    scopus 로고
    • Optimizing walking controllers for uncertain inputs and environments
    • Jack M. Wang, David J. Fleet, and Aaron Hertzmann. Optimizing walking controllers for uncertain inputs and environments. ACM Trans. Graph., 2010.
    • (2010) ACM Trans. Graph.
    • Wang, J.M.1    Fleet, D.J.2    Hertzmann, A.3
  • 37
    • 71749106087 scopus 로고    scopus 로고
    • Real-time reinforcement learning by sequential actor-critics and experience replay
    • Pawel Wawrzynski. Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks, 22:1484-1497, 2009.
    • (2009) Neural Networks , vol.22 , pp. 1484-1497
    • Wawrzynski, P.1
  • 38
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.