메뉴 건너뛰기




Volumn 8, Issue , 2013, Pages 353-360

On stochastic optimal control and reinforcement learning by approximate inference

Author keywords

[No Author keywords available]

Indexed keywords


EID: 84959314908     PISSN: None     EISSN: 2330765X     Source Type: Conference Proceeding    
DOI: 10.15607/rss.2012.viii.045     Document Type: Conference Paper
Times cited : (29)

References (26)
  • 2
    • 84856364721 scopus 로고    scopus 로고
    • Exploiting variable stiffness in explosive movement tasks
    • D. Braun, M. Howard, and S. Vijayakumar. Exploiting variable stiffness in explosive movement tasks. In R:SS, 2011.
    • (2011) R:SS
    • Braun, D.1    Howard, M.2    Vijayakumar, S.3
  • 4
    • 0346982426 scopus 로고    scopus 로고
    • Using em for reinforcement learning
    • P. Dayan and G. E. Hinton. Using EM for reinforcement learning. Neural Computation, 9:271-278, 1997.
    • (1997) Neural Computation , vol.9 , pp. 271-278
    • Dayan, P.1    Hinton, G.E.2
  • 5
    • 84862300689 scopus 로고    scopus 로고
    • Dynamic policy programming with function approximation
    • A.M. Gheshlaghi et al. Dynamic policy programming with function approximation. In AISTATS, 2011.
    • (2011) AISTATS
    • Gheshlaghi, A.M.1
  • 6
    • 77955798441 scopus 로고    scopus 로고
    • Optimal feedback control for anthropomorphic manipulators
    • D. Mitrovic et al. Optimal feedback control for anthropomorphic manipulators. In ICRA, 2010.
    • (2010) ICRA
    • Mitrovic, D.1
  • 7
    • 84862011769 scopus 로고    scopus 로고
    • Learning policy improvements with path integrals
    • E. A. Theodorou et al. Learning policy improvements with path integrals. In AISTATS, 2010.
    • (2010) AISTATS
    • Theodorou, E.A.1
  • 9
    • 29144534131 scopus 로고    scopus 로고
    • Convergence theorems for generalized alternating minimization procedures
    • A. Gunawardana and W. Byrne. Convergence theorems for generalized alternating minimization procedures. J. of Machine Learning Research, 6:2049-2073, 2005.
    • (2005) J. of Machine Learning Research , vol.6 , pp. 2049-2073
    • Gunawardana, A.1    Byrne, W.2
  • 12
    • 0008815681 scopus 로고    scopus 로고
    • Exponentiated gradient versus gradient descent for linear predictors
    • J. Kivinen and M. Warmuth. Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1-64, 1997.
    • (1997) Information and Computation , vol.132 , pp. 1-64
    • Kivinen, J.1    Warmuth, M.2
  • 13
    • 39649124969 scopus 로고    scopus 로고
    • An iterative optimal control and estimation design for nonlinear stochastic system
    • W. Li and E. Todorov. An iterative optimal control and estimation design for nonlinear stochastic system. In CDC, 2006.
    • (2006) CDC
    • Li, W.1    Todorov, E.2
  • 14
    • 84959314187 scopus 로고    scopus 로고
    • Adaptive optimal control for redundantly actuated arms
    • D. Mitrovic, S. Klanke, and S. Vijayakumar. Adaptive optimal control for redundantly actuated arms. In SAB, 2008.
    • (2008) SAB
    • Mitrovic, D.1    Klanke, S.2    Vijayakumar, S.3
  • 15
    • 84455175268 scopus 로고    scopus 로고
    • Stiffness and temporal optimization in periodic movements: An optimal control approach
    • J. Nakanishi, K. Rawlik, and S. Vijayakumar. Stiffness and temporal optimization in periodic movements: An optimal control approach. In IROS, 2011.
    • (2011) IROS
    • Nakanishi, J.1    Rawlik, K.2    Vijayakumar, S.3
  • 17
    • 85161978230 scopus 로고    scopus 로고
    • An approximate inference approach to temporal optimization in optimal control
    • K. Rawlik, M. Toussaint, and S. Vijayakumar. An approximate inference approach to temporal optimization in optimal control. In NIPS, 2010.
    • (2010) NIPS
    • Rawlik, K.1    Toussaint, M.2    Vijayakumar, S.3
  • 18
    • 84959257852 scopus 로고    scopus 로고
    • Evaluation of policy gradient methods and variants on the cart-pole benchmark
    • M. Riedmiller, J. Peters, and S. Schaal. Evaluation of policy gradient methods and variants on the cart-pole benchmark. In IEEE ADPRL, 2007.
    • (2007) IEEE ADPRL
    • Riedmiller, M.1    Peters, J.2    Schaal, S.3
  • 19
    • 84959277959 scopus 로고    scopus 로고
    • Reinforcement learning by probability matching
    • P. N. Sabes and M. I. Jordan. Reinforcement learning by probability matching. In NIPS, 1996.
    • (1996) NIPS
    • Sabes, P.N.1    Jordan, I.M.2
  • 22
    • 67650915125 scopus 로고    scopus 로고
    • Efficient computation of optimal actions
    • E. Todorov. Efficient computation of optimal actions. PNAS, 106:11478-11483, 2009.
    • (2009) PNAS , vol.106 , pp. 11478-11483
    • Todorov, E.1
  • 23
    • 0036829017 scopus 로고    scopus 로고
    • Optimal feedback control as a theory of motor coordination
    • E. Todorov and M. Jordan. Optimal feedback control as a theory of motor coordination. Nature Neuroscience, 5:1226-1235, 2002.
    • (2002) Nature Neuroscience , vol.5 , pp. 1226-1235
    • Todorov, E.1    Jordan, M.2
  • 24
    • 71149083296 scopus 로고    scopus 로고
    • Robot trajectory optimization using approximate inference
    • M. Toussaint. Robot trajectory optimization using approximate inference. In ICML, 2009.
    • (2009) ICML
    • Toussaint, M.1
  • 25
    • 33749234798 scopus 로고    scopus 로고
    • Probabilistic inference for solving discrete and continuous state markov decision processes
    • M. Toussaint and A. Storkey. Probabilistic inference for solving discrete and continuous state markov decision processes. In ICML, 2006.
    • (2006) ICML
    • Toussaint, M.1    Storkey, A.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.