메뉴 건너뛰기




Volumn , Issue , 2009, Pages 1081-1088

Model-free reinforcement learning as mixture learning

Author keywords

[No Author keywords available]

Indexed keywords

CAST MODELS; DESIGN AND ANALYSIS OF ALGORITHMS; EM ALGORITHMS; FINITE HORIZONS; MODEL FREE; NEW TOOLS; POLICY ITERATION ALGORITHMS; PROBABILISTIC MIXTURE MODELS; STOCHASTIC APPROXIMATIONS;

EID: 71249143630     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (29)

References (25)
  • 1
    • 84864030941 scopus 로고    scopus 로고
    • An application of reinforcement learning to aerobatic helicopter flight
    • B. Schölkopf, J. Platt and T. Hoffman Eds, Cambridge, MA: MIT Press
    • Abbeel, P., Coates, A., Quigley, M., & Y., N. A. (2007). An application of reinforcement learning to aerobatic helicopter flight. In B. Schölkopf, J. Platt and T. Hoffman (Eds.), Advances in neural information processing systems 19, 1-8. Cambridge, MA: MIT Press.
    • (2007) Advances in neural information processing systems 19 , pp. 1-8
    • Abbeel, P.1    Coates, A.2    Quigley, M.3    Ng, A.Y.4
  • 3
    • 0002241694 scopus 로고
    • The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem
    • Celeux, G., & Diebolt, J. (1985). The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comp. Statis. Quaterly, 2, 73-82.
    • (1985) Comp. Statis. Quaterly , vol.2 , pp. 73-82
    • Celeux, G.1    Diebolt, J.2
  • 4
    • 0008586604 scopus 로고
    • A method for using belief networks as influence diagrams
    • Minneapolis, Minnesota, USA
    • Cooper, G. F. (1988). A method for using belief networks as influence diagrams. Proc. 4th Workshop on Uncertainty in Artificial Intelligence (pp. 55-63). Minneapolis, Minnesota, USA.
    • (1988) Proc. 4th Workshop on Uncertainty in Artificial Intelligence , pp. 55-63
    • Cooper, G.F.1
  • 5
    • 0346982426 scopus 로고    scopus 로고
    • Using Expectation-Maximization for reinforcement learning
    • Dayan, P., & Hinton, G. E. (1997). Using Expectation-Maximization for reinforcement learning. Neural Computation, 9, 271-278.
    • (1997) Neural Computation , vol.9 , pp. 271-278
    • Dayan, P.1    Hinton, G.E.2
  • 7
    • 0033243858 scopus 로고    scopus 로고
    • Convergence of a stochastic approximation version of the EM algorithm
    • Delyon, B., Lavielle, M., & Moulines, E. (1999). Convergence of a stochastic approximation version of the EM algorithm. The Annals of Statistics, 27, 94-128.
    • (1999) The Annals of Statistics , vol.27 , pp. 94-128
    • Delyon, B.1    Lavielle, M.2    Moulines, E.3
  • 8
    • 57649089060 scopus 로고    scopus 로고
    • λ, Technical Report, CMU Learning Lab internal report
    • Gordon, G. (1996). Chattering in Sarsa(λ) (Technical Report). CMU Learning Lab internal report.
    • (1996) Chattering in Sarsa
    • Gordon, G.1
  • 10
    • 70350090880 scopus 로고    scopus 로고
    • Bayesian policy learning with trans-dimensional MCMC
    • J. Platt, D. Koller, Y. Singer and S. Roweis Eds, Cambridge, MA: MIT Press
    • Hoffman, M., Doucet, A., De Freitas, N., & Jasra, A. (2008). Bayesian policy learning with trans-dimensional MCMC. In J. Platt, D. Koller, Y. Singer and S. Roweis (Eds.), Advances in neural information processing systems 20, 665-672. Cambridge, MA: MIT Press.
    • (2008) Advances in neural information processing systems 20 , pp. 665-672
    • Hoffman, M.1    Doucet, A.2    De Freitas, N.3    Jasra, A.4
  • 12
    • 78049390740 scopus 로고    scopus 로고
    • Policy search for motor primitives in robotics
    • D. Koller, D. Schuurmans, Y. Bengio and L. Bottou Eds
    • Kober, J., & Peters, J. (2009). Policy search for motor primitives in robotics. In D. Koller, D. Schuurmans, Y. Bengio and L. Bottou (Eds.), Advances in neural information processing systems 21, 849-856.
    • (2009) Advances in neural information processing systems 21 , pp. 849-856
    • Kober, J.1    Peters, J.2
  • 14
    • 0012327484 scopus 로고    scopus 로고
    • Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes
    • Madison, Wisconsin, USA
    • Loch, J., & Singh, S. P. (1998). Using eligibility traces to find the best memoryless policy in partially observable Markov decision processes. Proc. 15th Int. Conf on Machine Learning (pp. 323-331). Madison, Wisconsin, USA.
    • (1998) Proc. 15th Int. Conf on Machine Learning , pp. 323-331
    • Loch, J.1    Singh, S.P.2
  • 15
    • 56449091120 scopus 로고    scopus 로고
    • An analysis of reinforcement learning with function approximation
    • Helsinki, Finland
    • Melo, F. S., Meyn, S. P., & Ribeiro, M. I. (2008). An analysis of reinforcement learning with function approximation. Proc. 25th Int. Conf. on Machine Learning (pp. 664-671). Helsinki, Finland.
    • (2008) Proc. 25th Int. Conf. on Machine Learning , pp. 664-671
    • Melo, F.S.1    Meyn, S.P.2    Ribeiro, M.I.3
  • 16
    • 0002788893 scopus 로고    scopus 로고
    • A view of the EM algorithm that justifies incremental, sparse, and other variants
    • M. I. Jordan Ed, Kluwer Academic Publishers
    • Neal, R. M., & Hinton, G. E. (1998). A view of the EM algorithm that justifies incremental, sparse, and other variants. In M. I. Jordan (Ed.), Learning in graphical models, 355-368. Kluwer Academic Publishers.
    • (1998) Learning in graphical models , pp. 355-368
    • Neal, R.M.1    Hinton, G.E.2
  • 17
    • 56449099734 scopus 로고    scopus 로고
    • On the existence of fixed points for Q-learning and Sarsa in partially observable domains
    • Perkins, T. J., & Pendrith, M. D. (2002). On the existence of fixed points for Q-learning and Sarsa in partially observable domains. Proc. 19th Int. Conf. on Machine Learning (pp. 490-497).
    • (2002) Proc. 19th Int. Conf. on Machine Learning , pp. 490-497
    • Perkins, T.J.1    Pendrith, M.D.2
  • 18
    • 84898960655 scopus 로고    scopus 로고
    • A convergent form of approximate policy iteration
    • S. T. S. Becker and K. Obermayer Eds, Cambridge, MA: MIT Press
    • Perkins, T. J., & Precup, D. (2003). A convergent form of approximate policy iteration. In S. T. S. Becker and K. Obermayer (Eds.), Advances in neural information processing systems 15, 1595-1602. Cambridge, MA: MIT Press.
    • (2003) Advances in neural information processing systems 15 , pp. 1595-1602
    • Perkins, T.J.1    Precup, D.2
  • 23
    • 33749234798 scopus 로고    scopus 로고
    • Probabilistic inference for solving discrete and continuous state Markov decision processes
    • Pittsburgh, Pennsylvania, USA
    • Toussaint, M., & Storkey, A. (2006). Probabilistic inference for solving discrete and continuous state Markov decision processes. Proc. 23rd Int. Conf. on Machine Learning (pp. 945-952). Pittsburgh, Pennsylvania, USA.
    • (2006) Proc. 23rd Int. Conf. on Machine Learning , pp. 945-952
    • Toussaint, M.1    Storkey, A.2
  • 24
    • 0042466434 scopus 로고    scopus 로고
    • On the convergence of optimistic policy iteration
    • Tsitsiklis, J. N. (2002). On the convergence of optimistic policy iteration. Journal of Machine Learning Research, 3, 59-72.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 59-72
    • Tsitsiklis, J.N.1
  • 25
    • 84950432017 scopus 로고
    • A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm
    • Wei, G., & Tanner, M. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithm. J. Amer. Statist. As-socation, 85, 699-704.
    • (1990) J. Amer. Statist. As-socation , vol.85 , pp. 699-704
    • Wei, G.1    Tanner, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.