메뉴 건너뛰기




Volumn 10, Issue , 2009, Pages 1131-1186

Multi-task reinforcement learning in partially observable stochastic environments

Author keywords

Dirichlet processes; Multi task learning; Partially observable Markov decision processes; Regionalized policy representation; Reinforcement learning

Indexed keywords

BATCH ALGORITHMS; BEHAVIOR POLICY; CONDITIONAL DISTRIBUTION; DATA SHARING; DIRICHLET PROCESS; DIRICHLET PROCESSES; MULTI-TASK LEARNING; NON-PARAMETRIC; OVER CURRENT; PARAMETRIC MODELS; PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES; POLICY ITERATION; REGIONALIZED POLICY REPRESENTATION; STOCHASTIC ENVIRONMENT; TARGET CLASSIFICATION;

EID: 66849131425     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (69)

References (61)
  • 2
    • 0000708831 scopus 로고
    • Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems
    • November
    • C. E. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6):1152-1174, November 1974.
    • (1974) The Annals of Statistics , vol.2 , Issue.6 , pp. 1152-1174
    • Antoniak, C.E.1
  • 4
    • 0346238931 scopus 로고    scopus 로고
    • Task clustering and gating for Bayesian multitask learning
    • B. Bakker and T. Heskes. Task clustering and gating for Bayesian multitask learning. Journal of Machine Learning Research, 4:83-99, 2003.
    • (2003) Journal of Machine Learning Research , vol.4 , pp. 83-99
    • Bakker, B.1    Heskes, T.2
  • 8
    • 0001432658 scopus 로고
    • Discounted dynamic programming
    • D. Blackwell. Discounted dynamic programming. Ann. Math. Stat., 36:226-235, 1965.
    • (1965) Ann. Math. Stat , vol.36 , pp. 226-235
    • Blackwell, D.1
  • 9
    • 0002617436 scopus 로고
    • Ferguson distributions via Polya urn schemes
    • D. Blackwell and J. MacQueen. Ferguson distributions via Polya urn schemes. Annals of Statistics, 1:353-355, 1973.
    • (1973) Annals of Statistics , vol.1 , pp. 353-355
    • Blackwell, D.1    MacQueen, J.2
  • 11
    • 0026998041 scopus 로고
    • Reinforcement learning with perceptual aliasing: The perceptual distinctions approach
    • San Jose, California: AAAI Press
    • L. Chrisman. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In Proceedings of the Tenth International Conference on Artificial Intelligence, pages 183-188. San Jose, California: AAAI Press, 1992.
    • (1992) Proceedings of the Tenth International Conference on Artificial Intelligence , pp. 183-188
    • Chrisman, L.1
  • 15
    • 0001120413 scopus 로고
    • A Bayesian analysis of some non-parametric problems
    • T. Ferguson. A Bayesian analysis of some non-parametric problems. The Annals of Statistics, 1: 209-230, 1973.
    • (1973) The Annals of Statistics , vol.1 , pp. 209-230
    • Ferguson, T.1
  • 20
    • 0041656866 scopus 로고    scopus 로고
    • An improved policy iteration algorithm for partially observable MDPs
    • E. A. Hansen. An improved policy iteration algorithm for partially observable MDPs. In Advances in Neural Information Processing Systems, volume 10, 1997.
    • (1997) Advances in Neural Information Processing Systems , vol.10
    • Hansen, E.A.1
  • 21
    • 66849123080 scopus 로고    scopus 로고
    • G. E. Hinton and T. J. Sejnowski. Learning and relearning in Boltzmann machines. In J. L. McClelland, D. E. Rumelhart, and the PDP Research Group, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, 1, pages 282-317. MIT Press, Cambridge, MA, 1986.
    • G. E. Hinton and T. J. Sejnowski. Learning and relearning in Boltzmann machines. In J. L. McClelland, D. E. Rumelhart, and the PDP Research Group, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 1, pages 282-317. MIT Press, Cambridge, MA, 1986.
  • 22
    • 0000624333 scopus 로고
    • Reinforcement learning algorithm for partially observable Markov decision problems
    • MIT Press, Cambridge, MA
    • T. Jaakkola, S. P. Singh, and M. I. Jordan. Reinforcement learning algorithm for partially observable Markov decision problems. In Advances in Neural Information Processing Systems, volume 7. MIT Press, Cambridge, MA., 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Jaakkola, T.1    Singh, S.P.2    Jordan, M.I.3
  • 23
    • 4043084564 scopus 로고    scopus 로고
    • Tutorial on variational approximation methods
    • M. Opper and D. Saad, editors, MIT Press
    • T. S. Jaakkola. Tutorial on variational approximation methods. In M. Opper and D. Saad, editors, Advanced Mean Field Methods: Theory and Practice, pages 129-160. MIT Press, 2001.
    • (2001) Advanced Mean Field Methods: Theory and Practice , pp. 129-160
    • Jaakkola, T.S.1
  • 24
    • 0000935895 scopus 로고    scopus 로고
    • An introduction to variational methods for graphical models
    • Cambridge, MA, MIT Press
    • M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An introduction to variational methods for graphical models. In Learning in Graphical Models, pages 105-161, Cambridge, MA, 1999. MIT Press.
    • (1999) Learning in Graphical Models , pp. 105-161
    • Jordan, M.I.1    Ghahramani, Z.2    Jaakkola, T.S.3    Saul, L.K.4
  • 25
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • L. Kaelbling, M. Littman, and A. Cassandra. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99-134, 1998.
    • (1998) Artificial Intelligence , vol.101 , pp. 99-134
    • Kaelbling, L.1    Littman, M.2    Cassandra, A.3
  • 30
    • 66849088211 scopus 로고    scopus 로고
    • Regionalized policy representation for reinforcement learning in POMDPs
    • X. Liao, H. Li, R. Parr, and L. Carin. Regionalized policy representation for reinforcement learning in POMDPs. In The Snowbird Learning Workshop, 2007.
    • (2007) The Snowbird Learning Workshop
    • Liao, X.1    Li, H.2    Parr, R.3    Carin, L.4
  • 32
    • 66849100434 scopus 로고    scopus 로고
    • Q. Liu, X, Liao, and L, Carin. Semi-supervised multitask learning. In J.C. Platt, D, Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 937-944. MIT Press, Cambridge, MA, 2008.
    • Q. Liu, X, Liao, and L, Carin. Semi-supervised multitask learning. In J.C. Platt, D, Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 937-944. MIT Press, Cambridge, MA, 2008.
  • 33
    • 0000494894 scopus 로고
    • Computationally feasible bounds for partially observed Markov decision processes
    • W. S. Lovejoy. Computationally feasible bounds for partially observed Markov decision processes. Operations Research, 39(1):162-175, 1991.
    • (1991) Operations Research , vol.39 , Issue.1 , pp. 162-175
    • Lovejoy, W.S.1
  • 36
    • 0031356598 scopus 로고    scopus 로고
    • Matched pursuits with a wave-based dictionary
    • Dec
    • M. McClure and L. Carin. Matched pursuits with a wave-based dictionary. IEEE Trans. Signal Proc., 45:2912-2927, Dec. 1997.
    • (1997) IEEE Trans. Signal Proc , vol.45 , pp. 2912-2927
    • McClure, M.1    Carin, L.2
  • 38
    • 0007300808 scopus 로고    scopus 로고
    • Markov chain sampling methods for Dirichlet process mixture models
    • Technical Report 9815, Dept. of Statistics, University of Toronto
    • R.M. Neal. Markov chain sampling methods for Dirichlet process mixture models. Technical Report 9815, Dept. of Statistics, University of Toronto, 1998.
    • (1998)
    • Neal, R.M.1
  • 39
    • 84880772945 scopus 로고    scopus 로고
    • Point-based value iteration: An anytime algorithm for POMDPs
    • August
    • J. Pineau, G. Gordon, and S. Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In Proceedings of IJCAI, pages 1025-1032, August 2003.
    • (2003) Proceedings of IJCAI , pp. 1025-1032
    • Pineau, J.1    Gordon, G.2    Thrun, S.3
  • 41
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech recognition
    • L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-285, 1989.
    • (1989) Proceedings of the IEEE , vol.77 , Issue.2 , pp. 257-285
    • Rabiner, L.R.1
  • 45
    • 0000720609 scopus 로고
    • A constructive definition of Dirichlet priors
    • J. Sethuraman. A constructive definition of Dirichlet priors. Statistica Sinica, 4:639-650, 1994.
    • (1994) Statistica Sinica , vol.4 , pp. 639-650
    • Sethuraman, J.1
  • 46
    • 0015658957 scopus 로고
    • The optimal control of partially observable Markov processes over a finite horizon
    • R. D. Smallwood and E. J. Sondik. The optimal control of partially observable Markov processes over a finite horizon. Operational Research, 21:1071-1088, 1973.
    • (1973) Operational Research , vol.21 , pp. 1071-1088
    • Smallwood, R.D.1    Sondik, E.J.2
  • 49
    • 0017943242 scopus 로고
    • The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs
    • Mar
    • E. J, Sondik. The optimal control of partially observable Markov processes over the infinite horizon: Discounted costs. Operations Research, 26(2):282-304, Mar. 1978.
    • (1978) Operations Research , vol.26 , Issue.2 , pp. 282-304
    • Sondik, E.J.1
  • 55
    • 0040830539 scopus 로고
    • Hyperparameter estimation in Dirichlet process mixture models
    • Technical Report 92-A03, ISDS Discussion Paper, Duke University
    • M. West. Hyperparameter estimation in Dirichlet process mixture models. Technical Report 92-A03, ISDS Discussion Paper, Duke University, 1992.
    • (1992)
    • West, M.1
  • 56
    • 0002612391 scopus 로고
    • Hierarchical priors and mixture models, with application in regression and density estimation
    • A.F.M. Smith and P. Freeman, editors, New York: Wiley
    • M. West, P. Muller, and M.D. Escobar. Hierarchical priors and mixture models, with application in regression and density estimation. In A.F.M. Smith and P. Freeman, editors, Aspects of Uncertainty: A Tribute to D. V. Lindley, pages 363-386. New York: Wiley, 1994.
    • (1994) Aspects of Uncertainty: A Tribute to D. V. Lindley , pp. 363-386
    • West, M.1    Muller, P.2    Escobar, M.D.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.