메뉴 건너뛰기




Volumn 40, Issue , 2011, Pages 95-142

A Monte-Carlo AIXI approximation

Author keywords

[No Author keywords available]

Indexed keywords

CONTEXT TREE WEIGHTING; MONTE CARLO; OPTIMALITY; PRACTICAL ALGORITHMS; REINFORCEMENT LEARNING AGENT; TREE SEARCH ALGORITHM;

EID: 79956344726     PISSN: None     EISSN: 10769757     Source Type: Journal    
DOI: 10.1613/jair.3125     Document Type: Article
Times cited : (138)

References (72)
  • 1
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397-422.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
    • Auer, P.1
  • 3
    • 0344672463 scopus 로고    scopus 로고
    • Rollout algorithms for stochastic scheduling problems
    • Bertsekas, D. P., & Castanon, D. A. (1999). Rollout algorithms for stochastic scheduling problems. Journal of Heuristics, 5(1), 89-108.
    • (1999) Journal of Heuristics , vol.5 , Issue.1 , pp. 89-108
    • Bertsekas, D.P.1    Castanon, D.A.2
  • 4
    • 0032069371 scopus 로고    scopus 로고
    • Top-down induction of first-order logical decision trees
    • PII S0004370298000344
    • Blockeel, H., & De Raedt, L. (1998). Top-down induction of first-order logical decision trees. Artificial Intelligence, 101(1-2), 285-297. (Pubitemid 128387397)
    • (1998) Artificial Intelligence , vol.101 , Issue.1-2 , pp. 285-297
    • Blockeel, H.1    De Raedt, L.2
  • 6
    • 0041965975 scopus 로고    scopus 로고
    • R-max-a general polynomial time algorithm for nearoptimal reinforcement learning
    • Brafman, R. I., & Tennenholtz, M. (2003). R-max - a general polynomial time algorithm for nearoptimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
    • (2003) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 7
    • 0028564629 scopus 로고
    • Acting optimally in partially observable stochastic domains
    • Cassandra, A. R., Kaelbling, L. P., & Littman, M. L. (1994). Acting optimally in partially observable stochastic domains. In AAAI, pp. 1023-1028.
    • (1994) AAAI , pp. 1023-1028
    • Cassandra, A.R.1    Kaelbling, L.P.2    Littman, M.L.3
  • 12
    • 57749181518 scopus 로고    scopus 로고
    • Simulation-based approach to general game playing
    • Finnsson, H., & Bj̈ornsson, Y. (2008). Simulation-based approach to general game playing. In AAAI, pp. 259-264.
    • (2008) AAAI , pp. 259-264
    • Finnsson, H.1    Bj̈ornsson, Y.2
  • 15
    • 79956339609 scopus 로고    scopus 로고
    • Modification of UCT with patterns in Monte-Carlo Go. Tech. rep. 6062, INRIA, France
    • Gelly, S., Wang, Y., Munos, R., & Teytaud, O. (2006). Modification of UCT with patterns in Monte-Carlo Go. Tech. rep. 6062, INRIA, France.
    • (2006)
    • Gelly, S.1    Wang, Y.2    Munos, R.3    Teytaud, O.4
  • 16
    • 29344449759 scopus 로고    scopus 로고
    • Effective short-term opponent exploitation in simplified poker
    • Proceedings of the 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference, AAAI-05/IAAI-05
    • Hoehn, B., Southey, F., Holte, R. C., & Bulitko, V. (2005). Effective short-term opponent exploitation in simplified poker. In AAAI, pp. 783-788. (Pubitemid 43006704)
    • (2005) Proceedings of the National Conference on Artificial Intelligence , vol.2 , pp. 783-788
    • Hoehn, B.1    Southey, F.2    Holte, R.C.3    Bulitko, V.4
  • 17
    • 34250765690 scopus 로고    scopus 로고
    • Looping suffix tree-based inference of partially observable hidden state
    • Holmes, M. P., & Jr, C. L. I. (2006). Looping suffix tree-based inference of partially observable hidden state. In ICML, pp. 409-416.
    • (2006) ICML , pp. 409-416
    • Holmes Jr., M.P.1
  • 18
    • 1642393842 scopus 로고    scopus 로고
    • The fastest and shortest algorithm for all well-defined problems
    • Hutter, M. (2002a). The fastest and shortest algorithm for all well-defined problems. International Journal of Foundations of Computer Science., 13(3), 431-443.
    • (2002) International Journal of Foundations of Computer Science. , vol.13 , Issue.3 , pp. 431-443
    • Hutter, M.1
  • 19
    • 84937417436 scopus 로고    scopus 로고
    • Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures
    • Lecture Notes in Artificial Intelligence. Springer
    • Hutter, M. (2002b). Self-optimizing and Pareto-optimal policies in general environments based on Bayes-mixtures. In Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002), Lecture Notes in Artificial Intelligence. Springer.
    • (2002) Proceedings of the 15th Annual Conference on Computational Learning Theory (COLT 2002
    • Hutter, M.1
  • 21
    • 78049319488 scopus 로고    scopus 로고
    • Universal algorithmic intelligence: A mathematical top?down approach
    • Springer, Berlin
    • Hutter, M. (2007). Universal algorithmic intelligence: A mathematical top?down approach. In Artificial General Intelligence, pp. 227-290. Springer, Berlin.
    • (2007) Artificial General Intelligence , pp. 227-290
    • Hutter, M.1
  • 22
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • PII S000437029800023X
    • Kaelbling, L. P., Littman, M. L., & Cassandra, A. R. (1995). Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101, 99-134. (Pubitemid 128387390)
    • (1998) Artificial Intelligence , vol.101 , Issue.1-2 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.L.2    Cassandra, A.R.3
  • 24
    • 0006713882 scopus 로고    scopus 로고
    • Inducing classification and regression trees in first order logic
    • In Džeroski, S., & Lavrač, N. (Eds.), chap. 6. Springer
    • Kramer, S., & Widmer, G. (2001). Inducing classification and regression trees in first order logic. In Džeroski, S., & Lavrač, N. (Eds.), Relational Data Mining, chap. 6. Springer.
    • (2001) Relational Data Mining
    • Kramer, S.1    Widmer, G.2
  • 27
    • 79956345551 scopus 로고    scopus 로고
    • Ergodic MDPs admit self-optimising policies. Tech. rep. IDSIA- s21-04, Dalle Molle Institute for Artificial Intelligence (IDSIA
    • Legg, S., & Hutter, M. (2004). Ergodic MDPs admit self-optimising policies. Tech. rep. IDSIA-21-04, Dalle Molle Institute for Artificial Intelligence (IDSIA).
    • (2004)
    • Legg, S.1    Hutter, M.2
  • 28
    • 77956163718 scopus 로고    scopus 로고
    • Ph.D. thesis, Department of Informatics, University of Lugano
    • Legg, S. (2008). Machine Super Intelligence. Ph.D. thesis, Department of Informatics, University of Lugano.
    • (2008) Machine Super Intelligence
    • Legg, S.1
  • 31
    • 84898982129 scopus 로고    scopus 로고
    • Predictive representations of state
    • Littman, M., Sutton, R., & Singh, S. (2002). Predictive representations of state. In NIPS, pp. 1555-1561.
    • (2002) NIPS , pp. 1555-1561
    • Littman, M.1    Sutton, R.2    Singh, S.3
  • 34
    • 71149083875 scopus 로고    scopus 로고
    • Proto-predictive representation of states with simple recurrent temporaldifference networks
    • Makino, T. (2009). Proto-predictive representation of states with simple recurrent temporaldifference networks. In ICML, pp. 697-704.
    • (2009) ICML , pp. 697-704
    • Makino, T.1
  • 37
    • 79956365352 scopus 로고    scopus 로고
    • A computational approximation to the AIXI model
    • Pankov, S. (2008). A computational approximation to the AIXI model. In AGI, pp. 256-267.
    • (2008) AGI , pp. 256-267
    • Pankov, S.1
  • 39
    • 79956346776 scopus 로고    scopus 로고
    • Universal learning of repeated matrix games. Tech. rep. 18-05, IDSIA
    • Poland, J., & Hutter, M. (2006). Universal learning of repeated matrix games. Tech. rep. 18-05, IDSIA.
    • (2006)
    • Poland, J.1    Hutter, M.2
  • 40
    • 77950356463 scopus 로고    scopus 로고
    • Model-based bayesian reinforcement learning in partially observable domains
    • Poupart, P., & Vlassis, N. (2008). Model-based bayesian reinforcement learning in partially observable domains. In ISAIM.
    • (2008) ISAIM
    • Poupart, P.1    Vlassis, N.2
  • 43
    • 0030282113 scopus 로고    scopus 로고
    • The power of amnesia: Learning probabilistic automata with variable memory length
    • Ron, D., Singer, Y., & Tishby, N. (1996). The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, 25(2), 117-150.
    • (1996) Machine Learning , vol.25 , Issue.2 , pp. 117-150
    • Ron, D.1    Singer, Y.2    Tishby, N.3
  • 45
    • 85162018872 scopus 로고    scopus 로고
    • Bayes-adaptive POMDPs
    • In Platt, J., Koller, D., Singer, Y., & Roweis, S. (Eds.), MIT Press, Cambridge, MA
    • Ross, S., Chaib-draa, B., & Pineau, J. (2008). Bayes-adaptive POMDPs. In Platt, J., Koller, D., Singer, Y., & Roweis, S. (Eds.), Advances in Neural Information Processing Systems 20, pp. 1225-1232. MIT Press, Cambridge, MA.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1225-1232
    • Ross, S.1    Chaib-draa, B.2    Pineau, J.3
  • 46
    • 0031186687 scopus 로고    scopus 로고
    • Shifting Inductive Bias with Success-Story Algorithm, Adaptive Levin Search, and Incremental Self-Improvement
    • Schmidhuber, J., Zhao, J., & Wiering, M. A. (1997). Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement. Machine Learning, 28, 105-130. (Pubitemid 127507171)
    • (1997) Machine Learning , vol.28 , Issue.1 , pp. 105-130
    • Schmidhuber, J.1    Zhao, J.2    Wiering, M.3
  • 47
    • 0031194381 scopus 로고    scopus 로고
    • Discovering neural nets with low Kolmogorov complexity and high generalization capability
    • DOI 10.1016/S0893-6080(96)00127-X, PII S089360809600127X
    • Schmidhuber, J. (1997). Discovering neural nets with low Kolmogorov complexity and high generalization capability. Neural Networks, 10(5), 857-873. (Pubitemid 27315721)
    • (1997) Neural Networks , vol.10 , Issue.5 , pp. 857-873
    • Schmidhuber, J.1
  • 48
    • 84937439050 scopus 로고    scopus 로고
    • The speed prior: A new simplicity measure yielding near-optimal computable predictions
    • Schmidhuber, J. (2002). The speed prior: A new simplicity measure yielding near-optimal computable predictions. In Proc. 15th Annual Conf. on Computational Learning Theory, pp. 216-228.
    • (2002) Proc. 15th Annual Conf. on Computational Learning Theory , pp. 216-228
    • Schmidhuber, J.1
  • 50
    • 1642328943 scopus 로고    scopus 로고
    • Optimal ordered problem solver
    • Schmidhuber, J. (2004). Optimal ordered problem solver. Machine Learning, 54, 211-254.
    • (2004) Machine Learning , vol.54 , pp. 211-254
    • Schmidhuber, J.1
  • 52
    • 33646434962 scopus 로고    scopus 로고
    • Resolving perceptual aliasing in the presence of noisy sensors
    • Shani, G., & Brafman, R. (2004). Resolving perceptual aliasing in the presence of noisy sensors. In NIPS.
    • (2004) NIPS
    • Shani, G.1    Brafman, R.2
  • 55
    • 33749263456 scopus 로고    scopus 로고
    • Predictive state representations: A new theory for modeling dynamical systems
    • Singh, S., James, M., & Rudary, M. (2004). Predictive state representations: A new theory for modeling dynamical systems. In UAI, pp. 512-519.
    • (2004) UAI , pp. 512-519
    • Singh, S.1    James, M.2    Rudary, M.3
  • 56
    • 4544279425 scopus 로고
    • A formal theory of inductive inference: Parts 1 and 2
    • 224-254
    • Solomonoff, R. J. (1964). A formal theory of inductive inference: Parts 1 and 2. Information and Control, 7, 1-22 and 224-254.
    • (1964) Information and Control , vol.7 , pp. 1-22
    • Solomonoff, R.J.1
  • 59
    • 14344258433 scopus 로고    scopus 로고
    • A Bayesian framework for reinforcement learning
    • Strens, M. (2000). A Bayesian framework for reinforcement learning. In ICML, pp. 943-950.
    • (2000) ICML , pp. 943-950
    • Strens, M.1
  • 60
    • 77958523564 scopus 로고    scopus 로고
    • A reinforcement learning algorithm in partially observable environments using short-term memory
    • Suematsu, N., & Hayashi, A. (1999). A reinforcement learning algorithm in partially observable environments using short-term memory. In NIPS, pp. 1059-1065.
    • (1999) NIPS , pp. 1059-1065
    • Suematsu, N.1    Hayashi, A.2
  • 61
    • 79956373203 scopus 로고    scopus 로고
    • A Bayesian approach to model learning in non- Markovian environment
    • Suematsu, N., Hayashi, A., & Li, S. (1997). A Bayesian approach to model learning in non- Markovian environment. In ICML, pp. 349-357.
    • (1997) ICML , pp. 349-357
    • Suematsu, N.1    Hayashi, A.2    Li, S.3
  • 63
    • 31844431936 scopus 로고    scopus 로고
    • Temporal-difference networks
    • Sutton, R. S., & Tanner, B. (2004). Temporal-difference networks. In NIPS.
    • (2004) NIPS
    • Sutton, R.S.1    Tanner, B.2
  • 67
    • 31844436266 scopus 로고    scopus 로고
    • Bayesian sparse sampling for on-line reward optimization
    • Wang, T., Lizotte, D. J., Bowling, M. H., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In ICML, pp. 956-963.
    • (2005) ICML , pp. 956-963
    • Wang, T.1    Lizotte, D.J.2    Bowling, M.H.3    Schuurmans, D.4
  • 70
    • 0032022518 scopus 로고    scopus 로고
    • The context-tree weighting method: Extensions
    • PII S0018944898006543
    • Willems, F. M. J. (1998). The context-tree weighting method: Extensions. IEEE Transactions on Information Theory, 44, 792-798. (Pubitemid 128737641)
    • (1998) IEEE Transactions on Information Theory , vol.44 , Issue.2 , pp. 792-798
    • Willems, F.M.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.