SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 90, Issue 3, 2013, Pages 385-429

TEXPLORE: Real-time sample-efficient reinforcement learning for robots

(2) Hester, Todd a Stone, Peter a

a University of Texas at Austin (United States)

Author keywords

MDP; Real time; Reinforcement learning; Robotics

Indexed keywords

ALGORITHM LEARNING; AUTONOMOUS VEHICLES; CONTINUOUS STATE; MARKOV DECISION PROCESSES; MDP; RANDOM FORESTS; REAL-TIME; ROBOTIC CONTROLS; SENSOR/ACTUATOR; SEQUENTIAL DECISION MAKING;

ALGORITHMS; DECISION TREES; MARKOV PROCESSES; PARALLEL ARCHITECTURES; REINFORCEMENT LEARNING; ROBOTS;

ROBOTICS;

EID: 84874698101 PISSN: 08856125 EISSN: 15730565 Source Type: Journal
DOI: 10.1007/s10994-012-5322-7 Document Type: Article

Times cited : (120)

References (64)

1
- 0016556021
- A new approach to manipulator control: The cerebellar model articulation controller
- 0314.92007 10.1115/1.3426922
- Albus, J. S. (1975). A new approach to manipulator control: the cerebellar model articulation controller. Journal of Dynamic Systems, Measurement, and Control, 97(3), 220-227.
- (1975) Journal of Dynamic Systems, Measurement, and Control , vol.97 , Issue.3 , pp. 220-227
- Albus, J.S.¹

2
- 78649507911
- A Bayesian sampling approach to exploration in reinforcement learning
- Asmuth, J., Li, L., Littman, M., Nouri, A., & Wingate, D. (2009). A Bayesian sampling approach to exploration in reinforcement learning. In Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI).
- (2009) Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI)
- Asmuth, J.¹ Li, L.² Littman, M.³ Nouri, A.⁴ Wingate, D.⁵

3
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- 1012.68093 10.1023/A:1013689704352
- Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2), 235-256.
- (2002) Machine Learning , vol.47 , Issue.2 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

4
- 0029210635
- Learning to act using real-time dynamic programming
- 10.1016/0004-3702(94)00011-O
- Barto, A. G., Bradtke, S. J., & Singh, S. P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1-2), 81-138.
- (1995) Artificial Intelligence , vol.72 , Issue.1-2 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

5
- 84973495235
- Multiagent interactions in urban driving
- Beeson, P., O'Quin, J., Gillan, B., Nimmagadda, T., Ristroph, M., Li, D., & Stone, P. (2008). Multiagent interactions in urban driving. Journal of Physical Agents, 2(1), 15-30.
- (2008) Journal of Physical Agents , vol.2 , Issue.1 , pp. 15-30
- Beeson, P.¹ O'Quin, J.² Gillan, B.³ Nimmagadda, T.⁴ Ristroph, M.⁵ Li, D.⁶ Stone, P.⁷

6
- 84880854156
- R-Max - A general polynomial time algorithm for near-optimal reinforcement learning
- Brafman, R., & Tennenholtz, M. (2001). R-Max - a general polynomial time algorithm for near-optimal reinforcement learning. In Proceedings of the seventeenth international joint conference on artificial intelligence (IJCAI) (pp. 953-958).
- (2001) Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence (IJCAI) , pp. 953-958
- Brafman, R.¹ Tennenholtz, M.²

7
- 0035478854
- Random forests
- 1007.68152 10.1023/A:1010933404324
- Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.
- (2001) Machine Learning , vol.45 , Issue.1 , pp. 5-32
- Breiman, L.¹

8
- 80053436895
- Structure learning in ergodic factored MDPs without knowledge of the transition function's in-degree
- Chakraborty, D., & Stone, P. (2011). Structure learning in ergodic factored MDPs without knowledge of the transition function's in-degree. In Proceedings of the Twenty-Eighth international conference on machine learning (ICML).
- (2011) Proceedings of the Twenty-Eighth International Conference on Machine Learning (ICML)
- Chakraborty, D.¹ Stone, P.²

9
- 55249093890
- Parallel Monte-Carlo tree search
- Chaslot, G., Winands, M. H. M., & van den Herik, H. J. (2008). Parallel Monte-Carlo tree search. In The 6th international conference on computers and games (CG 2008) (pp. 60-71).
- (2008) The 6th International Conference on Computers and Games (CG 2008) , pp. 60-71
- Chaslot, G.¹ Winands, M.H.M.² Van Den Herik, H.J.³

10
- 1142281527
- Model based Bayesian exploration
- Dearden, R., Friedman, N., & Andre, D. (1999). Model based Bayesian exploration. In Proceedings of the fifteenth conference on uncertainty in artificial intelligence (UAI) (pp. 150-159).
- (1999) Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI) , pp. 150-159
- Dearden, R.¹ Friedman, N.² Andre, D.³

11
- 33749242809
- Learning the structure of factored Markov decision processes in reinforcement learning problems
- Degris, T., Sigaud, O., & Wuillemin, P. H. (2006). Learning the structure of factored Markov decision processes in reinforcement learning problems. In Proceedings of the twenty-third international conference on machine learning (ICML) (pp. 257-264).
- (2006) Proceedings of the Twenty-third International Conference on Machine Learning (ICML) , pp. 257-264
- Degris, T.¹ Sigaud, O.² Wuillemin, P.H.³

12
- 80053441894
- PILCO: A model-based and data-efficient approach to policy search
- Deisenroth, M., & Rasmussen, C. (2011). PILCO: a model-based and data-efficient approach to policy search. In Proceedings of the Twenty-Eighth international conference on machine learning (ICML).
- (2011) Proceedings of the Twenty-Eighth International Conference on Machine Learning (ICML)
- Deisenroth, M.¹ Rasmussen, C.²

13
- 71149108881
- The adaptive-meteorologists problem and its application to structure learning and feature selection in reinforcement learning
- Diuk, C., Li, L., & Leffler, B. (2009). The adaptive-meteorologists problem and its application to structure learning and feature selection in reinforcement learning. In Proceedings of the twenty-sixth international conference on machine learning (ICML) (p. 32).
- (2009) Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML) , pp. 32
- Diuk, C.¹ Li, L.² Leffler, B.³

14
- 1942421168
- Design for an optimal probe
- Duff, M. (2003). Design for an optimal probe. In Proceedings of the twentieth international conference on machine learning (ICML) (pp. 131-138).
- (2003) Proceedings of the Twentieth International Conference on Machine Learning (ICML) , pp. 131-138
- Duff, M.¹

15
- 9444250519
- Iteratively extending time horizon reinforcement learning
- Ernst, D., Geurts, P., & Wehenkel, L. (2003). Iteratively extending time horizon reinforcement learning. In Proceedings of the fourteenth European conference on machine learning (ECML) (pp. 96-107).
- (2003) Proceedings of the Fourteenth European Conference on Machine Learning (ECML) , pp. 96-107
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

16
- 21844465127
- Tree-based batch mode reinforcement learning
- 2249830 1222.68193
- Ernst, D., Geurts, P., & Wehenkel, L. (2005). Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6, 503-556.
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

17
- 78149267893
- Intrinsically motivated information foraging
- Fasel, I., Wilt, A., Mafi, N., & Morris, C. (2010). Intrinsically motivated information foraging. In Proceedings of the ninth international conference on development and learning (ICDL).
- (2010) Proceedings of the Ninth International Conference on Development and Learning (ICDL)
- Fasel, I.¹ Wilt, A.² Mafi, N.³ Morris, C.⁴

18
- 34547990649
- Combining online and offline knowledge in UCT
- 10.1145/1273496.1273531
- Gelly, S., & Silver, D. (2007). Combining online and offline knowledge in UCT. In Proceedings of the twenty-fourth international conference on machine learning (ICML) (pp. 273-280).
- (2007) Proceedings of the Twenty-fourth International Conference on Machine Learning (ICML) , pp. 273-280
- Gelly, S.¹ Silver, D.²

19
- 55849142990
- The parallelization of Monte-Carlo planning
- Gelly, S., Hoock, J. B., Rimmel, A., Teytaud, O., & Kalemkarian, Y. (2008). The parallelization of Monte-Carlo planning. In Proceedings of the fifth international conference on informatics in control, automation and robotics, intelligent control systems and optimization (ICINCO 2008) (pp. 244-249).
- (2008) Proceedings of the Fifth International Conference on Informatics in Control, Automation and Robotics, Intelligent Control Systems and Optimization (ICINCO 2008) , pp. 244-249
- Gelly, S.¹ Hoock, J.B.² Rimmel, A.³ Teytaud, O.⁴ Kalemkarian, Y.⁵

20
- 84880694195
- Stable function approximation in dynamic programming
- Gordon, G. (1995). Stable function approximation in dynamic programming. In Proceedings of the twelfth international conference on machine learning (ICML).
- (1995) Proceedings of the Twelfth International Conference on Machine Learning (ICML)
- Gordon, G.¹

21
- 33749245414
- Algorithm-directed exploration for model-based reinforcement learning in factored MDPs
- Guestrin, C., Patrascu, R., & Schuurmans, D. (2002). Algorithm-directed exploration for model-based reinforcement learning in factored MDPs. In Proceedings of the nineteenth international conference on machine learning (ICML) (pp. 235-242).
- (2002) Proceedings of the Nineteenth International Conference on Machine Learning (ICML) , pp. 235-242
- Guestrin, C.¹ Patrascu, R.² Schuurmans, D.³

22
- 84866894654
- An empirical comparison of abstraction in models of Markov decision processes
- Hester, T., & Stone, P. (2009). An empirical comparison of abstraction in models of Markov decision processes. In Proceedings of the ICML/UAI/COLT workshop on abstraction in reinforcement learning.
- (2009) Proceedings of the ICML/UAI/COLT Workshop on Abstraction in Reinforcement Learning
- Hester, T.¹ Stone, P.²

23
- 78149247074
- Real time targeted exploration in large domains
- Hester, T., & Stone, P. (2010). Real time targeted exploration in large domains. In Proceedings of the ninth international conference on development and learning (ICDL).
- (2010) Proceedings of the Ninth International Conference on Development and Learning (ICDL)
- Hester, T.¹ Stone, P.²

24
- 84864454010
- RTMBA: A real-time model-based reinforcement learning architecture for robot control
- Hester, T., Quinlan, M., & Stone, P. (2012). RTMBA: a real-time model-based reinforcement learning architecture for robot control. In IEEE international conference on robotics and automation (ICRA).
- (2012) IEEE International Conference on Robotics and Automation (ICRA)
- Hester, T.¹ Quinlan, M.² Stone, P.³

25
- 84874668709
- Model-based function approximation for reinforcement learning
- Jong, N., & Stone, P. (2007). Model-based function approximation for reinforcement learning. In Proceedings of the sixth international joint conference on autonomous agents and multiagent systems (AAMAS).
- (2007) Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS)
- Jong, N.¹ Stone, P.²

26
- 0037399236
- Markov decision processes with delays and asynchronous cost collection
- 1968039 10.1109/TAC.2003.809799
- Katsikopoulos, K., & Engelbrecht, S. (2003). Markov decision processes with delays and asynchronous cost collection. IEEE Transactions on Automatic Control, 48(4), 568-574.
- (2003) IEEE Transactions on Automatic Control , vol.48 , Issue.4 , pp. 568-574
- Katsikopoulos, K.¹ Engelbrecht, S.²

27
- 84880649215
- A sparse sampling algorithm for near-optimal planning in large Markov decision processes
- Kearns, M., Mansour, Y., & Ng, A. (1999). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. In Proceedings of the sixteenth international joint conference on artificial intelligence (IJCAI) (pp. 1324-1331).
- (1999) Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI) , pp. 1324-1331
- Kearns, M.¹ Mansour, Y.² Ng, A.³

28
- 78049390740
- Policy search for motor primitives in robotics
- 1237.68229 10.1007/s10994-010-5223-6
- Kober, J., & Peters, J. (2011). Policy search for motor primitives in robotics. Machine Learning, 84(1-2), 171-203.
- (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 171-203
- Kober, J.¹ Peters, J.²

29
- 34547975806
- Bandit based Monte-Carlo planning
- Kocsis, L., & Szepesvári, C. (2006). Bandit based Monte-Carlo planning. In Proceedings of the seventeenth European conference on machine learning (ECML).
- (2006) Proceedings of the Seventeenth European Conference on Machine Learning (ECML)
- Kocsis, L.¹ Szepesvári, C.²

30
- 9444275934
- Machine learning for fast quadrupedal locomotion
- Kohl, N., & Stone, P. (2004). Machine learning for fast quadrupedal locomotion. In Proceedings of the nineteenth AAAI conference on artificial intelligence.
- (2004) Proceedings of the Nineteenth AAAI Conference on Artificial Intelligence
- Kohl, N.¹ Stone, P.²

31
- 84868298260
- LRTDP versus UCT for online probabilistic planning
- Kolobov, A., Mausam, & Weld, D. (2012). LRTDP versus UCT for online probabilistic planning. In AAAI conference on artificial intelligence. https://www.aaai.org/ocs/index.php/AAAI/AAAI12/paper/view/4961/5334.
- (2012) AAAI Conference on Artificial Intelligence
- Kolobov, A.¹ Mausam² Weld, D.³

32
- 4644323293
- Least-squares policy iteration
- 2125347
- Lagoudakis, M., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107-1149.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.¹ Parr, R.²

33
- 36349026477
- Efficient reinforcement learning with relocatable action models
- Leffler, B., Littman, M., & Edmunds, T. (2007). Efficient reinforcement learning with relocatable action models. In Proceedings of the twenty-second AAAI conference on artificial intelligence (pp. 572-577).
- (2007) Proceedings of the Twenty-second AAAI Conference on Artificial Intelligence , pp. 572-577
- Leffler, B.¹ Littman, M.² Edmunds, T.³

34
- 56449122733
- Knows what it knows: A framework for self-aware learning
- 10.1145/1390156.1390228
- Li, L., Littman, M., & Walsh, T. (2008). Knows what it knows: a framework for self-aware learning. In Proceedings of the twenty-fifth international conference on machine learning (ICML) (pp. 568-575).
- (2008) Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML) , pp. 568-575
- Li, L.¹ Littman, M.² Walsh, T.³

35
- 0003673017
- Ph.D. Thesis, Pittsburgh, PA, USA
- Lin, L. J. (1992). Reinforcement learning for robots using neural networks. Ph.D. Thesis, Pittsburgh, PA, USA.
- (1992) Reinforcement Learning for Robots Using Neural Networks
- Lin, L.J.¹

36
- 0002242826
- Learning to use selective attention and short-term memory in sequential tasks
- McCallum, A. (1996). Learning to use selective attention and short-term memory in sequential tasks. In From animals to animats 4: proceedings of the fourth international conference on simulation of adaptive behavior.
- (1996) From Animals to Animats 4: Proceedings of the Fourth International Conference on Simulation of Adaptive Behavior
- McCallum, A.¹

37
- 84855817203
- A parallel general game player
- 10.1007/s13218-010-0083-6
- Méhat, J., & Cazenave, T. (2011). A parallel general game player. KI. Künstliche Intelligenz, 25(1), 43-47.
- (2011) KI. Künstliche Intelligenz , vol.25 , Issue.1 , pp. 43-47
- Méhat, J.¹ Cazenave, T.²

38
- 0036832953
- Variable resolution discretization in optimal control
- 1005.68086 10.1023/A:1017992615625
- Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291-323.
- (2002) Machine Learning , vol.49 , pp. 291-323
- Munos, R.¹ Moore, A.²

39
- 3042583887
- Autonomous helicopter flight via reinforcement learning
- 16
- Ng, A., Kim Jordan M, H. J., & Sastry, S. (2003). Autonomous helicopter flight via reinforcement learning. In Advances in neural information processing systems (NIPS) (Vol. 16).
- (2003) Advances in Neural Information Processing Systems (NIPS)
- Ng, A.¹ Kim Jordan, M.H.J.² Sastry, S.³

40
- 0037383659
- What the cerebellum computes
- 10.1016/S0166-2236(03)00054-7
- Ohyama, T., Nores, W. L., Murphy, M., & Mauk, M. D. (2003). What the cerebellum computes. Trends in Neurosciences, 26(4), 222-227.
- (2003) Trends in Neurosciences , vol.26 , Issue.4 , pp. 222-227
- Ohyama, T.¹ Nores, W.L.² Murphy, M.³ Mauk, M.D.⁴

41
- 34047267520
- Intrinsic motivation systems for autonomous mental development
- 10.1109/TEVC.2006.890271
- Oudeyer, P. Y., Kaplan, F., & Hafner, V. V. (2007). Intrinsic motivation systems for autonomous mental development. IEEE Transactions on Evolutionary Computation, 11(2), 265-286.
- (2007) IEEE Transactions on Evolutionary Computation , vol.11 , Issue.2 , pp. 265-286
- Oudeyer, P.Y.¹ Kaplan, F.² Hafner, V.V.³

42
- 33749251297
- An analytic solution to discrete Bayesian reinforcement learning
- Poupart, P., Vlassis, N., Hoey, J., & Regan, K. (2006). An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the twenty-third international conference on machine learning (ICML) (pp. 697-704).
- (2006) Proceedings of the Twenty-third International Conference on Machine Learning (ICML) , pp. 697-704
- Poupart, P.¹ Vlassis, N.² Hoey, J.³ Regan, K.⁴

43
- 77957352104
- ROS: An open-source robot operating system
- Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., & Ng, A. (2009). ROS: an open-source robot operating system. In ICRA workshop on open source software.
- (2009) ICRA Workshop on Open Source Software
- Quigley, M.¹ Conley, K.² Gerkey, B.³ Faust, J.⁴ Foote, T.⁵ Leibs, J.⁶ Wheeler, R.⁷ Ng, A.⁸

44
- 33744584654
- Induction of decision trees
- Quinlan, R. (1986). Induction of decision trees. Machine Learning, 1, 81-106.
- (1986) Machine Learning , vol.1 , pp. 81-106
- Quinlan, R.¹

45
- 0001495905
- Learning with continuous classes
- World Scientific Singapore
- Quinlan, R. (1992). Learning with continuous classes. In 5th Australian joint conference on artificial intelligence (pp. 343-348). Singapore: World Scientific.
- (1992) 5th Australian Joint Conference on Artificial Intelligence , pp. 343-348
- Quinlan, R.¹

46
- 78651479757
- Control delay in reinforcement learning for real-time dynamic systems: A memoryless approach
- Schuitema, E., Busoniu, L., Babuska, R., & Jonker, P. (2010). Control delay in reinforcement learning for real-time dynamic systems: a memoryless approach. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (IROS) (pp. 3226-3231).
- (2010) Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , pp. 3226-3231
- Schuitema, E.¹ Busoniu, L.² Babuska, R.³ Jonker, P.⁴

47
- 56449110907
- Sample-based learning and search with permanent and transient memories
- 10.1145/1390156.1390278
- Silver, D., Sutton, R., & Müller, M. (2008). Sample-based learning and search with permanent and transient memories. In Proceedings of the twenty-fifth international conference on machine learning (ICML) (pp. 968-975).
- (2008) Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML) , pp. 968-975
- Silver, D.¹ Sutton, R.² Müller, M.³

48
- 84863416482
- Silver, D., Sutton, R., & Muller, M. (2012). Temporal difference search in computer go. Machine Learning, 87
- (2012) Temporal Difference Search in Computer Go. Machine Learning , pp. 87
- Silver, D.¹ Sutton, R.² Muller, M.³

49
- 31844432138
- A theoretical analysis of model-based interval estimation
- 10.1145/1102351.1102459
- Strehl, A., & Littman, M. (2005). A theoretical analysis of model-based interval estimation. In Proceedings of the twenty-second international conference on machine learning (ICML) (pp. 856-863).
- (2005) Proceedings of the Twenty-second International Conference on Machine Learning (ICML) , pp. 856-863
- Strehl, A.¹ Littman, M.²

50
- 77955832538
- Online linear regression and its application to model-based reinforcement learning
- 20
- Strehl, A., & Littman, M. (2007). Online linear regression and its application to model-based reinforcement learning. In Advances in neural information processing systems (NIPS) (Vol. 20).
- (2007) Advances in Neural Information Processing Systems (NIPS)
- Strehl, A.¹ Littman, M.²

51
- 36348930987
- Efficient structure learning in factored-state MDPs
- Strehl, A., Diuk, C., & Littman, M. (2007). Efficient structure learning in factored-state MDPs. In Proceedings of the twenty-second AAAI conference on artificial intelligence (pp. 645-650).
- (2007) Proceedings of the Twenty-second AAAI Conference on Artificial Intelligence , pp. 645-650
- Strehl, A.¹ Diuk, C.² Littman, M.³

52
- 14344258433
- A Bayesian framework for reinforcement learning
- Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of the seventeenth international conference on machine learning (ICML) (pp. 943-950).
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning (ICML) , pp. 943-950
- Strens, M.¹

53
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the seventh international conference on machine learning (ICML) (pp. 216-224).
- (1990) Proceedings of the Seventh International Conference on Machine Learning (ICML) , pp. 216-224
- Sutton, R.¹

54
- 0004102479
- MIT Press Cambridge
- Sutton, R., & Barto, A. (1998). Reinforcement learning: an introduction. Cambridge: MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

55
- 84899464022
- Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
- Sutton, R., Modayil, J., Delp, M., Degris, T., Pilarski, P., White, A., & Precup, D. (2011). Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proceedings of the tenth international joint conference on autonomous agents and multiagent systems (AAMAS).
- (2011) Proceedings of the Tenth International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS)
- Sutton, R.¹ Modayil, J.² Delp, M.³ Degris, T.⁴ Pilarski, P.⁵ White, A.⁶ Precup, D.⁷

56
- 70449370276
- RL-Glue: Language-independent software for reinforcement-learning experiments
- Tanner, B., & White, A. (2009). RL-Glue: language-independent software for reinforcement-learning experiments. Journal of Machine Learning Research, 10, 2133-2136.
- (2009) Journal of Machine Learning Research , vol.10 , pp. 2133-2136
- Tanner, B.¹ White, A.²

57
- 79956344726
- A Monte-Carlo AIXI approximation
- 2805239 1214.68302
- Veness, J., Ng, K. S., Hutter, M., Uther, W. T. B., & Silver, D. (2011). A Monte-Carlo AIXI approximation. The Journal of Artificial Intelligence Research, 40, 95-142.
- (2011) The Journal of Artificial Intelligence Research , vol.40 , pp. 95-142
- Veness, J.¹ Ng, K.S.² Hutter, M.³ Uther, W.T.B.⁴ Silver, D.⁵

58
- 58049186782
- Learning and planning in environments with delayed feedback
- 10.1007/s10458-008-9056-7
- Walsh, T., Nouri, A., Li, L., & Littman, M. (2009a). Learning and planning in environments with delayed feedback. Autonomous Agents and Multi-Agent Systems, 18, 83-105.
- (2009) Autonomous Agents and Multi-Agent Systems , vol.18 , pp. 83-105
- Walsh, T.¹ Nouri, A.² Li, L.³ Littman, M.⁴

59
- 79958846996
- Exploring compact reinforcement-learning representations with linear regression
- Walsh, T., Szita, I., Diuk, C., & Littman, M. (2009b). Exploring compact reinforcement-learning representations with linear regression. In Proceedings of the 25th conference on uncertainty in artificial intelligence (UAI).
- (2009) Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI)
- Walsh, T.¹ Szita, I.² Diuk, C.³ Littman, M.⁴

60
- 77958578580
- Integrating sample-based planning and model-based reinforcement learning
- Walsh, T., Goschin, S., & Littman, M. (2010). Integrating sample-based planning and model-based reinforcement learning. In Proceedings of the twenty-fifth AAAI conference on artificial intelligence.
- (2010) Proceedings of the Twenty-fifth AAAI Conference on Artificial Intelligence
- Walsh, T.¹ Goschin, S.² Littman, M.³

61
- 31844436266
- Bayesian sparse sampling for on-line reward optimization
- 10.1145/1102351.1102472
- Wang, T., Lizotte, D., Bowling, M., & Schuurmans, D. (2005). Bayesian sparse sampling for on-line reward optimization. In Proceedings of the twenty-second international conference on machine learning (ICML) (pp. 956-963).
- (2005) Proceedings of the Twenty-second International Conference on Machine Learning (ICML) , pp. 956-963
- Wang, T.¹ Lizotte, D.² Bowling, M.³ Schuurmans, D.⁴

62
- 0004049893
- Ph.D. Thesis, University of Cambridge
- Watkins, C. (1989). Learning from delayed rewards. Ph.D. Thesis, University of Cambridge.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

63
- 0345161973
- Efficient model-based exploration
- MIT Press Cambridge
- Wiering, M., & Schmidhuber, J. (1998). Efficient model-based exploration. In From animals to animats 5: proceedings of the fifth international conference on simulation of adaptive behavior (pp. 223-228). Cambridge: MIT Press.
- (1998) From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior , pp. 223-228
- Wiering, M.¹ Schmidhuber, J.²

64
- 0029307102
- The context tree weighting method: Basic properties
- 0837.94011 10.1109/18.382012
- Willems, F. M. J., Shtarkov, Y. M., & Tjalkens, T. J. (1995). The context tree weighting method: basic properties. IEEE Transactions on Information Theory, 41, 653-664.
- (1995) IEEE Transactions on Information Theory , vol.41 , pp. 653-664
- Willems, F.M.J.¹ Shtarkov, Y.M.² Tjalkens, T.J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.