메뉴 건너뛰기




Volumn 22, Issue 2, 2014, Pages 146-160

Multi-timescale nexting in a reinforcement learning robot

Author keywords

predictive knowledge; Reinforcement learning; robotics; temporal difference learning

Indexed keywords


EID: 84896357393     PISSN: 10597123     EISSN: 17412633     Source Type: Journal    
DOI: 10.1177/1059712313511648     Document Type: Article
Times cited : (73)

References (54)
  • 1
    • 0346859314 scopus 로고
    • Shank R. C.Colby K. M., ed.;, San Francisco, CA: W. H. Freeman and Company
    • Becker J. D.Computer models of thought and language. Shank R. C.Colby K. M., ed. San Francisco, CA: W. H. Freeman and Company; 1973:396-434.
    • (1973) Computer Models of Thought and Language , pp. 396-434
    • Becker, J.D.1
  • 8
    • 84872566721 scopus 로고    scopus 로고
    • Whatever next? Predictive brains, situated agents, and the future of cognitive science
    • Clark A.Whatever next? Predictive brains, situated agents, and the future of cognitive science.Behavioral and Brain Sciences. 2013;36 (3): 181-204.
    • (2013) Behavioral and Brain Sciences , vol.36 , Issue.3 , pp. 181-204
    • Clark, A.1
  • 11
    • 84869424969 scopus 로고    scopus 로고
    • Model-free reinforcement learning with continuous action in practice
    • Proceedings of the American Control Conference;; 2177
    • Degris T.,Pilarski P. M.,Sutton R. S.Model-free reinforcement learning with continuous action in practice. Proceedings of the American Control Conference; 2012; 2012. 2177.
    • (2012)
    • Degris, T.1    Pilarski, P.M.2    Sutton, R.S.3
  • 14
    • 6344257187 scopus 로고    scopus 로고
    • The emulation theory of representation: Motor control, imagery, and perception
    • Grush R.The emulation theory of representation: Motor control, imagery, and perception.Behavioural and Brain Sciences. 2004;27:377-442.
    • (2004) Behavioural and Brain Sciences , vol.27 , pp. 377-442
    • Grush, R.1
  • 17
    • 0042545768 scopus 로고
    • Learning to achieve goals
    • Proceedings of International Joint Conference on Artificial Intelligence;; 1094
    • Kaelbling L.Learning to achieve goals. Proceedings of International Joint Conference on Artificial Intelligence; 1993; 1993. 1094.
    • (1993)
    • Kaelbling, L.1
  • 18
    • 77952010176 scopus 로고    scopus 로고
    • Cambridge: Cambridge University Press
    • LaValle S. M.Planning algorithms. Cambridge: Cambridge University Press; 2006:.
    • (2006) Planning Algorithms
    • LaValle, S.M.1
  • 23
    • 77954101982 scopus 로고    scopus 로고
    • GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
    • Proceedings of the Third Conference on Artificial General Intelligence;; 91
    • Maei H.,Sutton R. S.GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. Proceedings of the Third Conference on Artificial General Intelligence; 2010; 2010. 91.
    • (2010)
    • Maei, H.1    Sutton, R.S.2
  • 24
    • 80051891791 scopus 로고    scopus 로고
    • Google cars drive themselves, in traffic
    • Markoff J.Google cars drive themselves, in traffic.The New York Times. 2010;:A1.
    • (2010) The New York Times
    • Markoff, J.1
  • 25
    • 84866006400 scopus 로고    scopus 로고
    • Multi-timescale nexting in a reinforcement learning robot
    • From Animals to Animats 12: 12th International Conference on Simulation of Adaptive Behavior;; 299
    • Modayil J.,White A.,Sutton R. S.Multi-timescale nexting in a reinforcement learning robot. From Animals to Animats 12: 12th International Conference on Simulation of Adaptive Behavior; 2012; 2012. 299.
    • (2012)
    • Modayil, J.1    White, A.2    Sutton, R.S.3
  • 26
    • 0342721206 scopus 로고    scopus 로고
    • A method for clustering the experiences of a mobile robot that accords with human judgments
    • Proceedings of the Seventeenth Conference of the Association for the Advancement of Artificial Intelligence;; 846
    • Oates T.,Schmill M. D.,Cohen P. R.A method for clustering the experiences of a mobile robot that accords with human judgments. Proceedings of the Seventeenth Conference of the Association for the Advancement of Artificial Intelligence; 2000; 2000. 846.
    • (2000)
    • Oates, T.1    Schmill, M.D.2    Cohen, P.R.3
  • 29
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Peters J.,Schaal S.Natural actor-critic.Neurocomputing. 2008;71 (7): 1180-1190.
    • (2008) Neurocomputing , vol.71 , Issue.7 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 30
    • 44349151557 scopus 로고    scopus 로고
    • Coordinating with the future: The anticipatory nature of representation
    • Pezzulo G.Coordinating with the future: The anticipatory nature of representation.Minds and Machines. 2008;18 (2): 179-225.
    • (2008) Minds and Machines , vol.18 , Issue.2 , pp. 179-225
    • Pezzulo, G.1
  • 31
    • 0031147214 scopus 로고    scopus 로고
    • Map learning with uninterpreted sensors and effectors
    • Pierce D.,Kuipers B. J.Map learning with uninterpreted sensors and effectors.Artificial Intelligence. 1997;92 (1): 169-227.
    • (1997) Artificial Intelligence , vol.92 , Issue.1 , pp. 169-227
    • Pierce, D.1    Kuipers, B.J.2
  • 33
    • 0026962175 scopus 로고
    • Reinforcement learning with a hierarchy of abstract models
    • Proceedings of the Conference of the Association for the Advancement of Artificial Intelligence;; 202
    • Singh S.Reinforcement learning with a hierarchy of abstract models. Proceedings of the Conference of the Association for the Advancement of Artificial Intelligence; 1992; 1992. 202.
    • (1992)
    • Singh, S.1
  • 34
    • 31844457132 scopus 로고    scopus 로고
    • Predictive state representations: A new theory for modeling dynamical systems
    • Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence;; 512
    • Singh S.,James M. R.,Rudary M. R.Predictive state representations: A new theory for modeling dynamical systems. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence; 2004; 2004. 512.
    • (2004)
    • Singh, S.1    James, M.R.2    Rudary, M.R.3
  • 35
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton R. S.Learning to predict by the method of temporal differences.Machine Learning. 1988;3:9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 36
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Proceedings of the Seventh International Conference on Machine Learning;; 216
    • Sutton R. S.Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proceedings of the Seventh International Conference on Machine Learning; 1990; 1990. 216.
    • (1990)
    • Sutton, R.S.1
  • 37
    • 84896385986 scopus 로고
    • TD models: Modeling the world at a mixture of time scales
    • Proceedings of the International Conference on Machine Learning;; 531
    • Sutton R. S.TD models: Modeling the world at a mixture of time scales. Proceedings of the International Conference on Machine Learning; 1995; 1995. 531.
    • (1995)
    • Sutton, R.S.1
  • 38
    • 84896334746 scopus 로고    scopus 로고
    • The grand challenge of predictive empirical abstract knowledge
    • Working Notes of the IJCAI-09 Workshop on Grand Challenges for Reasoning from Experiences;
    • Sutton R. S.The grand challenge of predictive empirical abstract knowledge. Working Notes of the IJCAI-09 Workshop on Grand Challenges for Reasoning from Experiences; 2009; 2009.
    • (2009)
    • Sutton, R.S.1
  • 39
    • 84864841464 scopus 로고    scopus 로고
    • Beyond reward: The problem of knowledge and data
    • Proceedings of the 21st International Conference on Inductive Logic Programming;; 2
    • Sutton R. S.Beyond reward: The problem of knowledge and data. Proceedings of the 21st International Conference on Inductive Logic Programming; 2012; 2012. 2.
    • (2012)
    • Sutton, R.S.1
  • 42
    • 84864885776 scopus 로고    scopus 로고
    • Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
    • Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems;; 761
    • Sutton R. S.,Modayil J.,Delp M.,Degris T.,Pilarski P. M.,White A.,Precup D.Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. Proceedings of the 10th International Conference on Autonomous Agents and Multiagent Systems; 2011; 2011. 761.
    • (2011)
    • Sutton, R.S.1    Modayil, J.2    Delp, M.3    Degris, T.4    Pilarski, P.M.5    White, A.6    Precup, D.7
  • 43
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • Sutton R. S.,Precup D.,Singh S.Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning.Artificial Intelligence. 1999;112:181-211.
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 45
    • 71149099079 scopus 로고    scopus 로고
    • Fast gradient-descent methods for temporal-difference learning with linear function approximation
    • Proceedings of the 26th International Conference on Machine Learning; Montreal, Canada; 993
    • Sutton R. S.,Maei H. R.,Precup D.,Bhatnagar S.,Silver D.,Szepesvari Cs.,. Wiewiora E.Fast gradient-descent methods for temporal-difference learning with linear function approximation. Proceedings of the 26th International Conference on Machine Learning; 2009Montreal, Canada; 2009. 993.
    • (2009)
    • Sutton, R.S.1    Maei, H.R.2    Precup, D.3    Bhatnagar, S.4    Silver, D.5    Szepesvari, C.6    Wiewiora, E.7
  • 47
    • 14044262287 scopus 로고    scopus 로고
    • Stochastic policy gradient reinforcement learning on a simple 3D biped
    • Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems;; 2849
    • Tedrake R.,Zhang T.,Seung H.Stochastic policy gradient reinforcement learning on a simple 3D biped. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems; 2005; 2005. 2849.
    • (2005)
    • Tedrake, R.1    Zhang, T.2    Seung, H.3
  • 48
    • 33750024797 scopus 로고    scopus 로고
    • Stanley: The robot that won the DARPA grand challenge
    • Thrun S.,Montemerlo M., et al.Stanley: The robot that won the DARPA grand challenge.Journal of Field Robotics. 2006;23 (9): 661-692.
    • (2006) Journal of Field Robotics , vol.23 , Issue.9 , pp. 661-692
    • Thrun, S.1    Montemerlo, M.2
  • 50
    • 0344876542 scopus 로고    scopus 로고
    • Online simultaneous localization and mapping with detection and tracking of moving objects: Theory and results from a ground vehicle in crowded urban areas
    • Proceedings of the IEEE International Conference on Robotics and Automation;; 842
    • Wang C. C.,Thorpe C.,Thrun S.Online simultaneous localization and mapping with detection and tracking of moving objects: Theory and results from a ground vehicle in crowded urban areas. Proceedings of the IEEE International Conference on Robotics and Automation; 2003; 2003. 842.
    • (2003)
    • Wang, C.C.1    Thorpe, C.2    Thrun, S.3
  • 51
    • 79955750805 scopus 로고
    • Chapel Hill, NC: Computer Science Department, University of North Carolina
    • Welch G.,Bishop G.An Introduction to the Kalman filter. Chapel Hill, NC: Computer Science Department, University of North Carolina; 1995:.
    • (1995) An Introduction to the Kalman Filter
    • Welch, G.1    Bishop, G.2
  • 52
    • 84872849054 scopus 로고    scopus 로고
    • Scaling life-long off-policy learning
    • Proceedings of the Second Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics
    • White A.,Modayil J.,Sutton R. S.Scaling life-long off-policy learning. Proceedings of the Second Joint IEEE International Conference on Development and Learning and on Epigenetic Robotics; 2012; 2012.
    • (2012)
    • White, A.1    Modayil, J.2    Sutton, R.S.3
  • 53
    • 0028799979 scopus 로고
    • An internal model for sensori-motor integration
    • Wolpert D.,Ghahramani Z.,Jordan M.An internal model for sensori-motor integration.Science. 1995;269 (5232): 1880-1882.
    • (1995) Science , vol.269 , Issue.5232 , pp. 1880-1882
    • Wolpert, D.1    Ghahramani, Z.2    Jordan, M.3
  • 54
    • 57149090913 scopus 로고    scopus 로고
    • Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment
    • Yamashita Y.,Tani J.Emergence of functional hierarchy in a multiple timescale neural network model: A humanoid robot experiment.PLOS: Computational Biology. 2008;4 (11): 1-18.
    • (2008) PLOS: Computational Biology , vol.4 , Issue.11 , pp. 1-18
    • Yamashita, Y.1    Tani, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.