메뉴 건너뛰기




Volumn 13, Issue 3, 2005, Pages 165-188

Reinforcement learning for RoboCup soccer keepaway

Author keywords

Machine learning; Multiagent learning; Multiagent systems; Reinforcement learning; Robot soccer

Indexed keywords


EID: 27544506565     PISSN: 10597123     EISSN: None     Source Type: Journal    
DOI: 10.1177/105971230501300301     Document Type: Article
Times cited : (318)

References (52)
  • 2
    • 0006923516 scopus 로고    scopus 로고
    • Refinement of soccer agents' positions using reinforcement learning
    • H. Kitano (Ed.), Berlin: Springer
    • Andou, T. (1998). Refinement of soccer agents' positions using reinforcement learning. In H. Kitano (Ed.), RoboCup-97: Robot soccer world cup I (pp. 373-388). Berlin: Springer.
    • (1998) RoboCup-97: Robot Soccer World Cup I , pp. 373-388
    • Andou, T.1
  • 3
    • 84898960325 scopus 로고    scopus 로고
    • Programmable reinforcement learning agents
    • T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Cambridge, MA: MIT Press
    • Andre, D., & Russell, S. J. (2001). Programmable reinforcement learning agents. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 1019-1025). Cambridge, MA: MIT Press.
    • (2001) Advances in Neural Information Processing Systems , vol.13 , pp. 1019-1025
    • Andre, D.1    Russell, S.J.2
  • 4
    • 0036927201 scopus 로고    scopus 로고
    • State abstraction for programmable reinforcement learning agents
    • R. Dechter, M. Kearns & R. S. Sutton (Eds.), Mento Park CA: AAAL Press
    • Andre, D., & Russell, S. J. (2002). State abstraction for programmable reinforcement learning agents. In R. Dechter, M. Kearns & R. S. Sutton (Eds.), Proceedings of the 18th National Conference on Artificial Intelligence Mento Park (pp. 119-125). CA: AAAL Press.
    • (2002) Proceedings of the 18th National Conference on Artificial Intelligence , pp. 119-125
    • Andre, D.1    Russell, S.J.2
  • 5
    • 84947424101 scopus 로고    scopus 로고
    • Evolving team Darwin United
    • M. Asada & H. Kitano (Eds.), Berlin: Springer
    • Andre, D., & Teller, A. (1999). Evolving team Darwin United. In M. Asada & H. Kitano (Eds.), RoboCup-98: Robot soccer world cup II (pp. 346-351). Berlin: Springer.
    • (1999) RoboCup-98: Robot Soccer World Cup II , pp. 346-351
    • Andre, D.1    Teller, A.2
  • 6
    • 0034859944 scopus 로고    scopus 로고
    • Autonomous helicopter control using reinforcement learning policy search methods
    • IEEE
    • Bagnell, J. A., & Schneider, J. (2001). Autonomous helicopter control using reinforcement learning policy search methods. In International Conference on Robotics and Automation (pp. 1615-1620). IEEE.
    • (2001) International Conference on Robotics and Automation , pp. 1615-1620
    • Bagnell, J.A.1    Schneider, J.2
  • 7
    • 84898958374 scopus 로고    scopus 로고
    • Gradient descent for general reinforcement learning
    • M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.) Cambridge, MA: The MIT Press
    • Baird, L. C., & Moore, A. W. (1999). Gradient descent for general reinforcement learning. In M. J. Kearns, S. A. Solla, & D. A. Cohn (Eds.) Advances in neural information processing systems (Vol. 11, pp. 968-974). Cambridge, MA: The MIT Press.
    • (1999) Advances in Neural Information Processing Systems , vol.11 , pp. 968-974
    • Baird, L.C.1    Moore, A.W.2
  • 8
  • 10
    • 0346942368 scopus 로고    scopus 로고
    • Decision-theoretic planning: Structural assumptions and computational leverage
    • Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.
    • (1999) Journal of Artificial Intelligence Research , vol.11 , pp. 1-94
    • Boutilier, C.1    Dean, T.2    Hanks, S.3
  • 11
    • 85150714688 scopus 로고
    • Reinforcement learning methods for continuous-time Markov decision problems
    • G. Tesauro, D. Touretzky, & T. Leem (Eds.), San Mateo, CA: Morgan Kaufmann
    • Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. Touretzky, & T. Leem (Eds.), Advances in neural information processing systems (Vol. 7, pp. 393-400). San Mateo, CA: Morgan Kaufmann.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 393-400
    • Bradtke, S.J.1    Duff, M.O.2
  • 13
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: The MIT Press
    • Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems (Vol. 8, pp. 1017-1023). Cambridge, MA: The MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
    • Crites, R.H.1    Barto, A.G.2
  • 14
    • 0009346464 scopus 로고
    • Reinforcement learning for planning and control
    • S. Minton (Ed.), San Mateo, CA: Morgan Kaufmann
    • Dean, T., Basye, K., & Shewchuk, J. (1992). Reinforcement learning for planning and control. In S. Minton (Ed.), Machine learning methods for planning and scheduling (pp. 67-92). San Mateo, CA: Morgan Kaufmann.
    • (1992) Machine Learning Methods for Planning and Scheduling , pp. 67-92
    • Dean, T.1    Basye, K.2    Shewchuk, J.3
  • 15
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the maxq value function decomposition
    • Dietterich, T. G. (2000). Hierarchical reinforcement learning with the maxq value function decomposition. Journal of Artificial Intelligence Research, 13, 227-303.
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 16
    • 84898995808 scopus 로고    scopus 로고
    • Reinforcement learning with function approximation converges to a region
    • T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Cambridge, MA: The MIT Press
    • Gordon, G. (2001). Reinforcement learning with function approximation converges to a region. In T. K. Leen, T. G. Dietterich, & V. Tresp (Eds.), Advances in neural information processing systems (Vol. 13, pp. 1040-1046). Cambridge, MA: The MIT Press.
    • (2001) Advances in Neural Information Processing Systems , vol.13 , pp. 1040-1046
    • Gordon, G.1
  • 17
    • 84899028010 scopus 로고    scopus 로고
    • Multiagent planning with factored MDPs
    • T. G. Dietterich, S. Becker & Z. Ghahramani (Eds.) Cambridge, MA: MIT Press
    • Guestrin, C., Koller, D., & Parr, R. (2002). Multiagent planning with factored MDPs. In T. G. Dietterich, S. Becker & Z. Ghahramani (Eds.) Advances in neural information processing systems (Vol. 14, pp. 1523-1530). Cambridge, MA: MIT Press.
    • (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1523-1530
    • Guestrin, C.1    Koller, D.2    Parr, R.3
  • 18
    • 4344663737 scopus 로고    scopus 로고
    • Genetic programming and multi-agent layered learning by reinforcements
    • W. B. Langdon et. al. (Eds.), New York San Mateo, CA: Morgan Kaufmann
    • Hsu, W. H., & Gustafson, S. M. (2002). Genetic programming and multi-agent layered learning by reinforcements. In W. B. Langdon et. al. (Eds.), Genetic and Evolutionary Computation Conference (New York) (pp. 764-771). San Mateo, CA: Morgan Kaufmann.
    • (2002) Genetic and Evolutionary Computation Conference , pp. 764-771
    • Hsu, W.H.1    Gustafson, S.M.2
  • 21
    • 0026219293 scopus 로고
    • CMAC-based adaptive critic self-learning control
    • IEEE
    • Lin, C.-S., & Kim, H. (1991). CMAC-based adaptive critic self-learning control. In IEEE Transactions on Neural Networks, 2, (pp. 530-533). IEEE.
    • (1991) IEEE Transactions on Neural Networks , vol.2 , pp. 530-533
    • Lin, C.-S.1    Kim, H.2
  • 22
    • 0003322602 scopus 로고    scopus 로고
    • Co-evolving soccer softbot team coordination with genetic programming
    • Kitano, H. (Ed.), Berlin: Springer
    • Luke, S., Hohn, C., Farris, J., Jackson, G., & Hendler, J. (1998). Co-evolving soccer softbot team coordination with genetic programming. In Kitano, H. (Ed.), RoboCup-97: Robot soccer world cup I (pp. 398-411). Berlin: Springer.
    • (1998) RoboCup-97: Robot Soccer World Cup I , pp. 398-411
    • Luke, S.1    Hohn, C.2    Farris, J.3    Jackson, G.4    Hendler, J.5
  • 24
    • 84867446844 scopus 로고    scopus 로고
    • Keeping the ball from CMUnited-99
    • P. Stone, T. Balch, & G. Kraetszchmar (Eds.), Berlin: Springer
    • McAllester, D., & Stone, P. (2001). Keeping the ball from CMUnited-99. In P. Stone, T. Balch, & G. Kraetszchmar (Eds.), RoboCup-2000: Robot soccer world cup IV. (pp. 333-338) Berlin: Springer.
    • (2001) RoboCup-2000: Robot Soccer World Cup IV , pp. 333-338
    • McAllester, D.1    Stone, P.2
  • 27
    • 84898960655 scopus 로고    scopus 로고
    • A convergent form of approximate policy iteration
    • S. Becker, S. Thrun, & K. Obermayer (Eds.), Cambridge, MA: The MIT Press
    • Perkins, T. J., & Precup, D. (2003). A convergent form of approximate policy iteration. In S. Becker, S. Thrun, & K. Obermayer (Eds.), Advances in neural information processing systems (Vol. 16) (pp. 1595-1602) Cambridge, MA: The MIT Press.
    • (2003) Advances in Neural Information Processing Systems , vol.16 , pp. 1595-1602
    • Perkins, T.J.1    Precup, D.2
  • 31
    • 84867471400 scopus 로고    scopus 로고
    • Karlsruhe brainstormers - A reinforcement learning approach to robotic soccer
    • P. Stone, T. Balch, & G. Kraetszchmar, (Eds.), Berlin: Springer
    • Riedmiller, M., Merke, A., Meier, D., Hoffman, A., Sinner, A., Thate, O., & Ehrmann, R. (2001). Karlsruhe brainstormers - a reinforcement learning approach to robotic soccer. In P. Stone, T. Balch, & G. Kraetszchmar, (Eds.), RoboCup-2000: Robot soccer world cup IV. (pp. 367-372) Berlin: Springer.
    • (2001) RoboCup-2000: Robot Soccer World Cup IV , pp. 367-372
    • Riedmiller, M.1    Merke, A.2    Meier, D.3    Hoffman, A.4    Sinner, A.5    Thate, O.6    Ehrmann, R.7
  • 33
    • 0003636089 scopus 로고
    • On-line Q-learning using connectionist systems
    • Cambridge University Engineering Department
    • Rummery, G. A., & Niranjan, M. (1994). On-line Q-learning using connectionist systems. Technical report CUED/F-INFENG/TR 166, Cambridge University Engineering Department.
    • (1994) Technical Report CUED/F-INFENG/TR 166
    • Rummery, G.A.1    Niranjan, M.2
  • 35
    • 0034832969 scopus 로고    scopus 로고
    • An architecture for action selection in robotic soccer
    • E. Andre, S. Sen, C. Frasson & J. P. Muller (Eds.) New York, NY: ACM Press
    • Stone, P., & McAllester, D. (2001). An architecture for action selection in robotic soccer. In E. Andre, S. Sen, C. Frasson & J. P. Muller (Eds.) Proceedings of the Fifth International Conference on Autonomous Agents (pp. 316-323). New York, NY: ACM Press.
    • (2001) Proceedings of the Fifth International Conference on Autonomous Agents , pp. 316-323
    • Stone, P.1    McAllester, D.2
  • 36
    • 0013528313 scopus 로고    scopus 로고
    • Scaling reinforcement learning toward RoboCup soccer
    • C. E. Brodley & A. P. Danyluk (Eds.) San Francisco, CA: Morgan Kaufmann
    • Stone, P., & Sutton, R. S. (2001). Scaling reinforcement learning toward RoboCup soccer. In C. E. Brodley & A. P. Danyluk (Eds.) Proceedings of the Eighteenth International Conference on Machine Learning (pp. 537-544). San Francisco, CA: Morgan Kaufmann.
    • (2001) Proceedings of the Eighteenth International Conference on Machine Learning , pp. 537-544
    • Stone, P.1    Sutton, R.S.2
  • 37
    • 84867470253 scopus 로고    scopus 로고
    • Keepaway soccer: A machine learning testbed
    • A. Birk, S. Coradeschi, & S. Tadokoro (Eds.), Berlin: Springer
    • Stone, P., & Sutton, R. S. (2002). Keepaway soccer: A machine learning testbed. In A. Birk, S. Coradeschi, & S. Tadokoro (Eds.), RoboCup-2001: Robot soccer world cup V (pp. 214-223). Berlin: Springer.
    • (2002) RoboCup-2001: Robot Soccer World Cup V , pp. 214-223
    • Stone, P.1    Sutton, R.S.2
  • 38
    • 84867452958 scopus 로고    scopus 로고
    • Reinforcement learning for 3 vs. 2 keepaway
    • P. Stone, T. Balch, & G. Kraetszchmar (Eds.), Berlin: Springer
    • Stone, P., Sutton, R. S., & Singh, S. (2001). Reinforcement learning for 3 vs. 2 keepaway. In P. Stone, T. Balch, & G. Kraetszchmar (Eds.), RoboCup-2000: Robot soccer world cup IV (pp. 249-258). Berlin: Springer.
    • (2001) RoboCup-2000: Robot Soccer World Cup IV , pp. 249-258
    • Stone, P.1    Sutton, R.S.2    Singh, S.3
  • 39
    • 84947431867 scopus 로고    scopus 로고
    • Team-partitioned, opaque-transition reinforcement learning
    • M. Asada, & H. Kitano (Eds.), Berlin: Springer Verlag
    • Stone, P., & Veloso, M. (1999). Team-partitioned, opaque-transition reinforcement learning. In M. Asada, & H. Kitano (Eds.), RoboCup-98: Robot soccer world cup II (pp. 261-272). Berlin: Springer Verlag.
    • (1999) RoboCup-98: Robot Soccer World Cup II , pp. 261-272
    • Stone, P.1    Veloso, M.2
  • 40
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: The MIT Press
    • Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems Vol. 8, (pp. 1038-1044), Cambridge, MA: The MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.S.1
  • 42
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • S. A. Solla, T. K. Leen, & K. R. Muller (Eds.) Cambridge, MA: The MIT Press
    • Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In S. A. Solla, T. K. Leen, & K. R. Muller (Eds.) Advances in neural information processing systems, (Vol. 12, pp. 1057-1063). Cambridge, MA: The MIT Press.
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
    • Sutton, R.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 43
    • 0033170372 scopus 로고    scopus 로고
    • Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
    • Sutton, R., Precup, D., & Singh, S. (1999). Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112, 181-211.
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.1    Precup, D.2    Singh, S.3
  • 44
    • 85152198941 scopus 로고
    • Multi-agent reinforcement learning: Independent vs. cooperative agents
    • Morgan Kaufmann
    • Tan, M. (1993). Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning (pp. 330-337). Morgan Kaufmann.
    • (1993) Proceedings of the Tenth International Conference on Machine Learning , pp. 330-337
    • Tan, M.1
  • 45
    • 27544473171 scopus 로고    scopus 로고
    • Behavior transfer for value-function-based reinforcement learning
    • V. Digman, S. Koenig, S. Kraus, M. P. Sigh & M. Wooldridge (Eds.), New York, NY: ACM Press
    • Taylor, M. E., & Stone, P. (2005). Behavior transfer for value-function-based reinforcement learning. In V. Digman, S. Koenig, S. Kraus, M. P. Sigh & M. Wooldridge (Eds.), The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems. (pp. 53-59). New York, NY: ACM Press.
    • (2005) The Fourth International Joint Conference on Autonomous Agents and Multiagent Systems , pp. 53-59
    • Taylor, M.E.1    Stone, P.2
  • 46
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(1), 215-219.
    • (1994) Neural Computation , vol.6 , Issue.1 , pp. 215-219
    • Tesauro, G.1
  • 47
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 49
    • 0010049486 scopus 로고    scopus 로고
    • Evolution for behavior selection accelerated by activation/termination constraints
    • H. Beyer, E. Canth-Puz, D. Goldberg, Parmee, L. Spector & D. Whitley (Eds.), Morgan Kaufmann
    • Uchibe, E., Yanase, M., & Asada, M. (2001). Evolution for behavior selection accelerated by activation/termination constraints. In H. Beyer, E. Canth-Puz, D. Goldberg, Parmee, L. Spector & D. Whitley (Eds.), Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1122-1129). Morgan Kaufmann.
    • (2001) Proceedings of the Genetic and Evolutionary Computation Conference , pp. 1122-1129
    • Uchibe, E.1    Yanase, M.2    Asada, M.3
  • 50
    • 0033339225 scopus 로고    scopus 로고
    • Anticipation as a key for collaboration in a team of agents: A case study in robotic soccer
    • P. S. Schenker & G. T. McKee (Eds.) (Boston, MA). Belligman, W.A: SPIE
    • Veloso, M., Stone, P., & Bowling, M. (1999). Anticipation as a key for collaboration in a team of agents: A case study in robotic soccer. In P. S. Schenker & G. T. McKee (Eds.) Proceedings of SPIE Sensor Fusion and Decentralized Control in Robotic Systems II (Vol. 3839) (Boston, MA). Belligman, W.A: SPIE.
    • (1999) Proceedings of SPIE Sensor Fusion and Decentralized Control in Robotic Systems II , vol.3839
    • Veloso, M.1    Stone, P.2    Bowling, M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.