메뉴 건너뛰기




Volumn 23, Issue 2, 2008, Pages 213-245

Two steps reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; DECISION TREES; INTELLIGENT SYSTEMS; STATE SPACE METHODS;

EID: 38949129339     PISSN: 08848173     EISSN: 1098111X     Source Type: Journal    
DOI: 10.1002/int.20255     Document Type: Conference Paper
Times cited : (25)

References (43)
  • 4
    • 85012688561 scopus 로고
    • Princeton, NJ. Princeton University Press;
    • Bellman R. Dynamic programming. Princeton, NJ. Princeton University Press; 1957.
    • (1957) Dynamic programming
    • Bellman, R.1
  • 5
    • 0001133021 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • Boyan JA, Moore AW. Generalization in reinforcement learning: Safely approximating the value function. Adva Neural Inform Process Syst 1995;7.
    • (1995) Adva Neural Inform Process Syst , pp. 7
    • Boyan, J.A.1    Moore, A.W.2
  • 6
    • 0036832953 scopus 로고    scopus 로고
    • Variable resolution discretization in optimal control
    • Munos R, Moore A. Variable resolution discretization in optimal control. Machine Learning 2002;49(2/3):291-323.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 7
    • 0031231885 scopus 로고    scopus 로고
    • Experiments with reinforcement learning in problems with continuous state and action spaces
    • Santamaría JC, Sutton RS, Ram A. Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 1998;6(2): 163-218.
    • (1998) Adaptive Behavior , vol.6 , Issue.2 , pp. 163-218
    • Santamaría, J.C.1    Sutton, R.S.2    Ram, A.3
  • 8
    • 0034274415 scopus 로고    scopus 로고
    • A study of reinforcement learning in the continuous case by the means of viscosity solutions
    • Munos R. A study of reinforcement learning in the continuous case by the means of viscosity solutions. Machine Learning 1999;40:265-299.
    • (1999) Machine Learning , vol.40 , pp. 265-299
    • Munos, R.1
  • 9
    • 25944467789 scopus 로고    scopus 로고
    • On determinism handling while learning reduced state space representations
    • Lyon France, July
    • Fernández F, Borrajo D. On determinism handling while learning reduced state space representations. In: Proc Eur Conf on Artificial Intelligence (ECAI 2002), Lyon (France); July 2002. pp 280-284.
    • (2002) Proc Eur Conf on Artificial Intelligence (ECAI , pp. 280-284
    • Fernández, F.1    Borrajo, D.2
  • 10
    • 0002267046 scopus 로고
    • Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued spaces
    • Moore AW. Variable resolution dynamic programming: Efficiently learning action maps in multivariate real-valued spaces. In: Proc Eighth Int Machine Learning Workshop, 1991.
    • (1991) Proc Eighth Int Machine Learning Workshop
    • Moore, A.W.1
  • 13
    • 0029514510 scopus 로고
    • The parti game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
    • Moore AW, Atkeson CG. The parti game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning 1995;21(3):199-233.
    • (1995) Machine Learning , vol.21 , Issue.3 , pp. 199-233
    • Moore, A.W.1    Atkeson, C.G.2
  • 16
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • Tsitsiklis JN, Van Roy B. Feature-based methods for large scale dynamic programming. Machine Learning, 1996;22:59-94.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 17
    • 0003208321 scopus 로고    scopus 로고
    • Gradient descent for general reinforcement learning
    • Baird LC. Gradient descent for general reinforcement learning. Neural Infor Process Syst 1998; 11.
    • (1998) Neural Infor Process Syst , pp. 11
    • Baird, L.C.1
  • 18
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • Tesauro G. Practical issues in temporal difference learning. Machine Learning, 1992;8:257-277.
    • (1992) Machine Learning , vol.8 , pp. 257-277
    • Tesauro, G.1
  • 19
    • 0004267735 scopus 로고    scopus 로고
    • Boston, MA: Kluwer Academic Publishers;
    • Aha D. Lazy learning. Boston, MA: Kluwer Academic Publishers; 1997.
    • (1997) Lazy learning
    • Aha, D.1
  • 21
    • 0031341345 scopus 로고    scopus 로고
    • Neural reinforcement learning for behaviour synthesis
    • Touzet C. Neural reinforcement learning for behaviour synthesis. Robotics and Auton Syst, 1997;22:251-281.
    • (1997) Robotics and Auton Syst , vol.22 , pp. 251-281
    • Touzet, C.1
  • 22
    • 84944872843 scopus 로고    scopus 로고
    • Applying vector quantization to reinforcement learning
    • RoboCup-99: Robot Soccer World Cup III, Springer Verlag; Berlin;
    • Fernández F, Borrajo D. VQQL. Applying vector quantization to reinforcement learning. In: RoboCup-99: Robot Soccer World Cup III, Lecture Notes in Artificial Intelligence, vol 1856 Springer Verlag; Berlin; 2000. pp 292-303.
    • (2000) Lecture Notes in Artificial Intelligence , vol.1856 , pp. 292-303
    • Fernández, F.1    Borrajo, D.V.2
  • 23
    • 5644261272 scopus 로고    scopus 로고
    • Learning in large cooperative multi-robot domains
    • Fernández F, Parker L. Learning in large cooperative multi-robot domains. Int J Robot Automation 2001;16(4):217-226.
    • (2001) Int J Robot Automation , vol.16 , Issue.4 , pp. 217-226
    • Fernández, F.1    Parker, L.2
  • 24
    • 0020102027 scopus 로고
    • Least squares quantization in PCM
    • Lloyd SP. Least squares quantization in PCM. IEEE Trans Infor Theory 1982;28:127-135.
    • (1982) IEEE Trans Infor Theory , vol.28 , pp. 127-135
    • Lloyd, S.P.1
  • 27
    • 0036573011 scopus 로고    scopus 로고
    • Distributed algorithms for multi-robot observation of multiple moving targets
    • Parker LE. Distributed algorithms for multi-robot observation of multiple moving targets. Auton Robots 2002; 12(3):231-255.
    • (2002) Auton Robots , vol.12 , Issue.3 , pp. 231-255
    • Parker, L.E.1
  • 28
    • 38949213221 scopus 로고    scopus 로고
    • Parker LE, Touzet C, Fernández F Techniques for learning in multi-robot teams, In: Robot teams: from Diversity to polymorphism A. K. Peters Publishers, 2002; pp 191-236.
    • Parker LE, Touzet C, Fernández F Techniques for learning in multi-robot teams, In: Robot teams: from Diversity to polymorphism A. K. Peters Publishers, 2002; pp 191-236.
  • 29
    • 29344446348 scopus 로고    scopus 로고
    • A reinforcement learning algorithm in cooperative multirobot domains
    • Fernández F, Borrajo D, Parker L. A reinforcement learning algorithm in cooperative multirobot domains. J Intel Robot Syst 2005;43(2-4):161-174.
    • (2005) J Intel Robot Syst , vol.43 , Issue.2-4 , pp. 161-174
    • Fernández, F.1    Borrajo, D.2    Parker, L.3
  • 30
    • 21844465127 scopus 로고    scopus 로고
    • Tree-based batch mode reinforcement learning
    • Ernst D. Tree-based batch mode reinforcement learning. J Machine Learning Res 2005;6:503-556.
    • (2005) J Machine Learning Res , vol.6 , pp. 503-556
    • Ernst, D.1
  • 32
    • 0042312608 scopus 로고    scopus 로고
    • Feature weighting in k-means clustering
    • Modha DS, Spangler WS. Feature weighting in k-means clustering. Machine Learning 2003;52:217-237.
    • (2003) Machine Learning , vol.52 , pp. 217-237
    • Modha, D.S.1    Spangler, W.S.2
  • 33
    • 0004090962 scopus 로고    scopus 로고
    • PhD thesis, Department of Computer Science at Brown University, Providence, RI, May
    • Smart WD. Making reinforcement learning work on real robots. PhD thesis, Department of Computer Science at Brown University, Providence, RI, May 2002.
    • (2002) Making reinforcement learning work on real robots
    • Smart, W.D.1
  • 34
    • 33744584654 scopus 로고
    • Induction of decision trees
    • Quinlan JR. Induction of decision trees. Machine Learning 1986;1(1):81-106.
    • (1986) Machine Learning , vol.1 , Issue.1 , pp. 81-106
    • Quinlan, J.R.1
  • 37
    • 0346046894 scopus 로고    scopus 로고
    • Automatic finding of good classifiers following a biologically inspired metaphor
    • Fernández F, Isasi P. Automatic finding of good classifiers following a biologically inspired metaphor. Comput Inform 2002; 21(3):205-220.
    • (2002) Comput Inform , vol.21 , Issue.3 , pp. 205-220
    • Fernández, F.1    Isasi, P.2
  • 38
    • 3142672346 scopus 로고    scopus 로고
    • Evolutionary design of nearest prototype classifiers
    • Fernández F, Isasi P. Evolutionary design of nearest prototype classifiers. J Heuristics 2004; 10(4):431-454.
    • (2004) J Heuristics , vol.10 , Issue.4 , pp. 431-454
    • Fernández, F.1    Isasi, P.2
  • 39
    • 0242536865 scopus 로고    scopus 로고
    • Adaptive resolution model-free reinforcement learning: Decision boundary partitioning
    • Reynolds SI. Adaptive resolution model-free reinforcement learning: Decision boundary partitioning. In: Proc Int Conf Machine Learning 2000; pp 783-790.
    • (2000) Proc Int Conf Machine Learning , pp. 783-790
    • Reynolds, S.I.1
  • 40
    • 0042353224 scopus 로고
    • Multigrid Q-learning
    • Technical report, Colorado State University, Boulder, CO
    • Anderson C, Crawford-Hines S. Multigrid Q-learning. Technical report, Colorado State University, Boulder, CO, 1994.
    • (1994)
    • Anderson, C.1    Crawford-Hines, S.2
  • 41
    • 0025484857 scopus 로고
    • Numerical methods for stochastic control problems in continuous time
    • Kushner HJ. Numerical methods for stochastic control problems in continuous time. SIAM J. Control Optim 28, 1990.
    • (1990) SIAM J. Control Optim , vol.28
    • Kushner, H.J.1
  • 42
    • 10044297591 scopus 로고    scopus 로고
    • K-d decision tree: An accelerated and memory efficient nearest neighbor classifier
    • Shibata T, Kato T, Wad T. K-d decision tree: An accelerated and memory efficient nearest neighbor classifier. In: Proc Third IEEE Int Conf on Data Mining 2003; pp 641-644.
    • (2003) Proc Third IEEE Int Conf on Data Mining , pp. 641-644
    • Shibata, T.1    Kato, T.2    Wad, T.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.