SCOPUS 정보 검색 플랫폼

Studies in Computational Intelligence

Volumn 281, Issue , 2010, Pages 3-44

Approximate dynamic programming and reinforcement learning

(3) Buşoniu, Lucian a De Schutter, Bart b Babuška, Robert b

a DELFT UNIVERSITY OF TECHNOLOGY (Netherlands)

b NONE

Author keywords

[No Author keywords available]

Indexed keywords

EID: 77950350393 PISSN: 1860949X EISSN: None Source Type: Book Series
DOI: 10.1007/978-3-642-11688-9_1 Document Type: Article

Times cited : (16)

References (91)

1
- 49049087720
- Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
- Baddeley, B.: Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 38(4), 950-956 (2008)
- (2008) IEEE Transactions on Systems Man and Cybernetics-Part B: Cybernetics , vol.38 , Issue.4 , pp. 950-956
- Baddeley, B.¹

2
- 77950358438
- A genetic search in policy space for solving Markov decision processes
- Palo Alto, US
- Barash, D.: A genetic search in policy space for solving Markov decision processes. In: AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information. Palo Alto, US (1999)
- (1999) AAAI Spring Symposium on Search Techniques for Problem Solving under Uncertainty and Incomplete Information
- Barash, D.¹

3
- 0020970738
- Neuronlike adaptive elements than can solve difficult learning control problems
- Barto, A.G., Sutton, R.S., Anderson, C.W.: Neuronlike adaptive elements than can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13(5), 833-846 (1983)
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , Issue.5 , pp. 833-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

4
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter, J., Bartlett, P.L.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319-350 (2001)
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

5
- 0026923465
- Learning and tuning fuzzy logic controllers through reinforcements
- Berenji, H.R., Khedkar, P.: Learning and tuning fuzzy logic controllers through reinforcements. IEEE Transactions on Neural Networks 3(5), 724-740 (1992)
- (1992) IEEE Transactions on Neural Networks , vol.3 , Issue.5 , pp. 724-740
- Berenji, H.R.¹ Khedkar, P.²

6
- 0041877717
- A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters
- Berenji, H.R., Vengerov, D.: A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11(4), 478-485 (2003)
- (2003) IEEE Transactions on Fuzzy Systems , vol.11 , Issue.4 , pp. 478-485
- Berenji, H.R.¹ Vengerov, D.²

7
- 0024680419
- Adaptive aggregation methods for infinite horizon dynamic programming
- Bertsekas, D.P.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control 34(6), 589-598 (1989)
- (1989) IEEE Transactions on Automatic Control , vol.34 , Issue.6 , pp. 589-598
- Bertsekas, D.P.¹

8
- 33645410501
- Dynamic programming and suboptimal control: A survey from ADP to MPC
- Special issue for the CDC-ECC-05 in Seville, Spain
- Bertsekas, D.P.: Dynamic programming and suboptimal control: A survey from ADP to MPC. European Journal of Control 11(4-5) (2005); Special issue for the CDC-ECC-05 in Seville, Spain
- (2005) European Journal of Control , vol.11 , Issue.4-5
- Bertsekas, D.P.¹

9
- 0003565783
- 3rd edn. Athena Scientific, Belmont
- Bertsekas, D.P.: Dynamic Programming and Optimal Control, 3rd edn., vol.2. Athena Scientific, Belmont (2007)
- (2007) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

10
- 0003923091
- Academic Press, London
- Bertsekas, D.P., Shreve, S.E.: Stochastic Optimal Control: The Discrete Time Case. Academic Press, London (1978)
- (1978) Stochastic Optimal Control: The Discrete Time Case
- Bertsekas, D.P.¹ Shreve, S.E.²

11
- 0003487482
- Athena Scientific, Belmont
- Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming. Athena Scientific, Belmont (1996)
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

12
- 13244278201
- An actor-critic algorithm for constrained Markov decision processes
- Borkar, V.: An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters 54, 207-213 (2005)
- (2005) Systems & Control Letters , vol.54 , pp. 207-213
- Borkar, V.¹

13
- 40949147745
- A comprehensive survey of multi-agent reinforcement learning
- Bu̧soniu, L., Babuška, R., De Schutter, B.: A comprehensive survey of multi-agent reinforcement learning. IEEE Transactions on Systems, Man, and Cybernetics 38(2), 156-172 (2008)
- (2008) IEEE Transactions on Systems, Man, and Cybernetics , vol.38 , Issue.2 , pp. 156-172
- Bu̧soniu, L.¹ Babuška, R.² De Schutter, B.³

14
- 55249099118
- Consistency of fuzzy model-based reinforcement learning
- Hong Kong
- Bu̧soniu, L., Ernst, D., De Schutter, B., Babuška, R.: Consistency of fuzzy model-based reinforcement learning. In: Proceedings 2008 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2008), Hong Kong, pp. 518-524 (2008)
- (2008) Proceedings 2008 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2008) , pp. 518-524
- Bu̧soniu, L.¹ Ernst, D.² De Schutter, B.³ Babuška, R.⁴

15
- 79961019197
- Fuzzy partition optimization for approximate fuzzy Q-iteration
- Seoul, Korea
- Bu̧soniu, L., Ernst, D., De Schutter, B., Babuška, R.: Fuzzy partition optimization for approximate fuzzy Q-iteration. In: Proceedings 17th IFACWorld Congress (IFAC 2008), Seoul, Korea, pp. 5629-5634 (2008)
- (2008) Proceedings 17th IFACWorld Congress (IFAC 2008) , pp. 5629-5634
- Bu̧soniu, L.¹ Ernst, D.² De Schutter, B.³ Babuška, R.⁴

16
- 67650502101
- Policy search with cross-entropy optimization of basis functions
- Nashville, US
- Bu̧soniu, L., Ernst, D., De Schutter, B., Babuška, R.: Policy search with cross-entropy optimization of basis functions. In: Proceedings 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009), Nashville, US, pp. 153-160 (2009)
- (2009) Proceedings 2009 IEEE International Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL 2009) , pp. 153-160
- Bu̧soniu, L.¹ Ernst, D.² De Schutter, B.³ Babuška, R.⁴

17
- 34547120053
- Springer, Heidelberg
- Chang, H.S., Fu, M.C., Hu, J., Marcus, S.I.: Simulation-Based Algorithms for Markov Decision Processes. Springer, Heidelberg (2007)
- (2007) Simulation-Based Algorithms for Markov Decision Processes
- Chang, H.S.¹ Fu, M.C.² Hu, J.³ Marcus, S.I.⁴

18
- 28644432777
- Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes
- Morgantown, US
- Chin, H.H., Jafari, A.A.: Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes. In: Proceedings 30th Southeastern Symposium on System Theory, Morgantown, US, pp. 538-543 (1998)
- (1998) Proceedings 30th Southeastern Symposium on System Theory , pp. 538-543
- Chin, H.H.¹ Jafari, A.A.²

19
- 0026206780
- An optimal one-way multigrid algorithm for discrete-time stochastic control
- Chow, C.S., Tsitsiklis, J.N.: An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 36(8), 898-914 (1991)
- (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
- Chow, C.S.¹ Tsitsiklis, J.N.²

20
- 84942750244
- Feedforward neural networks in reinforcement learning applied to highdimensional motor control
- Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.). Springer, Heidelberg
- Coulom, R.: Feedforward neural networks in reinforcement learning applied to highdimensional motor control. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.) ALT 2002. LNCS (LNAI), vol.2533, pp. 403-413. Springer, Heidelberg (2002)
- (2002) ALT 2002. LNCS (LNAI) , vol.2533 , pp. 403-413
- Coulom, R.¹

21
- 21844465127
- Tree-based batch mode reinforcement learning
- Ernst, D., Geurts, P.,Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503-556 (2005)
- (2005) Journal of Machine Learning Research , vol.6 , pp. 503-556
- Ernst, D.¹ Geurts, P.² Wehenkel, L.³

22
- 64049106701
- Reinforcement learning versus model predictive control: A comparison on a power system problem
- Ernst, D., Glavic, M., Capitanescu, F., Wehenkel, L.: Reinforcement learning versus model predictive control: a comparison on a power system problem. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 39(2), 517-529 (2009)
- (2009) IEEE Transactions on Systems Man and Cybernetics-Part B: Cybernetics , vol.39 , Issue.2 , pp. 517-529
- Ernst, D.¹ Glavic, M.² Capitanescu, F.³ Wehenkel, L.⁴

23
- 33845529505
- Reinforcement learning: An overview
- Aachen, Germany
- Glorennec, P.Y.: Reinforcement learning: An overview. In: Proceedings European Symposium on Intelligent Techniques (ESIT 2000), Aachen, Germany, pp. 17-35 (2000)
- (2000) Proceedings European Symposium on Intelligent Techniques (ESIT 2000) , pp. 17-35
- Glorennec, P.Y.¹

24
- 33750374195
- Efficient non-linear control through neuroevolution
- Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.). Springer, Heidelberg
- Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol.4212, pp. 654-662. Springer, Heidelberg (2006)
- (2006) ECML 2006. LNCS (LNAI) , vol.4212 , pp. 654-662
- Gomez, F.J.¹ Schmidhuber, J.² Miikkulainen, R.³

25
- 0022027092
- On deterministic control problems: An approximation procedure for the optimal cost I. The stationary problem
- Gonzalez, R.L., Rofman, E.: On deterministic control problems: An approximation procedure for the optimal cost I. The stationary problem. SIAM Journal on Control and Optimization 23(2), 242-266 (1985)
- (1985) SIAM Journal on Control and Optimization , vol.23 , Issue.2 , pp. 242-266
- Gonzalez, R.L.¹ Rofman, E.²

26
- 84880694195
- Stable function approximation in dynamic programming
- Tahoe City, US
- Gordon, G.: Stable function approximation in dynamic programming. In: Proceedings 12th International Conference on Machine Learning (ICML 1995), Tahoe City, US, pp. 261-268 (1995)
- (1995) Proceedings 12th International Conference on Machine Learning (ICML 1995) , pp. 261-268
- Gordon, G.¹

27
- 11144319332
- Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation
- Grüne, L.: Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation. Numerical Mathematics 99, 85-112 (2004)
- (2004) Numerical Mathematics , vol.99 , pp. 85-112
- Grüne, L.¹

28
- 0030377615
- Fuzzy interpolation-based Q-learning with continuous states and actions
- New Orleans, US
- Horiuchi, T., Fujino, A., Katai, O., Sawaragi, T.: Fuzzy interpolation-based Q-learning with continuous states and actions. In: Proceedings 5th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 1996), New Orleans, US, pp. 594-600 (1996)
- (1996) Proceedings 5th IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 1996) , pp. 594-600
- Horiuchi, T.¹ Fujino, A.² Katai, O.³ Sawaragi, T.⁴

29
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6), 1185-1201 (1994)
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

30
- 0032140718
- Fuzzy inference system learning by reinforcement methods
- Jouffe, L.: Fuzzy inference system learning by reinforcement methods. IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews 28(3), 338-355 (1998)
- (1998) IEEE Transactions on Systems Man and Cybernetics-Part C: Applications and Reviews , vol.28 , Issue.3 , pp. 338-355
- Jouffe, L.¹

31
- 84885993384
- Least squares SVM for least squares TD learning
- Riva del Garda, Italy
- Jung, T., Polani, D.: Least squares SVM for least squares TD learning. In: Proceedings of 17th European Conference on Artificial Intelligence (ECAI 2006), Riva del Garda, Italy, pp. 499-503 (2006)
- (2006) Proceedings of 17th European Conference on Artificial Intelligence (ECAI 2006) , pp. 499-503
- Jung, T.¹ Polani, D.²

32
- 34548765672
- Kernelizing LSPE(λ)
- Honolulu, US
- Jung, T., Polani, D.: Kernelizing LSPE(λ). In: Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), Honolulu, US, pp. 338-345 (2007)
- (2007) Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007) , pp. 338-345
- Jung, T.¹ Polani, D.²

33
- 22944487667
- Experiments in value function approximation with sparse support vector regression
- Pisa, Italy
- Jung, T., Uthmann, T.: Experiments in value function approximation with sparse support vector regression. In: Proceedings 15th European Conference on Machine Learning (ECML 2004), Pisa, Italy, pp. 180-191 (2004)
- (2004) Proceedings 15th European Conference on Machine Learning (ECML 2004) , pp. 180-191
- Jung, T.¹ Uthmann, T.²

34
- 0032073263
- Planning and acting in partially observable stochastic domains
- Kaelbling, L.P., Littman,M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99-134 (1998)
- (1998) Artificial Intelligence , vol.101 , pp. 99-134
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

35
- 0029679044
- Reinforcement learning: A survey
- Kaelbling, L.P., Littman,M.L.,Moore, A.W.: Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237-285 (1996)
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

36
- 0042758707
- Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, US
- Konda, V.: Actor-critic algorithms. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, US (2002)
- (2002) Actor-critic Algorithms
- Konda, V.¹

37
- 84898938510
- Actor-critic algorithms
- Solla, S.A., Leen, T.K., Müller, K.R. (eds.). MIT Press, Cambridge
- Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol.12, pp. 1008-1014. MIT Press, Cambridge (2000)
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1008-1014
- Konda, V.R.¹ Tsitsiklis, J.N.²

38
- 4043069840
- On actor-critic algorithms
- Konda, V.R., Tsitsiklis, J.N.: On actor-critic algorithms. SIAM Journal on Control and Optimization 42(4), 1143-1166 (2003)
- (2003) SIAM Journal on Control and Optimization , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

39
- 35048819671
- Least-squares methods in reinforcement learning for control
- Vlahavas, I.P., Spyropoulos, C.D. (eds.). Springer, Heidelberg
- Lagoudakis, M., Parr, R., Littman, M.: Least-squares methods in reinforcement learning for control. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol.2308, pp. 249-260. Springer, Heidelberg (2002)
- (2002) SETN 2002. LNCS (LNAI) , vol.2308 , pp. 249-260
- Lagoudakis, M.¹ Parr, R.² Littman, M.³

40
- 4644323293
- Least-squares policy iteration
- Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107-1149 (2003)
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

41
- 1942420814
- Reinforcement learning as classification: Leveraging modern classifiers
- Washington, US
- Lagoudakis, M.G., Parr, R.: Reinforcement learning as classification: Leveraging modern classifiers. In: Proceedings 20th International Conference on Machine Learning (ICML 2003), Washington, US, pp. 424-431 (2003)
- (2003) Proceedings 20th International Conference on Machine Learning (ICML 2003) , pp. 424-431
- Lagoudakis, M.G.¹ Parr, R.²

42
- 0033412824
- Pattern search algorithms for bound constrained minimization
- Lewis, R.M., Torczon, V.: Pattern search algorithms for bound constrained minimization. SIAM Journal on Optimization 9(4), 1082-1099 (1999)
- (1999) SIAM Journal on Optimization , vol.9 , Issue.4 , pp. 1082-1099
- Lewis, R.M.¹ Torczon, V.²

43
- 0000123778
- Self-improving reactive agents based on reinforcement learning, planning and teaching
- Special Issue on Reinforcement Learning
- Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8(3/4), 293-321 (1992); Special Issue on Reinforcement Learning
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 293-321
- Lin, L.J.¹

44
- 49049108697
- Adaptive critic learning techniques for engine torque and air-fuel ratio control
- Liu, D., Javaherian, H., Kovalenko, O., Huang, T.: Adaptive critic learning techniques for engine torque and air-fuel ratio control. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 38(4), 988-993 (2008)
- (2008) IEEE Transactions on Systems Man and Cybernetics-Part B: Cybernetics , vol.38 , Issue.4 , pp. 988-993
- Liu, D.¹ Javaherian, H.² Kovalenko, O.³ Huang, T.⁴

45
- 0036927574
- On policy iteration as a newton s method and polynomial policy iteration algorithms
- Edmonton, Canada
- Madani, O.: On policy iteration as a newton s method and polynomial policy iteration algorithms. In: Proceedings 18th National Conference on Artificial Intelligence and 14th Conference on Innovative Applications of Artificial Intelligence AAAI/IAAI 2002, Edmonton, Canada, pp. 273-278 (2002)
- (2002) Proceedings 18th National Conference on Artificial Intelligence and 14th Conference on Innovative Applications of Artificial Intelligence AAAI/IAAI 2002 , pp. 273-278
- Madani, O.¹

46
- 29344433509
- Samuel meets Amarel: Automating value function approximation using global state space analysis
- Pittsburgh, US
- Mahadevan, S.: Samuel meets Amarel: Automating value function approximation using global state space analysis. In: Proceedings 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference (AAAI 2005), Pittsburgh, US, pp. 1000-1005 (2005)
- (2005) Proceedings 20th National Conference on Artificial Intelligence and the 17th Innovative Applications of Artificial Intelligence Conference (AAAI 2005) , pp. 1000-1005
- Mahadevan, S.¹

47
- 35748957806
- Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
- Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research 8, 2169-2231 (2007)
- (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
- Mahadevan, S.¹ Maggioni, M.²

48
- 1942516890
- The cross-entropy method for fast policy search
- Washington, US
- Mannor, S., Rubinstein, R.Y., Gat, Y.: The cross-entropy method for fast policy search. In: Proceedings 20th International Conference on Machine Learning (ICML 2003), Washington, US, pp. 512-519 (2003)
- (2003) Proceedings 20th International Conference on Machine Learning (ICML 2003) , pp. 512-519
- Mannor, S.¹ Rubinstein, R.Y.² Gat, Y.³

49
- 0037288469
- Approximate gradient methods in policy-space optimization of Markov reward processes
- Marbach, P., Tsitsiklis, J.N.: Approximate gradient methods in policy-space optimization of Markov reward processes. Discrete Event Dynamic Systems: Theory and Applications 13, 111-148 (2003)
- (2003) Discrete Event Dynamic Systems: Theory and Applications , vol.13 , pp. 111-148
- Marbach, P.¹ Tsitsiklis, J.N.²

50
- 85151432208
- Overcoming incomplete perception with utile distinction memory
- Amherst, US
- McCallum, A.: Overcoming incomplete perception with utile distinction memory. In: Proceedings 10th International Conference on Machine Learning (ICML 1993), Amherst, US, pp. 190-196 (1993)
- (1993) Proceedings 10th International Conference on Machine Learning (ICML 1993) , pp. 190-196
- McCallum, A.¹

51
- 17444414191
- Basis function adaptation in temporal difference reinforcement learning
- Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215-238 (2005)
- (2005) Annals of Operations Research , vol.134 , pp. 215-238
- Menache, I.¹ Mannor, S.² Shimkin, N.³

52
- 0036832960
- Continuous-action Q-learning
- Millán, J.d.R., Posenato, D., Dedieu, E.: Continuous-action Q-learning. Machine Learning 49(2-3), 247-265 (2002)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 247-265
- Millán, J.D.R.¹ Posenato, D.² Dedieu, E.³

53
- 84947933152
- Finite-element methods with local triangulation refinement for continuous reinforcement learning problems
- van Someren, M., Widmer, G. (eds.). Springer, Heidelberg
- Munos, R.: Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol.1224, pp. 170-182. Springer, Heidelberg (1997)
- (1997) ECML 1997. LNCS , vol.1224 , pp. 170-182
- Munos, R.¹

54
- 33646399442
- Policy gradient in continuous time
- Munos, R.: Policy gradient in continuous time. Journal ofMachine Learning Research 7, 771-791 (2006)
- (2006) Journal OfMachine Learning Research , vol.7 , pp. 771-791
- Munos, R.¹

55
- 0036832953
- Variable-resolution discretization in optimal control
- Munos, R., Moore, A.: Variable-resolution discretization in optimal control. Machine Learning 49(2-3), 291-323 (2002)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 291-323
- Munos, R.¹ Moore, A.²

56
- 34547769694
- Reinforcement learning for a biped robot based on a CPG-actor-critic method
- Nakamura, Y., Moria, T., Satoc, M., Ishiia, S.: Reinforcement learning for a biped robot based on a CPG-actor-critic method. Neural Networks 20, 723-735 (2007)
- (2007) Neural Networks , vol.20 , pp. 723-735
- Nakamura, Y.¹ Moria, T.² Satoc, M.³ Ishiia, S.⁴

57
- 0037288398
- Least-squares policy evaluation algorithms with linear function approximation
- Nedíc, A., Bertsekas, D.P.: Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems 13, 79-110 (2003)
- (2003) Discrete Event Dynamic Systems , vol.13 , pp. 79-110
- Nedíc, A.¹ Bertsekas, D.P.²

58
- 0141596576
- Policy invariance under reward transformations: Theory and application to reward shaping
- Bled, Slovenia
- Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: Proceedings 16th International Conference on Machine Learning (ICML 1999), Bled, Slovenia, pp. 278-287 (1999)
- (1999) Proceedings 16th International Conference on Machine Learning (ICML 1999) , pp. 278-287
- Ng, A.Y.¹ Harada, D.² Russell, S.³

59
- 0141819580
- PEGASUS: A policy search method for large MDPs and POMDPs
- Palo Alto, US
- Ng, A.Y., Jordan, M.I.: PEGASUS: A policy search method for large MDPs and POMDPs. In: Proceedings 16th Conference in Uncertainty in Artificial Intelligence (UAI 2000), Palo Alto, US, pp. 406-415 (2000)
- (2000) Proceedings 16th Conference in Uncertainty in Artificial Intelligence (UAI 2000) , pp. 406-415
- Ng, A.Y.¹ Jordan, M.I.²

60
- 0036832956
- Kernel-based reinforcement learning
- Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49(2-3), 161-178 (2002)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
- Ormoneit, D.¹ Sen, S.²

61
- 21344469989
- Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots
- Emergent Neural Computational Architectures Based on Neuroscience. Wermter, S., Austin, J.,Willshaw, D.J. (eds.). Springer, Heidelberg
- Pérez-Uribe, A.: Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots. In:Wermter, S., Austin, J.,Willshaw, D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience. LNCS (LNAI), vol.2036, pp. 522-533. Springer, Heidelberg (2001)
- (2001) LNCS (LNAI) , vol.2036 , pp. 522-533
- Pérez-Uribe, A.¹

62
- 40649106649
- Natural actor-critic
- Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7-9), 1180-1190 (2008)
- (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

63
- 33750724397
- Point-based value iteration for continuous POMDPs
- Porta, J.M., Vlassis, N., Spaan, M.T., Poupart, P.: Point-based value iteration for continuous POMDPs. Journal of Machine Learning Research 7, 2329-2367 (2006)
- (2006) Journal of Machine Learning Research , vol.7 , pp. 2329-2367
- Porta, J.M.¹ Vlassis, N.² Spaan, M.T.³ Poupart, P.⁴

64
- 0031236002
- Adaptive critic designs
- Prokhorov, D., Wunsch, D.C.: Adaptive critic designs. IEEE Transactions on Neural Networks 8(5), 997-1007 (1997)
- (1997) IEEE Transactions on Neural Networks , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.¹ Wunsch, D.C.²

65
- 22944448066
- Sparse distributed memories for on-line value-based reinforcement learning
- Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.). Springer, Heidelberg
- Ratitch, B., Precup, D.: Sparse distributed memories for on-line value-based reinforcement learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol.3201, pp. 347-358. Springer, Heidelberg (2004)
- (2004) ECML 2004. LNCS (LNAI) , vol.3201 , pp. 347-358
- Ratitch, B.¹ Precup, D.²

66
- 0242536865
- Adaptive resolution model-free reinforcement learning: Decision boundary partitioning
- Stanford University, US
- Reynolds, S.I.: Adaptive resolution model-free reinforcement learning: Decision boundary partitioning. In: Proceedings 17th International Conference on Machine Learning (ICML 2000), Stanford University, US, pp. 783-790 (2000)
- (2000) Proceedings 17th International Conference on Machine Learning (ICML 2000) , pp. 783-790
- Reynolds, S.I.¹

67
- 33646398129
- Neural fitted Q-iteration - First experiences with a data efficient neural reinforcement learning method
- Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.). Springer, Heidelberg
- Riedmiller, M.: Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol.3720, pp. 317-328. Springer, Heidelberg (2005)
- (2005) ECML 2005. LNCS (LNAI) , vol.3720 , pp. 317-328
- Riedmiller, M.¹

68
- 34548763245
- Evaluation of policy gradient methods and variants on the cart-pole benchmark
- Honolulu, US
- Riedmiller, M., Peters, J., Schaal, S.: Evaluation of policy gradient methods and variants on the cart-pole benchmark. In: Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007), Honolulu, US, pp. 254-261 (2007)
- (2007) Proceedings 2007 IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL 2007) , pp. 254-261
- Riedmiller, M.¹ Peters, J.² Schaal, S.³

69
- 0003636089
- On-line Q-learning using connectionist systems
- Engineering Department, Cambridge University, UK
- Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, UK (1994)
- (1994) Tech. Rep. CUED/F-INFENG/TR166
- Rummery, G.A.¹ Niranjan, M.²

70
- 0008872081
- Analysis of a numerical dynamic programming algorithm applied to economic models
- Santos, M.S., Vigo-Aguiar, J.: Analysis of a numerical dynamic programming algorithm applied to economic models. Econometrica 66(2), 409-426 (1998)
- (1998) Econometrica , vol.66 , Issue.2 , pp. 409-426
- Santos, M.S.¹ Vigo-Aguiar, J.²

71
- 85153965130
- Reinforcement learning with soft state aggregation
- Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.)
- Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol.7, pp. 361-368 (1995)
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

72
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9-44 (1988)
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

73
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Austin, US
- Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings 7th International Conference on Machine Learning (ICML 1990), Austin, US, pp. 216-224 (1990)
- (1990) Proceedings 7th International Conference on Machine Learning (ICML 1990) , pp. 216-224
- Sutton, R.S.¹

74
- 0004102479
- MIT Press, Cambridge
- Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

75
- 0026852362
- Reinforcement learning is adaptive optimal control
- Sutton, R.S., Barto, A.G., Williams, R.J.: Reinforcement learning is adaptive optimal control. IEEE Control Systems Magazine 12(2), 19-22 (1992)
- (1992) IEEE Control Systems Magazine , vol.12 , Issue.2 , pp. 19-22
- Sutton, R.S.¹ Barto, A.G.² Williams, R.J.³

76
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Solla, S.A., Leen, T.K., Müller, K.R. (eds.). MIT Press, Cambridge
- Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol.12, pp. 1057-1063. MIT Press, Cambridge (2000)
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

77
- 14344263882
- Interpolation-based Q-learning
- Bannf, Canada
- Szepesvári, C., Smart,W.D.: Interpolation-based Q-learning. In: Proceedings 21st International Conference on Machine Learning (ICML 2004), Bannf, Canada, pp. 791-798 (2004)
- (2004) Proceedings 21st International Conference on Machine Learning (ICML 2004) , pp. 791-798
- Szepesvári, C.¹ Smart, W.D.²

78
- 0031540855
- On the convergence of pattern search algorithms
- Torczon, V.: On the convergence of pattern search algorithms. SIAM Journal on Optimization 7(1), 1-25 (1997)
- (1997) SIAM Journal on Optimization , vol.7 , Issue.1 , pp. 1-25
- Torczon, V.¹

79
- 0031341345
- Neural reinforcement learning for behaviour synthesis
- Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems 22(3-4), 251-281 (1997)
- (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 251-281
- Touzet, C.F.¹

80
- 0029752470
- Feature-based methods for large scale dynamic programming
- Tsitsiklis, J.N., Van Roy, B.: Feature-based methods for large scale dynamic programming. Machine Learning 22(1-3), 59-94 (1996)
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

81
- 0031143730
- An analysis of temporal difference learning with function approximation
- Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674-690 (1997)
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

82
- 0031636218
- Tree based discretization for continuous state space reinforcement learning
- Madison, US
- Uther, W.T.B., Veloso, M.M.: Tree based discretization for continuous state space reinforcement learning. In: Proceedings 15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference (AAAI 1998/IAAI 1998), Madison, US, pp. 769-774 (1998)
- (1998) Proceedings 15th National Conference on Artificial Intelligence and 10th Innovative Applications of Artificial Intelligence Conference (AAAI 1998/IAAI 1998) , pp. 769-774
- Uther, W.T.B.¹ Veloso, M.M.²

83
- 58349110975
- Adaptive optimal control for continuous-time linear systems based on policy iteration
- Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477-484 (2009)
- (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
- Vrabie, D.¹ Pastravanu, O.² Abu-Khalaf, M.³ Lewis, F.⁴

84
- 55249117731
- Fuzzy Q-learning with an adaptive representation
- Hong Kong
- Waldock, A., Carse, B.: Fuzzy Q-learning with an adaptive representation. In: Proceedings 2008 IEEE World Congress on Computational Intelligence (WCCI 2008), Hong Kong, pp. 720-725 (2008)
- (2008) Proceedings 2008 IEEE World Congress on Computational Intelligence (WCCI 2008) , pp. 720-725
- Waldock, A.¹ Carse, B.²

85
- 47149095559
- Value approximation with least squares support vector machine in reinforcement learning system
- Wang, X., Tian, X., Cheng, Y.: Value approximation with least squares support vector machine in reinforcement learning system. Journal of Computational and Theoretical Nanoscience 4(7-8), 1290-1294 (2007)
- (2007) Journal of Computational and Theoretical Nanoscience , vol.4 , Issue.7-8 , pp. 1290-1294
- Wang, X.¹ Tian, X.² Cheng, Y.³

86
- 0004049893
- Ph.D. thesis King's College Oxford
- Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, King's College, Oxford (1989)
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

87
- 34249833101
- Q-learning
- Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning 8, 279-292 (1992)
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

88
- 22944460232
- Convergence and divergence in standard and averaging reinforcement learning
- Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.). Springer, Heidelberg
- Wiering, M.: Convergence and divergence in standard and averaging reinforcement learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol.3201, pp. 477-488. Springer, Heidelberg (2004)
- (2004) ECML 2004. LNCS (LNAI) , vol.3201 , pp. 477-488
- Wiering, M.¹

89
- 0347348609
- Tight performance bounds on greedy policies based on imperfect value functions
- New Haven, US
- Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. In: Proceedings 8th Yale Workshop on Adaptive and Learning Systems, New Haven, US, pp. 108-113 (1994)
- (1994) Proceedings 8th Yale Workshop on Adaptive and Learning Systems , pp. 108-113
- Williams, R.J.¹ Baird, L.C.²

90
- 34547098844
- Kernel-based least-squares policy iteration for reinforcement learning
- Xu, X., Hu, D., Lu, X.: Kernel-based least-squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973-992 (2007)
- (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
- Xu, X.¹ Hu, D.² Lu, X.³

91
- 34547991475
- Convergence results for some temporal difference methods based on least-squares
- Massachusetts Institute of Technology, Cambridge, US
- Yu, H., Bertsekas, D.P.: Convergence results for some temporal difference methods based on least-squares. Tech. Rep. LIDS 2697, Massachusetts Institute of Technology, Cambridge, US (2006)
- (2006) Tech. Rep. LIDS 2697
- Yu, H.¹ Bertsekas, D.P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.