메뉴 건너뛰기




Volumn 281, Issue , 2010, Pages 3-44

Approximate dynamic programming and reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords


EID: 77950350393     PISSN: 1860949X     EISSN: None     Source Type: Book Series    
DOI: 10.1007/978-3-642-11688-9_1     Document Type: Article
Times cited : (16)

References (91)
  • 1
    • 49049087720 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
    • Baddeley, B.: Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators. IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 38(4), 950-956 (2008)
    • (2008) IEEE Transactions on Systems Man and Cybernetics-Part B: Cybernetics , vol.38 , Issue.4 , pp. 950-956
    • Baddeley, B.1
  • 5
    • 0026923465 scopus 로고
    • Learning and tuning fuzzy logic controllers through reinforcements
    • Berenji, H.R., Khedkar, P.: Learning and tuning fuzzy logic controllers through reinforcements. IEEE Transactions on Neural Networks 3(5), 724-740 (1992)
    • (1992) IEEE Transactions on Neural Networks , vol.3 , Issue.5 , pp. 724-740
    • Berenji, H.R.1    Khedkar, P.2
  • 6
    • 0041877717 scopus 로고    scopus 로고
    • A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters
    • Berenji, H.R., Vengerov, D.: A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters. IEEE Transactions on Fuzzy Systems 11(4), 478-485 (2003)
    • (2003) IEEE Transactions on Fuzzy Systems , vol.11 , Issue.4 , pp. 478-485
    • Berenji, H.R.1    Vengerov, D.2
  • 7
    • 0024680419 scopus 로고
    • Adaptive aggregation methods for infinite horizon dynamic programming
    • Bertsekas, D.P.: Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control 34(6), 589-598 (1989)
    • (1989) IEEE Transactions on Automatic Control , vol.34 , Issue.6 , pp. 589-598
    • Bertsekas, D.P.1
  • 8
    • 33645410501 scopus 로고    scopus 로고
    • Dynamic programming and suboptimal control: A survey from ADP to MPC
    • Special issue for the CDC-ECC-05 in Seville, Spain
    • Bertsekas, D.P.: Dynamic programming and suboptimal control: A survey from ADP to MPC. European Journal of Control 11(4-5) (2005); Special issue for the CDC-ECC-05 in Seville, Spain
    • (2005) European Journal of Control , vol.11 , Issue.4-5
    • Bertsekas, D.P.1
  • 12
    • 13244278201 scopus 로고    scopus 로고
    • An actor-critic algorithm for constrained Markov decision processes
    • Borkar, V.: An actor-critic algorithm for constrained Markov decision processes. Systems & Control Letters 54, 207-213 (2005)
    • (2005) Systems & Control Letters , vol.54 , pp. 207-213
    • Borkar, V.1
  • 18
    • 28644432777 scopus 로고    scopus 로고
    • Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes
    • Morgantown, US
    • Chin, H.H., Jafari, A.A.: Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes. In: Proceedings 30th Southeastern Symposium on System Theory, Morgantown, US, pp. 538-543 (1998)
    • (1998) Proceedings 30th Southeastern Symposium on System Theory , pp. 538-543
    • Chin, H.H.1    Jafari, A.A.2
  • 19
    • 0026206780 scopus 로고
    • An optimal one-way multigrid algorithm for discrete-time stochastic control
    • Chow, C.S., Tsitsiklis, J.N.: An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control 36(8), 898-914 (1991)
    • (1991) IEEE Transactions on Automatic Control , vol.36 , Issue.8 , pp. 898-914
    • Chow, C.S.1    Tsitsiklis, J.N.2
  • 20
    • 84942750244 scopus 로고    scopus 로고
    • Feedforward neural networks in reinforcement learning applied to highdimensional motor control
    • Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.). Springer, Heidelberg
    • Coulom, R.: Feedforward neural networks in reinforcement learning applied to highdimensional motor control. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.) ALT 2002. LNCS (LNAI), vol.2533, pp. 403-413. Springer, Heidelberg (2002)
    • (2002) ALT 2002. LNCS (LNAI) , vol.2533 , pp. 403-413
    • Coulom, R.1
  • 24
    • 33750374195 scopus 로고    scopus 로고
    • Efficient non-linear control through neuroevolution
    • Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.). Springer, Heidelberg
    • Gomez, F.J., Schmidhuber, J., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol.4212, pp. 654-662. Springer, Heidelberg (2006)
    • (2006) ECML 2006. LNCS (LNAI) , vol.4212 , pp. 654-662
    • Gomez, F.J.1    Schmidhuber, J.2    Miikkulainen, R.3
  • 25
    • 0022027092 scopus 로고
    • On deterministic control problems: An approximation procedure for the optimal cost I. The stationary problem
    • Gonzalez, R.L., Rofman, E.: On deterministic control problems: An approximation procedure for the optimal cost I. The stationary problem. SIAM Journal on Control and Optimization 23(2), 242-266 (1985)
    • (1985) SIAM Journal on Control and Optimization , vol.23 , Issue.2 , pp. 242-266
    • Gonzalez, R.L.1    Rofman, E.2
  • 27
    • 11144319332 scopus 로고    scopus 로고
    • Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation
    • Grüne, L.: Error estimation and adaptive discretization for the discrete stochastic Hamilton-Jacobi-Bellman equation. Numerical Mathematics 99, 85-112 (2004)
    • (2004) Numerical Mathematics , vol.99 , pp. 85-112
    • Grüne, L.1
  • 29
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M.I., Singh, S.P.: On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6(6), 1185-1201 (1994)
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 36
    • 0042758707 scopus 로고    scopus 로고
    • Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, US
    • Konda, V.: Actor-critic algorithms. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, US (2002)
    • (2002) Actor-critic Algorithms
    • Konda, V.1
  • 37
    • 84898938510 scopus 로고    scopus 로고
    • Actor-critic algorithms
    • Solla, S.A., Leen, T.K., Müller, K.R. (eds.). MIT Press, Cambridge
    • Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol.12, pp. 1008-1014. MIT Press, Cambridge (2000)
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1008-1014
    • Konda, V.R.1    Tsitsiklis, J.N.2
  • 39
    • 35048819671 scopus 로고    scopus 로고
    • Least-squares methods in reinforcement learning for control
    • Vlahavas, I.P., Spyropoulos, C.D. (eds.). Springer, Heidelberg
    • Lagoudakis, M., Parr, R., Littman, M.: Least-squares methods in reinforcement learning for control. In: Vlahavas, I.P., Spyropoulos, C.D. (eds.) SETN 2002. LNCS (LNAI), vol.2308, pp. 249-260. Springer, Heidelberg (2002)
    • (2002) SETN 2002. LNCS (LNAI) , vol.2308 , pp. 249-260
    • Lagoudakis, M.1    Parr, R.2    Littman, M.3
  • 42
    • 0033412824 scopus 로고    scopus 로고
    • Pattern search algorithms for bound constrained minimization
    • Lewis, R.M., Torczon, V.: Pattern search algorithms for bound constrained minimization. SIAM Journal on Optimization 9(4), 1082-1099 (1999)
    • (1999) SIAM Journal on Optimization , vol.9 , Issue.4 , pp. 1082-1099
    • Lewis, R.M.1    Torczon, V.2
  • 43
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • Special Issue on Reinforcement Learning
    • Lin, L.J.: Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning 8(3/4), 293-321 (1992); Special Issue on Reinforcement Learning
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 293-321
    • Lin, L.J.1
  • 47
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes
    • Mahadevan, S., Maggioni, M.: Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes. Journal of Machine Learning Research 8, 2169-2231 (2007)
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 2169-2231
    • Mahadevan, S.1    Maggioni, M.2
  • 51
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • Menache, I., Mannor, S., Shimkin, N.: Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134, 215-238 (2005)
    • (2005) Annals of Operations Research , vol.134 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 53
    • 84947933152 scopus 로고    scopus 로고
    • Finite-element methods with local triangulation refinement for continuous reinforcement learning problems
    • van Someren, M., Widmer, G. (eds.). Springer, Heidelberg
    • Munos, R.: Finite-element methods with local triangulation refinement for continuous reinforcement learning problems. In: van Someren, M., Widmer, G. (eds.) ECML 1997. LNCS, vol.1224, pp. 170-182. Springer, Heidelberg (1997)
    • (1997) ECML 1997. LNCS , vol.1224 , pp. 170-182
    • Munos, R.1
  • 55
    • 0036832953 scopus 로고    scopus 로고
    • Variable-resolution discretization in optimal control
    • Munos, R., Moore, A.: Variable-resolution discretization in optimal control. Machine Learning 49(2-3), 291-323 (2002)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 291-323
    • Munos, R.1    Moore, A.2
  • 56
    • 34547769694 scopus 로고    scopus 로고
    • Reinforcement learning for a biped robot based on a CPG-actor-critic method
    • Nakamura, Y., Moria, T., Satoc, M., Ishiia, S.: Reinforcement learning for a biped robot based on a CPG-actor-critic method. Neural Networks 20, 723-735 (2007)
    • (2007) Neural Networks , vol.20 , pp. 723-735
    • Nakamura, Y.1    Moria, T.2    Satoc, M.3    Ishiia, S.4
  • 57
    • 0037288398 scopus 로고    scopus 로고
    • Least-squares policy evaluation algorithms with linear function approximation
    • Nedíc, A., Bertsekas, D.P.: Least-squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems 13, 79-110 (2003)
    • (2003) Discrete Event Dynamic Systems , vol.13 , pp. 79-110
    • Nedíc, A.1    Bertsekas, D.P.2
  • 60
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • Ormoneit, D., Sen, S.: Kernel-based reinforcement learning. Machine Learning 49(2-3), 161-178 (2002)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 61
    • 21344469989 scopus 로고    scopus 로고
    • Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots
    • Emergent Neural Computational Architectures Based on Neuroscience. Wermter, S., Austin, J.,Willshaw, D.J. (eds.). Springer, Heidelberg
    • Pérez-Uribe, A.: Using a time-delay actor-critic neural architecture with dopamine-like reinforcement signal for learning in autonomous robots. In:Wermter, S., Austin, J.,Willshaw, D.J. (eds.) Emergent Neural Computational Architectures Based on Neuroscience. LNCS (LNAI), vol.2036, pp. 522-533. Springer, Heidelberg (2001)
    • (2001) LNCS (LNAI) , vol.2036 , pp. 522-533
    • Pérez-Uribe, A.1
  • 62
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Peters, J., Schaal, S.: Natural actor-critic. Neurocomputing 71(7-9), 1180-1190 (2008)
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 65
    • 22944448066 scopus 로고    scopus 로고
    • Sparse distributed memories for on-line value-based reinforcement learning
    • Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.). Springer, Heidelberg
    • Ratitch, B., Precup, D.: Sparse distributed memories for on-line value-based reinforcement learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol.3201, pp. 347-358. Springer, Heidelberg (2004)
    • (2004) ECML 2004. LNCS (LNAI) , vol.3201 , pp. 347-358
    • Ratitch, B.1    Precup, D.2
  • 66
    • 0242536865 scopus 로고    scopus 로고
    • Adaptive resolution model-free reinforcement learning: Decision boundary partitioning
    • Stanford University, US
    • Reynolds, S.I.: Adaptive resolution model-free reinforcement learning: Decision boundary partitioning. In: Proceedings 17th International Conference on Machine Learning (ICML 2000), Stanford University, US, pp. 783-790 (2000)
    • (2000) Proceedings 17th International Conference on Machine Learning (ICML 2000) , pp. 783-790
    • Reynolds, S.I.1
  • 67
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q-iteration - First experiences with a data efficient neural reinforcement learning method
    • Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.). Springer, Heidelberg
    • Riedmiller, M.: Neural fitted Q-iteration - first experiences with a data efficient neural reinforcement learning method. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol.3720, pp. 317-328. Springer, Heidelberg (2005)
    • (2005) ECML 2005. LNCS (LNAI) , vol.3720 , pp. 317-328
    • Riedmiller, M.1
  • 69
    • 0003636089 scopus 로고
    • On-line Q-learning using connectionist systems
    • Engineering Department, Cambridge University, UK
    • Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Tech. Rep. CUED/F-INFENG/TR166, Engineering Department, Cambridge University, UK (1994)
    • (1994) Tech. Rep. CUED/F-INFENG/TR166
    • Rummery, G.A.1    Niranjan, M.2
  • 70
    • 0008872081 scopus 로고    scopus 로고
    • Analysis of a numerical dynamic programming algorithm applied to economic models
    • Santos, M.S., Vigo-Aguiar, J.: Analysis of a numerical dynamic programming algorithm applied to economic models. Econometrica 66(2), 409-426 (1998)
    • (1998) Econometrica , vol.66 , Issue.2 , pp. 409-426
    • Santos, M.S.1    Vigo-Aguiar, J.2
  • 71
    • 85153965130 scopus 로고
    • Reinforcement learning with soft state aggregation
    • Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.)
    • Singh, S.P., Jaakkola, T., Jordan, M.I.: Reinforcement learning with soft state aggregation. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol.7, pp. 361-368 (1995)
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 361-368
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 72
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R.S.: Learning to predict by the method of temporal differences. Machine Learning 3, 9-44 (1988)
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 73
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Austin, US
    • Sutton, R.S.: Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In: Proceedings 7th International Conference on Machine Learning (ICML 1990), Austin, US, pp. 216-224 (1990)
    • (1990) Proceedings 7th International Conference on Machine Learning (ICML 1990) , pp. 216-224
    • Sutton, R.S.1
  • 76
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Solla, S.A., Leen, T.K., Müller, K.R. (eds.). MIT Press, Cambridge
    • Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Solla, S.A., Leen, T.K., Müller, K.R. (eds.) Advances in Neural Information Processing Systems, vol.12, pp. 1057-1063. MIT Press, Cambridge (2000)
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.A.2    Singh, S.P.3    Mansour, Y.4
  • 78
    • 0031540855 scopus 로고    scopus 로고
    • On the convergence of pattern search algorithms
    • Torczon, V.: On the convergence of pattern search algorithms. SIAM Journal on Optimization 7(1), 1-25 (1997)
    • (1997) SIAM Journal on Optimization , vol.7 , Issue.1 , pp. 1-25
    • Torczon, V.1
  • 79
    • 0031341345 scopus 로고    scopus 로고
    • Neural reinforcement learning for behaviour synthesis
    • Touzet, C.F.: Neural reinforcement learning for behaviour synthesis. Robotics and Autonomous Systems 22(3-4), 251-281 (1997)
    • (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 251-281
    • Touzet, C.F.1
  • 80
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • Tsitsiklis, J.N., Van Roy, B.: Feature-based methods for large scale dynamic programming. Machine Learning 22(1-3), 59-94 (1996)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 81
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • Tsitsiklis, J.N., Van Roy, B.: An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control 42(5), 674-690 (1997)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 83
    • 58349110975 scopus 로고    scopus 로고
    • Adaptive optimal control for continuous-time linear systems based on policy iteration
    • Vrabie, D., Pastravanu, O., Abu-Khalaf, M., Lewis, F.: Adaptive optimal control for continuous-time linear systems based on policy iteration. Automatica 45(2), 477-484 (2009)
    • (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
    • Vrabie, D.1    Pastravanu, O.2    Abu-Khalaf, M.3    Lewis, F.4
  • 85
    • 47149095559 scopus 로고    scopus 로고
    • Value approximation with least squares support vector machine in reinforcement learning system
    • Wang, X., Tian, X., Cheng, Y.: Value approximation with least squares support vector machine in reinforcement learning system. Journal of Computational and Theoretical Nanoscience 4(7-8), 1290-1294 (2007)
    • (2007) Journal of Computational and Theoretical Nanoscience , vol.4 , Issue.7-8 , pp. 1290-1294
    • Wang, X.1    Tian, X.2    Cheng, Y.3
  • 88
    • 22944460232 scopus 로고    scopus 로고
    • Convergence and divergence in standard and averaging reinforcement learning
    • Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.). Springer, Heidelberg
    • Wiering, M.: Convergence and divergence in standard and averaging reinforcement learning. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol.3201, pp. 477-488. Springer, Heidelberg (2004)
    • (2004) ECML 2004. LNCS (LNAI) , vol.3201 , pp. 477-488
    • Wiering, M.1
  • 90
    • 34547098844 scopus 로고    scopus 로고
    • Kernel-based least-squares policy iteration for reinforcement learning
    • Xu, X., Hu, D., Lu, X.: Kernel-based least-squares policy iteration for reinforcement learning. IEEE Transactions on Neural Networks 18(4), 973-992 (2007)
    • (2007) IEEE Transactions on Neural Networks , vol.18 , Issue.4 , pp. 973-992
    • Xu, X.1    Hu, D.2    Lu, X.3
  • 91
    • 34547991475 scopus 로고    scopus 로고
    • Convergence results for some temporal difference methods based on least-squares
    • Massachusetts Institute of Technology, Cambridge, US
    • Yu, H., Bertsekas, D.P.: Convergence results for some temporal difference methods based on least-squares. Tech. Rep. LIDS 2697, Massachusetts Institute of Technology, Cambridge, US (2006)
    • (2006) Tech. Rep. LIDS 2697
    • Yu, H.1    Bertsekas, D.P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.