메뉴 건너뛰기




Volumn 2, Issue 3, 2004, Pages 263-278

Approximate dynamic programming strategies and their applicability for process control: A review and future directions

Author keywords

Approximate dynamic programming; Function approximation; Neuro dynamic programming; Optimal control; Reinforcement learning

Indexed keywords

APPROXIMATION THEORY; AUTOMATION; FUNCTIONS; MARKOV PROCESSES; OPTIMAL CONTROL SYSTEMS; PROBLEM SOLVING; PROCESS CONTROL; SET THEORY; STRATEGIC PLANNING;

EID: 4544319442     PISSN: 15986446     EISSN: None     Source Type: Journal    
DOI: None     Document Type: Review
Times cited : (91)

References (107)
  • 1
    • 0036682894 scopus 로고    scopus 로고
    • A reinforcement learning approach to automatic generation control
    • T. P. I. Ahamed, P. S. N. Rao, and P. S. Sastry, "A reinforcement learning approach to automatic generation control," Electric Power Systems Research, vol. 63, no. 1, pp. 9-26, 2002.
    • (2002) Electric Power Systems Research , vol.63 , Issue.1 , pp. 9-26
    • Ahamed, T.P.I.1    Rao, P.S.N.2    Sastry, P.S.3
  • 3
    • 0016556021 scopus 로고
    • A new approach to manipulator control: The cerebellar model articulation controller (CMAC)
    • J. S. Albus, "A new approach to manipulator control: The cerebellar model articulation controller (CMAC)," Journal of Dynamic Systems, Measurement and Control, pp. 220-227, 1975.
    • (1975) Journal of Dynamic Systems, Measurement and Control , pp. 220-227
    • Albus, J.S.1
  • 4
    • 0024646143 scopus 로고
    • Learning to control an inverted pendulum using neural networks
    • C. W. Anderson, "Learning to control an inverted pendulum using neural networks," IEEE Control Systems Magazine, vol. 9, no. 3, pp. 31-37, 1989.
    • (1989) IEEE Control Systems Magazine , vol.9 , Issue.3 , pp. 31-37
    • Anderson, C.W.1
  • 5
    • 0031259122 scopus 로고    scopus 로고
    • Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil
    • C. W. Anderson, D. C. Hittle, A. D. Katz, and R. M. Kretchmar, "Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil," Artificial Intelligence in Engineering, vol. 11, no. 4, pp. 421-429, 1997.
    • (1997) Artificial Intelligence in Engineering , vol.11 , Issue.4 , pp. 421-429
    • Anderson, C.W.1    Hittle, D.C.2    Katz, A.D.3    Kretchmar, R.M.4
  • 6
    • 0030149709 scopus 로고    scopus 로고
    • Purposive behavior acquisition for a real robot by vision-based reinforcement learning
    • M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, "Purposive behavior acquisition for a real robot by vision-based reinforcement learning," Machine Learning, vol. 23, pp. 279-303, 1996.
    • (1996) Machine Learning , vol.23 , pp. 279-303
    • Asada, M.1    Noda, S.2    Tawaratsumida, S.3    Hosoda, K.4
  • 7
    • 0022740002 scopus 로고
    • Dual control of an integrator with unknown gain
    • K. J. Åström and A. Helmersson, "Dual control of an integrator with unknown gain," Comp. & Maths, with Appls., vol. 12A, pp. 653-662, 1986.
    • (1986) Comp. & Maths, with Appls. , vol.12 A , pp. 653-662
    • Åström, K.J.1    Helmersson, A.2
  • 10
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • A. G. Barto, S. J. Bradtke, and S. P. Singh, "Learning to act using real-time dynamic programming," Artificial Intelligence, vol. 72, no. 1, pp. 81-138, 1995.
    • (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 11
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. on Systems, Man, and Cybernetics, vol. 13, no. 5, pp. 834-846, 1983.
    • (1983) IEEE Trans. on Systems, Man, and Cybernetics , vol.13 , Issue.5 , pp. 834-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 12
    • 85012688561 scopus 로고
    • Princeton University Press, New Jersey
    • R. E. Bellman, Dynamic Programming, Princeton University Press, New Jersey, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 15
    • 0024680419 scopus 로고
    • Adaptive aggregation for infinite horizon dynamic programming
    • D. P. Bertsekas and D. A. Castanon, "Adaptive aggregation for infinite horizon dynamic programming," IEEE Trans. on Automatic Control, vol. 34, no. 6, pp. 589-598, 1989.
    • (1989) IEEE Trans. on Automatic Control , vol.34 , Issue.6 , pp. 589-598
    • Bertsekas, D.P.1    Castanon, D.A.2
  • 16
    • 0003794137 scopus 로고
    • Prentice-Hall, Englewood Cliffs, NJ, 2nd edition
    • D. P. Bertsekas and R. G. Gallager, Data Networks, Prentice-Hall, Englewood Cliffs, NJ, 2nd edition, 1992.
    • (1992) Data Networks
    • Bertsekas, D.P.1    Gallager, R.G.2
  • 19
    • 0343709784 scopus 로고
    • A convex analytic approach to Markov decision processes
    • V. Borkar, "A convex analytic approach to Markov decision processes," Probability Theory and Related Fields, vol. 78, pp. 583-602, 1988.
    • (1988) Probability Theory and Related Fields , vol.78 , pp. 583-602
    • Borkar, V.1
  • 20
    • 0001133021 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • G. Tesauro and D. Touretzky, editors, Morgan Kaufmann
    • J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: safely approximating the value function," In G. Tesauro and D. Touretzky, editors, Advances in Neural Information Processing Systems, vol. 7, Morgan Kaufmann, 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Boyan, J.A.1    Moore, A.W.2
  • 21
    • 0000859970 scopus 로고
    • Reinforcement learning applied to linear quadratic regulation
    • S. J. Hanson, J. Cowan, and C. L. Giles, editors, Morgan Kaufmann
    • S. J. Bradtke, "Reinforcement learning applied to linear quadratic regulation," In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, vol, 5, Morgan Kaufmann, 1993.
    • (1993) Advances in Neural Information Processing Systems , vol.5
    • Bradtke, S.J.1
  • 22
    • 0003259931 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, MIT Press, San Francisco, CA
    • R. Crites and A. G. Barto, "Improving elevator performance using reinforcement learning," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, MIT Press, San Francisco, CA, 1996.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Crites, R.1    Barto, A.G.2
  • 23
    • 0032208335 scopus 로고    scopus 로고
    • Elevator group control using multiple reinforcement learning agents
    • R. Crites and A. G. Barto, "Elevator group control using multiple reinforcement learning agents," Machine Learning, vol. 33, pp. 235-262, 1998.
    • (1998) Machine Learning , vol.33 , pp. 235-262
    • Crites, R.1    Barto, A.G.2
  • 24
    • 0000430514 scopus 로고
    • The convergence of TD(X) for general λ
    • P. Dayan. "The convergence of TD(X) for general λ," Machine Learning, vol. 8, pp. 341-362, 1992.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 25
    • 0348090400 scopus 로고    scopus 로고
    • The linear programming approach to approximate dynamic programming
    • D. P. de Farias and B. Van Roy, "The linear programming approach to approximate dynamic programming," Operations Research, vol. 51, no. 6, pp. 850-865, 2003.
    • (2003) Operations Research , vol.51 , Issue.6 , pp. 850-865
    • De Farias, D.P.1    Van Roy, B.2
  • 26
    • 0001554538 scopus 로고
    • On linear programming in a Markov decision problem
    • E. V. Denardo, "On linear programming in a Markov decision problem," Management Science, vol. 16, pp. 282-288, 1970.
    • (1970) Management Science , vol.16 , pp. 282-288
    • Denardo, E.V.1
  • 30
    • 0018455841 scopus 로고
    • Linear programming and Markov decision chains
    • A. Hordijk and L. C. M. Kallenberg, "Linear programming and Markov decision chains," Management Science, vol. 25, pp. 352-362, 1979.
    • (1979) Management Science , vol.25 , pp. 352-362
    • Hordijk, A.1    Kallenberg, L.C.M.2
  • 31
    • 0026849113 scopus 로고
    • Process control via artificial neural networks and reinforcement learning
    • J. C. Hoskins and D. M. Himmelblau, "Process control via artificial neural networks and reinforcement learning," Computers & Chemical Engineering, vol. 16, no. 4, pp. 241-251, 1992.
    • (1992) Computers & Chemical Engineering , vol.16 , Issue.4 , pp. 241-251
    • Hoskins, J.C.1    Himmelblau, D.M.2
  • 33
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Computation, vol. 6, no. 6, pp. 1185-1201, 1994.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 35
    • 0037349318 scopus 로고    scopus 로고
    • Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor
    • N. S. Kaisare, J. M. Lee, and J. H. Lee, "Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor," International Journal of Robust and Nonlinear Control, vol. 13, no. 3-4, pp. 347-363, 2002.
    • (2002) International Journal of Robust and Nonlinear Control , vol.13 , Issue.3-4 , pp. 347-363
    • Kaisare, N.S.1    Lee, J.M.2    Lee, J.H.3
  • 40
    • 4544228767 scopus 로고    scopus 로고
    • Simulation-based dynamic programming strategy for improvement of control policies
    • San Francisco, CA, paper 43 8c
    • J. M. Lee, N. S. Kaisare, and J. H. Lee, "Simulation-based dynamic programming strategy for improvement of control policies," AIChE Annual Meeting, San Francisco, CA, paper 43 8c, 2003.
    • (2003) AIChE Annual Meeting
    • Lee, J.M.1    Kaisare, N.S.2    Lee, J.H.3
  • 41
    • 2942665140 scopus 로고    scopus 로고
    • Neuro-dynamic programming approach to dual control problem
    • Reno, NV, paper 276e
    • J. M. Lee and J. H. Lee, "Neuro-dynamic programming approach to dual control problem," AIChE Annual Meeting, Reno, NV, paper 276e, 2001.
    • (2001) AIChE Annual Meeting
    • Lee, J.M.1    Lee, J.H.2
  • 42
    • 4544285548 scopus 로고    scopus 로고
    • Approximate dynamic programming based approaches for input-output data-driven control of nonlinear processes
    • Submitted
    • J. M. Lee and J. H. Lee, "Approximate dynamic programming based approaches for input-output data-driven control of nonlinear processes," Automatica, 2004. Submitted.
    • (2004) Automatica
    • Lee, J.M.1    Lee, J.H.2
  • 43
    • 2942655578 scopus 로고    scopus 로고
    • Simulation-based learning of cost-to-go for control of nonlinear processes
    • J. M. Lee and J. H. Lee, "Simulation-based learning of cost-to-go for control of nonlinear processes," Korean J. Chem. Eng., vol. 21, no. 2, pp. 338-344, 2004.
    • (2004) Korean J. Chem. Eng. , vol.21 , Issue.2 , pp. 338-344
    • Lee, J.M.1    Lee, J.H.2
  • 44
    • 0026923530 scopus 로고
    • A neural network architecture that computes its own reliability
    • J. A. Leonard, M. A. Kramer, and L. H. Ungar, "A neural network architecture that computes its own reliability," Computers & Chemical Engineering, vol. 16, pp. 819-835, 1992.
    • (1992) Computers & Chemical Engineering , vol.16 , pp. 819-835
    • Leonard, J.A.1    Kramer, M.A.2    Ungar, L.H.3
  • 45
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, plannin and teaching
    • L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, plannin and teaching." Machine Learning, vol. 8, pp. 293-321, 1992.
    • (1992) Machine Learning , vol.8 , pp. 293-321
    • Lin, L.-J.1
  • 46
    • 0026880130 scopus 로고
    • Automatic programming of behavior-based robots using rein-forcement learning
    • S. Mahadevan and J. Connell, "Automatic programming of behavior-based robots using rein-forcement learning," Machine Learning, vol. 55, no. 2-3, pp. 311-365, 1992.
    • (1992) Machine Learning , vol.55 , Issue.2-3 , pp. 311-365
    • Mahadevan, S.1    Connell, J.2
  • 48
    • 0001257766 scopus 로고
    • Linear programming and sequential decisions
    • A. S. Manne, "Linear programming and sequential decisions," Management Science, vol. 6, no. 3, pp. 259-267, 1960.
    • (1960) Management Science , vol.6 , Issue.3 , pp. 259-267
    • Manne, A.S.1
  • 49
    • 0035249254 scopus 로고    scopus 로고
    • Simulation-based optimization of Markov reward processes
    • P. Marbach and J. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. on Automatic Control, vol. 46, no. 2, pp. 191-209, 2001.
    • (2001) IEEE Trans. on Automatic Control , vol.46 , Issue.2 , pp. 191-209
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 50
    • 0034662259 scopus 로고    scopus 로고
    • Batch process modeling for optimization using reinforcement learning
    • E. C. Martinez, "Batch process modeling for optimization using reinforcement learning," Computers & Chemical Engineering, vol. 24, pp. 1187-1193, 2000.
    • (2000) Computers & Chemical Engineering , vol.24 , pp. 1187-1193
    • Martinez, E.C.1
  • 51
    • 0345494056 scopus 로고
    • Temporal difference learning: A chemical process control application
    • A. F. Murray, editor, Kluwer, Norwell, MA
    • S. Miller and R. J. Williams, "Temporal difference learning: A chemical process control application," In A. F. Murray, editor, Applications of Artificial Neural Networks, Kluwer, Norwell, MA, 1995.
    • (1995) Applications of Artificial Neural Networks
    • Miller, S.1    Williams, R.J.2
  • 52
    • 0029514510 scopus 로고
    • The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces
    • A. Moore and C. Atkeson, "The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces. Machine Learning, vol. 21, no. 3, pp. 199-233, 1995.
    • (1995) Machine Learning , vol.21 , Issue.3 , pp. 199-233
    • Moore, A.1    Atkeson, C.2
  • 54
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less time
    • A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Machine Learning, vol. 13, pp. 103-130, 1993.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2
  • 55
    • 0033135677 scopus 로고    scopus 로고
    • Model predictive control: Past, present and future
    • M. Morari and J. H. Lee, "Model predictive control: Past, present and future," Computers & Chemical Engineering, vol. 23, pp. 667-682, 1999.
    • (1999) Computers & Chemical Engineering , vol.23 , pp. 667-682
    • Morari, M.1    Lee, J.H.2
  • 56
  • 57
    • 0034274415 scopus 로고    scopus 로고
    • A study of reinforcement learning in the continuous case by means of viscosity solutions
    • R. Munos, "A study of reinforcement learning in the continuous case by means of viscosity solutions," Machine Learning Journal, vol. 40, pp. 265-299, 2000.
    • (2000) Machine Learning Journal , vol.40 , pp. 265-299
    • Munos, R.1
  • 58
    • 85046267451 scopus 로고    scopus 로고
    • Enhancing Q-leaming for optimal asset allocation
    • M. Jordan, M. Kearns, and S. Solla, editors
    • R. Neuneier, "Enhancing Q-leaming for optimal asset allocation," In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems, vol. 10, 1997.
    • (1997) Advances in Neural Information Processing Systems , vol.10
    • Neuneier, R.1
  • 59
    • 0036804005 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning in average-cost problems
    • D. Ormoneit and P. W. Glynn, "Kernel-based reinforcement learning in average-cost problems," IEEE Trans. on Automatic Control, vol. 47, no. 10, pp. 1624-1636, 2002.
    • (2002) IEEE Trans. on Automatic Control , vol.47 , Issue.10 , pp. 1624-1636
    • Ormoneit, D.1    Glynn, P.W.2
  • 60
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Machine Learning, vol. 49, pp. 161-178, 2002.
    • (2002) Machine Learning , vol.49 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 61
    • 0001473437 scopus 로고
    • On estimation of a probability density function and mode
    • E. Parzen, "On estimation of a probability density function and mode," Ann. Math. Statist., vol. 33, pp. 1065-1076, 1962.
    • (1962) Ann. Math. Statist. , vol.33 , pp. 1065-1076
    • Parzen, E.1
  • 63
    • 84977063352 scopus 로고
    • Efficient learning and planning within the Dyna framework
    • J. Peng and R. J. Williams, "Efficient learning and planning within the Dyna framework," Adaptive Behavior, vol. 1, no. 4. pp. 437-454, 1993.
    • (1993) Adaptive Behavior , vol.1 , Issue.4 , pp. 437-454
    • Peng, J.1    Williams, R.J.2
  • 66
    • 0041802770 scopus 로고    scopus 로고
    • A survey of industrial model predictive control technology
    • S. J. Qin and T. A. Badgwell, "A survey of industrial model predictive control technology," Control Engineering Practice, vol. 11, no. 7, pp. 733-764, 2003.
    • (2003) Control Engineering Practice , vol.11 , Issue.7 , pp. 733-764
    • Qin, S.J.1    Badgwell, T.A.2
  • 70
    • 0001201756 scopus 로고
    • Some studies in machine learning using the game of checkers
    • A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM J. Res. Develop., pp. 210-229, 1959.
    • (1959) IBM J. Res. Develop. , pp. 210-229
    • Samuel, A.L.1
  • 71
    • 0001201757 scopus 로고
    • Some studies in machine learning using the game of checkers II - Recent progress
    • A. L. Samuel, "Some studies in machine learning using the game of checkers II - recent progress," IBM J. Res. Develop., pp. 601-617, 1967.
    • (1967) IBM J. Res. Develop. , pp. 601-617
    • Samuel, A.L.1
  • 72
    • 0031231885 scopus 로고    scopus 로고
    • Experiments with reinforcement learning in problems with continuous state and action spaces
    • J. C. Santamaria, R. S. Sutton, and A. Ram, "Experiments with reinforcement learning in problems with continuous state and action spaces," Adaptive Behavior, vol. 6, no. 2, pp. 163-217, 1997.
    • (1997) Adaptive Behavior , vol.6 , Issue.2 , pp. 163-217
    • Santamaria, J.C.1    Sutton, R.S.2    Ram, A.3
  • 73
    • 84898995067 scopus 로고    scopus 로고
    • Learning from demonstration
    • M. C. Mozer, M. Jordan, and T. Petsche, editors
    • S. Schaal, "Learning from demonstration," In M. C. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, vol. 9, pp. 1040-1046, 1997.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 1040-1046
    • Schaal, S.1
  • 74
    • 0028374275 scopus 로고
    • Robot juggling: An implementation of memory-based learning
    • S. Schaal and C. Atkeson, "Robot juggling: An implementation of memory-based learning," IEEE Control Systems, vol. 14, no. 1, pp. 57-71, 1994.
    • (1994) IEEE Control Systems , vol.14 , Issue.1 , pp. 57-71
    • Schaal, S.1    Atkeson, C.2
  • 75
    • 0000433333 scopus 로고
    • Temporal difference learning of position evaluation in the game of Go
    • J. D. Cowan, G. Tesauro, and J. Alspector, editors
    • N. N. Schraudolph, P. Dayan, and T. J. Sejnowski, "Temporal difference learning of position evaluation in the game of Go," In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, vol. 6, pp. 817-824, 1994.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 817-824
    • Schraudolph, N.N.1    Dayan, P.2    Sejnowski, T.J.3
  • 76
    • 84898972974 scopus 로고    scopus 로고
    • Reinforcement learning for dynamic channel allocation in cellular telephone systems
    • M. C. Mozer, M. I. Jordan, and T. Petsche, editors
    • S. Singh and D. Bertsekas, "Reinforcement learning for dynamic channel allocation in cellular telephone systems," In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, vol. 9, pp. 974-980, 1997.
    • (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974-980
    • Singh, S.1    Bertsekas, D.2
  • 77
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • S. P. Singh and R. S. Sutton, "Reinforcement learning with replacing eligibility traces," Machine Learning, vol. 22, pp. 123-158, 1996.
    • (1996) Machine Learning , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 79
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforce-ment learning with function approximation
    • S. A. Solla, T. K. Leen, and K.-R. Muller, editors
    • R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforce-ment learning with function approximation," In S. A. Solla, T. K. Leen, and K.-R. Muller, editors, Advances in Neural Information Processing Systems, vol. 12, pp. 1057-1063, 2000.
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
    • Sutton, R.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 81
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • R. S. Sutton, "Learning to predict by the method of temporal differences," Machine Learning, vol. 3.no. 1, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 82
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Austin, TX
    • R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," Proc. of the Seventh International Conference on Machine Learning, Austin, TX, 1990.
    • (1990) Proc. of the Seventh International Conference on Machine Learning
    • Sutton, R.S.1
  • 84
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors
    • R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, pp. 1038-1044, 1996.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.S.1
  • 85
    • 0019537951 scopus 로고
    • Toward a modern theory of adaptive networks: Expectation and prediction
    • R. S. Sutton and A. G. Barto, "Toward a modern theory of adaptive networks: Expectation and prediction," Psycol. Rev., vol. 88, no. 2, pp. 135-170, 1981.
    • (1981) Psycol. Rev. , vol.88 , Issue.2 , pp. 135-170
    • Sutton, R.S.1    Barto, A.G.2
  • 87
    • 0034499835 scopus 로고    scopus 로고
    • Enhanced continuous valued Q-learning for real autonomous robots
    • M. Takeda, T. Nakamura, M. Imai, T. Ogasawara, and M. Asada, "Enhanced continuous valued Q-learning for real autonomous robots," Advanced Robotics, vol. 14, no. 5, pp. 439-442, 2000.
    • (2000) Advanced Robotics , vol.14 , Issue.5 , pp. 439-442
    • Takeda, M.1    Nakamura, T.2    Imai, M.3    Ogasawara, T.4    Asada, M.5
  • 88
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • G. Tesauro, "Practical issues in temporal difference learning," Machine Learning, vol. 8, pp. 257-277, 1992.
    • (1992) Machine Learning , vol.8 , pp. 257-277
    • Tesauro, G.1
  • 89
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural Computation, vol. 6, no. 2, pp. 215-219, 1994.
    • (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
    • Tesauro, G.1
  • 90
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • G. Tesauro, "Temporal difference learning and TD-Gammon," Communications of the ACM, vol. 38, no. 3, pp. 58-67, 1995.
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-67
    • Tesauro, G.1
  • 91
    • 0003215153 scopus 로고
    • Learning to play the game of chess
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, editors
    • S. Thrun, "Learning to play the game of chess," In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems, vol. 7, 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7
    • Thrun, S.1
  • 93
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Machine Learning, vol. 16, pp. 185-202, 1994.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 94
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997.
    • (1997) IEEE Trans. on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 95
    • 4544283129 scopus 로고    scopus 로고
    • Neuro-dynamic programming: Overview and recent trends
    • E. Feinberg and A. Shwartz, editors, Kluwer, Boston, MA
    • B. Van Roy, "Neuro-dynamic programming: Overview and recent trends," In E. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, Kluwer, Boston, MA, 2001.
    • (2001) Handbook of Markov Decision Processes: Methods and Applications
    • Van Roy, B.1
  • 98
    • 0002557583 scopus 로고
    • Advanced forecasting methods for global crisis warning and models of intelligence'
    • P. J. Werbos, "Advanced forecasting methods for global crisis warning and models of intelligence'," General Systems Yearbook, vol. 22, pp. 25-38, 1977.
    • (1977) General Systems Yearbook , vol.22 , pp. 25-38
    • Werbos, P.J.1
  • 99
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • D. A. White and D. A. Sofge, editors, Van Nostrand Reinhold, New York
    • P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, pp. 493-525, 1992.
    • (1992) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , pp. 493-525
    • Werbos, P.J.1
  • 101
    • 0039967456 scopus 로고
    • Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems
    • Northeastern University, College of Computer Science, Boston, MA
    • R. J. Williams and L. C. Baird III, "Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems," Technical Report NU-CCS-93-14, Northeastern University, College of Computer Science, Boston, MA, 1993.
    • (1993) Technical Report , vol.NU-CCS-93-14
    • Williams, R.J.1    Baird III, L.C.2
  • 102
    • 0008612661 scopus 로고    scopus 로고
    • Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation
    • J. A. Wilson and E. C. Martinez, "Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation," Computers & Chemical Engineering, vol. 21S, pp. S1233-S1238, 1997.
    • (1997) Computers & Chemical Engineering , vol.21 S
    • Wilson, J.A.1    Martinez, E.C.2
  • 103
    • 4544314315 scopus 로고
    • Stochastic control problems
    • B. Friedland, editor, ASME, New York
    • M. Wonham, "Stochastic control problems," In B. Friedland, editor, Stochastic Problems in Control, ASME, New York, 1968.
    • (1968) Stochastic Problems in Control
    • Wonham, M.1
  • 105
    • 0005914572 scopus 로고    scopus 로고
    • W. Zhang, Reinforcement Learning for Job-Shop Scheduling, PhD thesis, Oregon State University, 1996. Also available as Technical Report CS-96-30-1.
    • Technical Report , vol.CS-96-30-1
  • 107
    • 0001648572 scopus 로고    scopus 로고
    • Highperformance job-shop scheduling with a time-delay TD(λ) network
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors
    • W. Zhang and T. G. Dietterich, "Highperformance job-shop scheduling with a time-delay TD(λ) network," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, 1996.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Zhang, W.1    Dietterich, T.G.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.