SCOPUS 정보 검색 플랫폼

International Journal of Control, Automation and Systems

Volumn 2, Issue 3, 2004, Pages 263-278

Approximate dynamic programming strategies and their applicability for process control: A review and future directions

a Georgia Institute of Technology (United States)

Author keywords

Approximate dynamic programming; Function approximation; Neuro dynamic programming; Optimal control; Reinforcement learning

Indexed keywords

APPROXIMATION THEORY; AUTOMATION; FUNCTIONS; MARKOV PROCESSES; OPTIMAL CONTROL SYSTEMS; PROBLEM SOLVING; PROCESS CONTROL; SET THEORY; STRATEGIC PLANNING;

APPROXIMATE DYNAMIC PROGRAMMING; FUNCTION APPROXIMATION; NEURO-DYNAMIC PROGRAMMING (NDP); OPTIMAL CONTROL; REINFORCEMENT LEARNING;

DYNAMIC PROGRAMMING;

EID: 4544319442 PISSN: 15986446 EISSN: None Source Type: Journal
DOI: None Document Type: Review

Times cited : (91)

References (107)

1
- 0036682894
- A reinforcement learning approach to automatic generation control
- T. P. I. Ahamed, P. S. N. Rao, and P. S. Sastry, "A reinforcement learning approach to automatic generation control," Electric Power Systems Research, vol. 63, no. 1, pp. 9-26, 2002.
- (2002) Electric Power Systems Research , vol.63 , Issue.1 , pp. 9-26
- Ahamed, T.P.I.¹ Rao, P.S.N.² Sastry, P.S.³

2
- 0016555419
- Data storage in the cerebellar model articulation controller
- J. S. Albus, "Data storage in the cerebellar model articulation controller," Journal of Dynamic Systems, Measurement and Control, pp. 228-233, 1975.
- (1975) Journal of Dynamic Systems, Measurement and Control , pp. 228-233
- Albus, J.S.¹

3
- 0016556021
- A new approach to manipulator control: The cerebellar model articulation controller (CMAC)
- J. S. Albus, "A new approach to manipulator control: The cerebellar model articulation controller (CMAC)," Journal of Dynamic Systems, Measurement and Control, pp. 220-227, 1975.
- (1975) Journal of Dynamic Systems, Measurement and Control , pp. 220-227
- Albus, J.S.¹

4
- 0024646143
- Learning to control an inverted pendulum using neural networks
- C. W. Anderson, "Learning to control an inverted pendulum using neural networks," IEEE Control Systems Magazine, vol. 9, no. 3, pp. 31-37, 1989.
- (1989) IEEE Control Systems Magazine , vol.9 , Issue.3 , pp. 31-37
- Anderson, C.W.¹

5
- 0031259122
- Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil
- C. W. Anderson, D. C. Hittle, A. D. Katz, and R. M. Kretchmar, "Synthesis of reinforcement learning, neural networks and PI control applied to a simulated heating coil," Artificial Intelligence in Engineering, vol. 11, no. 4, pp. 421-429, 1997.
- (1997) Artificial Intelligence in Engineering , vol.11 , Issue.4 , pp. 421-429
- Anderson, C.W.¹ Hittle, D.C.² Katz, A.D.³ Kretchmar, R.M.⁴

6
- 0030149709
- Purposive behavior acquisition for a real robot by vision-based reinforcement learning
- M. Asada, S. Noda, S. Tawaratsumida, and K. Hosoda, "Purposive behavior acquisition for a real robot by vision-based reinforcement learning," Machine Learning, vol. 23, pp. 279-303, 1996.
- (1996) Machine Learning , vol.23 , pp. 279-303
- Asada, M.¹ Noda, S.² Tawaratsumida, S.³ Hosoda, K.⁴

7
- 0022740002
- Dual control of an integrator with unknown gain
- K. J. Åström and A. Helmersson, "Dual control of an integrator with unknown gain," Comp. & Maths, with Appls., vol. 12A, pp. 653-662, 1986.
- (1986) Comp. & Maths, with Appls. , vol.12 A , pp. 653-662
- Åström, K.J.¹ Helmersson, A.²

8
- 0002130986
- Robot learning from demonstration
- San Francisco, CA
- C. G. Atkeson and S. Schaal, "Robot learning from demonstration," Proc. of the Fourteenth International Conference on Machine Learning, pp. 12-20, San Francisco, CA, 1997.
- (1997) Proc. of the Fourteenth International Conference on Machine Learning , pp. 12-20
- Atkeson, C.G.¹ Schaal, S.²

9
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- L. Baird III, "Residual algorithms: Reinforcement learning with function approximation," Proc. of the International Conference on Machine Learning, pp. 30-37, 1995.
- (1995) Proc. of the International Conference on Machine Learning , pp. 30-37
- Baird III, L.¹

10
- 0029210635
- Learning to act using real-time dynamic programming
- A. G. Barto, S. J. Bradtke, and S. P. Singh, "Learning to act using real-time dynamic programming," Artificial Intelligence, vol. 72, no. 1, pp. 81-138, 1995.
- (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

11
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. on Systems, Man, and Cybernetics, vol. 13, no. 5, pp. 834-846, 1983.
- (1983) IEEE Trans. on Systems, Man, and Cybernetics , vol.13 , Issue.5 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

12
- 85012688561
- Princeton University Press, New Jersey
- R. E. Bellman, Dynamic Programming, Princeton University Press, New Jersey, 1957.
- (1957) Dynamic Programming
- Bellman, R.E.¹

13
- 0003565783
- Athena Scientic, Belmont, MA, 2nd edition
- D. P. Bertsekas. Dynamic Programming and Optimal Control, Athena Scientic, Belmont, MA, 2nd edition, 2000.
- (2000) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

14
- 18144371937
- Neuro-dynamic programming: An overview
- J. B. Rawlings, B. A. Ogunnaike, and J. W. Eaton, editors
- D. P. Bertsekas, "Neuro-dynamic programming: An overview," In J. B. Rawlings, B. A. Ogunnaike, and J. W. Eaton, editors, Proc. of Sixth International Conference on Chemical Process Control, 2001.
- (2001) Proc. of Sixth International Conference on Chemical Process Control
- Bertsekas, D.P.¹

15
- 0024680419
- Adaptive aggregation for infinite horizon dynamic programming
- D. P. Bertsekas and D. A. Castanon, "Adaptive aggregation for infinite horizon dynamic programming," IEEE Trans. on Automatic Control, vol. 34, no. 6, pp. 589-598, 1989.
- (1989) IEEE Trans. on Automatic Control , vol.34 , Issue.6 , pp. 589-598
- Bertsekas, D.P.¹ Castanon, D.A.²

16
- 0003794137
- Prentice-Hall, Englewood Cliffs, NJ, 2nd edition
- D. P. Bertsekas and R. G. Gallager, Data Networks, Prentice-Hall, Englewood Cliffs, NJ, 2nd edition, 1992.
- (1992) Data Networks
- Bertsekas, D.P.¹ Gallager, R.G.²

17
- 0003636164
- Prentice-Hall, Englewood Cliffs, NJ
- D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ, 1989.
- (1989) Parallel and Distributed Computation: Numerical Methods
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

18
- 0003487482
- Athena Scientic, Belmont, Massachusetts
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientic, Belmont, Massachusetts, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

19
- 0343709784
- A convex analytic approach to Markov decision processes
- V. Borkar, "A convex analytic approach to Markov decision processes," Probability Theory and Related Fields, vol. 78, pp. 583-602, 1988.
- (1988) Probability Theory and Related Fields , vol.78 , pp. 583-602
- Borkar, V.¹

20
- 0001133021
- Generalization in reinforcement learning: Safely approximating the value function
- G. Tesauro and D. Touretzky, editors, Morgan Kaufmann
- J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: safely approximating the value function," In G. Tesauro and D. Touretzky, editors, Advances in Neural Information Processing Systems, vol. 7, Morgan Kaufmann, 1995.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Boyan, J.A.¹ Moore, A.W.²

21
- 0000859970
- Reinforcement learning applied to linear quadratic regulation
- S. J. Hanson, J. Cowan, and C. L. Giles, editors, Morgan Kaufmann
- S. J. Bradtke, "Reinforcement learning applied to linear quadratic regulation," In S. J. Hanson, J. Cowan, and C. L. Giles, editors, Advances in Neural Information Processing Systems, vol, 5, Morgan Kaufmann, 1993.
- (1993) Advances in Neural Information Processing Systems , vol.5
- Bradtke, S.J.¹

22
- 0003259931
- Improving elevator performance using reinforcement learning
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, MIT Press, San Francisco, CA
- R. Crites and A. G. Barto, "Improving elevator performance using reinforcement learning," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, MIT Press, San Francisco, CA, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Crites, R.¹ Barto, A.G.²

23
- 0032208335
- Elevator group control using multiple reinforcement learning agents
- R. Crites and A. G. Barto, "Elevator group control using multiple reinforcement learning agents," Machine Learning, vol. 33, pp. 235-262, 1998.
- (1998) Machine Learning , vol.33 , pp. 235-262
- Crites, R.¹ Barto, A.G.²

24
- 0000430514
- The convergence of TD(X) for general λ
- P. Dayan. "The convergence of TD(X) for general λ," Machine Learning, vol. 8, pp. 341-362, 1992.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.¹

25
- 0348090400
- The linear programming approach to approximate dynamic programming
- D. P. de Farias and B. Van Roy, "The linear programming approach to approximate dynamic programming," Operations Research, vol. 51, no. 6, pp. 850-865, 2003.
- (2003) Operations Research , vol.51 , Issue.6 , pp. 850-865
- De Farias, D.P.¹ Van Roy, B.²

26
- 0001554538
- On linear programming in a Markov decision problem
- E. V. Denardo, "On linear programming in a Markov decision problem," Management Science, vol. 16, pp. 282-288, 1970.
- (1970) Management Science , vol.16 , pp. 282-288
- Denardo, E.V.¹

27
- 0030691430
- A comparison of direct and model-based reinforcement learning
- C. G. Atkeson and J. Santamaria, "A comparison of direct and model-based reinforcement learning," Proc. of the International Conference on Robotics and Automation, 1997.
- (1997) Proc. of the International Conference on Robotics and Automation
- Atkeson, C.G.¹ Santamaria, J.²

28
- 84880694195
- Stable function approximation in dynamic programming
- San Francisco, CA
- G. J. Gordon, "Stable function approximation in dynamic programming," Proc. of the Twelfth International Conference on Machine Learning, San Francisco, CA, pp. 261-268, 1995.
- (1995) Proc. of the Twelfth International Conference on Machine Learning , pp. 261-268
- Gordon, G.J.¹

29
- 0003684449
- Springer-Verlag, New York, NY
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer-Verlag, New York, NY, 2001.
- (2001) The Elements of Statistical Learning: Data Mining, Inference, and Prediction
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³

30
- 0018455841
- Linear programming and Markov decision chains
- A. Hordijk and L. C. M. Kallenberg, "Linear programming and Markov decision chains," Management Science, vol. 25, pp. 352-362, 1979.
- (1979) Management Science , vol.25 , pp. 352-362
- Hordijk, A.¹ Kallenberg, L.C.M.²

31
- 0026849113
- Process control via artificial neural networks and reinforcement learning
- J. C. Hoskins and D. M. Himmelblau, "Process control via artificial neural networks and reinforcement learning," Computers & Chemical Engineering, vol. 16, no. 4, pp. 241-251, 1992.
- (1992) Computers & Chemical Engineering , vol.16 , Issue.4 , pp. 241-251
- Hoskins, J.C.¹ Himmelblau, D.M.²

32
- 0003644124
- MIT Press, Cambridge, MA
- R. A. Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, MA, 1960.
- (1960) Dynamic Programming and Markov Processes
- Howard, R.A.¹

33
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Computation, vol. 6, no. 6, pp. 1185-1201, 1994.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

34
- 0029679044
- Reinforcement learning: A survey
- L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," Journal of Artificial Intelligence Research, vol. 4, pp. 237-285, 1996.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

35
- 0037349318
- Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor
- N. S. Kaisare, J. M. Lee, and J. H. Lee, "Simulation based strategy for nonlinear optimal control: Application to a microbial cell reactor," International Journal of Robust and Nonlinear Control, vol. 13, no. 3-4, pp. 347-363, 2002.
- (2002) International Journal of Robust and Nonlinear Control , vol.13 , Issue.3-4 , pp. 347-363
- Kaisare, N.S.¹ Lee, J.M.² Lee, J.H.³

36
- 0027704767
- Complexity analysis of real-time reinforcement learning
- Menlo Park, CA
- S. Koenig and R. G. Simmons, "Complexity analysis of real-time reinforcement learning," Proc. of the Eleventh National Conference on Artificial Intelligence, Menlo Park, CA, pp. 99-105, 1993.
- (1993) Proc. of the Eleventh National Conference on Artificial Intelligence , pp. 99-105
- Koenig, S.¹ Simmons, R.G.²

37
- 84898938510
- Actor-critic algorithms
- S. A. Solla, T. K. Leen, and K.-R. Müller, editors
- V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms," In S. A. Solla, T. K. Leen, and K.-R. Müller, editors, Advances in neural information processing systems, vol. 12, 2000.
- (2000) Advances in Neural Information Processing Systems , vol.12
- Konda, V.R.¹ Tsitsiklis, J.N.²

38
- 84947709855
- Adaptive state space quantisation for reinforcement learning of collision-free navigation
- Piscataway, NJ
- B. J. A. Kröse and J. W. M. van Dam, "Adaptive state space quantisation for reinforcement learning of collision-free navigation," Proc. of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems, Piscataway, NJ, 1992.
- (1992) Proc. of the 1992 IEEE/RSJ International Conference on Intelligent Robots and Systems
- Kröse, B.J.A.¹ Van Dam, J.W.M.²

39
- 0003691637
- Prentice Hall, Englewood Cliffs, NJ
- P. R. Kumar and P. P. Varaiya, Stochastic Systems: Estimation, Identification, and Adaptive Control, Prentice Hall, Englewood Cliffs, NJ, 1986.
- (1986) Stochastic Systems: Estimation, Identification, and Adaptive Control
- Kumar, P.R.¹ Varaiya, P.P.²

40
- 4544228767
- Simulation-based dynamic programming strategy for improvement of control policies
- San Francisco, CA, paper 43 8c
- J. M. Lee, N. S. Kaisare, and J. H. Lee, "Simulation-based dynamic programming strategy for improvement of control policies," AIChE Annual Meeting, San Francisco, CA, paper 43 8c, 2003.
- (2003) AIChE Annual Meeting
- Lee, J.M.¹ Kaisare, N.S.² Lee, J.H.³

41
- 2942665140
- Neuro-dynamic programming approach to dual control problem
- Reno, NV, paper 276e
- J. M. Lee and J. H. Lee, "Neuro-dynamic programming approach to dual control problem," AIChE Annual Meeting, Reno, NV, paper 276e, 2001.
- (2001) AIChE Annual Meeting
- Lee, J.M.¹ Lee, J.H.²

42
- 4544285548
- Approximate dynamic programming based approaches for input-output data-driven control of nonlinear processes
- Submitted
- J. M. Lee and J. H. Lee, "Approximate dynamic programming based approaches for input-output data-driven control of nonlinear processes," Automatica, 2004. Submitted.
- (2004) Automatica
- Lee, J.M.¹ Lee, J.H.²

43
- 2942655578
- Simulation-based learning of cost-to-go for control of nonlinear processes
- J. M. Lee and J. H. Lee, "Simulation-based learning of cost-to-go for control of nonlinear processes," Korean J. Chem. Eng., vol. 21, no. 2, pp. 338-344, 2004.
- (2004) Korean J. Chem. Eng. , vol.21 , Issue.2 , pp. 338-344
- Lee, J.M.¹ Lee, J.H.²

44
- 0026923530
- A neural network architecture that computes its own reliability
- J. A. Leonard, M. A. Kramer, and L. H. Ungar, "A neural network architecture that computes its own reliability," Computers & Chemical Engineering, vol. 16, pp. 819-835, 1992.
- (1992) Computers & Chemical Engineering , vol.16 , pp. 819-835
- Leonard, J.A.¹ Kramer, M.A.² Ungar, L.H.³

45
- 0000123778
- Self-improving reactive agents based on reinforcement learning, plannin and teaching
- L.-J. Lin, "Self-improving reactive agents based on reinforcement learning, plannin and teaching." Machine Learning, vol. 8, pp. 293-321, 1992.
- (1992) Machine Learning , vol.8 , pp. 293-321
- Lin, L.-J.¹

46
- 0026880130
- Automatic programming of behavior-based robots using rein-forcement learning
- S. Mahadevan and J. Connell, "Automatic programming of behavior-based robots using rein-forcement learning," Machine Learning, vol. 55, no. 2-3, pp. 311-365, 1992.
- (1992) Machine Learning , vol.55 , Issue.2-3 , pp. 311-365
- Mahadevan, S.¹ Connell, J.²

47
- 0001963197
- Self-improving factory simulation using continuous-time average-reward reinforcement learning
- th International Conference on Machine Learning, pp. 202-210, 1997.
- (1997) th International Conference on Machine Learning , pp. 202-210
- Mahadevan, S.¹ Marchalleck, N.² Das, T.K.³ Gosavi, A.⁴

48
- 0001257766
- Linear programming and sequential decisions
- A. S. Manne, "Linear programming and sequential decisions," Management Science, vol. 6, no. 3, pp. 259-267, 1960.
- (1960) Management Science , vol.6 , Issue.3 , pp. 259-267
- Manne, A.S.¹

49
- 0035249254
- Simulation-based optimization of Markov reward processes
- P. Marbach and J. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. on Automatic Control, vol. 46, no. 2, pp. 191-209, 2001.
- (2001) IEEE Trans. on Automatic Control , vol.46 , Issue.2 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, J.N.²

50
- 0034662259
- Batch process modeling for optimization using reinforcement learning
- E. C. Martinez, "Batch process modeling for optimization using reinforcement learning," Computers & Chemical Engineering, vol. 24, pp. 1187-1193, 2000.
- (2000) Computers & Chemical Engineering , vol.24 , pp. 1187-1193
- Martinez, E.C.¹

51
- 0345494056
- Temporal difference learning: A chemical process control application
- A. F. Murray, editor, Kluwer, Norwell, MA
- S. Miller and R. J. Williams, "Temporal difference learning: A chemical process control application," In A. F. Murray, editor, Applications of Artificial Neural Networks, Kluwer, Norwell, MA, 1995.
- (1995) Applications of Artificial Neural Networks
- Miller, S.¹ Williams, R.J.²

52
- 0029514510
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces
- A. Moore and C. Atkeson, "The parti-game algorithm for variable resolution reinforcement learning in multidimensional state spaces. Machine Learning, vol. 21, no. 3, pp. 199-233, 1995.
- (1995) Machine Learning , vol.21 , Issue.3 , pp. 199-233
- Moore, A.¹ Atkeson, C.²

53
- 0003442587
- PhD thesis, Cambridge University, October
- A. W. Moore, Efficient Memory Based Robot Learning, PhD thesis, Cambridge University, October 1991.
- (1991) Efficient Memory Based Robot Learning
- Moore, A.W.¹

54
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less time
- A. W. Moore and C. G. Atkeson, "Prioritized sweeping: Reinforcement learning with less data and less time," Machine Learning, vol. 13, pp. 103-130, 1993.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

55
- 0033135677
- Model predictive control: Past, present and future
- M. Morari and J. H. Lee, "Model predictive control: Past, present and future," Computers & Chemical Engineering, vol. 23, pp. 667-682, 1999.
- (1999) Computers & Chemical Engineering , vol.23 , pp. 667-682
- Morari, M.¹ Lee, J.H.²

56
- 0039225090
- A convergent reinforcement learning algorithm in the continuous case based on a finite difference method
- R. Munos, "A convergent reinforcement learning algorithm in the continuous case based on a finite difference method," Proc. of the International Joint Conference on Artificial Intelligence, 1997.
- (1997) Proc. of the International Joint Conference on Artificial Intelligence
- Munos, R.¹

57
- 0034274415
- A study of reinforcement learning in the continuous case by means of viscosity solutions
- R. Munos, "A study of reinforcement learning in the continuous case by means of viscosity solutions," Machine Learning Journal, vol. 40, pp. 265-299, 2000.
- (2000) Machine Learning Journal , vol.40 , pp. 265-299
- Munos, R.¹

58
- 85046267451
- Enhancing Q-leaming for optimal asset allocation
- M. Jordan, M. Kearns, and S. Solla, editors
- R. Neuneier, "Enhancing Q-leaming for optimal asset allocation," In M. Jordan, M. Kearns, and S. Solla, editors, Advances in Neural Information Processing Systems, vol. 10, 1997.
- (1997) Advances in Neural Information Processing Systems , vol.10
- Neuneier, R.¹

59
- 0036804005
- Kernel-based reinforcement learning in average-cost problems
- D. Ormoneit and P. W. Glynn, "Kernel-based reinforcement learning in average-cost problems," IEEE Trans. on Automatic Control, vol. 47, no. 10, pp. 1624-1636, 2002.
- (2002) IEEE Trans. on Automatic Control , vol.47 , Issue.10 , pp. 1624-1636
- Ormoneit, D.¹ Glynn, P.W.²

60
- 0036832956
- Kernel-based reinforcement learning
- D. Ormoneit and S. Sen, "Kernel-based reinforcement learning," Machine Learning, vol. 49, pp. 161-178, 2002.
- (2002) Machine Learning , vol.49 , pp. 161-178
- Ormoneit, D.¹ Sen, S.²

61
- 0001473437
- On estimation of a probability density function and mode
- E. Parzen, "On estimation of a probability density function and mode," Ann. Math. Statist., vol. 33, pp. 1065-1076, 1962.
- (1962) Ann. Math. Statist. , vol.33 , pp. 1065-1076
- Parzen, E.¹

62
- 0010932382
- PhD thesis, North-eastern University, Boston, MA
- J. Peng, Efficient Dynamic Programming-Based Learning for Control, PhD thesis, North-eastern University, Boston, MA, 1993.
- (1993) Efficient Dynamic Programming-based Learning for Control
- Peng, J.¹

63
- 84977063352
- Efficient learning and planning within the Dyna framework
- J. Peng and R. J. Williams, "Efficient learning and planning within the Dyna framework," Adaptive Behavior, vol. 1, no. 4. pp. 437-454, 1993.
- (1993) Adaptive Behavior , vol.1 , Issue.4 , pp. 437-454
- Peng, J.¹ Williams, R.J.²

64
- 0031236002
- Adaptive critic designs
- September
- D. V. Prokhorov and D. C. Wunsch II, "Adaptive critic designs," IEEE Trans. on Neural Networks, vol. 8, no. 5, pp. 997-1007, September 1997.
- (1997) IEEE Trans. on Neural Networks , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.V.¹ Wunsch II, D.C.²

65
- 0003998452
- Wiley, New York, NY
- M. L. Puterman, Markov Decision Processes, Wiley, New York, NY, 1994.
- (1994) Markov Decision Processes
- Puterman, M.L.¹

66
- 0041802770
- A survey of industrial model predictive control technology
- S. J. Qin and T. A. Badgwell, "A survey of industrial model predictive control technology," Control Engineering Practice, vol. 11, no. 7, pp. 733-764, 2003.
- (2003) Control Engineering Practice , vol.11 , Issue.7 , pp. 733-764
- Qin, S.J.¹ Badgwell, T.A.²

67
- 0003588929
- Society for Industrial and Applied Mathematics, Philadelphia, PA
- U. Rüde, Mathematical and Computational Techniques for Multilevel Adaptive Methods, Society for Industrial and Applied Mathematics, Philadelphia, PA, 1993.
- (1993) Mathematical and Computational Techniques for Multilevel Adaptive Methods
- Rüde, U.¹

68
- 0003636089
- Technical Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University
- G. A. Rummery and M. Niranjan, On-line Qlearning using connectionist systems, Technical Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University, 1994.
- (1994) On-line Qlearning Using Connectionist Systems
- Rummery, G.A.¹ Niranjan, M.²

69
- 4544257178
- Approximating Q-values with basis function representations
- Hillsdale, NJ
- P. Sabes, "Approximating Q-values with basis function representations," Proc. of the Fourth Connectionist Models Summer School, Hillsdale, NJ, 1993.
- (1993) Proc. of the Fourth Connectionist Models Summer School
- Sabes, P.¹

70
- 0001201756
- Some studies in machine learning using the game of checkers
- A. L. Samuel, "Some studies in machine learning using the game of checkers," IBM J. Res. Develop., pp. 210-229, 1959.
- (1959) IBM J. Res. Develop. , pp. 210-229
- Samuel, A.L.¹

71
- 0001201757
- Some studies in machine learning using the game of checkers II - Recent progress
- A. L. Samuel, "Some studies in machine learning using the game of checkers II - recent progress," IBM J. Res. Develop., pp. 601-617, 1967.
- (1967) IBM J. Res. Develop. , pp. 601-617
- Samuel, A.L.¹

72
- 0031231885
- Experiments with reinforcement learning in problems with continuous state and action spaces
- J. C. Santamaria, R. S. Sutton, and A. Ram, "Experiments with reinforcement learning in problems with continuous state and action spaces," Adaptive Behavior, vol. 6, no. 2, pp. 163-217, 1997.
- (1997) Adaptive Behavior , vol.6 , Issue.2 , pp. 163-217
- Santamaria, J.C.¹ Sutton, R.S.² Ram, A.³

73
- 84898995067
- Learning from demonstration
- M. C. Mozer, M. Jordan, and T. Petsche, editors
- S. Schaal, "Learning from demonstration," In M. C. Mozer, M. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, vol. 9, pp. 1040-1046, 1997.
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 1040-1046
- Schaal, S.¹

74
- 0028374275
- Robot juggling: An implementation of memory-based learning
- S. Schaal and C. Atkeson, "Robot juggling: An implementation of memory-based learning," IEEE Control Systems, vol. 14, no. 1, pp. 57-71, 1994.
- (1994) IEEE Control Systems , vol.14 , Issue.1 , pp. 57-71
- Schaal, S.¹ Atkeson, C.²

75
- 0000433333
- Temporal difference learning of position evaluation in the game of Go
- J. D. Cowan, G. Tesauro, and J. Alspector, editors
- N. N. Schraudolph, P. Dayan, and T. J. Sejnowski, "Temporal difference learning of position evaluation in the game of Go," In J. D. Cowan, G. Tesauro, and J. Alspector, editors, Advances in Neural Information Processing Systems, vol. 6, pp. 817-824, 1994.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 817-824
- Schraudolph, N.N.¹ Dayan, P.² Sejnowski, T.J.³

76
- 84898972974
- Reinforcement learning for dynamic channel allocation in cellular telephone systems
- M. C. Mozer, M. I. Jordan, and T. Petsche, editors
- S. Singh and D. Bertsekas, "Reinforcement learning for dynamic channel allocation in cellular telephone systems," In M. C. Mozer, M. I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems, vol. 9, pp. 974-980, 1997.
- (1997) Advances in Neural Information Processing Systems , vol.9 , pp. 974-980
- Singh, S.¹ Bertsekas, D.²

77
- 0029753630
- Reinforcement learning with replacing eligibility traces
- S. P. Singh and R. S. Sutton, "Reinforcement learning with replacing eligibility traces," Machine Learning, vol. 22, pp. 123-158, 1996.
- (1996) Machine Learning , vol.22 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

78
- 0001898381
- Practical reinforcement learning in continuous spaces
- W. D. Smart and L. P. Kaelbling, "Practical reinforcement learning in continuous spaces," Proc. 17th International Conf. on Machine Learning, pp. 903-910, 2000.
- (2000) Proc. 17th International Conf. on Machine Learning , pp. 903-910
- Smart, W.D.¹ Kaelbling, L.P.²

79
- 84898939480
- Policy gradient methods for reinforce-ment learning with function approximation
- S. A. Solla, T. K. Leen, and K.-R. Muller, editors
- R. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforce-ment learning with function approximation," In S. A. Solla, T. K. Leen, and K.-R. Muller, editors, Advances in Neural Information Processing Systems, vol. 12, pp. 1057-1063, 2000.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

80
- 0003617454
- PhD thesis, University of Massachusetts, Amherst, MA
- R. S. Sutton, Temporal Credit Assignment in Reinforcement Learning, PhD thesis, University of Massachusetts, Amherst, MA, 1984.
- (1984) Temporal Credit Assignment in Reinforcement Learning
- Sutton, R.S.¹

81
- 33847202724
- Learning to predict by the method of temporal differences
- R. S. Sutton, "Learning to predict by the method of temporal differences," Machine Learning, vol. 3.no. 1, pp. 9-44, 1988.
- (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.S.¹

82
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Austin, TX
- R. S. Sutton, "Integrated architectures for learning, planning, and reacting based on approximating dynamic programming," Proc. of the Seventh International Conference on Machine Learning, Austin, TX, 1990.
- (1990) Proc. of the Seventh International Conference on Machine Learning
- Sutton, R.S.¹

83
- 85152618928
- Planning by incremental dynamic programming
- R. S. Sutton, "Planning by incremental dynamic programming," Proc. of the Eighth International Workshop on Machine Learning, pp. 353-357, 1991.
- (1991) Proc. of the Eighth International Workshop on Machine Learning , pp. 353-357
- Sutton, R.S.¹

84
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors
- R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, pp. 1038-1044, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
- Sutton, R.S.¹

85
- 0019537951
- Toward a modern theory of adaptive networks: Expectation and prediction
- R. S. Sutton and A. G. Barto, "Toward a modern theory of adaptive networks: Expectation and prediction," Psycol. Rev., vol. 88, no. 2, pp. 135-170, 1981.
- (1981) Psycol. Rev. , vol.88 , Issue.2 , pp. 135-170
- Sutton, R.S.¹ Barto, A.G.²

86
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

87
- 0034499835
- Enhanced continuous valued Q-learning for real autonomous robots
- M. Takeda, T. Nakamura, M. Imai, T. Ogasawara, and M. Asada, "Enhanced continuous valued Q-learning for real autonomous robots," Advanced Robotics, vol. 14, no. 5, pp. 439-442, 2000.
- (2000) Advanced Robotics , vol.14 , Issue.5 , pp. 439-442
- Takeda, M.¹ Nakamura, T.² Imai, M.³ Ogasawara, T.⁴ Asada, M.⁵

88
- 0001046225
- Practical issues in temporal difference learning
- G. Tesauro, "Practical issues in temporal difference learning," Machine Learning, vol. 8, pp. 257-277, 1992.
- (1992) Machine Learning , vol.8 , pp. 257-277
- Tesauro, G.¹

89
- 0000985504
- TD-Gammon, a self-teaching backgammon program, achieves master-level play
- G. Tesauro, "TD-Gammon, a self-teaching backgammon program, achieves master-level play," Neural Computation, vol. 6, no. 2, pp. 215-219, 1994.
- (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
- Tesauro, G.¹

90
- 0029276036
- Temporal difference learning and TD-Gammon
- G. Tesauro, "Temporal difference learning and TD-Gammon," Communications of the ACM, vol. 38, no. 3, pp. 58-67, 1995.
- (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-67
- Tesauro, G.¹

91
- 0003215153
- Learning to play the game of chess
- G. Tesauro, D. S. Touretzky, and T. K. Leen, editors
- S. Thrun, "Learning to play the game of chess," In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems, vol. 7, 1995.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Thrun, S.¹

92
- 0003270924
- Issues in using function approximation for reinforcement learning
- Hillsdale, NJ
- S. Thrun and A. Schwartz, "Issues in using function approximation for reinforcement learning," Proc. of the Fourth Connectionist Models Summer School, Hillsdale, NJ, 1993.
- (1993) Proc. of the Fourth Connectionist Models Summer School
- Thrun, S.¹ Schwartz, A.²

93
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Machine Learning, vol. 16, pp. 185-202, 1994.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

94
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. on Automatic Control, vol. 42, no. 5, pp. 674-690, 1997.
- (1997) IEEE Trans. on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

95
- 4544283129
- Neuro-dynamic programming: Overview and recent trends
- E. Feinberg and A. Shwartz, editors, Kluwer, Boston, MA
- B. Van Roy, "Neuro-dynamic programming: Overview and recent trends," In E. Feinberg and A. Shwartz, editors, Handbook of Markov Decision Processes: Methods and Applications, Kluwer, Boston, MA, 2001.
- (2001) Handbook of Markov Decision Processes: Methods and Applications
- Van Roy, B.¹

96
- 0004049893
- PhD thesis, University of Cambridge, England
- C. J. C. H. Watkins, Learning from Delayed Rewards, PhD thesis, University of Cambridge, England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

97
- 34249833101
- Q-leaming
- C. J. C. H. Watkins and P. Dayan, "Q-leaming," Machine Learning, vol. 8, pp. 279-292, 1992.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

98
- 0002557583
- Advanced forecasting methods for global crisis warning and models of intelligence'
- P. J. Werbos, "Advanced forecasting methods for global crisis warning and models of intelligence'," General Systems Yearbook, vol. 22, pp. 25-38, 1977.
- (1977) General Systems Yearbook , vol.22 , pp. 25-38
- Werbos, P.J.¹

99
- 0002031779
- Approximate dynamic programming for real-time control and neural modeling
- D. A. White and D. A. Sofge, editors, Van Nostrand Reinhold, New York
- P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, Van Nostrand Reinhold, New York, pp. 493-525, 1992.
- (1992) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , pp. 493-525
- Werbos, P.J.¹

100
- 85152652126
- Complexity and cooperation in Q-leaming
- Evanston, IL
- S. D. Whitehead, "Complexity and cooperation in Q-leaming," Proc. of the Eighth International Workshop on Machine Learning, Evanston, IL, 1991.
- (1991) Proc. of the Eighth International Workshop on Machine Learning
- Whitehead, S.D.¹

101
- 0039967456
- Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems
- Northeastern University, College of Computer Science, Boston, MA
- R. J. Williams and L. C. Baird III, "Analysis of some incremental variants of policy iteration: First steps toward understanding actor-critic learning systems," Technical Report NU-CCS-93-14, Northeastern University, College of Computer Science, Boston, MA, 1993.
- (1993) Technical Report , vol.NU-CCS-93-14
- Williams, R.J.¹ Baird III, L.C.²

102
- 0008612661
- Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation
- J. A. Wilson and E. C. Martinez, "Neuro-fuzzy modeling and control of a batch process involving simultaneous reaction and distillation," Computers & Chemical Engineering, vol. 21S, pp. S1233-S1238, 1997.
- (1997) Computers & Chemical Engineering , vol.21 S
- Wilson, J.A.¹ Martinez, E.C.²

103
- 4544314315
- Stochastic control problems
- B. Friedland, editor, ASME, New York
- M. Wonham, "Stochastic control problems," In B. Friedland, editor, Stochastic Problems in Control, ASME, New York, 1968.
- (1968) Stochastic Problems in Control
- Wonham, M.¹

104
- 4544324989
- PhD thesis, Oregon State University
- W. Zhang, Reinforcement Learning for Job-Shop Scheduling, PhD thesis, Oregon State University, 1996. Also available as Technical Report CS-96-30-1.
- (1996) Reinforcement Learning for Job-shop Scheduling
- Zhang, W.¹

105
- 0005914572
- W. Zhang, Reinforcement Learning for Job-Shop Scheduling, PhD thesis, Oregon State University, 1996. Also available as Technical Report CS-96-30-1.
- Technical Report , vol.CS-96-30-1

106
- 84918834208
- A reinforcement learning approach to job-shop scheduling
- San Francisco, CA
- W. Zhang and T. G. Dietterich, "A reinforcement learning approach to job-shop scheduling," Proc. of the Twelfth International Conference on Machine Learning, San Francisco, CA, pp. 1114-1120, 1995.
- (1995) Proc. of the Twelfth International Conference on Machine Learning , pp. 1114-1120
- Zhang, W.¹ Dietterich, T.G.²

107
- 0001648572
- Highperformance job-shop scheduling with a time-delay TD(λ) network
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors
- W. Zhang and T. G. Dietterich, "Highperformance job-shop scheduling with a time-delay TD(λ) network," In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems, vol. 8, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Zhang, W.¹ Dietterich, T.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.