SCOPUS 정보 검색 플랫폼

Handbook of Learning and Approximate Dynamic Programming

Volumn , Issue , 2004, Pages 311-335

Learning and optimization from a system theoretic perspective

a HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY (Hong Kong)

Author keywords

Algorithm design and analysis; Estimation; Markov processes; Optimization; Sensitivity; Steady state; Stochastic systems

Indexed keywords

ESTIMATION; MARKOV PROCESSES; OPERATIONS RESEARCH; OPTIMIZATION; REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

ALGORITHM DESIGN AND ANALYSIS; MARKOV DECISION PROCESSES; PERFORMANCE POTENTIALS; PERFORMANCE SENSITIVITY; PERTURBATION ANALYSIS; PERTURBATION REALIZATION FACTORS; SENSITIVITY; STEADY STATE;

COMPUTER CONTROL SYSTEMS;

EID: 84986024692 PISSN: None EISSN: None Source Type: Book
DOI: 10.1109/9780470544785.ch12 Document Type: Chapter

Times cited : (4)

References (49)

1
- 0027557742
- Discrete-time controlled Markov processes with average cost criterion: A survey
- A. Arapostathis, V. S. Borkar, E. Femandez-Gaucherand, M. K. Ghosh, and S. I. Marcus, Discrete-time controlled Markov processes with average cost criterion: A survey, SIAM Journal Control and Optimization, vol. 31, pp. 282-344, 1993.
- (1993) SIAM Journal Control and Optimization , vol.31 , pp. 282-344
- Arapostathis, A.¹ Borkar, V.S.² Femandez-Gaucherand, E.³ Ghosh, M.K.⁴ Marcus, S.I.⁵

2
- 0013535965
- Infinite-horizon policy-gradient estimation
- J. Baxter and P. L. Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol. 15, pp. 319-350, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

3
- 0013495368
- Experiments with infinite-horizon policy-gradient estimation
- J. Baxter, P. L. Bartlett, and L. Weaver, Experiments with infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol. 15, pp. 351-381,2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.L.² Weaver, L.³

4
- 0003565783
- Athena Scientific, Belmont, MA
- D. P. Bertsekas, Dynamic Programming and Optimal Control, Volume I and II, Athena Scientific, Belmont, MA, 1995.
- (1995) Dynamic Programming and Optimal Control, Volume I and II
- Bertsekas, D.P.¹

5
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 0004256573
- Addison-Wesley, Reading, MA, 1968. Springer-Verlag, New York
- L. Breiman, Probability, Addison-Wesley, Reading, MA, 1968. Springer-Verlag, New York, 1994.
- (1994) Probability
- Breiman, L.¹

7
- 0003618624
- Springer-Verlag, New York
- P. Bremaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues, Springer-Verlag, New York, 1998.
- (1998) Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues
- Bremaud, P.¹

8
- 0022117237
- Convergence of parameter sensitivity estimates in a stochastic experiment
- X. R. Cao, Convergence of parameter sensitivity estimates in a stochastic experiment, IEEE Trans. Automatic Control, vol. AC-30, pp. 834-843,1985.
- (1985) IEEE Trans. Automatic Control , vol.AC-30 , pp. 834-843
- Cao, X.R.¹

9
- 0042585486
- Sensitivity estimates based on one realization of a stochastic system
- X. R. Cao, Sensitivity estimates based on one realization of a stochastic system, Journal of Statistical Computation and Simulation, vol. 27, pp. 211-232,1987.
- (1987) Journal of Statistical Computation and Simulation , vol.27 , pp. 211-232
- Cao, X.R.¹

10
- 0003983929
- Springer-Verlag, New York
- X. R. Cao, Realization Probabilities: The Dynamics of Queueing Systems, Springer-Verlag, New York, 1994.
- (1994) Realization Probabilities: The Dynamics of Queueing Systems
- Cao, X.R.¹

11
- 0030409198
- A single sample path-based performance sensitivity formula for Markov chains
- X. R. Cao, X. M. Yuan, and L. Qiu, A single sample path-based performance sensitivity formula for Markov chains, IEEE Trans. Automatic Control, vol. 41, pp. 1814-1817,1996.
- (1996) IEEE Trans. Automatic Control , vol.41 , pp. 1814-1817
- Cao, X.R.¹ Yuan, X.M.² Qiu, L.³

12
- 0032027940
- The relation among potentials, perturbation analysis, Markov decision processes, and other topics
- X. R. Cao, The relation among potentials, perturbation analysis, Markov decision processes, and other topics, Journal of Discrete Event Dynamic Systems, vol. 8, pp. 71-87,1998.
- (1998) Journal of Discrete Event Dynamic Systems , vol.8 , pp. 71-87
- Cao, X.R.¹

13
- 0033247533
- Single sample path based optimization of Markov chains
- X. R. Cao, Single sample path based optimization of Markov chains, Journal of Optimization: Theory and Application, vol. 100, no. 3, pp. 527-548,1999.
- (1999) Journal of Optimization: Theory and Application , vol.100 , Issue.3 , pp. 527-548
- Cao, X.R.¹

14
- 85036534462
- submitted
- X. R. Cao, A basic formula for on-line policy-gradient algorithms, submitted, 2004
- (2004) A Basic Formula for On-Line Policy-Gradient Algorithms
- Cao, X.R.¹

15
- 0031258478
- Perturbation realization, potentials and sensitivity analysis of Markov processes
- X. R. Cao and H. F. Chen, Perturbation realization, potentials and sensitivity analysis of Markov processes, IEEE Trans. Automatic Control, vol. 42, pp. 1382-1393, 1997.
- (1997) IEEE Trans. Automatic Control , vol.42 , pp. 1382-1393
- Cao, X.R.¹ Chen, H.F.²

16
- 0033884215
- A unified approach to Markov decision problems and performance sensitivity analysis
- X. R. Cao, A unified approach to Markov decision problems and performance sensitivity analysis, Automatica, vol. 36, pp. 771-774, 2000.
- (2000) Automatica , vol.36 , pp. 771-774
- Cao, X.R.¹

17
- 1542350287
- Constructing performance sensitivities for Markov systems with potentials as building blocks
- Maui, Hawaii
- X. R. Cao, Constructing performance sensitivities for Markov systems with potentials as building blocks, Proc. Of the 42nd IEEE Conference on Decision and Control, Maui, Hawaii, 2003.
- (2003) Proc. Of the 42Nd IEEE Conference on Decision and Control
- Cao, X.R.¹

18
- 0036992818
- Gradient-based policy iteration: An example
- X. R. Cao and H. T. Fang, Gradient-based policy iteration: An example, Proc. Of2002 IEEE Conference on Decision and Control, pp. 3367-3371, 2002.
- (2002) Proc. Of2002 IEEE Conference on Decision and Control , pp. 3367-3371
- Cao, X.R.¹ Fang, H.T.²

19
- 85036555679
- submitted
- X. R. Cao and X. P. Guo, A unified approach to Markov decision problems and sensitivity analysis with discounted and average criteria: The multichain case, submitted.
- A Unified Approach to Markov Decision Problems and Sensitivity Analysis with Discounted and Average Criteria: The Multichain Case
- Cao, X.R.¹ Guo, X.P.²

20
- 0032122986
- Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
- X. R. Cao and Y. W. Wan, Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization, IEEE Trans. Control System Tech, vol. 6, pp. 482-494,1998.
- (1998) IEEE Trans. Control System Tech , vol.6 , pp. 482-494
- Cao, X.R.¹ Wan, Y.W.²

21
- 0003864139
- Kluwer Academic Publishers, Boston
- C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems, Kluwer Academic Publishers, Boston, 1999.
- (1999) Introduction to Discrete Event Systems
- Cassandras, C.G.¹ Lafortune, S.²

22
- 0028466316
- Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
- E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control, vol. 39, pp. 1400-1410,1994.
- (1994) IEEE Trans. Automatic Control , vol.39 , pp. 1400-1410
- Chong, E.K.P.¹ Ramadge, P.J.²

23
- 0003745958
- Prentice-Hall, Englewood Cliffs, NJ
- E. Cinlar, Introduction to Stochastic Processes, Prentice-Hall, Englewood Cliffs, NJ, 1975.
- (1975) Introduction to Stochastic Processes
- Cinlar, E.¹

24
- 2442614974
- Potential-based on-line policy iteration algorithms for Markov decision processes
- H.-T. Fang and X. R. Cao, Potential-based on-line policy iteration algorithms for Markov decision processes, IEEE Trans. Automatic Control, vol. 49, no. 4, pp. 493-505,2004.
- (2004) IEEE Trans. Automatic Control , vol.49 , Issue.4 , pp. 493-505
- Fang, H.-T.¹ Cao, X.R.²

25
- 0023543886
- Likelihood ratio gradient estimation: An overview
- A. Thesen, H. Grant, and K. D. Kelton, (eds.), Society for Computer Simulation, San Diego, CA
- P. W. Glynn, Likelihood ratio gradient estimation: An overview, in A. Thesen, H. Grant, and K. D. Kelton, (eds.), Proc. Of1987 Winter Simulation Conference, pp. 366-375, Society for Computer Simulation, San Diego, CA, 1988.
- (1988) Proc. Of1987 Winter Simulation Conference , pp. 366-375
- Glynn, P.W.¹

26
- 0024932244
- Optimization of stochastic systems via simulation
- A. Thesen, H. Grant, and K. D. Kelton, (eds.), Society for Computer Simulation, San Diego, CA
- P. W. Glynn, Optimization of stochastic systems via simulation, in A. Thesen, H. Grant, and K. D. Kelton, (eds.), Proc. Of1987 Winter Simulation Conference, pp. 90-105, Society for Computer Simulation, San Diego, CA, 1988.
- (1988) Proc. Of1987 Winter Simulation Conference , pp. 90-105
- Glynn, P.W.¹

27
- 0020802518
- Perturbation analysis and optimization of queueing networks
- Y. C. Ho and X. R. Cao, Perturbation analysis and optimization of queueing networks, Journal of Optimization Theory and Applications, vol. 40, no. 4, pp. 559-582, 1983.
- (1983) Journal of Optimization Theory and Applications , vol.40 , Issue.4 , pp. 559-582
- Ho, Y.C.¹ Cao, X.R.²

28
- 0003585978
- Kluwer Academic Publisher, Boston
- Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete-Event Dynamic Systems, Kluwer Academic Publisher, Boston, 1991.
- (1991) Perturbation Analysis of Discrete-Event Dynamic Systems
- Ho, Y.C.¹ Cao, X.R.²

29
- 0032653557
- Explanation of goal softening in ordinal optimization
- L. H. Lee, E. T. K. Lau and Y. C. Ho, Explanation of goal softening in ordinal optimization, IEEE Trans. Automatic Control, vol. 44, pp. 94-99,1999.
- (1999) IEEE Trans. Automatic Control , vol.44 , pp. 94-99
- Lee, L.H.¹ Lau, E.T.K.² Ho, Y.C.³

30
- 85153938292
- Reinforcement learning algorithm for partially observable Markov decision problems
- Morgan Kaufman, San Francisco, CA
- T. Jaakkola, S. P. Singh, and M. I. Jordan, Reinforcement learning algorithm for partially observable Markov decision problems, Advances in Neural Information Processing Systems, vol. 7, pp. 345-352, Morgan Kaufman, San Francisco, CA, 1995.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 345-352
- Jaakkola, T.¹ Singh, S.P.² Jordan, M.I.³

31
- 0003821875
- Wiley, New York
- L. Kleinrock, Queueing Systems, Vol. 1: Theory, Wiley, New York, 1975.
- (1975) Queueing Systems, Vol. 1: Theory
- Kleinrock, L.¹

32
- 0343893613
- Actor-critic-type learning algorithms for Markov decision processes
- V. R. Konda and V. S. Borkar, Actor-critic-type learning algorithms for Markov decision processes, SIAM Journal of Control Optimization, vol. 38, pp. 94-123, 1999.
- (1999) SIAM Journal of Control Optimization , vol.38 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

33
- 0035249254
- Simulation-based optimization of Markov reward processes
- P. Marbach and J. N. Tsitsiklis, Simulation-based optimization of Markov reward processes, IEEE Trans. Automatic Control, vol. 46, pp. 191-209, 2001.
- (2001) IEEE Trans. Automatic Control , vol.46 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, J.N.²

34
- 0037288469
- Approximate gradient methods in policy-space optimization of Markov reward processes
- P. Marbach and J. N. Tsitsiklis, Approximate gradient methods in policy-space optimization of Markov reward processes, Journal of Discrete Event Dynamic Systems,1, vol. 13, no. 1, pp. 111-148, 2003.
- (2003) Journal of Discrete Event Dynamic Systems,1 , vol.13 , Issue.1 , pp. 111-148
- Marbach, P.¹ Tsitsiklis, J.N.²

35
- 0003637131
- SpringerVerlag, London
- S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability, SpringerVerlag, London, 1993.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

36
- 0001621211
- Sample-path optimization of convex stochastic performance functions
- E. L. Plambeck, B. R. Fu, S. M. Robinson, and R. Suri, Sample-path optimization of convex stochastic performance functions, Math. Program. B, vol. 75, pp. 137-176, 1996.
- (1996) Math. Program. B , vol.75 , pp. 137-176
- Plambeck, E.L.¹ Fu, B.R.² Robinson, S.M.³ Suri, R.⁴

37
- 85102627959
- Wiley, New York
- M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley, New York, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

38
- 0024735795
- Sensitivity analysis via likelihood ratio
- M. I. Reiman and A. Weiss, Sensitivity analysis via likelihood ratio, Operations Research, vol. 37, pp. 830-844,1989.
- (1989) Operations Research , vol.37 , pp. 830-844
- Reiman, M.I.¹ Weiss, A.²

39
- 0000016172
- A stochastic approximation method
- H. Robbins and S. Monro, A stochastic approximation method, Annals of Mathematical Statistics, vol. 22, pp. 400-407,1951.
- (1951) Annals of Mathematical Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

40
- 0003418592
- Wiley, New York
- R. V. Rubinstein, Monte Carlo Optimization, Simulation, and Sensitivity Analysis of Queueing Networks, Wiley, New York, 1986.
- (1986) Monte Carlo Optimization, Simulation, and Sensitivity Analysis of Queueing Networks
- Rubinstein, R.V.¹

41
- 0024621270
- Single run optimization of discrete event simulations-An empirical study using the M/M/l queue
- R. Sun and Y. T. Leung, Single run optimization of discrete event simulations-An empirical study using the M/M/l queue, HE Trans., vol. 21, pp. 35-49, 1989.
- (1989) HE Trans. , vol.21 , pp. 35-49
- Sun, R.¹ Leung, Y.T.²

42
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol. 3, pp. 835-846,1988.
- (1988) Machine Learning , vol.3 , pp. 835-846
- Sutton, R.S.¹

43
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

44
- 0042758707
- Actor-critic algorithms
- MIT, Cambridge, MA, Preprint
- J. N. Tsitsiklis and V. R. Konda, Actor-critic algorithms, Tech. Rep., Lab. Inform. Decision Systems, MIT, Cambridge, MA, 2001, Preprint.
- (2001) Tech. Rep., Lab. Inform. Decision Systems
- Tsitsiklis, J.N.¹ Konda, V.R.²

45
- 0029752470
- Feature-based methods for large-scale dynamic programming
- J. N. Tsitsiklis and B. Van Roy, Feature-based methods for large-scale dynamic programming, Machine Learning, vol. 22, pp. 59-94,1994.
- (1994) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

46
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Trans. Automatic Control, vol. 42, pp. 674-690, 1997.
- (1997) IEEE Trans. Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

47
- 0033221519
- Average cost temporal-difference learning
- J. N. Tsitsiklis and B. Van Roy, Average cost temporal-difference learning, Automatica, vol. 35, pp. 1799-1808,1999.
- (1999) Automatica , vol.35 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

48
- 0004049893
- Ph.D. Thesis, Cambridge University, Cambridge, UK
- C. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, Cambridge, UK, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

49
- 34249833101
- Q-leaming
- C. Watkins and P. Dayan, Q-leaming, Machine Learning, vol. 8, pp. 279-292, 1992.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.