메뉴 건너뛰기




Volumn , Issue , 2004, Pages 311-335

Learning and optimization from a system theoretic perspective

Author keywords

Algorithm design and analysis; Estimation; Markov processes; Optimization; Sensitivity; Steady state; Stochastic systems

Indexed keywords

ESTIMATION; MARKOV PROCESSES; OPERATIONS RESEARCH; OPTIMIZATION; REINFORCEMENT LEARNING; STOCHASTIC CONTROL SYSTEMS; STOCHASTIC SYSTEMS;

EID: 84986024692     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1109/9780470544785.ch12     Document Type: Chapter
Times cited : (4)

References (49)
  • 6
    • 0004256573 scopus 로고
    • Addison-Wesley, Reading, MA, 1968. Springer-Verlag, New York
    • L. Breiman, Probability, Addison-Wesley, Reading, MA, 1968. Springer-Verlag, New York, 1994.
    • (1994) Probability
    • Breiman, L.1
  • 8
    • 0022117237 scopus 로고
    • Convergence of parameter sensitivity estimates in a stochastic experiment
    • X. R. Cao, Convergence of parameter sensitivity estimates in a stochastic experiment, IEEE Trans. Automatic Control, vol. AC-30, pp. 834-843,1985.
    • (1985) IEEE Trans. Automatic Control , vol.AC-30 , pp. 834-843
    • Cao, X.R.1
  • 9
    • 0042585486 scopus 로고
    • Sensitivity estimates based on one realization of a stochastic system
    • X. R. Cao, Sensitivity estimates based on one realization of a stochastic system, Journal of Statistical Computation and Simulation, vol. 27, pp. 211-232,1987.
    • (1987) Journal of Statistical Computation and Simulation , vol.27 , pp. 211-232
    • Cao, X.R.1
  • 11
    • 0030409198 scopus 로고    scopus 로고
    • A single sample path-based performance sensitivity formula for Markov chains
    • X. R. Cao, X. M. Yuan, and L. Qiu, A single sample path-based performance sensitivity formula for Markov chains, IEEE Trans. Automatic Control, vol. 41, pp. 1814-1817,1996.
    • (1996) IEEE Trans. Automatic Control , vol.41 , pp. 1814-1817
    • Cao, X.R.1    Yuan, X.M.2    Qiu, L.3
  • 12
    • 0032027940 scopus 로고    scopus 로고
    • The relation among potentials, perturbation analysis, Markov decision processes, and other topics
    • X. R. Cao, The relation among potentials, perturbation analysis, Markov decision processes, and other topics, Journal of Discrete Event Dynamic Systems, vol. 8, pp. 71-87,1998.
    • (1998) Journal of Discrete Event Dynamic Systems , vol.8 , pp. 71-87
    • Cao, X.R.1
  • 13
    • 0033247533 scopus 로고    scopus 로고
    • Single sample path based optimization of Markov chains
    • X. R. Cao, Single sample path based optimization of Markov chains, Journal of Optimization: Theory and Application, vol. 100, no. 3, pp. 527-548,1999.
    • (1999) Journal of Optimization: Theory and Application , vol.100 , Issue.3 , pp. 527-548
    • Cao, X.R.1
  • 15
    • 0031258478 scopus 로고    scopus 로고
    • Perturbation realization, potentials and sensitivity analysis of Markov processes
    • X. R. Cao and H. F. Chen, Perturbation realization, potentials and sensitivity analysis of Markov processes, IEEE Trans. Automatic Control, vol. 42, pp. 1382-1393, 1997.
    • (1997) IEEE Trans. Automatic Control , vol.42 , pp. 1382-1393
    • Cao, X.R.1    Chen, H.F.2
  • 16
    • 0033884215 scopus 로고    scopus 로고
    • A unified approach to Markov decision problems and performance sensitivity analysis
    • X. R. Cao, A unified approach to Markov decision problems and performance sensitivity analysis, Automatica, vol. 36, pp. 771-774, 2000.
    • (2000) Automatica , vol.36 , pp. 771-774
    • Cao, X.R.1
  • 17
    • 1542350287 scopus 로고    scopus 로고
    • Constructing performance sensitivities for Markov systems with potentials as building blocks
    • Maui, Hawaii
    • X. R. Cao, Constructing performance sensitivities for Markov systems with potentials as building blocks, Proc. Of the 42nd IEEE Conference on Decision and Control, Maui, Hawaii, 2003.
    • (2003) Proc. Of the 42Nd IEEE Conference on Decision and Control
    • Cao, X.R.1
  • 20
    • 0032122986 scopus 로고    scopus 로고
    • Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
    • X. R. Cao and Y. W. Wan, Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization, IEEE Trans. Control System Tech, vol. 6, pp. 482-494,1998.
    • (1998) IEEE Trans. Control System Tech , vol.6 , pp. 482-494
    • Cao, X.R.1    Wan, Y.W.2
  • 22
    • 0028466316 scopus 로고
    • Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
    • E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control, vol. 39, pp. 1400-1410,1994.
    • (1994) IEEE Trans. Automatic Control , vol.39 , pp. 1400-1410
    • Chong, E.K.P.1    Ramadge, P.J.2
  • 24
    • 2442614974 scopus 로고    scopus 로고
    • Potential-based on-line policy iteration algorithms for Markov decision processes
    • H.-T. Fang and X. R. Cao, Potential-based on-line policy iteration algorithms for Markov decision processes, IEEE Trans. Automatic Control, vol. 49, no. 4, pp. 493-505,2004.
    • (2004) IEEE Trans. Automatic Control , vol.49 , Issue.4 , pp. 493-505
    • Fang, H.-T.1    Cao, X.R.2
  • 25
    • 0023543886 scopus 로고
    • Likelihood ratio gradient estimation: An overview
    • A. Thesen, H. Grant, and K. D. Kelton, (eds.), Society for Computer Simulation, San Diego, CA
    • P. W. Glynn, Likelihood ratio gradient estimation: An overview, in A. Thesen, H. Grant, and K. D. Kelton, (eds.), Proc. Of1987 Winter Simulation Conference, pp. 366-375, Society for Computer Simulation, San Diego, CA, 1988.
    • (1988) Proc. Of1987 Winter Simulation Conference , pp. 366-375
    • Glynn, P.W.1
  • 26
    • 0024932244 scopus 로고
    • Optimization of stochastic systems via simulation
    • A. Thesen, H. Grant, and K. D. Kelton, (eds.), Society for Computer Simulation, San Diego, CA
    • P. W. Glynn, Optimization of stochastic systems via simulation, in A. Thesen, H. Grant, and K. D. Kelton, (eds.), Proc. Of1987 Winter Simulation Conference, pp. 90-105, Society for Computer Simulation, San Diego, CA, 1988.
    • (1988) Proc. Of1987 Winter Simulation Conference , pp. 90-105
    • Glynn, P.W.1
  • 27
    • 0020802518 scopus 로고
    • Perturbation analysis and optimization of queueing networks
    • Y. C. Ho and X. R. Cao, Perturbation analysis and optimization of queueing networks, Journal of Optimization Theory and Applications, vol. 40, no. 4, pp. 559-582, 1983.
    • (1983) Journal of Optimization Theory and Applications , vol.40 , Issue.4 , pp. 559-582
    • Ho, Y.C.1    Cao, X.R.2
  • 29
    • 0032653557 scopus 로고    scopus 로고
    • Explanation of goal softening in ordinal optimization
    • L. H. Lee, E. T. K. Lau and Y. C. Ho, Explanation of goal softening in ordinal optimization, IEEE Trans. Automatic Control, vol. 44, pp. 94-99,1999.
    • (1999) IEEE Trans. Automatic Control , vol.44 , pp. 94-99
    • Lee, L.H.1    Lau, E.T.K.2    Ho, Y.C.3
  • 30
    • 85153938292 scopus 로고
    • Reinforcement learning algorithm for partially observable Markov decision problems
    • Morgan Kaufman, San Francisco, CA
    • T. Jaakkola, S. P. Singh, and M. I. Jordan, Reinforcement learning algorithm for partially observable Markov decision problems, Advances in Neural Information Processing Systems, vol. 7, pp. 345-352, Morgan Kaufman, San Francisco, CA, 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 345-352
    • Jaakkola, T.1    Singh, S.P.2    Jordan, M.I.3
  • 32
    • 0343893613 scopus 로고    scopus 로고
    • Actor-critic-type learning algorithms for Markov decision processes
    • V. R. Konda and V. S. Borkar, Actor-critic-type learning algorithms for Markov decision processes, SIAM Journal of Control Optimization, vol. 38, pp. 94-123, 1999.
    • (1999) SIAM Journal of Control Optimization , vol.38 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 33
    • 0035249254 scopus 로고    scopus 로고
    • Simulation-based optimization of Markov reward processes
    • P. Marbach and J. N. Tsitsiklis, Simulation-based optimization of Markov reward processes, IEEE Trans. Automatic Control, vol. 46, pp. 191-209, 2001.
    • (2001) IEEE Trans. Automatic Control , vol.46 , pp. 191-209
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 34
    • 0037288469 scopus 로고    scopus 로고
    • Approximate gradient methods in policy-space optimization of Markov reward processes
    • P. Marbach and J. N. Tsitsiklis, Approximate gradient methods in policy-space optimization of Markov reward processes, Journal of Discrete Event Dynamic Systems,1, vol. 13, no. 1, pp. 111-148, 2003.
    • (2003) Journal of Discrete Event Dynamic Systems,1 , vol.13 , Issue.1 , pp. 111-148
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 36
    • 0001621211 scopus 로고    scopus 로고
    • Sample-path optimization of convex stochastic performance functions
    • E. L. Plambeck, B. R. Fu, S. M. Robinson, and R. Suri, Sample-path optimization of convex stochastic performance functions, Math. Program. B, vol. 75, pp. 137-176, 1996.
    • (1996) Math. Program. B , vol.75 , pp. 137-176
    • Plambeck, E.L.1    Fu, B.R.2    Robinson, S.M.3    Suri, R.4
  • 38
    • 0024735795 scopus 로고
    • Sensitivity analysis via likelihood ratio
    • M. I. Reiman and A. Weiss, Sensitivity analysis via likelihood ratio, Operations Research, vol. 37, pp. 830-844,1989.
    • (1989) Operations Research , vol.37 , pp. 830-844
    • Reiman, M.I.1    Weiss, A.2
  • 41
    • 0024621270 scopus 로고
    • Single run optimization of discrete event simulations-An empirical study using the M/M/l queue
    • R. Sun and Y. T. Leung, Single run optimization of discrete event simulations-An empirical study using the M/M/l queue, HE Trans., vol. 21, pp. 35-49, 1989.
    • (1989) HE Trans. , vol.21 , pp. 35-49
    • Sun, R.1    Leung, Y.T.2
  • 42
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol. 3, pp. 835-846,1988.
    • (1988) Machine Learning , vol.3 , pp. 835-846
    • Sutton, R.S.1
  • 45
    • 0029752470 scopus 로고
    • Feature-based methods for large-scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy, Feature-based methods for large-scale dynamic programming, Machine Learning, vol. 22, pp. 59-94,1994.
    • (1994) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 46
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Trans. Automatic Control, vol. 42, pp. 674-690, 1997.
    • (1997) IEEE Trans. Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 47
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • J. N. Tsitsiklis and B. Van Roy, Average cost temporal-difference learning, Automatica, vol. 35, pp. 1799-1808,1999.
    • (1999) Automatica , vol.35 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 48
    • 0004049893 scopus 로고
    • Ph.D. Thesis, Cambridge University, Cambridge, UK
    • C. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, Cambridge, UK, 1989.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.