메뉴 건너뛰기




Volumn 60, Issue 3, 2015, Pages 743-758

A new optimal stepsize for approximate dynamic programming

Author keywords

Approximate dynamic programming (ADP); Kalman filter; simulation based optimization; stochastic approximation

Indexed keywords

APPROXIMATION ALGORITHMS; COMPUTER SYSTEMS PROGRAMMING; KALMAN FILTERS; OPERATIONS RESEARCH; OPTIMIZATION; STOCHASTIC SYSTEMS;

EID: 84923616759     PISSN: 00189286     EISSN: None     Source Type: Journal    
DOI: 10.1109/TAC.2014.2357134     Document Type: Article
Times cited : (10)

References (45)
  • 1
    • 84862009362 scopus 로고    scopus 로고
    • Computing near-optimal policies in generalized joint replenishment
    • D. Adelman and D. Klabjan, "Computing near-optimal policies in generalized joint replenishment," INFORMS J. Comp., vol. 24, no. 1, pp. 148-164, 2012.
    • (2012) INFORMS J. Comp. , vol.24 , Issue.1 , pp. 148-164
    • Adelman, D.1    Klabjan, D.2
  • 2
    • 77952074893 scopus 로고    scopus 로고
    • Approximate dynamic programming for ambulance redeployment
    • M. S. Maxwell, M. Restrepo, S. G. Henderson, and H. Topaloglu, "Approximate dynamic programming for ambulance redeployment,"INFORMS J. Comp., vol. 22, no. 2, pp. 266-281, 2010.
    • (2010) INFORMS J. Comp. , vol.22 , Issue.2 , pp. 266-281
    • Maxwell, M.S.1    Restrepo, M.2    Henderson, S.G.3    Topaloglu, H.4
  • 3
    • 84862297978 scopus 로고    scopus 로고
    • Approximate dynamic programming algorithms for optimal dosage decisions in controlled ovarian hyperstimulation
    • M. He, L. Zhao, and W. B. Powell, "Approximate dynamic programming algorithms for optimal dosage decisions in controlled ovarian hyperstimulation,"Eur. J. Oper. Res., vol. 222, no. 2, pp. 328-340, 2012.
    • (2012) Eur. J. Oper. Res. , vol.222 , Issue.2 , pp. 328-340
    • He, M.1    Zhao, L.2    Powell, W.B.3
  • 4
    • 77953561812 scopus 로고    scopus 로고
    • An approximate dynamic programming approach to benchmark practice-based heuristics for natural gas storage valuation
    • G. Lai, F. Margot, and N. Secomandi, "An approximate dynamic programming approach to benchmark practice-based heuristics for natural gas storage valuation," Oper. Res., vol. 58, no. 3, pp. 564-582, 2010.
    • (2010) Oper. Res. , vol.58 , Issue.3 , pp. 564-582
    • Lai, G.1    Margot, F.2    Secomandi, N.3
  • 5
    • 79952360158 scopus 로고    scopus 로고
    • Optimal day-ahead trading and storage of renewable energies - an approximate dynamic programming approach
    • N. Löhndorf and S. Minner, "Optimal day-ahead trading and storage of renewable energies - an approximate dynamic programming approach,"Energy Syst., vol. 1, no. 1, pp. 61-77, 2010.
    • (2010) Energy Syst , vol.1 , Issue.1 , pp. 61-77
    • Löhndorf, N.1    Minner, S.2
  • 6
    • 77949559841 scopus 로고    scopus 로고
    • Optimal commodity trading with a capacitated storage asset
    • N. Secomandi, "Optimal commodity trading with a capacitated storage asset," Manag. Sci., vol. 56, no. 3, pp. 449-467, 2010.
    • (2010) Manag. Sci. , vol.56 , Issue.3 , pp. 449-467
    • Secomandi, N.1
  • 7
    • 70449631674 scopus 로고    scopus 로고
    • An approximate dynamic programming approach to network revenue management with customer choice
    • D. Zhang and D. Adelman, "An approximate dynamic programming approach to network revenue management with customer choice,"Transportation Sci., vol. 43, no. 3, pp. 381-394, 2009.
    • (2009) Transportation Sci , vol.43 , Issue.3 , pp. 381-394
    • Zhang, D.1    Adelman, D.2
  • 9
    • 63449141834 scopus 로고    scopus 로고
    • An approximate dynamic programming algorithm for large-scale fleet management: A case application
    • H. P. Simão, J. Day, A. P. George, T. Gifford, J. Nienow, and W. B. Powell, "An approximate dynamic programming algorithm for large-scale fleet management: A case application," Transportation Sci., vol. 43, no. 2, pp. 178-197, 2009.
    • (2009) Transportation Sci , vol.43 , Issue.2 , pp. 178-197
    • Simão, H.P.1    Day, J.2    George, A.P.3    Gifford, T.4    Nienow, J.5    Powell, W.B.6
  • 10
    • 84873152819 scopus 로고    scopus 로고
    • SMART: A stochastic multiscale model for the analysis of energy resources, technology and policy
    • W. B. Powell, A. George, A. Lamont, J. Stewart, and W. R. Scott, "SMART: A stochastic multiscale model for the analysis of energy resources, technology and policy," INFORMS J. Comp., vol. 24, no. 4, pp. 665-682, 2012.
    • (2012) INFORMS J. Comp. , vol.24 , Issue.4 , pp. 665-682
    • Powell, W.B.1    George, A.2    Lamont, A.3    Stewart, J.4    Scott, W.R.5
  • 13
    • 84968519017 scopus 로고
    • Functional approximations and dynamic programming
    • R. Bellman and S. Dreyfus, "Functional approximations and dynamic programming," Math. Tables Aids Comp., vol. 13, pp. 247-251, 1959.
    • (1959) Math. Tables Aids Comp. , vol.13 , pp. 247-251
    • Bellman, R.1    Dreyfus, S.2
  • 19
    • 34249833101 scopus 로고
    • Q-learning
    • C. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
    • Watkins, C.1    Dayan, P.2
  • 20
    • 33645566756 scopus 로고    scopus 로고
    • Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems
    • H. Topaloglu and W. B. Powell, "Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems,"INFORMS J. Comp., vol. 18, no. 1, pp. 31-42, 2006.
    • (2006) INFORMS J. Comp. , vol.18 , Issue.1 , pp. 31-42
    • Topaloglu, H.1    Powell, W.B.2
  • 21
    • 0004093909 scopus 로고
    • Cambridge, U.K.: Cambridge Univ. Press
    • M. Wasan, Stochastic Approximation. Cambridge, U.K.: Cambridge Univ. Press, 1969.
    • (1969) Stochastic Approximation
    • Wasan, M.1
  • 23
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning,"Machine Learning, vol. 16, pp. 185-202, 1994.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.1
  • 24
    • 4243385070 scopus 로고
    • Convergence of Stochastic Iterative Dynamic Programming algorithms
    • J. Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers
    • T. Jaakkola, M. Jordan, and S. Singh, "Convergence of stochastic iterative dynamic programming algorithms," in Advances in Neural Information Processing Systems, vol. 6, J. Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers, 1994, pp. 703-710.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 703-710
    • Jaakkola, T.1    Jordan, M.2    Singh, S.3
  • 26
    • 84898998140 scopus 로고    scopus 로고
    • The Asymptotic Convergence-rate of Q-learning
    • M. Jordan, M. Kearns, and S. Solla , Eds. Cambridge, MA, USA: MIT Press
    • C. Szepesvári, "The asymptotic convergence-rate of Q-learning," in Advances in Neural Information Processing Systems, vol. 10, M. Jordan, M. Kearns, and S. Solla, Eds. Cambridge, MA, USA: MIT Press, 1997, pp. 1064-1070.
    • (1997) Advances in Neural Information Processing Systems , vol.10 , pp. 1064-1070
    • Szepesvári, C.1
  • 28
    • 60749124483 scopus 로고    scopus 로고
    • On Step Sizes, Stochastic Shortest Paths, Survival Probabilities in Reinforcement learning
    • S. J. Mason, R. R. Hill, L.Mönch, O. Rose, T. Jefferson, and J. W. Fowler, Eds.
    • A. Gosavi, "On step sizes, stochastic shortest paths, survival probabilities in reinforcement learning," in Proc. Winter Simul. Conf., S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, and J. W. Fowler, Eds., 2008, pp. 525-531.
    • (2008) Proc. Winter Simul. Conf. , pp. 525-531
    • Gosavi, A.1
  • 29
    • 84993077818 scopus 로고    scopus 로고
    • Approximate dynamic programming for management of high-value spare parts
    • H. P. Simão and W. B. Powell, "Approximate dynamic programming for management of high-value spare parts," J. Manufact. Technol. Manag., vol. 20, no. 2, pp. 147-160, 2009.
    • (2009) J. Manufact. Technol. Manag. , vol.20 , Issue.2 , pp. 147-160
    • Simão, H.P.1    Powell, W.B.2
  • 31
    • 80052250414 scopus 로고    scopus 로고
    • Adaptive subgradient methods for online learning and stochastic optimization
    • J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," J. Machine Learning Res., vol. 12, pp. 2121-2159, 2011.
    • (2011) J. Machine Learning Res. , vol.12 , pp. 2121-2159
    • Duchi, J.1    Hazan, E.2    Singer, Y.3
  • 36
    • 33646435300 scopus 로고    scopus 로고
    • A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
    • D. P. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., vol. 16, pp. 207-239, 2006.
    • (2006) Discrete Event Dyn. Syst. , vol.16 , pp. 207-239
    • Choi, D.P.1    Van Roy, B.2
  • 37
    • 33748998787 scopus 로고    scopus 로고
    • Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    • A. George and W. B. Powell, "Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming," Machine Learning, vol. 65, no. 1, pp. 167-198, 2006.
    • (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
    • George, A.1    Powell, W.B.2
  • 38
    • 85053849310 scopus 로고    scopus 로고
    • Temporal Difference Updating without a Learning rate
    • J. C. Platt, D. Koller, Y. Singer, and S. Roweis , Eds. C ambridge, MA, USA: MIT Press
    • M. Hutter and S. Legg, "Temporal difference updating without a learning rate," in Advances in Neural Information Processing Systems, vol. 20, J. C. Platt, D. Koller, Y. Singer, and S. Roweis, Eds. Cambridge, MA, USA: MIT Press, 2007, pp. 705-712.
    • (2007) Advances in Neural Information Processing Systems , vol.20 , pp. 705-712
    • Hutter, M.1    Legg, S.2
  • 39
    • 77956513316 scopus 로고    scopus 로고
    • A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation
    • R. Sutton, C. Szepesvári, and H. Maei, "A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation,"Adv. Neural Inform. Processing Syst., vol. 21, pp. 1609-1616, 2008.
    • (2008) Adv. Neural Inform. Processing Syst. , vol.21 , pp. 1609-1616
    • Sutton, R.1    Szepesvári, C.2    Maei, H.3
  • 41
    • 84923605303 scopus 로고    scopus 로고
    • A new optimal stepsize for approximate dynamic programming
    • I. O. Ryzhov, P. I. Frazier, and W. B. Powell, "A new optimal stepsize for approximate dynamic programming," IEEE Trans. Autom. Control. [Online]. Available: http://arxiv.org/abs/1407.2676
    • IEEE Trans. Autom. Control
    • Ryzhov, I.O.1    Frazier, P.I.2    Powell, W.B.3
  • 43
    • 81455141800 scopus 로고    scopus 로고
    • General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm
    • M. Broadie, D. Cicek, and A. Zeevi, "General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm,"Oper. Res., vol. 59, no. 5, pp. 1211-1224, 2011.
    • (2011) Oper. Res. , vol.59 , Issue.5 , pp. 1211-1224
    • Broadie, M.1    Cicek, D.2    Zeevi, A.3
  • 45
    • 56349109509 scopus 로고    scopus 로고
    • Value function approximation using multiple aggregation for multiattribute resource management
    • A. George,W. B. Powell, and S. R. Kulkarni, "Value function approximation using multiple aggregation for multiattribute resource management,"J. Machine Learning Res., vol. 9, pp. 2079-2111, 2008.
    • (2008) J. Machine Learning Res. , vol.9 , pp. 2079-2111
    • George, A.1    Powell, W.B.2    Kulkarni, S.R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.