SCOPUS 정보 검색 플랫폼

IEEE Transactions on Automatic Control

Volumn 60, Issue 3, 2015, Pages 743-758

A new optimal stepsize for approximate dynamic programming

(3) Ryzhov, Ilya O a Frazier, Peter I b Powell, Warren B c

a UNIVERSITY OF MARYLAND (United States)

b School of Operations Research and Industrial Engineering (United States)

c Princeton University (United States)

Author keywords

Approximate dynamic programming (ADP); Kalman filter; simulation based optimization; stochastic approximation

Indexed keywords

APPROXIMATION ALGORITHMS; COMPUTER SYSTEMS PROGRAMMING; KALMAN FILTERS; OPERATIONS RESEARCH; OPTIMIZATION; STOCHASTIC SYSTEMS;

APPROXIMATE DYNAMIC PROGRAMMING; FASTER CONVERGENCE; LARGE-SCALE TRANSPORTATION; NUMERICAL EXPERIMENTS; RESEARCH APPLICATIONS; SIMULATION-BASED OPTIMIZATIONS; STOCHASTIC APPROXIMATIONS; VALUE FUNCTION APPROXIMATION;

DYNAMIC PROGRAMMING;

EID: 84923616759 PISSN: 00189286 EISSN: None Source Type: Journal
DOI: 10.1109/TAC.2014.2357134 Document Type: Article

Times cited : (10)

References (45)

1
- 84862009362
- Computing near-optimal policies in generalized joint replenishment
- D. Adelman and D. Klabjan, "Computing near-optimal policies in generalized joint replenishment," INFORMS J. Comp., vol. 24, no. 1, pp. 148-164, 2012.
- (2012) INFORMS J. Comp. , vol.24 , Issue.1 , pp. 148-164
- Adelman, D.¹ Klabjan, D.²

2
- 77952074893
- Approximate dynamic programming for ambulance redeployment
- M. S. Maxwell, M. Restrepo, S. G. Henderson, and H. Topaloglu, "Approximate dynamic programming for ambulance redeployment,"INFORMS J. Comp., vol. 22, no. 2, pp. 266-281, 2010.
- (2010) INFORMS J. Comp. , vol.22 , Issue.2 , pp. 266-281
- Maxwell, M.S.¹ Restrepo, M.² Henderson, S.G.³ Topaloglu, H.⁴

3
- 84862297978
- Approximate dynamic programming algorithms for optimal dosage decisions in controlled ovarian hyperstimulation
- M. He, L. Zhao, and W. B. Powell, "Approximate dynamic programming algorithms for optimal dosage decisions in controlled ovarian hyperstimulation,"Eur. J. Oper. Res., vol. 222, no. 2, pp. 328-340, 2012.
- (2012) Eur. J. Oper. Res. , vol.222 , Issue.2 , pp. 328-340
- He, M.¹ Zhao, L.² Powell, W.B.³

4
- 77953561812
- An approximate dynamic programming approach to benchmark practice-based heuristics for natural gas storage valuation
- G. Lai, F. Margot, and N. Secomandi, "An approximate dynamic programming approach to benchmark practice-based heuristics for natural gas storage valuation," Oper. Res., vol. 58, no. 3, pp. 564-582, 2010.
- (2010) Oper. Res. , vol.58 , Issue.3 , pp. 564-582
- Lai, G.¹ Margot, F.² Secomandi, N.³

5
- 79952360158
- Optimal day-ahead trading and storage of renewable energies - an approximate dynamic programming approach
- N. Löhndorf and S. Minner, "Optimal day-ahead trading and storage of renewable energies - an approximate dynamic programming approach,"Energy Syst., vol. 1, no. 1, pp. 61-77, 2010.
- (2010) Energy Syst , vol.1 , Issue.1 , pp. 61-77
- Löhndorf, N.¹ Minner, S.²

6
- 77949559841
- Optimal commodity trading with a capacitated storage asset
- N. Secomandi, "Optimal commodity trading with a capacitated storage asset," Manag. Sci., vol. 56, no. 3, pp. 449-467, 2010.
- (2010) Manag. Sci. , vol.56 , Issue.3 , pp. 449-467
- Secomandi, N.¹

7
- 70449631674
- An approximate dynamic programming approach to network revenue management with customer choice
- D. Zhang and D. Adelman, "An approximate dynamic programming approach to network revenue management with customer choice,"Transportation Sci., vol. 43, no. 3, pp. 381-394, 2009.
- (2009) Transportation Sci , vol.43 , Issue.3 , pp. 381-394
- Zhang, D.¹ Adelman, D.²

8
- 0031343837
- Approximate dynamic programming for sensor management
- th IEEE Conf. Decision Control, 1997, vol. 2, pp. 1202-1207.
- (1997) Proc. 36th IEEE Conf. Decision Control , vol.2 , pp. 1202-1207
- Castanon, D.A.¹

9
- 63449141834
- An approximate dynamic programming algorithm for large-scale fleet management: A case application
- H. P. Simão, J. Day, A. P. George, T. Gifford, J. Nienow, and W. B. Powell, "An approximate dynamic programming algorithm for large-scale fleet management: A case application," Transportation Sci., vol. 43, no. 2, pp. 178-197, 2009.
- (2009) Transportation Sci , vol.43 , Issue.2 , pp. 178-197
- Simão, H.P.¹ Day, J.² George, A.P.³ Gifford, T.⁴ Nienow, J.⁵ Powell, W.B.⁶

10
- 84873152819
- SMART: A stochastic multiscale model for the analysis of energy resources, technology and policy
- W. B. Powell, A. George, A. Lamont, J. Stewart, and W. R. Scott, "SMART: A stochastic multiscale model for the analysis of energy resources, technology and policy," INFORMS J. Comp., vol. 24, no. 4, pp. 665-682, 2012.
- (2012) INFORMS J. Comp. , vol.24 , Issue.4 , pp. 665-682
- Powell, W.B.¹ George, A.² Lamont, A.³ Stewart, J.⁴ Scott, W.R.⁵

11
- 0003871605
- New York, NY, USA: Wiley
- R. Howard, Dynamic Probabilistic Systems, Volume II: Semimarkov and Decision Processes. New York, NY, USA: Wiley, 1971.
- (1971) Dynamic Probabilistic Systems, Volume II: Semimarkov and Decision Processes
- Howard, R.¹

12
- 0003998452
- New York, NY, USA: Wiley
- M. L. Puterman, Markov Decision Processes. New York, NY, USA: Wiley, 1994.
- (1994) Markov Decision Processes
- Puterman, M.L.¹

13
- 84968519017
- Functional approximations and dynamic programming
- R. Bellman and S. Dreyfus, "Functional approximations and dynamic programming," Math. Tables Aids Comp., vol. 13, pp. 247-251, 1959.
- (1959) Math. Tables Aids Comp. , vol.13 , pp. 247-251
- Bellman, R.¹ Dreyfus, S.²

14
- 0003487482
- Belmont, MA, USA: Athena Scientific
- D. Bertsekas and J. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA, USA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming.
- Bertsekas, D.¹ Tsitsiklis, J.²

15
- 0004007508
- Cambridge, MA, USA: The MIT Press
- R. Sutton and A. Barto, Reinforcement Learning. Cambridge, MA, USA: The MIT Press, 1998.
- (1998) Reinforcement Learning
- Sutton, R.¹ Barto, A.²

16
- 84921399937
- New York, NY, USA: IEEE Press
- J. Si, A. G. Barto, W. B. Powell, and D. Wunsch, Eds., Handbook of Learning and Approximate Dynamic Programming. New York, NY, USA: IEEE Press, 2004.
- (2004) Handbook of Learning and Approximate Dynamic Programming
- Si, J.¹ Barto, A.G.² Powell, W.B.³ Wunsch, D.⁴

17
- 84949764394
- New York, NY, USA: Wiley
- nd ed.). New York, NY, USA: Wiley, 2011.
- (2011) Approximate Dynamic Programming: Solving the Curses of Dimensionality (2nd ed.)
- Powell, W.B.¹

18
- 0031388983
- A neuro-dynamic programming approach to retailer inventory management
- th IEEE Conf. Decision Control, 1997, vol. 4, pp. 4052-4057.
- (1997) Proc. 36th IEEE Conf. Decision Control , vol.4 , pp. 4052-4057
- Van Roy, B.¹ Bertsekas, D.² Lee, Y.³ Tsitsiklis, J.⁴

19
- 34249833101
- Q-learning
- C. Watkins and P. Dayan, "Q-learning," Machine Learning, vol. 8, no. 3, pp. 279-292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

20
- 33645566756
- Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems
- H. Topaloglu and W. B. Powell, "Dynamic programming approximations for stochastic, time-staged integer multicommodity flow problems,"INFORMS J. Comp., vol. 18, no. 1, pp. 31-42, 2006.
- (2006) INFORMS J. Comp. , vol.18 , Issue.1 , pp. 31-42
- Topaloglu, H.¹ Powell, W.B.²

21
- 0004093909
- Cambridge, U.K.: Cambridge Univ. Press
- M. Wasan, Stochastic Approximation. Cambridge, U.K.: Cambridge Univ. Press, 1969.
- (1969) Stochastic Approximation
- Wasan, M.¹

22
- 0004066022
- New York, NY, USA: Springer-Verlag
- H. J. Kushner and G. G. Yin, Stochastic Approximation Algorithms and Applications. New York, NY, USA: Springer-Verlag, 1997.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.G.²

23
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- J. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning,"Machine Learning, vol. 16, pp. 185-202, 1994.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.¹

24
- 4243385070
- Convergence of Stochastic Iterative Dynamic Programming algorithms
- J. Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers
- T. Jaakkola, M. Jordan, and S. Singh, "Convergence of stochastic iterative dynamic programming algorithms," in Advances in Neural Information Processing Systems, vol. 6, J. Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA, USA: Morgan Kaufmann Publishers, 1994, pp. 703-710.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 703-710
- Jaakkola, T.¹ Jordan, M.² Singh, S.³

25
- 85162416897
- Speedy Q-learning
- M. G. Azar, R. Munos, M. Ghavamzadeh, and H. J. Kappen, "Speedy Q-learning," Adv. Neural Inform. Processing Syst., vol. 24, pp. 2411-2419, 2011.
- (2011) Adv. Neural Inform. Processing Syst. , vol.24 , pp. 2411-2419
- Azar, M.G.¹ Munos, R.² Ghavamzadeh, M.³ Kappen, H.J.⁴

26
- 84898998140
- The Asymptotic Convergence-rate of Q-learning
- M. Jordan, M. Kearns, and S. Solla , Eds. Cambridge, MA, USA: MIT Press
- C. Szepesvári, "The asymptotic convergence-rate of Q-learning," in Advances in Neural Information Processing Systems, vol. 10, M. Jordan, M. Kearns, and S. Solla, Eds. Cambridge, MA, USA: MIT Press, 1997, pp. 1064-1070.
- (1997) Advances in Neural Information Processing Systems , vol.10 , pp. 1064-1070
- Szepesvári, C.¹

27
- 14344266002
- Learning rates for Q-learning
- E. Even-Dar and Y. Mansour, "Learning rates for Q-learning," J. Machine Learning Res., vol. 5, pp. 1-25, 2003.
- (2003) J. Machine Learning Res. , vol.5 , pp. 1-25
- Even-Dar, E.¹ Mansour, Y.²

28
- 60749124483
- On Step Sizes, Stochastic Shortest Paths, Survival Probabilities in Reinforcement learning
- S. J. Mason, R. R. Hill, L.Mönch, O. Rose, T. Jefferson, and J. W. Fowler, Eds.
- A. Gosavi, "On step sizes, stochastic shortest paths, survival probabilities in reinforcement learning," in Proc. Winter Simul. Conf., S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, and J. W. Fowler, Eds., 2008, pp. 525-531.
- (2008) Proc. Winter Simul. Conf. , pp. 525-531
- Gosavi, A.¹

29
- 84993077818
- Approximate dynamic programming for management of high-value spare parts
- H. P. Simão and W. B. Powell, "Approximate dynamic programming for management of high-value spare parts," J. Manufact. Technol. Manag., vol. 20, no. 2, pp. 147-160, 2009.
- (2009) J. Manufact. Technol. Manag. , vol.20 , Issue.2 , pp. 147-160
- Simão, H.P.¹ Powell, W.B.²

30
- 0003778897
- New York, NY, USA: Springer-Verlag
- A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations. New York, NY, USA: Springer-Verlag, 1990.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

31
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," J. Machine Learning Res., vol. 12, pp. 2121-2159, 2011.
- (2011) J. Machine Learning Res. , vol.12 , pp. 2121-2159
- Duchi, J.¹ Hazan, E.² Singer, Y.³

32
- 84897487847
- No more pesky learning rates
- th Int. Conf. Machine Learning, 2013, pp. 343-351.
- (2013) Proc. 30th Int. Conf. Machine Learning , pp. 343-351
- Schaul, T.¹ Zhang, S.² LeCun, Y.³

33
- 0026971570
- Adapting bias by gradient descent: An incremental version of delta-bar-delta
- th Nat. Conf. Artif. Intell., 1992, pp. 171-176.
- (1992) Proc. 10th Nat. Conf. Artif. Intell. , pp. 171-176
- Sutton, R.¹

34
- 84867615954
- Tuning-free stepsize adaptation
- A. R. Mahmood, R. S. Sutton, T. Degris, and P. M. Pilarski, "Tuning-free stepsize adaptation," in Proc. IEEE Int. Conf. Acous., Speech, Signal Processing, 2012, pp. 2121-2124.
- (2012) Proc. IEEE Int. Conf. Acous., Speech, Signal Processing , pp. 2121-2124
- Mahmood, A.R.¹ Sutton, R.S.² Degris, T.³ Pilarski, P.M.⁴

35
- 0004294973
- New York, NY, USA: Dover Publications
- R. Stengel, Optimal Control and Estimation. New York, NY, USA: Dover Publications, 1994.
- (1994) Optimal Control and Estimation
- Stengel, R.¹

36
- 33646435300
- A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
- D. P. Choi and B. Van Roy, "A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning," Discrete Event Dyn. Syst., vol. 16, pp. 207-239, 2006.
- (2006) Discrete Event Dyn. Syst. , vol.16 , pp. 207-239
- Choi, D.P.¹ Van Roy, B.²

37
- 33748998787
- Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
- A. George and W. B. Powell, "Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming," Machine Learning, vol. 65, no. 1, pp. 167-198, 2006.
- (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
- George, A.¹ Powell, W.B.²

38
- 85053849310
- Temporal Difference Updating without a Learning rate
- J. C. Platt, D. Koller, Y. Singer, and S. Roweis , Eds. C ambridge, MA, USA: MIT Press
- M. Hutter and S. Legg, "Temporal difference updating without a learning rate," in Advances in Neural Information Processing Systems, vol. 20, J. C. Platt, D. Koller, Y. Singer, and S. Roweis, Eds. Cambridge, MA, USA: MIT Press, 2007, pp. 705-712.
- (2007) Advances in Neural Information Processing Systems , vol.20 , pp. 705-712
- Hutter, M.¹ Legg, S.²

39
- 77956513316
- A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation
- R. Sutton, C. Szepesvári, and H. Maei, "A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation,"Adv. Neural Inform. Processing Syst., vol. 21, pp. 1609-1616, 2008.
- (2008) Adv. Neural Inform. Processing Syst. , vol.21 , pp. 1609-1616
- Sutton, R.¹ Szepesvári, C.² Maei, H.³

40
- 84897527953
- Concurrent reinforcement learning from customer interactions
- th Int. Conf. Machine Learning, 2013, pp. 924-932.
- (2013) Proc. 30th Int. Conf. Machine Learning , pp. 924-932
- Silver, D.¹ Newnham, L.² Barker, D.³ Weller, S.⁴ McFall, J.⁵

41
- 84923605303
- A new optimal stepsize for approximate dynamic programming
- I. O. Ryzhov, P. I. Frazier, and W. B. Powell, "A new optimal stepsize for approximate dynamic programming," IEEE Trans. Autom. Control. [Online]. Available: http://arxiv.org/abs/1407.2676
- IEEE Trans. Autom. Control
- Ryzhov, I.O.¹ Frazier, P.I.² Powell, W.B.³

42
- 0003684449
- New York, NY, USA: Springer ser. Statistics
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning. New York, NY, USA: Springer, 2001, ser. Statistics.
- (2001) The Elements of Statistical Learning
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³

43
- 81455141800
- General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm
- M. Broadie, D. Cicek, and A. Zeevi, "General bounds and finite-time improvement for the Kiefer-Wolfowitz stochastic approximation algorithm,"Oper. Res., vol. 59, no. 5, pp. 1211-1224, 2011.
- (2011) Oper. Res. , vol.59 , Issue.5 , pp. 1211-1224
- Broadie, M.¹ Cicek, D.² Zeevi, A.³

44
- 0004000490
- Upper Saddle River, NJ, USA: Prentice Hall
- P. Bickel and K. Doksum, Mathematical Statistics - Basic Ideas and Selected Topics Volume 1. Upper Saddle River, NJ, USA: Prentice Hall, 2001.
- (2001) Mathematical Statistics - Basic Ideas and Selected TopicsVolume 1
- Bickel, P.¹ Doksum, K.²

45
- 56349109509
- Value function approximation using multiple aggregation for multiattribute resource management
- A. George,W. B. Powell, and S. R. Kulkarni, "Value function approximation using multiple aggregation for multiattribute resource management,"J. Machine Learning Res., vol. 9, pp. 2079-2111, 2008.
- (2008) J. Machine Learning Res. , vol.9 , pp. 2079-2111
- George, A.¹ Powell, W.B.² Kulkarni, S.R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.