메뉴 건너뛰기




Volumn 46, Issue 7, 2016, Pages 1628-1639

F-Discrepancy for Efficient Sampling in Approximate Dynamic Programming

Author keywords

Approximate dynamic programming (ADP); F discrepancy; Markovian decision problem (MDP); state sampling; value function approximation

Indexed keywords

ALGORITHMS; IMPORTANCE SAMPLING; PROBABILITY DISTRIBUTIONS;

EID: 84938514337     PISSN: 21682267     EISSN: None     Source Type: Journal    
DOI: 10.1109/TCYB.2015.2453123     Document Type: Article
Times cited : (9)

References (39)
  • 4
    • 79551685808 scopus 로고    scopus 로고
    • Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data
    • Feb.
    • F. L. Lewis and K. G. Vamvoudakis, "Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 1, pp. 14-25, Feb. 2011.
    • (2011) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.41 , Issue.1 , pp. 14-25
    • Lewis, F.L.1    Vamvoudakis, K.G.2
  • 5
    • 80052899788 scopus 로고    scopus 로고
    • Incremental state aggregation for value function estimation in reinforcement learning
    • Oct.
    • T. Mori and S. Ishii, "Incremental state aggregation for value function estimation in reinforcement learning," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 5, pp. 1407-1416, Oct. 2011.
    • (2011) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.41 , Issue.5 , pp. 1407-1416
    • Mori, T.1    Ishii, S.2
  • 6
    • 84912135349 scopus 로고    scopus 로고
    • Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method
    • Dec.
    • K. Senda, S. Hattori, T. Hishinuma, and T. Kohda, "Acceleration of reinforcement learning by policy evaluation using nonstationary iterative method," IEEE Trans. Cybern., vol. 44, no. 12, pp. 2696-2705, Dec. 2014.
    • (2014) IEEE Trans. Cybern. , vol.44 , Issue.12 , pp. 2696-2705
    • Senda, K.1    Hattori, S.2    Hishinuma, T.3    Kohda, T.4
  • 7
    • 84912071084 scopus 로고    scopus 로고
    • A clustering-based graph Laplacian framework for value function approximation in reinforcement learning
    • Dec.
    • X. Xu, Z. Huang, D. Graves, and W. Pedrycz, "A clustering-based graph Laplacian framework for value function approximation in reinforcement learning," IEEE Trans. Cybern., vol. 44, no. 12, pp. 2613-2625, Dec. 2014.
    • (2014) IEEE Trans. Cybern. , vol.44 , Issue.12 , pp. 2613-2625
    • Xu, X.1    Huang, Z.2    Graves, D.3    Pedrycz, W.4
  • 8
    • 84919769592 scopus 로고    scopus 로고
    • Stochastic abstract policies: Generalizing knowledge to improve reinforcement learning
    • Jan.
    • M. L. Koga, V. Freire, and A. H. R. Costa, "Stochastic abstract policies: Generalizing knowledge to improve reinforcement learning," IEEE Trans. Cybern., vol. 45, no. 1, pp. 77-88, Jan. 2015.
    • (2015) IEEE Trans. Cybern. , vol.45 , Issue.1 , pp. 77-88
    • Koga, M.L.1    Freire, V.2    Costa, A.H.R.3
  • 9
    • 84912026937 scopus 로고    scopus 로고
    • Revisiting approximate dynamic programming and its convergence
    • Dec.
    • A. Heydari, "Revisiting approximate dynamic programming and its convergence," IEEE Trans. Cybern., vol. 44, no. 12, pp. 2733-2743, Dec. 2014.
    • (2014) IEEE Trans. Cybern. , vol.44 , Issue.12 , pp. 2733-2743
    • Heydari, A.1
  • 10
    • 84921377021 scopus 로고    scopus 로고
    • Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems
    • Feb.
    • M. Palanisamy, H. Modares, F. L. Lewis, and M. Aurangzeb, "Continuous-time Q-learning for infinite-horizon discounted cost linear quadratic regulator problems," IEEE Trans. Cybern., vol. 45, no. 2, pp. 165-176, Feb. 2015.
    • (2015) IEEE Trans. Cybern. , vol.45 , Issue.2 , pp. 165-176
    • Palanisamy, M.1    Modares, H.2    Lewis, F.L.3    Aurangzeb, M.4
  • 11
    • 84912122528 scopus 로고    scopus 로고
    • Finite-approximation-errorbased discrete-time iterative adaptive dynamic programming
    • Dec.
    • Q. Wei, F.-Y. Wang, D. Liu, and X. Yang, "Finite-approximation-errorbased discrete-time iterative adaptive dynamic programming," IEEE Trans. Cybern., vol. 44, no. 12, pp. 2820-2833, Dec. 2014.
    • (2014) IEEE Trans. Cybern. , vol.44 , Issue.12 , pp. 2820-2833
    • Wei, Q.1    Wang, F.-Y.2    Liu, D.3    Yang, X.4
  • 12
    • 84960101128 scopus 로고    scopus 로고
    • Optimal tracking control of unknown discrete-time linear systems using input-output measured data
    • B. Kiumarsi, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, "Optimal tracking control of unknown discrete-time linear systems using input-output measured data," IEEE Trans. Cybern., DOI: 10.1109/TCYB.2014.2384016.
    • IEEE Trans. Cybern.
    • Kiumarsi, B.1    Lewis, F.L.2    Naghibi-Sistani, M.-B.3    Karimpour, A.4
  • 13
    • 84880065287 scopus 로고    scopus 로고
    • Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics
    • Jan.
    • A. Heydari and S. N. Balakrishnan, "Finite-horizon control-constrained nonlinear optimal control using single network adaptive critics," IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 1, pp. 145-157, Jan. 2013.
    • (2013) IEEE Trans. Neural Netw. Learn. Syst. , vol.24 , Issue.1 , pp. 145-157
    • Heydari, A.1    Balakrishnan, S.N.2
  • 14
    • 84864491417 scopus 로고    scopus 로고
    • Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality
    • K. G. Vamvoudakis, F. L. Lewis, and G. R. Hudas, "Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality," Automatica, vol. 48, no. 8, pp. 1598-1611, 2012.
    • (2012) Automatica , vol.48 , Issue.8 , pp. 1598-1611
    • Vamvoudakis, K.G.1    Lewis, F.L.2    Hudas, G.R.3
  • 15
    • 84904389431 scopus 로고    scopus 로고
    • Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming
    • Jul.
    • H. Zhang, C. Qin, and Y. Luo, "Neural-network-based constrained optimal control scheme for discrete-time switched nonlinear system using dual heuristic programming," IEEE Trans. Autom. Sci. Eng., vol. 11, no. 3, pp. 839-849, Jul. 2014.
    • (2014) IEEE Trans. Autom. Sci. Eng. , vol.11 , Issue.3 , pp. 839-849
    • Zhang, H.1    Qin, C.2    Luo, Y.3
  • 16
    • 84906778934 scopus 로고    scopus 로고
    • Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification
    • Oct.
    • Q. Wei and D. Liu, "Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification," IEEE Trans. Autom. Sci. Eng., vol. 11, no. 4, pp. 1020-1036, Oct. 2014.
    • (2014) IEEE Trans. Autom. Sci. Eng. , vol.11 , Issue.4 , pp. 1020-1036
    • Wei, Q.1    Liu, D.2
  • 17
    • 84968468700 scopus 로고
    • Polynomial approximation-A new computational technique in dynamic programming allocation processes
    • R. Bellman, R. Kalaba, and B. Kotkin, "Polynomial approximation-A new computational technique in dynamic programming allocation processes," Math. Comput., vol. 17, no. 82, pp. 155-161, 1963.
    • (1963) Math. Comput. , vol.17 , Issue.82 , pp. 155-161
    • Bellman, R.1    Kalaba, R.2    Kotkin, B.3
  • 18
    • 0027601994 scopus 로고
    • Numerical solution of continuous-state dynamic programs using linear and spline interpolation
    • S. A. Johnson, J. Stedinger, C. A. Shoemaker, Y. Li, and J. A. Tejada-Guibert, "Numerical solution of continuous-state dynamic programs using linear and spline interpolation," Oper. Res., vol. 41, no. 3, pp. 484-500, 1993.
    • (1993) Oper. Res. , vol.41 , Issue.3 , pp. 484-500
    • Johnson, S.A.1    Stedinger, J.2    Shoemaker, C.A.3    Li, Y.4    Tejada-Guibert, J.A.5
  • 19
    • 0001820934 scopus 로고    scopus 로고
    • Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming
    • V. Chen, D. Ruppert, and C. A. Shoemaker, "Applying experimental design and regression splines to high-dimensional continuous-state stochastic dynamic programming," Oper. Res., vol. 47, no. 1, pp. 38-53, 1999.
    • (1999) Oper. Res. , vol.47 , Issue.1 , pp. 38-53
    • Chen, V.1    Ruppert, D.2    Shoemaker, C.A.3
  • 20
    • 84884969734 scopus 로고    scopus 로고
    • Low-discrepancy sampling for approximate dynamic programming with local approximators
    • Mar.
    • C. Cervellera, M. Gaggero, and D. Macciò, "Low-discrepancy sampling for approximate dynamic programming with local approximators," Comput. Oper. Res., vol. 43, pp. 108-115, Mar. 2014.
    • (2014) Comput. Oper. Res. , vol.43 , pp. 108-115
    • Cervellera, C.1    Gaggero, M.2    Macciò, D.3
  • 21
    • 84961378056 scopus 로고    scopus 로고
    • Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming
    • Feb.
    • H. Zhang, J. Zhang, G.-H. Yang, and Y. Luo, "Leader-based optimal coordination control for the consensus problem of multiagent differential games via fuzzy adaptive dynamic programming," IEEE Trans. Fuzzy Syst., vol. 23, no. 1, pp. 152-163, Feb. 2015.
    • (2015) IEEE Trans. Fuzzy Syst. , vol.23 , Issue.1 , pp. 152-163
    • Zhang, H.1    Zhang, J.2    Yang, G.-H.3    Luo, Y.4
  • 23
    • 77956759955 scopus 로고    scopus 로고
    • Management of water resources systems in the presence of uncertainties by nonlinear approximators and deterministic sampling techniques
    • M. Baglietto, C. Cervellera, M. Sanguineti, and R. Zoppoli, "Management of water resources systems in the presence of uncertainties by nonlinear approximators and deterministic sampling techniques," Comput. Optim. Appl., vol. 47, no. 2, pp. 349-376, 2010.
    • (2010) Comput. Optim. Appl. , vol.47 , Issue.2 , pp. 349-376
    • Baglietto, M.1    Cervellera, C.2    Sanguineti, M.3    Zoppoli, R.4
  • 24
    • 0036013020 scopus 로고    scopus 로고
    • Measuring the goodness of orthogonal array discretizations for stochastic programming and stochastic dynamic programming
    • V. C. P. Chen, "Measuring the goodness of orthogonal array discretizations for stochastic programming and stochastic dynamic programming," SIAM J. Optim., vol. 12, no. 2, pp. 322-344, 2001.
    • (2001) SIAM J. Optim. , vol.12 , Issue.2 , pp. 322-344
    • Chen, V.C.P.1
  • 25
    • 33746257756 scopus 로고    scopus 로고
    • Neural network and regression spline value function approximations for stochastic dynamic programming
    • C. Cervellera, V. Chen, and A. Wen, "Neural network and regression spline value function approximations for stochastic dynamic programming," Comput. Oper. Res., vol. 34, no. 1, pp. 70-90, 2006.
    • (2006) Comput. Oper. Res. , vol.34 , Issue.1 , pp. 70-90
    • Cervellera, C.1    Chen, V.2    Wen, A.3
  • 26
    • 78249259323 scopus 로고    scopus 로고
    • A comparison of global and semi-local approximation in T-stage stochastic optimization
    • C. Cervellera and D. Macciò, "A comparison of global and semi-local approximation in T-stage stochastic optimization," Eur. J. Oper. Res., vol. 208, no. 2, pp. 109-118, 2011.
    • (2011) Eur. J. Oper. Res. , vol.208 , Issue.2 , pp. 109-118
    • Cervellera, C.1    Macciò, D.2
  • 27
    • 36148965498 scopus 로고    scopus 로고
    • Efficient sampling in approximate dynamic programming algorithms
    • C. Cervellera and M. Muselli, "Efficient sampling in approximate dynamic programming algorithms," Comput. Optim. Appl., vol. 38, no. 3, pp. 417-443, 2007.
    • (2007) Comput. Optim. Appl. , vol.38 , Issue.3 , pp. 417-443
    • Cervellera, C.1    Muselli, M.2
  • 28
    • 84871395855 scopus 로고    scopus 로고
    • Adaptive value function approximation for continuous-state stochastic dynamic programming
    • H. Fan, P. K. Tarun, and V. Chen, "Adaptive value function approximation for continuous-state stochastic dynamic programming," Comput. Oper. Res., vol. 40, no. 4, pp. 1076-1084, 2013.
    • (2013) Comput. Oper. Res. , vol.40 , Issue.4 , pp. 1076-1084
    • Fan, H.1    Tarun, P.K.2    Chen, V.3
  • 29
    • 68349126329 scopus 로고    scopus 로고
    • Non-uniform low-discrepancy sequence generation and integration of singular integrands
    • H. Niederreiter and D. Talay, Eds. Berlin, Germany: Springer
    • J. Hartinger and R. Kainhofer, "Non-uniform low-discrepancy sequence generation and integration of singular integrands," in Monte Carlo and Quasi-Monte Carlo Methods 2004, H. Niederreiter and D. Talay, Eds. Berlin, Germany: Springer, 2006, pp. 163-179.
    • (2006) Monte Carlo and Quasi-Monte Carlo Methods 2004 , pp. 163-179
    • Hartinger, J.1    Kainhofer, R.2
  • 30
    • 84908469557 scopus 로고    scopus 로고
    • An analysis based on F-discrepancy for sampling in regression tree learning
    • Jul. Beijing, China
    • C. Cervellera, M. Gaggero, and D. Macciò, "An analysis based on F-discrepancy for sampling in regression tree learning," in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Jul. 2014, Beijing, China, pp. 1115-1121.
    • (2014) Proc. Int. Joint Conf. Neural Netw. (IJCNN) , pp. 1115-1121
    • Cervellera, C.1    Gaggero, M.2    Macciò, D.3
  • 33
    • 84875913709 scopus 로고    scopus 로고
    • High dimensional integration-The quasi-Monte Carlo way
    • May
    • J. Dick, F. Y. Kuo, and I. H. Sloan, "High dimensional integration-The quasi-Monte Carlo way," Acta Numer., vol. 22, pp. 133-288, May 2013.
    • (2013) Acta Numer. , vol.22 , pp. 133-288
    • Dick, J.1    Kuo, F.Y.2    Sloan, I.H.3
  • 34
    • 0035649406 scopus 로고    scopus 로고
    • The inverse of the star-discrepancy depends linearly on the dimension
    • S. Heinrich, E. Novak, G. Wasilkowski, and H. Wózniakowski, "The inverse of the star-discrepancy depends linearly on the dimension," Acta Arith., vol. 96, no. 3, pp. 279-302, 2001.
    • (2001) Acta Arith. , vol.96 , Issue.3 , pp. 279-302
    • Heinrich, S.1    Novak, E.2    Wasilkowski, G.3    Wózniakowski, H.4
  • 35
    • 84861366753 scopus 로고    scopus 로고
    • A new randomized algorithm to approximate the star discrepancy based on threshold accepting
    • M. Gnewuch, M. Wahlström, and C. Winzen, "A new randomized algorithm to approximate the star discrepancy based on threshold accepting," SIAM J. Numer. Anal., vol. 50, no. 2, pp. 781-807, 2012.
    • (2012) SIAM J. Numer. Anal. , vol.50 , Issue.2 , pp. 781-807
    • Gnewuch, M.1    Wahlström, M.2    Winzen, C.3
  • 36
    • 0027599793 scopus 로고
    • Universal approximation bounds for superpositions of a sigmoidal function
    • May
    • A. Barron, "Universal approximation bounds for superpositions of a sigmoidal function," IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 930-945, May 1993.
    • (1993) IEEE Trans. Inf. Theory , vol.39 , Issue.3 , pp. 930-945
    • Barron, A.1
  • 37
    • 0001219859 scopus 로고
    • Regularization theory and neural networks architectures
    • F. Girosi, M. Jones, and T. Poggio, "Regularization theory and neural networks architectures," Neural Comput., vol. 7, no. 2, pp. 219-269, 1995.
    • (1995) Neural Comput. , vol.7 , Issue.2 , pp. 219-269
    • Girosi, F.1    Jones, M.2    Poggio, T.3
  • 38
    • 0000796112 scopus 로고
    • A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training
    • L. K. Jones, "A simple lemma on greedy approximation in Hilbert space and convergence rates for projection pursuit regression and neural network training," Ann. Stat., vol. 20, no. 1, pp. 608-613, 1992.
    • (1992) Ann. Stat. , vol.20 , Issue.1 , pp. 608-613
    • Jones, L.K.1
  • 39
    • 0028543366 scopus 로고
    • Training feedforward networks with the Marquardt algorithm
    • Nov.
    • M. Hagan and M. Menhaj, "Training feedforward networks with the Marquardt algorithm," IEEE Trans. Neural Netw., vol. 5, no. 6, pp. 989-993, Nov. 1994.
    • (1994) IEEE Trans. Neural Netw. , vol.5 , Issue.6 , pp. 989-993
    • Hagan, M.1    Menhaj, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.