메뉴 건너뛰기




Volumn , Issue , 2010, Pages 243-248

Two-time-scale online actor-critic paradigm driven by POMDP

Author keywords

[No Author keywords available]

Indexed keywords

ACTOR CRITIC; ACTOR-CRITIC ALGORITHM; ADAPTIVE DYNAMIC PROGRAMMING; ANALYSIS AND SIMULATION; GREEDY ALGORITHMS; HIDDEN STATE; NONLINEAR FUNCTIONS; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; STATE ESTIMATORS; STOCHASTIC GRADIENT; TEMPORAL DIFFERENCES; TWO TIME SCALE;

EID: 77953105620     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICNSC.2010.546149     Document Type: Conference Paper
Times cited : (4)

References (16)
  • 5
    • 0347502949 scopus 로고
    • Reinforcement learning in non-markov environments
    • S. D. Whitehead and L. J. Lin, "Reinforcement learning in non-markov environments," Artificial Intelligence, vol. 8, pp. 3-4, 1992.
    • (1992) Artificial Intelligence , vol.8 , pp. 3-4
    • Whitehead, S.D.1    Lin, L.J.2
  • 7
    • 0001770240 scopus 로고    scopus 로고
    • Value-function approximations for partially observable markov decision processes
    • M. Hauskrecht, "Value-function approximations for partially observable markov decision processes," Journal of Artificial Intelligence Research, vol. 13, pp. 33-94, 2000.
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 33-94
    • Hauskrecht, M.1
  • 9
    • 0035273403 scopus 로고    scopus 로고
    • Online learning control by association and reinforcement
    • Mar
    • J. Si and Y.-T. Wang, "Online learning control by association and reinforcement," IEEE Transactions on Neural Networks, vol. 12, no. 2, pp. 264-276, Mar 2001.
    • (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.2 , pp. 264-276
    • Si, J.1    Wang, Y.-T.2
  • 10
    • 0000439891 scopus 로고
    • Convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh, "Convergence of stochastic iterative dynamic programming algorithms," Neural Computation, vol. 6, pp. 1185-1201, 1994.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 11
    • 0022783899 scopus 로고
    • Distributed asynchronous deterministic and stochastic gradient optimization algorithms
    • Sep
    • J. Tsitsiklis, D. Bertsekas, and M. Athans, "Distributed asynchronous deterministic and stochastic gradient optimization algorithms," IEEE Transactions on Automatic Control, vol. 31, no. 9, pp. 803-812, Sep 1986.
    • (1986) IEEE Transactions on Automatic Control , vol.31 , Issue.9 , pp. 803-812
    • Tsitsiklis, J.1    Bertsekas, D.2    Athans, M.3
  • 14
    • 0021376040 scopus 로고
    • Convergence of an adaptive linear estimation algorithm
    • E. Eweda and O. Macchi, "Convergence of an adaptive linear estimation algorithm," IEEE Transactions on Automatic Control, vol. 29, no. 2, pp. 119-127, Feb 1984.
    • (1984) IEEE Transactions on Automatic Control , vol.29 , Issue.2
    • Eweda, E.1    Macchi, O.2
  • 15
    • 0041510534 scopus 로고    scopus 로고
    • Linear stochastic approximation driven by slowly varying markov chains
    • V. R. Konda and J. N. Tsitsiklis, "Linear stochastic approximation driven by slowly varying markov chains," Systems and Control Letters, vol. 50, pp. 95-102, 2003.
    • (2003) Systems and Control Letters , vol.50 , pp. 95-102
    • Konda, V.R.1    Tsitsiklis, J.N.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.