메뉴 건너뛰기




Volumn 24, Issue 5, 2013, Pages 776-788

Policy improvement by a model-free Dyna architecture

Author keywords

Critic actor structure; Dyna style reinforcement learning; POMDP; Temporal difference

Indexed keywords

ACTOR-CRITIC ARCHITECTURES; ADAPTIVE HEURISTIC CRITICS; DESIRED TRAJECTORIES; INDIRECT LEARNING; POMDP; REINFORCEMENT SIGNAL; TEMPORAL DIFFERENCE METHODS; TEMPORAL DIFFERENCES;

EID: 84884963190     PISSN: 2162237X     EISSN: 21622388     Source Type: Journal    
DOI: 10.1109/TNNLS.2013.2244100     Document Type: Article
Times cited : (7)

References (18)
  • 1
    • 84898939480 scopus 로고    scopus 로고
    • Policy-gradient methods for reinforcement learning with function approximation
    • R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy-gradient methods for reinforcement learning with function approximation," Adv. Neural Inf. Process. Syst., vol. 12, no. 22, pp. 1057-1063, 2000.
    • (2000) Adv. Neural Inf. Process. Syst. , vol.12 , Issue.22 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 2
    • 0032187591 scopus 로고    scopus 로고
    • Smoothing trajectory tracking of three-link robot: A self-organizing CMAC approach
    • Oct.
    • K. S. Hwang and C. S. Lin, "Smoothing trajectory tracking of three-link robot: A self-organizing CMAC approach," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 28, no. 5, pp. 680-692, Oct. 1998.
    • (1998) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.28 , Issue.5 , pp. 680-692
    • Hwang, K.S.1    Lin, C.S.2
  • 5
    • 49049094852 scopus 로고    scopus 로고
    • Higher level application of adp: A next phase for the control field?
    • Aug.
    • G. G. Lendaris, "Higher level application of adp: A next phase for the control field?" IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 901-912, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.38 , Issue.4 , pp. 901-912
    • Lendaris, G.G.1
  • 7
    • 49049087720 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
    • Aug.
    • B. Baddeley, "Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 950-956, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.38 , Issue.4 , pp. 950-956
    • Baddeley, B.1
  • 8
    • 84876914496 scopus 로고    scopus 로고
    • Neural-Fitted TD-Leaf Learning for Playing Othello with Structured Neural Networks
    • Nov.
    • S. Dries and M. A. Wiering, "Neural-Fitted TD-Leaf Learning for Playing Othello with Structured Neural Networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1701-1713, Nov. 2012.
    • (2012) IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.11 , pp. 1701-1713
    • Dries, S.1    Wiering, M.A.2
  • 9
    • 84876909440 scopus 로고    scopus 로고
    • Neural network based Online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ Control
    • Dec.
    • H.-N. Wu and B. Luo, "Neural network based Online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ Control," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 12, pp. 1884-1895, Dec. 2012.
    • (2012) IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.12 , pp. 1884-1895
    • Wu, H.-N.1    Luo, B.2
  • 10
    • 0028574683 scopus 로고    scopus 로고
    • Reinforcement learning algorithms for average-payoff markovian decision processes
    • S. P. Singh, "Reinforcement learning algorithms for average-payoff markovian decision processes," in Proc. 12th Amer. Assoc. Artif. Intell., 1994, pp. 700-705.
    • Proc. 12th Amer. Assoc. Artif. Intell., 1994 , pp. 700-705
    • Singh, S.P.1
  • 12
    • 48249156672 scopus 로고    scopus 로고
    • Epoch-incremental queue-dyna algorithm
    • R. Zajdel, "Epoch-incremental queue-dyna algorithm," in Proc. Lect. Notes Artif. Intell., pp. 1160-1170, 2008.
    • (2008) Proc. Lect. Notes Artif. Intell. , pp. 1160-1170
    • Zajdel, R.1
  • 13
    • 0012929784 scopus 로고
    • Dyna, an integrated architecture for learning, planning, and reacting
    • Aug.
    • R. Sutton, "Dyna, an integrated architecture for learning, planning, and reacting," Special Interest Group Artif. intell. Bulletin, vol. 2, no. 4, pp. 160-163, Aug. 1991.
    • (1991) Special Interest Group Artif. Intell. Bulletin , vol.2 , Issue.4 , pp. 160-163
    • Sutton, R.1
  • 14
    • 0036832957 scopus 로고    scopus 로고
    • On average versus discounted reward temporal-difference learning
    • DOI 10.1023/A:1017980312899
    • J. N. Tsitsiklis and B. Van Roy, "On average versus discounted reward temporal-difference learning," Mach. Learn., vol. 49, no. 2, pp. 179-191, 2002. (Pubitemid 34325685)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 179-191
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 15
    • 0033722074 scopus 로고    scopus 로고
    • Behavioral considerations suggest an average reward TD model of the dopamine system
    • N. D. Daw and D. S. Touretzky, "Behavioral considerations suggest an average reward TD model of the dopamine system," Neurocomputing, pp. 679-684, 2000.
    • (2000) Neurocomputing , pp. 679-684
    • Daw, N.D.1    Touretzky, D.S.2
  • 16
    • 0025600638 scopus 로고
    • A stochastic reinforcement learning algorithm for learning real-valued functions
    • V. Gullapalli, "A stochastic reinforcement learning algorithm for learning real-valued functions," Neural Netw., vol. 3, no. 6, pp. 671-692, 1990.
    • (1990) Neural Netw. , vol.3 , Issue.6 , pp. 671-692
    • Gullapalli, V.1
  • 18
    • 0000756319 scopus 로고
    • Optimum settings for automatic controllers
    • J. G. Ziegler and N. B. Nichols, "Optimum settings for automatic controllers," Trans. of the ASME, vol. 64, no. 11, pp. 759-768, 1942.
    • (1942) Trans. of the ASME , vol.64 , Issue.11 , pp. 759-768
    • Ziegler, J.G.1    Nichols, N.B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.