SCOPUS 정보 검색 플랫폼

IEEE Transactions on Neural Networks and Learning Systems

Volumn 24, Issue 5, 2013, Pages 776-788

Policy improvement by a model-free Dyna architecture

(2) Hwang, Kao Shing a Lo, Chia Yue b

a NATIONAL SUN YAT SEN UNIVERSITY (Taiwan)

b NATIONAL CHUNG CHENG UNIVERSITY (Taiwan)

Author keywords

Critic actor structure; Dyna style reinforcement learning; POMDP; Temporal difference

Indexed keywords

ACTOR-CRITIC ARCHITECTURES; ADAPTIVE HEURISTIC CRITICS; DESIRED TRAJECTORIES; INDIRECT LEARNING; POMDP; REINFORCEMENT SIGNAL; TEMPORAL DIFFERENCE METHODS; TEMPORAL DIFFERENCES;

HEURISTIC METHODS; PENDULUMS; REINFORCEMENT LEARNING;

COMPUTER SIMULATION;

EID: 84884963190 PISSN: 2162237X EISSN: 21622388 Source Type: Journal
DOI: 10.1109/TNNLS.2013.2244100 Document Type: Article

Times cited : (7)

References (18)

1
- 84898939480
- Policy-gradient methods for reinforcement learning with function approximation
- R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy-gradient methods for reinforcement learning with function approximation," Adv. Neural Inf. Process. Syst., vol. 12, no. 22, pp. 1057-1063, 2000.
- (2000) Adv. Neural Inf. Process. Syst. , vol.12 , Issue.22 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

2
- 0032187591
- Smoothing trajectory tracking of three-link robot: A self-organizing CMAC approach
- Oct.
- K. S. Hwang and C. S. Lin, "Smoothing trajectory tracking of three-link robot: A self-organizing CMAC approach," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 28, no. 5, pp. 680-692, Oct. 1998.
- (1998) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.28 , Issue.5 , pp. 680-692
- Hwang, K.S.¹ Lin, C.S.²

3
- 0004102479
- Cambridge, Cambridge, MA, USA: MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, Cambridge, MA, USA: MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

4
- 49049105169
- Ensemble algorithms in reinforcement learning
- Aug.
- M. A. Wiering and H. V. Hasselt, "Ensemble algorithms in reinforcement learning," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 930-936, Aug. 2008.
- (2008) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.38 , Issue.4 , pp. 930-936
- Wiering, M.A.¹ Hasselt, H.V.²

5
- 49049094852
- Higher level application of adp: A next phase for the control field?
- Aug.
- G. G. Lendaris, "Higher level application of adp: A next phase for the control field?" IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 901-912, Aug. 2008.
- (2008) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.38 , Issue.4 , pp. 901-912
- Lendaris, G.G.¹

6
- 49049104480
- Quantum reinforcement learning
- Oct.
- D. Dong, C. Chen, H. Li, and T. Tarn, "Quantum reinforcement learning," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 5, pp. 1207-1220, Oct. 2008.
- (2008) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.38 , Issue.5 , pp. 1207-1220
- Dong, D.¹ Chen, C.² Li, H.³ Tarn, T.⁴

7
- 49049087720
- Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
- Aug.
- B. Baddeley, "Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 38, no. 4, pp. 950-956, Aug. 2008.
- (2008) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.38 , Issue.4 , pp. 950-956
- Baddeley, B.¹

8
- 84876914496
- Neural-Fitted TD-Leaf Learning for Playing Othello with Structured Neural Networks
- Nov.
- S. Dries and M. A. Wiering, "Neural-Fitted TD-Leaf Learning for Playing Othello with Structured Neural Networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 11, pp. 1701-1713, Nov. 2012.
- (2012) IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.11 , pp. 1701-1713
- Dries, S.¹ Wiering, M.A.²

9
- 84876909440
- Neural network based Online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ Control
- Dec.
- H.-N. Wu and B. Luo, "Neural network based Online simultaneous policy update algorithm for solving the HJI equation in nonlinear H∞ Control," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 12, pp. 1884-1895, Dec. 2012.
- (2012) IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.12 , pp. 1884-1895
- Wu, H.-N.¹ Luo, B.²

10
- 0028574683
- Reinforcement learning algorithms for average-payoff markovian decision processes
- S. P. Singh, "Reinforcement learning algorithms for average-payoff markovian decision processes," in Proc. 12th Amer. Assoc. Artif. Intell., 1994, pp. 700-705.
- Proc. 12th Amer. Assoc. Artif. Intell., 1994 , pp. 700-705
- Singh, S.P.¹

11
- 17444428905
- Second-order training of adaptive critics for Online process control
- Apr.
- J. J. Govindhasamy, S. F. McLoone, and G. W. Irwin, "Second-order training of adaptive critics for Online process control," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 35, no. 2, pp. 381-385, Apr. 2005.
- (2005) IEEE Trans. Syst., Man, Cybern.-Part B, Cybern. , vol.35 , Issue.2 , pp. 381-385
- Govindhasamy, J.J.¹ McLoone, S.F.² Irwin, G.W.³

12
- 48249156672
- Epoch-incremental queue-dyna algorithm
- R. Zajdel, "Epoch-incremental queue-dyna algorithm," in Proc. Lect. Notes Artif. Intell., pp. 1160-1170, 2008.
- (2008) Proc. Lect. Notes Artif. Intell. , pp. 1160-1170
- Zajdel, R.¹

13
- 0012929784
- Dyna, an integrated architecture for learning, planning, and reacting
- Aug.
- R. Sutton, "Dyna, an integrated architecture for learning, planning, and reacting," Special Interest Group Artif. intell. Bulletin, vol. 2, no. 4, pp. 160-163, Aug. 1991.
- (1991) Special Interest Group Artif. Intell. Bulletin , vol.2 , Issue.4 , pp. 160-163
- Sutton, R.¹

14
- 0036832957
- On average versus discounted reward temporal-difference learning
- DOI 10.1023/A:1017980312899
- J. N. Tsitsiklis and B. Van Roy, "On average versus discounted reward temporal-difference learning," Mach. Learn., vol. 49, no. 2, pp. 179-191, 2002. (Pubitemid 34325685)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 179-191
- Tsitsiklis, J.N.¹ Van Roy, B.²

15
- 0033722074
- Behavioral considerations suggest an average reward TD model of the dopamine system
- N. D. Daw and D. S. Touretzky, "Behavioral considerations suggest an average reward TD model of the dopamine system," Neurocomputing, pp. 679-684, 2000.
- (2000) Neurocomputing , pp. 679-684
- Daw, N.D.¹ Touretzky, D.S.²

16
- 0025600638
- A stochastic reinforcement learning algorithm for learning real-valued functions
- V. Gullapalli, "A stochastic reinforcement learning algorithm for learning real-valued functions," Neural Netw., vol. 3, no. 6, pp. 671-692, 1990.
- (1990) Neural Netw. , vol.3 , Issue.6 , pp. 671-692
- Gullapalli, V.¹

17
- 0033878670
- Neural network-based model reference adaptive control system
- DOI 10.1109/3477.826961
- H. D. Patino and D. Liu, "Neural Network-Based Model Reference Adaptive Control System," IEEE Trans. Syst., Man, Cybern.-Part B, Cybern., vol. 30, no. 1, pp. 198-204, Feb. 2000. (Pubitemid 30588328)
- (2000) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.30 , Issue.1 , pp. 198-204
- Patino, H.D.¹ Liu, D.²

18
- 0000756319
- Optimum settings for automatic controllers
- J. G. Ziegler and N. B. Nichols, "Optimum settings for automatic controllers," Trans. of the ASME, vol. 64, no. 11, pp. 759-768, 1942.
- (1942) Trans. of the ASME , vol.64 , Issue.11 , pp. 759-768
- Ziegler, J.G.¹ Nichols, N.B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.