SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems 21 - Proceedings of the 2008 Conference

Volumn , Issue , 2009, Pages 385-392

Temporal difference based actor critic learning - Convergence and neural implementation

(3) Di Castro, Dotan a Volkinshtein, Dmitry a Meir, Ron a

a TECHNION ISRAEL INSTITUTE OF TECHNOLOGY (Israel)

Author keywords

[No Author keywords available]

Indexed keywords

AMINES; BIOINFORMATICS; LEARNING ALGORITHMS; NEUROPHYSIOLOGY;

ACTOR CRITIC; ACTOR-CRITIC ALGORITHM; ACTOR-CRITIC LEARNING; CONVERGENCE PROPERTIES; DOPAMINE; FUNCTIONS APPROXIMATIONS; LEARNING CONVERGENCE; NEURAL IMPLEMENTATIONS; REINFORCEMENT LEARNINGS; TEMPORAL DIFFERENCES;

REINFORCEMENT LEARNING;

EID: 79959855306 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (14)

References (24)

1
- 34548049545
- Reinforcement learning, spike time dependent plasticity and the bcm rule
- D. Baras and R. Meir. Reinforcement learning, spike time dependent plasticity and the bcm rule. Neural Comput., 19(8):22452279, 2007
- (2007) Neural Comput. , vol.19 , Issue.8 , pp. 22452279
- Baras, D.¹ Meir, R.²

2
- 2542506169
- (Technical rep.). Canberra: Research School of Information Sciences and Engineering Australian National University
- J. Baxter and P.L. Bartlett. Hebbian synaptic modifications in spiking neurons that learn. (Technical rep.). Canberra: Research School of Information Sciences and Engineering, Australian National University, 1999.
- (1999) Hebbian Synaptic Modifications in Spiking Neurons That Learn
- Baxter, J.¹ Bartlett, P.L.²

3
- 0013535965
- Infinite-horizon policy-gradient estimation
- J. Baxter and P.L. Bartlett. Infinite-Horizon Policy-Gradient Estimation. J. of Artificial Intelligence Research, 15:319-350, 2001.
- (2001) J. of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

4
- 85162016901
- 3rd Ed. Athena Scinetific
- D.P. Bertsekas. Dynamic Programming and Optimal Control, Vol I., 3rd Ed. Athena Scinetific, 2006.
- (2006) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

5
- 85162049326
- Incremental natural actor-critic algorithms
- J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors. MIT Press, Cambridge, MA
- S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and M. Lee. Incremental natural actor-critic algorithms. In J.C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 105-112. MIT Press, Cambridge, MA, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 105-112
- Bhatnagar, S.¹ Sutton, R.² Ghavamzadeh, M.³ Lee, M.⁴

6
- 67650346847
- Natural actor-critic algorithms
- To appear
- S. Bhatnagar, R.S. Sutton, M. Ghavamzadeh, and M. Lee. Natural actor-critic algorithms. Automatica, To appear, 2008.
- (2008) Automatica
- Bhatnagar, S.¹ Sutton, R.S.² Ghavamzadeh, M.³ Lee, M.⁴

7
- 0031076413
- Stochastic approximation with two time scales
- V.S. Borkar. Stochastic approximation with two time scales. Syst. Control Lett., 29(5):291294, 1997.
- (1997) Syst. Control Lett. , vol.29 , Issue.5 , pp. 291294
- Borkar, V.S.¹

8
- 0003618624
- Springer
- P. Bremaud. Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. Springer, 1999.
- (1999) Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues
- Bremaud, P.¹

9
- 34249708388
- Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity
- R.V. Florian. Reinforcement learning through modulation of spike-timing-dependent synaptic plasticity. Neural Computation, 19:14681502, 2007.
- (2007) Neural Computation , vol.19 , pp. 14681502
- Florian, R.V.¹

10
- 0004169893
- Kluwer Academic Publishers
- R.G. Gallager. Discrete Stochastic Processes. Kluwer Academic Publishers, 1995.
- (1995) Discrete Stochastic Processes
- Gallager, R.G.¹

11
- 0004017463
- Cambridge University Press, Cambridge
- W. Gerstner and W.M. Kistler. Spinking Neuron Models. Cambridge University Press, Cambridge, 2002.
- (2002) Spinking Neuron Models
- Gerstner, W.¹ Kistler, W.M.²

12
- 34948906745
- Solving the distal reward problem through linkage of STDP and dopamine signaling
- DOI 10.1093/cercor/bhl152
- E.M. Izhikevich. Solving the Distal Reward Problem through Linkage of STDP and Dopamine Signaling. Cerebral Cortex, 17(10):2443-52, 2007. (Pubitemid 47517479)
- (2007) Cerebral Cortex , vol.17 , Issue.10 , pp. 2443-2452
- Izhikevich, E.M.¹

13
- 4043069840
- On actor critic algorithms
- V.R. Konda and J. Tsitsiklis. On actor critic algorithms. SIAM J. Control Optim., 42(4):11431166, 2003.
- (2003) SIAM J. Control Optim. , vol.42 , Issue.4 , pp. 11431166
- Konda, V.R.¹ Tsitsiklis, J.²

14
- 9944258743
- Springer
- H.J. Kushner and G.G. Yin. Stochastic Approximation Algorithms and Applications. Springer, 1997.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.G.²

15
- 0035249254
- Simulation-based optimization of markov reward processes
- P. Marbach and J. Tsitsiklis. Simulation-Based Optimization of Markov Reward Processes. IEEE. Trans. Auto. Cont., 46:191-209, 1998.
- (1998) IEEE. Trans. Auto. Cont. , vol.46 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, J.²

16
- 0029981543
- A framework for mesencephalic dopamine systems based on predictive hebbian learning
- P.R. Montague, P. Dayan, and T.J. Sejnowski. A framework for mesencephalic dopamine systems based on predictive hebbian learning. Journal of Neuroscience, 16:19361947, 1996.
- (1996) Journal of Neuroscience , vol.16 , pp. 19361947
- Montague, P.R.¹ Dayan, P.² Sejnowski, T.J.³

17
- 1942520195
- Dissociable roles of ventral and dorsal striatum in instrumental conditioning
- J. ODoherty, P. Dayan, J. Schultz, R. Deichmann, K. Friston, and R.J. Dolan. Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science, 304:452454, 2004.
- (2004) Science , vol.304 , pp. 452454
- Odoherty, J.¹ Dayan, P.² Schultz, J.³ Deichmann, R.⁴ Friston, K.⁵ Dolan, R.J.⁶

18
- 0036592025
- Dopamine-dependent plasticity of corticostriatal synapses
- J.N.J. Reynolds and J.R.Wickens. Dopamine-dependent plasticity of corticostriatal synapses. Neural Networks, 15(4-6):507521, 2002.
- (2002) Neural Networks , vol.15 , Issue.4-6 , pp. 507521
- Reynolds, J.N.J.¹ Wickens, J.R.²

19
- 0036121949
- Development, learning and memory in large random networks of cortical neurons: Lessons beyond anatomy
- S. Marom and G. Shahaf. Development, learning and memory in large random networks of cortical neurons: lessons beyond anatomy. Quarterly Reviews of Biophysics, 35:6387, 2002.
- (2002) Quarterly Reviews of Biophysics , vol.35 , pp. 6387
- Marom, S.¹ Shahaf, G.²

20
- 0034576323
- Multiple reward signals in the brain
- Dec
- W. Schultz. Multiple reward signals in the brain. Nature Reviews Neuroscience, 1:199207, Dec. 2000.
- (2000) Nature Reviews Neuroscience , vol.1 , pp. 199207
- Schultz, W.¹

21
- 0032114627
- Analytical mean squared error curves for temporal difference learning
- S. Singh and P. Dayan. Analytical mean squared error curves for temporal difference learning. Machine Learning, 32:540, 1998.
- (1998) Machine Learning , vol.32 , pp. 540
- Singh, S.¹ Dayan, P.²

22
- 0004102479
- MIT Press
- R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, 1998.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

23
- 84898939480
- Policy-gradient methods for reinforcement learning with function approximation
- R. Sutton, D. McAllester, S. Singh and Y. Mansour. Policy-Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems, 12:1057-1063, 2000.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

24
- 1642534402
- Modulation of caudate activity by action contingency
- E.M. Tricomi, M.R. Delgado, and J.A. Fiez. Modulation of caudate activity by action contingency. Neuron, 41(2):281292, 2004.
- (2004) Neuron , vol.41 , Issue.2 , pp. 281292
- Tricomi, E.M.¹ Delgado, M.R.² Fiez, J.A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.