메뉴 건너뛰기




Volumn 16, Issue 2-3, 2013, Pages

Combining correlation-based and reward-based learning in neural control for policy improvement

Author keywords

associative learning; Classical conditioning; goal directed behavior; operant conditioning; pole balancing; reinforcement learning

Indexed keywords


EID: 84880376650     PISSN: 02195259     EISSN: None     Source Type: Journal    
DOI: 10.1142/S021952591350015X     Document Type: Article
Times cited : (10)

References (70)
  • 1
    • 84858398590 scopus 로고    scopus 로고
    • Pavlovian and instrumental Q-learning: A Rescorla and Wagner-based approach to generalization in Q-learning, in Proc
    • Alonso, E., Mondragon, E. and Kjäll-Ohlsson, N., Pavlovian and instrumental Q-learning: A Rescorla and Wagner-based approach to generalization in Q-learning, in Proc. Adaptation in Artificial and Biological Systems (2006), pp. 23-29.
    • (2006) Adaptation in Artificial and Biological Systems , pp. 23-29
    • Alonso, E.1    Mondragon, E.2    Kjäll-Ohlsson, N.3
  • 3
    • 33845876447 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization
    • Bakker, B. and Schmidhuber, J., Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization, in Proc. 8th Conf. Intelligent Autonomous Systems (2004), pp. 438-445.
    • (2004) Proc. 8th Conf. Intelligent Autonomous Systems , pp. 438-445
    • Bakker, B.1    Schmidhuber, J.2
  • 4
    • 79955402817 scopus 로고    scopus 로고
    • Action discovery for single and multi-agent reinforcement learning adv
    • Banerjee, B. and Kraemer, L., Action discovery for single and multi-agent reinforcement learning, Adv. Complex Syst. 14 (2011) 279-305.
    • (2011) Complex Syst. , vol.14 , pp. 279-305
    • Banerjee, B.1    Kraemer, L.2
  • 5
    • 84880354951 scopus 로고    scopus 로고
    • Animal behavior: Mechanism development function and evolution
    • Barnard, C., Animal Behavior: Mechanism, Development, Function, and Evolution (Pearson Education, 2004).
    • (2004) Pearson Education
    • Barnard, C.1
  • 6
    • 0141988716 scopus 로고    scopus 로고
    • Recent advances in hierarchical reinforcement learning
    • Barto, A. G. and Mahadevan, S., Recent advances in hierarchical reinforcement learning, Discrete Event Dyn. Syst. 13 (2003) 41-77.
    • (2003) Discrete Event Dyn. Syst. , vol.13 , pp. 41-77
    • Barto, A.G.1    Mahadevan, S.2
  • 9
    • 70350566799 scopus 로고    scopus 로고
    • Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective
    • Botvinick, M. M., Niv, Y. and Barto, A. C., Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective, Cognition 113 (2009) 262-280.
    • (2009) Cognition , vol.113 , pp. 262-280
    • Botvinick, M.M.1    Niv, Y.2    Barto, A.C.3
  • 11
    • 0033994172 scopus 로고    scopus 로고
    • The operant and the classical in conditioned orientation of Drosophila melanogaster at the flight simulator
    • DOI 10.1101/lm.7.2.104
    • Brembs, B. and Heisenberg, M., The operant and the classical in conditioned orientation in drosophila melanogaster at the flight simulator, Learn. Memory 7 (2000) 104-115. (Pubitemid 30213024)
    • (2000) Learning and Memory , vol.7 , Issue.2 , pp. 104-115
    • Brembs, B.1    Heisenberg, M.2
  • 13
    • 0003165497 scopus 로고    scopus 로고
    • Application of biological learning theories to mobile robot avoidance and approach behaviors adv
    • Chang, C. and Gaudiano, P., Application of biological learning theories to mobile robot avoidance and approach behaviors, Adv. Complex Syst. 1 (1998) 79-114.
    • (1998) Complex Syst. , vol.1 , pp. 79-114
    • Chang, C.1    Gaudiano, P.2
  • 15
    • 0037057808 scopus 로고    scopus 로고
    • Reward motivation, and reinforcement learning
    • Dayan, P. and Balleine, B., Reward, motivation, and reinforcement learning, Neuron 36 (2002) 285-298.
    • (2002) Neuron , vol.36 , pp. 285-298
    • Dayan, P.1    Balleine, B.2
  • 16
    • 79955403826 scopus 로고    scopus 로고
    • An empirical study of potential-based reward shaping and advice in complex, multi-agent systems
    • Devlin, S., Kudenko, D. and Grzes, M., An empirical study of potential-based reward shaping and advice in complex, multi-agent systems, Adv. Complex Syst. 14 (2011) 251-278.
    • (2011) Adv. Complex Syst. , vol.14 , pp. 251-278
    • Devlin, S.1    Kudenko, D.2    Grzes, M.3
  • 17
    • 0002278788 scopus 로고    scopus 로고
    • Hierarchical reinforcement learning with the MAXQ value function decomposition
    • Dietterich, T. G., Hierarchical reinforcement learning with the MAXQ value function decomposition, J. Artif. Intell. Res. 13 (2000) 227-303. (Pubitemid 33682087)
    • (2000) Journal of Artificial Intelligence Research , vol.13 , pp. 227-303
    • Dietterich, T.G.1
  • 18
    • 0000406101 scopus 로고    scopus 로고
    • Efficient nonlinear control with actor-tutor architecture
    • Doya, K., Efficient nonlinear control with actor-tutor architecture, in Advances in Neural Information Processing Systems (1997), pp. 1012-1018.
    • (1997) Advances in Neural Information Processing Systems , pp. 1012-1018
    • Doya, K.1
  • 19
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • Doya, K., Reinforcement learning in continuous time and space, Neural Comput. 12 (2000) 219-245.
    • (2000) Neural Comput. , vol.12 , pp. 219-245
    • Doya, K.1
  • 20
    • 38649142135 scopus 로고    scopus 로고
    • Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot
    • DOI 10.1177/0278364907084980
    • Endo, G., Morimoto, J., Matsubara, T., Nakanish, J. and Cheng, G., Learning CPGbased biped locomotion with a policy gradient method: Application to a humanoid robot, Int. J. Robot. Res. 27 (2008) 213-228. (Pubitemid 351169715)
    • (2008) International Journal of Robotics Research , vol.27 , Issue.2 , pp. 213-228
    • Endo, G.1    Morimoto, J.2    Matsubara, T.3    Nakanishi, J.4    Cheng, G.5
  • 21
    • 84880391636 scopus 로고    scopus 로고
    • A modulatory learning rule for neural learning and metalearning
    • (Shaker Verlag GmbH
    • Fischer, J., A Modulatory Learning Rule for Neural Learning and Metalearning in Real World Robots with Many Degees of Freedom (Shaker Verlag GmbH, 2003).
    • (2003) Real World Robots With Many Degees Of Freedom
    • Fischer, J.1
  • 22
    • 44649193889 scopus 로고    scopus 로고
    • Accelerated neural evolution through cooperatively coevolved synapses
    • Gomez, F., Schmidhuber, J. and Miikkulainen., R., Accelerated neural evolution through cooperatively coevolved synapses, J. Mach. Lear. Res. 9 (2008) 937-965.
    • (2008) J. Mach. Lear. Res. , vol.9 , pp. 937-965
    • Gomez, F.1    Schmidhuber, J.2    Miikkulainen, R.3
  • 23
    • 0027831281 scopus 로고
    • Neural network control for a closed-loop system using feedback-error-learning
    • Gomi, H. and Kawato, M., Neural network control for a closed-loop system using feedback-error-learning, Neural Netw. 6 (1993) 933-946.
    • (1993) Neural Netw. , vol.6 , pp. 933-946
    • Gomi, H.1    Kawato, M.2
  • 24
    • 0025600638 scopus 로고
    • A stochastic reinforcement learning algorithm for learning real-valued functions
    • Gullapalli, V., A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Netw. 3 (1990) 671-692.
    • (1990) Neural Netw. , vol.3 , pp. 671-692
    • Gullapalli, V.1
  • 27
    • 0346096511 scopus 로고    scopus 로고
    • Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain
    • DOI 10.1038/nature02194
    • Humeau, Y., Shaban, H., Bissiere, S. and Lüthi, A., Presynaptic induction of heterosynaptic associative plasticity in the mammalian brain, Nature 426 (2003) 841-845. (Pubitemid 38056869)
    • (2003) Nature , vol.426 , Issue.6968 , pp. 841-845
    • Humeau, Y.1    Shaban, H.2    Bissiere, S.3    Luthi, A.4
  • 31
    • 84880354010 scopus 로고    scopus 로고
    • Learning of whole arm manipulation with constraint of contact mode maintaining
    • Kawarai, N. and Kobayashi, Y., Learning of whole arm manipulation with constraint of contact mode maintaining, J. Robot. Mechatron. 22 (2010) 542-550.
    • (2010) J. Robot. Mechatron. , vol.22 , pp. 542-550
    • Kawarai, N.1    Kobayashi, Y.2
  • 32
    • 0023878618 scopus 로고
    • A neuronal model of classical conditioning
    • Klopf, A. H., A neuronal model of classical conditioning, Psychobiology 16 (1988) 85-123.
    • (1988) Psychobiology , vol.16 , pp. 85-123
    • Klopf, A.H.1
  • 34
    • 0036448729 scopus 로고    scopus 로고
    • A reinforcement learning with adaptive state space recruitment strategy for real autonomous mobile robots
    • Kondo, T. and Ito, K., A reinforcement learning with adaptive state space recruitment strategy for real autonomous mobile robots, in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (2002), pp. 897-902. (Pubitemid 35437923)
    • (2002) IEEE International Conference on Intelligent Robots and Systems , vol.1 , pp. 897-902
    • Kondo, T.1    Ito, K.2
  • 36
    • 72749107057 scopus 로고    scopus 로고
    • Neuroevolutionary reinforcement learning for generalized helicopter control, in GECCO 2009
    • Koppejan, R. and Whiteson, S., Neuroevolutionary reinforcement learning for generalized helicopter control, in GECCO 2009: Proc. Genetic and Evolutionary Computation Conf. (2009), pp. 145-152.
    • (2009) Proc. Genetic and Evolutionary Computation Conf. , pp. 145-152
    • Koppejan, R.1    Whiteson, S.2
  • 38
    • 33845646031 scopus 로고    scopus 로고
    • Quadruped robot obstacle negotiation via reinforcement learning
    • DOI 10.1109/ROBOT.2006.1642158, 1642158, Proceedings 2006 IEEE International Conference on Robotics and Automation, ICRA 2006
    • Lee, H., Shen, Y., Yu, C., Singh, G. and Ng, A., Quadruped robot obstacle negotiation via reinforcement learning, in Proc. IEEE Int. Conf. Robotics and Automation (2006), pp. 3003-3010. (Pubitemid 44940521)
    • (2006) Proceedings - IEEE International Conference on Robotics and Automation , vol.2006 , pp. 3003-3010
    • Lee, H.1    Shen, Y.2    Yu, C.-H.3    Singh, G.4    Ng, A.Y.5
  • 39
    • 0020972471 scopus 로고
    • Facilitation of instrumental behaviour by a Pavlovian appetitive conditioned stimulus
    • DOI 10.1037//0097-7403.9.3.225
    • Lovibond, P. F., Facilitation of instrumental behavior by a pavlovian appetitive conditioned stimulus, J. Exp. Psychol. Anim. B 9 (1983) 225-247. (Pubitemid 13022508)
    • (1983) Journal of Experimental Psychology: Animal Behavior Processes , vol.9 , Issue.3 , pp. 225-247
    • Lovibond, P.F.1
  • 41
    • 84878282737 scopus 로고    scopus 로고
    • Neural control and adaptive neural forward models for insect-like, energy-efficient, and adaptable locomotion of walking machines
    • Doi: 10.3389/fncir.2013.00012
    • Manoonpong, P., Parlitz, U. and Wörgötter, F., Neural control and adaptive neural forward models for insect-like, energy-efficient, and adaptable locomotion of walking machines, Front. Neural Circuits 7 (2013). Doi: 10.3389/fncir.2013.00012.
    • (2013) Front. Neural Circuits , vol.7
    • Manoonpong, P.1    Parlitz, U.2    Wörgötter, F.3
  • 42
    • 76249093665 scopus 로고    scopus 로고
    • Adaptive sensor-driven neural control for learning in walking machines
    • Manoonpong, P. and Wörgötter, F., Adaptive sensor-driven neural control for learning in walking machines, in Neural Information Processing, LNCS (2009), pp. 47-55.
    • (2009) Neural Information Processing, LNCS , pp. 47-55
    • Manoonpong, P.1    Wörgötter, F.2
  • 43
    • 78650211630 scopus 로고    scopus 로고
    • Extraction of reward-related feature space using correlation-based and reward-based learning methods
    • Manoonpong, P., Wörgötter, F. and Morimoto, J., Extraction of reward-related feature space using correlation-based and reward-based learning methods, in Neural Information Processing, LNCS (2010), pp. 414-421.
    • (2010) Neural Information Processing, LNCS , pp. 414-421
    • Manoonpong, P.1    Wörgötter, F.2    Morimoto, J.3
  • 46
    • 0035979437 scopus 로고    scopus 로고
    • Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning
    • DOI 10.1016/S0921-8890(01)00113-0, PII S0921889001001130
    • Morimoto, J. and Doya, K., Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning, Robot. Auton. Syst. 36 (2001) 37-51. (Pubitemid 32610235)
    • (2001) Robotics and Autonomous Systems , vol.36 , Issue.1 , pp. 37-51
    • Morimoto, J.1    Doya, K.2
  • 48
    • 0141596576 scopus 로고    scopus 로고
    • Policy invariance under reward transformations: Theory and application to reward shaping
    • Ng, A. Y., Harada, D. and Russell, S. J., Policy invariance under reward transformations: Theory and application to reward shaping, in Proc. 16th Int. Conf. Machine Learning (1999), pp. 278-287.
    • (1999) Proc. 16th Int. Conf. Machine Learning , pp. 278-287
    • Ng, A.Y.1    Harada, D.2    Russell, S.J.3
  • 49
    • 26444529967 scopus 로고    scopus 로고
    • Evolving neurocontrollers for balancing an inverted pendulum
    • PII S0954898X9889619X
    • Pasemann, F., Evolving neurocontrollers for balancing an inverted pendulum, Network-Comp. Neural 9 (1998) 495-511. (Pubitemid 128692543)
    • (1998) Network: Computation in Neural Systems , vol.9 , Issue.4 , pp. 495-511
    • Pasemann, F.1
  • 50
    • 0004281531 scopus 로고
    • Oxford University Press Oxford UK
    • Pavlov, I., Conditioned Reflexes (Oxford University Press, Oxford, UK, 1927).
    • (1927) Conditioned Reflexes
    • Pavlov, I.1
  • 51
    • 76649120218 scopus 로고    scopus 로고
    • Learning cooperative behaviours in multiagent reinforcement learning
    • Phon-Amnuaisuk, S., Learning cooperative behaviours in multiagent reinforcement learning, in Neural Information Processing, LNCS (2009), pp. 570-579.
    • (2009) Neural Information Processing, LNCS , pp. 570-579
    • Phon-Amnuaisuk, S.1
  • 52
    • 33646781302 scopus 로고    scopus 로고
    • Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only
    • DOI 10.1162/neco.2006.18.6.1380
    • Porr, B. and Wörgötter, F., Strongly improved stability and faster convergence of temporal sequence learning by using input correlations only, Neural Comput. 18 (2006) 1380-1412. (Pubitemid 43765448)
    • (2006) Neural Computation , vol.18 , Issue.6 , pp. 1380-1412
    • Porr, B.1    Worgotter, F.2
  • 53
    • 34247895540 scopus 로고    scopus 로고
    • Fast heterosynaptic learning in a robot food retrieval task inspired by the limbic system
    • DOI 10.1016/j.biosystems.2006.04.026, PII S030326470600270X
    • Porr, B. and Wörgötter, F., Fast heterosynaptic learning in a robot food retrieval task inspired by the limbic system, Biosystems 89 (2007) 294-299. (Pubitemid 46695861)
    • (2007) BioSystems , vol.89 , Issue.1-3 , pp. 294-299
    • Porr, B.1    Worgotter, F.2
  • 54
  • 55
    • 0014085947 scopus 로고
    • Two process learning theory: Relationship between pavlovian conditioning and instrumental learning
    • Rescorla, R. and Solomon, R., Two process learning theory: Relationship between pavlovian conditioning and instrumental learning, Psychol. Rev. 88 (1967) 151-182.
    • (1967) Psychol. Rev. , vol.88 , pp. 151-182
    • Rescorla, R.1    Solomon, R.2
  • 56
    • 0002109138 scopus 로고
    • A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement
    • Rescorla, R. and Wagner, A., A theory of pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement, in Classical Conditioning II : Current Research and Theory (1972) 64-99.
    • (1972) Classical Conditioning II : Current Research and Theory , pp. 64-99
    • Rescorla, R.1    Wagner, A.2
  • 58
    • 0036890614 scopus 로고    scopus 로고
    • Active perception and recognition learning system based on actor-Q architecture
    • DOI 10.1002/scj.10207
    • Shibata, K., Nishino, T. and Okabe, Y., Active perception and recoginition learning system based on Actor-Q architecture, Syst. Comput. Jpn. 33 (2002) 12-22. (Pubitemid 35372556)
    • (2002) Systems and Computers in Japan , vol.33 , Issue.14 , pp. 12-22
    • Shibata, K.1    Nishino, T.2    Okabe, Y.3
  • 61
    • 0035961179 scopus 로고    scopus 로고
    • Modeling functions of striatal dopamine modulation in learning and planning
    • DOI 10.1016/S0306-4522(00)00554-6, PII S0306452200005546
    • Suri, R. E., Bargas, J. and Arbib, M. A., Modeling functions of striatal dopamine modulation in learning and planning, Neuroscience 103 (2001) 65-85. (Pubitemid 32304347)
    • (2001) Neuroscience , vol.103 , Issue.1 , pp. 65-85
    • Suri, R.E.1    Bargas, J.2    Arbib, M.A.3
  • 63
    • 0002621983 scopus 로고
    • Animal intelligence: An experimental study of the associative process in animals
    • Thorndike, E., Animal intelligence: An experimental study of the associative process in animals, Psychol. Rev. Monogr. Suppl. 8 (1898) 68-72.
    • (1898) Psychol. Rev. Monogr. Suppl. , vol.8 , pp. 68-72
    • Thorndike, E.1
  • 64
    • 78349245906 scopus 로고    scopus 로고
    • Adaptive -greedy exploration in reinforcement learning based on value differences
    • Tokic, M., Adaptive -greedy exploration in reinforcement learning based on value differences, in Proc. KI 2010: Advances in Artificial Intelligence (2010), pp. 203-210.
    • (2010) Proc. KI 2010: Advances in Artificial Intelligence , pp. 203-210
    • Tokic, M.1
  • 68
    • 84989993724 scopus 로고
    • Auto-maintenance in the pigeon: Sustained pecking despite contingent non-reinforcement
    • Williams, D. andWilliams, H., Auto - maintenance in the pigeon: Sustained pecking despite contingent non-reinforcement, J. Exp. Anal. Behav. 12 (1969) 511-520.
    • (1969) J. Exp. Anal. Behav. , vol.12 , pp. 511-520
    • Williams, D.1    Williams, H.2
  • 70
    • 13244267004 scopus 로고    scopus 로고
    • Temporal sequence learning, prediction, and control: A review of different models and their relation to biological mechanisms
    • DOI 10.1162/0899766053011555
    • Wörgötter, F. and Porr, B., Temporal sequence learning, prediction and control - A review of different models and their relation to biological mechanisms, Neural Comp. 17 (2005) 245-319. (Pubitemid 40190477)
    • (2005) Neural Computation , vol.17 , Issue.2 , pp. 245-319
    • Worgotter, F.1    Porr, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.