메뉴 건너뛰기




Volumn 36, Issue 2, 2012, Pages 333-358

When, What, and How Much to Reward in Reinforcement Learning-Based Models of Cognition

Author keywords

Adaptive behavior; Choice; Cognitive architecture; Expected utility; Expected value; Reinforcement learning; Skill acquisition and learning; Strategy selection

Indexed keywords

ADAPTIVE BEHAVIOR; ARTICLE; COGNITION; HUMAN; PSYCHOLOGICAL MODEL; REINFORCEMENT;

EID: 84857974114     PISSN: 03640213     EISSN: None     Source Type: Journal    
DOI: 10.1111/j.1551-6709.2011.01222.x     Document Type: Article
Times cited : (27)

References (52)
  • 1
    • 42549101410 scopus 로고    scopus 로고
    • Comparison of decision learning models using the generalization criterion method
    • Ahn, W.-Y., Busemeyer, J. R., Wagenmakers, E., & Stout, J. C. (2008). Comparison of decision learning models using the generalization criterion method. Cognitive Science, 32, 1376-1402.
    • (2008) Cognitive Science , vol.32 , pp. 1376-1402
    • Ahn, W.-Y.1    Busemeyer, J.R.2    Wagenmakers, E.3    Stout, J.C.4
  • 2
    • 0004102787 scopus 로고
    • Hillsdale, NJ: Lawrence Erlbaum Associates
    • Anderson, J. R. (1993). Rules of the mind. Hillsdale, NJ: Lawrence Erlbaum Associates.
    • (1993) Rules of the mind
    • Anderson, J.R.1
  • 6
    • 84970188424 scopus 로고
    • Reflections of the environment in memory
    • Anderson, J. R., & Schooler, L. J. (1991). Reflections of the environment in memory. Psychological Science, 2, 396-408.
    • (1991) Psychological Science , vol.2 , pp. 396-408
    • Anderson, J.R.1    Schooler, L.J.2
  • 9
    • 84857929186 scopus 로고    scopus 로고
    • On the role of embodiment in modeling natural behaviors
    • W. D. Gray (Ed.). New York: Oxford University Press
    • Ballard, D. H., & Sprague, N. (2007). On the role of embodiment in modeling natural behaviors. In W. D. Gray (Ed.), Integrated models of cognitive systems (pp. 283-296). New York: Oxford University Press.
    • (2007) Integrated models of cognitive systems , pp. 283-296
    • Ballard, D.H.1    Sprague, N.2
  • 11
    • 84857938466 scopus 로고    scopus 로고
    • Producer). ACT-R 6 reference manual. Accessed June 2008
    • Bothell, D. (Producer). (2008) ACT-R 6 reference manual. Accessed June 2008.
    • (2008)
    • Bothell, D.1
  • 12
    • 70350566799 scopus 로고    scopus 로고
    • Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective
    • Botvinick, M., Niv, Y., & Barto, A. G. (2009). Hierarchically organized behavior and its neural foundations: A reinforcement learning perspective. Cognition, 113, 262-280.
    • (2009) Cognition , vol.113 , pp. 262-280
    • Botvinick, M.1    Niv, Y.2    Barto, A.G.3
  • 13
    • 52249095975 scopus 로고    scopus 로고
    • Neurocomputational mechanisms of reinforcement-guided learning in humans: A review
    • Cohen, M. X. (2008). Neurocomputational mechanisms of reinforcement-guided learning in humans: A review. Cognitive, Affective & Behavioral Neuroscience, 8, 113-125.
    • (2008) Cognitive, Affective & Behavioral Neuroscience , vol.8 , pp. 113-125
    • Cohen, M.X.1
  • 15
    • 70350566659 scopus 로고    scopus 로고
    • Reinforcement learning and higher level cognition: Introduction to special issue
    • Daw, N. D., & Frank, M. J. (2009). Reinforcement learning and higher level cognition: Introduction to special issue. Cognition, 113, 259-261.
    • (2009) Cognition , vol.113 , pp. 259-261
    • Daw, N.D.1    Frank, M.J.2
  • 16
    • 0002251031 scopus 로고
    • Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing
    • Edwards, W. (1965). Optimal strategies for seeking information: Models for statistics, choice reaction times, and human information processing. Journal of Mathematical Psychology, 2, 312-329.
    • (1965) Journal of Mathematical Psychology , vol.2 , pp. 312-329
    • Edwards, W.1
  • 17
    • 26844491600 scopus 로고    scopus 로고
    • On adaptation, maximization, and reinforcement learning among cognitive strategies
    • Erev, I., & Barron, G. (2005). On adaptation, maximization, and reinforcement learning among cognitive strategies. Psychological Review, 112, 912-931.
    • (2005) Psychological Review , vol.112 , pp. 912-931
    • Erev, I.1    Barron, G.2
  • 18
    • 46649087024 scopus 로고
    • The information capacity of the human motor system in controlling the amplitude of movement
    • Fitts, P. M. (1954). The information capacity of the human motor system in controlling the amplitude of movement. Journal of Experimental Psychology, 47, 381-391.
    • (1954) Journal of Experimental Psychology , vol.47 , pp. 381-391
    • Fitts, P.M.1
  • 19
    • 8844219940 scopus 로고    scopus 로고
    • Extending the computational abilities of the procedural learning mechanism in ACT-R
    • K. D. Forbus, D. Gentner, & T. Regier (Eds.). Austin, TX: Cognitive Science Society
    • Fu, W. T., & Anderson, J. R. (2004). Extending the computational abilities of the procedural learning mechanism in ACT-R. In: K. D. Forbus, D. Gentner, & T. Regier (Eds.), Proceedings of the 26th Annual Meeting of the Cognitive Science Society (pp. 416-421). Austin, TX: Cognitive Science Society.
    • (2004) Proceedings of the 26th Annual Meeting of the Cognitive Science Society , pp. 416-421
    • Fu, W.T.1    Anderson, J.R.2
  • 20
    • 33745108748 scopus 로고    scopus 로고
    • From recurrent choice to skill learning: A reinforcement-learning model
    • Fu, W. T., & Anderson, J. R. (2006). From recurrent choice to skill learning: A reinforcement-learning model. Journal of Experimental Psychology: General, 135, 184-206.
    • (2006) Journal of Experimental Psychology: General , vol.135 , pp. 184-206
    • Fu, W.T.1    Anderson, J.R.2
  • 21
    • 33645093715 scopus 로고    scopus 로고
    • Suboptimal tradeoffs in information seeking
    • Fu, W. T., & Gray, W. D. (2006). Suboptimal tradeoffs in information seeking. Cognitive Psychology, 52, 195-242.
    • (2006) Cognitive Psychology , vol.52 , pp. 195-242
    • Fu, W.T.1    Gray, W.D.2
  • 22
    • 0034569209 scopus 로고    scopus 로고
    • Milliseconds matter: An introduction to microstrategies and to their use in describing and predicting interactive behavior
    • Gray, W. D., & Boehm-Davis, D. A. (2000). Milliseconds matter: An introduction to microstrategies and to their use in describing and predicting interactive behavior. Journal of Experimental Psychology: Applied, 6, 322-335.
    • (2000) Journal of Experimental Psychology: Applied , vol.6 , pp. 322-335
    • Gray, W.D.1    Boehm-Davis, D.A.2
  • 23
    • 2442656733 scopus 로고    scopus 로고
    • Soft constraints in interactive behavior: The case of ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head
    • Gray, W. D., & Fu, W. T. (2004). Soft constraints in interactive behavior: The case of ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head. Cognitive Science, 28, 359-382.
    • (2004) Cognitive Science , vol.28 , pp. 359-382
    • Gray, W.D.1    Fu, W.T.2
  • 24
    • 8844241372 scopus 로고    scopus 로고
    • Adapting to the task environment: Explorations in expected value
    • Gray, W. D., Schoelles, M. J., & Sims, C. R. (2005). Adapting to the task environment: Explorations in expected value. Cognitive Systems Research, 6, 27-40.
    • (2005) Cognitive Systems Research , vol.6 , pp. 27-40
    • Gray, W.D.1    Schoelles, M.J.2    Sims, C.R.3
  • 25
    • 33746347681 scopus 로고    scopus 로고
    • The soft constraints hypothesis: A rational analysis approach to resource allocation for interactive behavior
    • Gray, W. D., Sims, C. R., Fu, W. T., & Schoelles, M. J. (2006). The soft constraints hypothesis: A rational analysis approach to resource allocation for interactive behavior. Psychological Review, 113, 461-482.
    • (2006) Psychological Review , vol.113 , pp. 461-482
    • Gray, W.D.1    Sims, C.R.2    Fu, W.T.3    Schoelles, M.J.4
  • 26
    • 70350572378 scopus 로고    scopus 로고
    • Short-term gains, long-term pains: How cues about state aid learning in dynamic environments
    • Gureckis, T. M., & Love, B. C. (2009). Short-term gains, long-term pains: How cues about state aid learning in dynamic environments. Cognition, 113, 293-313.
    • (2009) Cognition , vol.113 , pp. 293-313
    • Gureckis, T.M.1    Love, B.C.2
  • 27
    • 84970199822 scopus 로고
    • Behavior, reinforcement and utility
    • Herrnstein, R. J. (1990). Behavior, reinforcement and utility. Psychological Science, 1, 217-224.
    • (1990) Psychological Science , vol.1 , pp. 217-224
    • Herrnstein, R.J.1
  • 28
    • 85047670409 scopus 로고    scopus 로고
    • The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity
    • Holroyd, C. B., & Coles, M. G. H. (2002). The neural basis of human error processing: Reinforcement learning, dopamine, and the error-related negativity. Psychological Review, 109, 679-709.
    • (2002) Psychological Review , vol.109 , pp. 679-709
    • Holroyd, C.B.1    Coles, M.G.H.2
  • 29
    • 70350440484 scopus 로고    scopus 로고
    • Rational adaptation under task and processing constraints: Implications for testing theories of cognition and action
    • Howes, A., Lewis, R. L., & Vera, A. (2009). Rational adaptation under task and processing constraints: Implications for testing theories of cognition and action. Psychological Review, 116, 717-751.
    • (2009) Psychological Review , vol.116 , pp. 717-751
    • Howes, A.1    Lewis, R.L.2    Vera, A.3
  • 30
    • 84994223687 scopus 로고    scopus 로고
    • Strategic adaptation to performance objectives in a dual-task setting
    • Janssen, C. P., & Brumby, D. P. (2010). Strategic adaptation to performance objectives in a dual-task setting. Cognitive Science, 34, 1548-1560.
    • (2010) Cognitive Science , vol.34 , pp. 1548-1560
    • Janssen, C.P.1    Brumby, D.P.2
  • 31
    • 79957967824 scopus 로고    scopus 로고
    • Identifying optimum performance trade-offs using a cognitively bounded rational analysis model of discretionary task interleaving
    • Janssen, C. P., Brumby, D. P., Dowell, J., Chater, N., & Howes, A. (2011). Identifying optimum performance trade-offs using a cognitively bounded rational analysis model of discretionary task interleaving. Topics in Cognitive Science, 3, 123-139.
    • (2011) Topics in Cognitive Science , vol.3 , pp. 123-139
    • Janssen, C.P.1    Brumby, D.P.2    Dowell, J.3    Chater, N.4    Howes, A.5
  • 32
    • 0041020772 scopus 로고    scopus 로고
    • Choice
    • J. R. Anderson & C. Lebiere (Eds.). Mahwah, NJ: Erlbaum
    • Lovett, M. C. (1998). Choice. In J. R. Anderson & C. Lebiere (Eds.), The atomic components of thought (pp. 255-296). Mahwah, NJ: Erlbaum.
    • (1998) The atomic components of thought , pp. 255-296
    • Lovett, M.C.1
  • 33
    • 0029981543 scopus 로고    scopus 로고
    • A framework for mesencephalic dopamine systems based on predictive hebbian learning
    • Montague, P. R., Dayan, P., & Sejnowski, T. J. (1996). A framework for mesencephalic dopamine systems based on predictive hebbian learning. Journal of Neuroscience, 16, 1936-1947.
    • (1996) Journal of Neuroscience , vol.16 , pp. 1936-1947
    • Montague, P.R.1    Dayan, P.2    Sejnowski, T.J.3
  • 35
    • 84857932876 scopus 로고    scopus 로고
    • Rewards and punishments in iterated decision making: An explanation for the frequency of the contingent event effect
    • D. D. Salvucci & G. Gunzelmann (Eds.). Philadelphia, PA: Drexel University
    • Napoli, A., & Fum, D. (2010). Rewards and punishments in iterated decision making: An explanation for the frequency of the contingent event effect. In D. D. Salvucci & G. Gunzelmann (Eds.), Proceedings of the 10th International Conference on Cognitive Modeling (pp. 175-180). Philadelphia, PA: Drexel University.
    • (2010) Proceedings of the 10th International Conference on Cognitive Modeling , pp. 175-180
    • Napoli, A.1    Fum, D.2
  • 36
    • 8844283285 scopus 로고    scopus 로고
    • SOAR-RL: Integrating reinforcement learning with SOAR
    • Nason, S., & Laird, J. E. (2005). SOAR-RL: Integrating reinforcement learning with SOAR. Cognitive Systems Research, 6, 51-59.
    • (2005) Cognitive Systems Research , vol.6 , pp. 51-59
    • Nason, S.1    Laird, J.E.2
  • 37
    • 52049117895 scopus 로고    scopus 로고
    • Feedback design for the control of a dynamic multitasking system: Dissociating outcome feedback from control feedback
    • Neth, H., Khemlani, S. S., & Gray, W. D. (2008). Feedback design for the control of a dynamic multitasking system: Dissociating outcome feedback from control feedback. Human Factors, 50, 643-651.
    • (2008) Human Factors , vol.50 , pp. 643-651
    • Neth, H.1    Khemlani, S.S.2    Gray, W.D.3
  • 38
    • 67349169259 scopus 로고    scopus 로고
    • Melioration dominates maximization: Stable suboptimal performance despite global feedback
    • R. Sun (Ed.). Austin, TX: Cognitive Science Society
    • Neth, H., Sims, C. R., & Gray, W. D. (2006). Melioration dominates maximization: Stable suboptimal performance despite global feedback. In R. Sun (Ed.), Proceedings of the 28th Annual Meeting of the Cognitive Science Society (pp. 627-632). Austin, TX: Cognitive Science Society.
    • (2006) Proceedings of the 28th Annual Meeting of the Cognitive Science Society , pp. 627-632
    • Neth, H.1    Sims, C.R.2    Gray, W.D.3
  • 40
    • 32444439058 scopus 로고    scopus 로고
    • Behavioral theories and the neurophysiology of reward
    • Schultz, W. (2006). Behavioral theories and the neurophysiology of reward. Annual Review of Psychology, 57, 87-115.
    • (2006) Annual Review of Psychology , vol.57 , pp. 87-115
    • Schultz, W.1
  • 41
    • 0030896968 scopus 로고    scopus 로고
    • A neural substrate of prediction and reward
    • Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275, 1593-1599.
    • (1997) Science , vol.275 , pp. 1593-1599
    • Schultz, W.1    Dayan, P.2    Montague, P.R.3
  • 43
    • 8844287788 scopus 로고    scopus 로고
    • Episodic versus semantic memory: An exploration of models of memory decay in the serial attention paradigm
    • M. Lovett, C. Schunn, & C. Lebiere (Eds.). Mahwah, NJ: Lawrence Erlbaum Associates
    • Sims, C. R., & Gray, W. D. (2004). Episodic versus semantic memory: An exploration of models of memory decay in the serial attention paradigm. In M. Lovett, C. Schunn, & C. Lebiere (Eds.), Proceedings of the 6th International Conference on Cognitive Modeling (pp. 279-284). Mahwah, NJ: Lawrence Erlbaum Associates.
    • (2004) Proceedings of the 6th International Conference on Cognitive Modeling , pp. 279-284
    • Sims, C.R.1    Gray, W.D.2
  • 45
    • 10044264448 scopus 로고    scopus 로고
    • Towards a standard for pointing device evaluation, perspectives on 27 years of fitts' law research in HCI
    • Soukoreff, R. W., & MacKenzie, I. S. (2004). Towards a standard for pointing device evaluation, perspectives on 27 years of fitts' law research in HCI. International Journal of Human-Computer Studies, 61, 751-789.
    • (2004) International Journal of Human-Computer Studies , vol.61 , pp. 751-789
    • Soukoreff, R.W.1    MacKenzie, I.S.2
  • 46
    • 0035740527 scopus 로고    scopus 로고
    • From implicit skills to explicit knowledge: A bottom-up model of skill learning
    • Sun, R., Merrill, E., & Peterson, T. (2001). From implicit skills to explicit knowledge: A bottom-up model of skill learning. Cognitive Science, 25, 203-244.
    • (2001) Cognitive Science , vol.25 , pp. 203-244
    • Sun, R.1    Merrill, E.2    Peterson, T.3
  • 48
    • 35648956750 scopus 로고    scopus 로고
    • Influencing cognitive strategy by manipulating information access
    • Waldron, S. M., Patrick, J., Morgan, P. L., & King, S. (2007). Influencing cognitive strategy by manipulating information access. The Computer Journal, 50, 694-702.
    • (2007) The Computer Journal , vol.50 , pp. 694-702
    • Waldron, S.M.1    Patrick, J.2    Morgan, P.L.3    King, S.4
  • 49
    • 63149124215 scopus 로고    scopus 로고
    • The strategic nature of changing your mind
    • Walsh, M. M., & Anderson, J. R. (2009). The strategic nature of changing your mind. Cognitive Psychology, 58, 416-440.
    • (2009) Cognitive Psychology , vol.58 , pp. 416-440
    • Walsh, M.M.1    Anderson, J.R.2
  • 50
    • 36749074427 scopus 로고    scopus 로고
    • Acquisition and transfer of attention allocation strategies in a multiple-task work environment
    • Wang, D. D., Proctor, R. W., & Pick, D. F. (2007). Acquisition and transfer of attention allocation strategies in a multiple-task work environment. Human Factors, 49, 995-1004.
    • (2007) Human Factors , vol.49 , pp. 995-1004
    • Wang, D.D.1    Proctor, R.W.2    Pick, D.F.3
  • 52
    • 25644450322 scopus 로고    scopus 로고
    • Comparison of basic assumptions embedded in learning models for experience-based decision making
    • Yechiam, E., & Busemeyer, J. R. (2005). Comparison of basic assumptions embedded in learning models for experience-based decision making. Psychonomic Bulletin & Review, 12, 387-402.
    • (2005) Psychonomic Bulletin & Review , vol.12 , pp. 387-402
    • Yechiam, E.1    Busemeyer, J.R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.