메뉴 건너뛰기




Volumn 102, Issue 4, 2014, Pages 544-571

Modeling human decision making in generalized gaussian multiarmed bandits

Author keywords

Adaptive control; human decision making; machine learning; multiarmed bandit

Indexed keywords

BAYESIAN NETWORKS; BEHAVIORAL RESEARCH; INFERENCE ENGINES; LEARNING SYSTEMS; PROBABILITY; STOCHASTIC SYSTEMS;

EID: 84897532572     PISSN: 00189219     EISSN: None     Source Type: Journal    
DOI: 10.1109/JPROC.2014.2307024     Document Type: Article
Times cited : (86)

References (52)
  • 1
    • 84883537695 scopus 로고    scopus 로고
    • Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers
    • Dec.
    • F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, "Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers," IEEE Control Syst. Mag., vol. 32, no. 6, pp. 76-105, Dec. 2012.
    • (2012) IEEE Control Syst. Mag. , vol.32 , Issue.6 , pp. 76-105
    • Lewis, F.L.1    Vrabie, D.2    Vamvoudakis, K.G.3
  • 3
    • 85012688561 scopus 로고
    • Princeton NJ, USA: Princeton Univ. Press
    • R. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.1
  • 5
    • 34249833101 scopus 로고
    • Q-learning
    • C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3-4, pp. 279-292, 1992.
    • (1992) Mach. Learn. , vol.8 , Issue.3-4 , pp. 279-292
    • Watkins, C.J.C.H.1    Dayan, P.2
  • 6
    • 56449090814 scopus 로고    scopus 로고
    • Logarithmic online regret bounds for undiscounted reinforcement learning
    • B. Schölkopf, J. Platt, and T. Hoffmanb, Eds. Cambridge, MA, USA: MIT Press
    • P. Auer and R. Ortner, "Logarithmic online regret bounds for undiscounted reinforcement learning," in Advances in Neural Information Processing Systems 19, B. Schölkopf, J. Platt, and T. Hoffmanb, Eds. Cambridge, MA, USA: MIT Press, 2007, pp. 49-56.
    • (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 49-56
    • Auer, P.1    Ortner, R.2
  • 7
    • 34250348767 scopus 로고    scopus 로고
    • Should i stay or should i go? How the human brain manages the trade-off between exploitation and exploration
    • J. D. Cohen, S. M. McClure, and A. J. Yu, "Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration," Philosoph. Trans. Roy. Soc. B, Biol. Sci., vol. 362, no. 1481, pp. 933-942, 2007.
    • (2007) Philosoph. Trans. Roy. Soc. B, Biol. Sci. , vol.362 , Issue.1481 , pp. 933-942
    • Cohen, J.D.1    McClure, S.M.2    Yu, A.J.3
  • 9
    • 0000169010 scopus 로고
    • Bandit processes and dynamic allocation indices
    • J. C. Gittins, "Bandit processes and dynamic allocation indices," J. Roy. Stat. Soc. B (Methodological), vol. 41, no. 2, pp. 148-177, 1979.
    • (1979) J. Roy. Stat. Soc. B (Methodological) , vol.41 , Issue.2 , pp. 148-177
    • Gittins, J.C.1
  • 10
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins, "Asymptotically efficient adaptive allocation rules," Adv. Appl. Math., vol. 6, no. 1, pp. 4-22, 1985.
    • (1985) Adv. Appl. Math. , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 11
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • W. R. Thompson, "On the likelihood that one unknown probability exceeds another in view of the evidence of two samples," Biometrika, vol. 25, no. 3/4, pp. 285-294, 1933.
    • (1933) Biometrika , vol.25 , Issue.3-4 , pp. 285-294
    • Thompson, W.R.1
  • 12
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • H. Robbins, "Some aspects of the sequential design of experiments," Bull. Amer. Math. Soc., vol. 58, pp. 527-535, 1952.
    • (1952) Bull. Amer. Math. Soc. , vol.58 , pp. 527-535
    • Robbins, H.1
  • 13
  • 14
  • 15
    • 52449090226 scopus 로고    scopus 로고
    • Multi-UAV dynamic routing with partial observations using restless bandit allocation indices
    • Seattle, Washington, USA Jun.
    • J. L. Ny, M. Dahleh, and E. Feron, "Multi-UAV dynamic routing with partial observations using restless bandit allocation indices," in Proc. Amer. Control Conf., Seattle, Washington, USA, Jun. 2008, pp. 4220-4225.
    • (2008) Proc. Amer. Control Conf. , pp. 4220-4225
    • Ny, J.L.1    Dahleh, M.2    Feron, E.3
  • 16
    • 0023423149 scopus 로고
    • A sequential study of migration and job search
    • B. P. McCall and J. J. McCall, "A sequential study of migration and job search," J. Labor Econ., vol. 5, no. 4, pp. 452-476, 1987.
    • (1987) J. Labor Econ. , vol.5 , Issue.4 , pp. 452-476
    • McCall, B.P.1    McCall, J.J.2
  • 17
    • 84893765224 scopus 로고    scopus 로고
    • Autonomous mobile acoustic relay positioning as a multi-armed bandit with switching costs
    • Tokyo, Japan, Nov.
    • M. Y. Cheung, J. Leighton, and F. S. Hover, "Autonomous mobile acoustic relay positioning as a multi-armed bandit with switching costs," in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Tokyo, Japan, Nov. 2013, pp. 3368-3373.
    • (2013) Proc IEEE/RSJ Int. Conf. Intell. Robots Syst. , pp. 3368-3373
    • Cheung, M.Y.1    Leighton, J.2    Hover, F.S.3
  • 18
    • 34948834122 scopus 로고
    • Test of optimal sampling by foraging great tits
    • J. R. Krebs, A. Kacelnik, and P. Taylorm, "Test of optimal sampling by foraging great tits," Nature, vol. 275, no. 5675, pp. 27-31, 1978.
    • (1978) Nature , vol.275 , Issue.5675 , pp. 27-31
    • Krebs, J.R.1    Kacelnik, A.2    Taylorm, P.3
  • 19
    • 0000616723 scopus 로고
    • Sample mean based index policies with O regret for the multi-armed bandit problem
    • R. Agrawal, "Sample mean based index policies with O regret for the multi-armed bandit problem," Adv. Appl. Probab., vol. 27, no. 4, pp. 1054-1078, 1995.
    • (1995) Adv. Appl. Probab. , vol.27 , Issue.4 , pp. 1054-1078
    • Agrawal, R.1
  • 20
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Mach. Learn., vol. 47, no. 2, pp. 235-256, 2002.
    • (2002) Mach. Learn. , vol.47 , Issue.2 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 21
    • 84874045238 scopus 로고    scopus 로고
    • Regret analysis of stochastic and nonstochastic multi-armed bandit problems
    • S. Bubeck and N. Cesa-Bianchi, "Regret analysis of stochastic and nonstochastic multi-armed bandit problems," Mach. Learn., vol. 5, no. 1, pp. 1-122, 2012.
    • (2012) Mach. Learn. , vol.5 , Issue.1 , pp. 1-122
    • Bubeck, S.1    Cesa-Bianchi, N.2
  • 22
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • J.-Y. Audibert, R. Munos, and C. Szepesvári, "Exploration- exploitation tradeoff using variance estimates in multi-armed bandits," Theor. Comput. Sci., vol. 410, no. 19, pp. 1876-1902, 2009.
    • (2009) Theor. Comput. Sci. , vol.410 , Issue.19 , pp. 1876-1902
    • Audibert, J.-Y.1    Munos, R.2    Szepesvári, C.3
  • 23
    • 0007163041 scopus 로고    scopus 로고
    • Finite-time regret bounds for the multiarmed bandit problem
    • Madison, WI, USA Jul.
    • N. Cesa-Bianchi and P. Fischer, "Finite-time regret bounds for the multiarmed bandit problem," in Proc. 15th Int. Conf. Mach. Learn., Madison, WI, USA, Jul. 1998, pp. 100-108.
    • (1998) Proc. 15th Int. Conf. Mach. Learn. , pp. 100-108
    • Cesa-Bianchi, N.1    Fischer, P.2
  • 24
    • 84898437076 scopus 로고    scopus 로고
    • The KL-UCB algorithm for bounded stochastic bandits and beyond
    • A. Garivier and O. Cappé, "The KL-UCB algorithm for bounded stochastic bandits and beyond," in Proc. Conf. Comput. Learn. Theory, 2011, pp. 359-376.
    • (2011) Proc. Conf. Comput. Learn. Theory , pp. 359-376
    • Garivier, A.1    Cappé, O.2
  • 26
    • 84860236413 scopus 로고    scopus 로고
    • Information-theoretic regret bounds for Gaussian process optimization in the bandit setting
    • May
    • N. Srinivas, A. Krause, S. M. Kakade, and M. Seeger, "Information- theoretic regret bounds for Gaussian process optimization in the bandit setting," IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 3250-3265, May 2012.
    • (2012) IEEE Trans. Inf. Theory , vol.58 , Issue.5 , pp. 3250-3265
    • Srinivas, N.1    Krause, A.2    Kakade, S.M.3    Seeger, M.4
  • 27
    • 84898052054 scopus 로고    scopus 로고
    • Analysis of Thompson sampling for the multi-armed bandit problem
    • S.Mannor, N. Srebro, and R. C. Williamson, Eds.
    • S. Agrawal and N. Goyal, "Analysis of Thompson sampling for the multi-armed bandit problem," in Proc. 25th Annu. Conf. Learn. Theory, S.Mannor, N. Srebro, and R. C. Williamson, Eds., 2012, pp. 391-3926.
    • (2012) Proc. 25th Annu. Conf. Learn. Theory , pp. 391-3926
    • Agrawal, S.1    Goyal, N.2
  • 28
    • 84954519509 scopus 로고    scopus 로고
    • On Bayesian upper confidence bounds for bandit problems
    • La Palma, Canary Islands, Spain, Apr.
    • E. Kaufmann, O. Cappé, and A. Garivier, "On Bayesian upper confidence bounds for bandit problems," in Proc. Int. Conf. Artif. Intell. Stat., La Palma, Canary Islands, Spain, Apr. 2012, pp. 592-600.
    • (2012) Proc. Int. Conf. Artif. Intell. Stat. , pp. 592-600
    • Kaufmann, E.1    Cappé, O.2    Garivier, A.3
  • 29
    • 0024089489 scopus 로고
    • Asymptotically efficient adaptive allocation rules for the multi-armed bandit problem with switching cost
    • Oct.
    • R. Agrawal, M. V. Hedge, and D. Teneketzis, "Asymptotically efficient adaptive allocation rules for the multi-armed bandit problem with switching cost," IEEE Trans. Autom. Control, vol. 33, no. AC-10, pp. 899-906, Oct. 1988.
    • (1988) IEEE Trans. Autom. Control , vol.33 , Issue.AC-10 , pp. 899-906
    • Agrawal, R.1    Hedge, M.V.2    Teneketzis, D.3
  • 31
    • 0030108809 scopus 로고    scopus 로고
    • Multi-armed bandits with switching penalties
    • Mar.
    • M. Asawa and D. Teneketzis, "Multi-armed bandits with switching penalties," IEEE Trans. Autom. Control, vol. 41, no. 3, pp. 328-348, Mar. 1996.
    • (1996) IEEE Trans. Autom. Control , vol.41 , Issue.3 , pp. 328-348
    • Asawa, M.1    Teneketzis, D.2
  • 32
    • 10944236938 scopus 로고    scopus 로고
    • A survey on the bandit problem with switching costs
    • T. Jun, "A survey on the bandit problem with switching costs," De Economist, vol. 152, no. 4, pp. 513-541, 2004.
    • (2004) De Economist , vol.152 , Issue.4 , pp. 513-541
    • Jun, T.1
  • 33
    • 77955660815 scopus 로고    scopus 로고
    • Regret bounds for sleeping experts and bandits
    • R. Kleinberg, A. Niculescu-Mizil, and Y. Sharma, "Regret bounds for sleeping experts and bandits," Mach. Learn., vol. 80, no. 2-3, pp. 245-272, 2010.
    • (2010) Mach. Learn. , vol.80 , Issue.2-3 , pp. 245-272
    • Kleinberg, R.1    Niculescu-Mizil, A.2    Sharma, Y.3
  • 34
    • 77951576301 scopus 로고    scopus 로고
    • Bayesian modeling of human sequential decision-making on the multi-armed bandit problem
    • B. C. Love, K. McRae, and V. M. Sloutsky, Eds., Washington, DC, USA Jul.
    • D. Acuña and P. Schrater, "Bayesian modeling of human sequential decision-making on the multi-armed bandit problem," in Proc. 30th Annu. Conf. Cogn. Sci. Soc., B. C. Love, K. McRae, and V. M. Sloutsky, Eds., Washington, DC, USA, Jul. 2008, pp. 2065-2070.
    • (2008) Proc. 30th Annu. Conf. Cogn. Sci. Soc. , pp. 2065-2070
    • Acuña, D.1    Schrater, P.2
  • 35
    • 78651226963 scopus 로고    scopus 로고
    • Structure learning in human sequential decision-making
    • D. E. Acuña and P. Schrater, "Structure learning in human sequential decision-making," PLoS Comput. Biol., vol. 6, no. 12, 2010, e1001003.
    • (2010) PLoS Comput. Biol. , vol.6 , Issue.12
    • Acuña, D.E.1    Schrater, P.2
  • 36
    • 67349268975 scopus 로고    scopus 로고
    • A Bayesian analysis of human decision-making on bandit problems
    • M. Steyvers, M. D. Lee, and E. Wagenmakers, "A Bayesian analysis of human decision-making on bandit problems," J. Math. Psychol., vol. 53, no. 3, pp. 168-179, 2009.
    • (2009) J. Math. Psychol. , vol.53 , Issue.3 , pp. 168-179
    • Steyvers, M.1    Lee, M.D.2    Wagenmakers, E.3
  • 37
    • 79952189388 scopus 로고    scopus 로고
    • Psychological models of human and optimal performance in bandit problems
    • M. D. Lee, S. Zhang, M. Munro, and M. Steyvers, "Psychological models of human and optimal performance in bandit problems," Cogn. Syst. Res., vol. 12, no. 2, pp. 164-174, 2011.
    • (2011) Cogn. Syst. Res. , vol.12 , Issue.2 , pp. 164-174
    • Lee, M.D.1    Zhang, S.2    Munro, M.3    Steyvers, M.4
  • 38
    • 84898947296 scopus 로고    scopus 로고
    • Cheap but clever: Human active learning in a bandit setting
    • Berlin, Germany, Aug.
    • S. Zhang and A. J. Yu, "Cheap but clever: Human active learning in a bandit setting," in Proc. 35th Annu. Conf. Cogn. Sci. Soc., Berlin, Germany, Aug. 2013, pp. 1647-1652.
    • (2013) Proc. 35th Annu. Conf. Cogn. Sci. Soc. , pp. 1647-1652
    • Zhang, S.1    Yu, A.J.2
  • 39
    • 84897487168 scopus 로고    scopus 로고
    • Why the grass is greener on the other side: Behavioral evidence for an ambiguity bonus in human exploratory decision-making
    • Washington, DC, USA, Nov. Program No. 830.10
    • R. C. Wilson, A. Geana, J. M. White, E. A. Ludvig, and J. D. Cohen, "Why the grass is greener on the other side: Behavioral evidence for an ambiguity bonus in human exploratory decision-making," in Proc. Neurosci. Abstr., Washington, DC, USA, Nov. 2011, Program No. 830.10.
    • (2011) Proc. Neurosci. Abstr.
    • Wilson, R.C.1    Geana, A.2    White, J.M.3    Ludvig, E.A.4    Cohen, J.D.5
  • 40
    • 84897482822 scopus 로고    scopus 로고
    • Group foraging task reveals separable influences of individual experience and social information
    • New Orleans, LA, USA, Oct. Program No. 596.12
    • D. Tomlin, A. Nedic, R. C. Wilson, P. Holmes, and J. D. Cohen, "Group foraging task reveals separable influences of individual experience and social information," in Proc. Neurosci. Abstr., New Orleans, LA, USA, Oct. 2012, Program No. 596.12.
    • (2012) Proc. Neurosci. Abstr.
    • Tomlin, D.1    Nedic, A.2    Wilson, R.C.3    Holmes, P.4    Cohen, J.D.5
  • 41
    • 84874248431 scopus 로고    scopus 로고
    • Towards optimization of a human-inspired heuristic for solving explore-exploit problems
    • Maui, HI, USA, Dec.
    • P. Reverdy, R. C. Wilson, P. Holmes, and N. E. Leonard, "Towards optimization of a human-inspired heuristic for solving explore-exploit problems," in Proc. IEEE Conf. Decision Control, Maui, HI, USA, Dec. 2012, pp. 2820-2825.
    • (2012) Proc IEEE Conf. Decision Control , pp. 2820-2825
    • Reverdy, P.1    Wilson, R.C.2    Holmes, P.3    Leonard, N.E.4
  • 43
    • 67650691734 scopus 로고    scopus 로고
    • Near-optimal nonmyopic value of information in graphical models
    • Edinburgh, Scotland Jul.
    • A. Krause and C. E. Guestrin, "Near-optimal nonmyopic value of information in graphical models," in Proc. 21st Conf. Uncertainty Artif. Intell., Edinburgh, Scotland, Jul. 2005, pp. 324-331.
    • (2005) Proc. 21st Conf. Uncertainty Artif. Intell. , pp. 324-331
    • Krause, A.1    Guestrin, C.E.2
  • 45
    • 84972487889 scopus 로고
    • Simulated annealing
    • D. Bertsimas and J. N. Tsitsiklis, "Simulated annealing," Stat. Sci., vol. 8, no. 1, pp. 10-15, 1993.
    • (1993) Stat. Sci. , vol.8 , Issue.1 , pp. 10-15
    • Bertsimas, D.1    Tsitsiklis, J.N.2
  • 46
    • 0001345363 scopus 로고
    • Convergence and finite-time behavior of simulated annealing
    • D. Mitra, F. Romeo, and A. Sangiovanni-Vincentelli, "Convergence and finite-time behavior of simulated annealing," Adv. Appl. Probab., vol. 18, no. 3, pp. 747-771, 1986.
    • (1986) Adv. Appl. Probab. , vol.18 , Issue.3 , pp. 747-771
    • Mitra, D.1    Romeo, F.2    Sangiovanni-Vincentelli, A.3
  • 47
    • 26444479778 scopus 로고
    • Optimization by simulated annealing
    • S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, "Optimization by simulated annealing," Science, vol. 220, no. 4598, pp. 671-680, 1983.
    • (1983) Science , vol.220 , Issue.4598 , pp. 671-680
    • Kirkpatrick, S.1    Gelatt Jr., C.D.2    Vecchi, M.P.3
  • 48
    • 0003205588 scopus 로고
    • Fundamentals of statistical signal processing
    • Englewood Cliffs, NJ, USA: Prentice-Hall
    • S. M. Kay, Fundamentals of Statistical Signal Processing, Volume I: Estimation Theory. Englewood Cliffs, NJ, USA: Prentice-Hall, 1993.
    • (1993) Estimation Theory , vol.1
    • Kay, S.M.1
  • 49
    • 0000905617 scopus 로고
    • Adjustment of an inverse matrix corresponding to a change in one element of a given matrix
    • J. Sherman and W. J. Morrison, "Adjustment of an inverse matrix corresponding to a change in one element of a given matrix," Ann. Math. Stat., vol. 21, no. 1, pp. 124-127, 1950.
    • (1950) Ann. Math. Stat. , vol.21 , Issue.1 , pp. 124-127
    • Sherman, J.1    Morrison, W.J.2
  • 50
    • 79960392344 scopus 로고    scopus 로고
    • Amazon's mechanical Turk: A new source of inexpensive, yet high-quality, data?
    • DOI: 10.1177/1745691610393980
    • M. Buhrmester, T. Kwang, and S. D. Gosling, "Amazon's mechanical Turk: A new source of inexpensive, yet high-quality, data?" Perspectives Psychol. Sci., vol. 6, no. 1, pp. 3-5, 2011, DOI: 10.1177/1745691610393980.
    • (2011) Perspectives Psychol. Sci. , vol.6 , Issue.1 , pp. 3-5
    • Buhrmester, M.1    Kwang, T.2    Gosling, S.D.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.