-
1
-
-
0004870746
-
A problem in the sequential design of experiments
-
Bellman RE (1956) A problem in the sequential design of experiments. Sankhyā 16: 221-229.
-
(1956)
Sankhyā
, vol.16
, pp. 221-229
-
-
Bellman, R.E.1
-
3
-
-
0001043843
-
Restless bandits: Activity allocation in a changing world
-
Whittle P (1988) Restless bandits: activity allocation in a changing world. J Appl Probab 25: 287-298.
-
(1988)
J Appl Probab
, vol.25
, pp. 287-298
-
-
Whittle, P.1
-
4
-
-
33745223257
-
Cortical substrates for exploratory decisions in humans
-
Daw ND, O'Doherty JP, Dayan P, Seymour B, Dolan RJ (2006) Cortical substrates for exploratory decisions in humans. Nature 441: 876-879.
-
(2006)
Nature
, vol.441
, pp. 876-879
-
-
Daw, N.D.1
O'Doherty, J.P.2
Dayan, P.3
Seymour, B.4
Dolan, R.J.5
-
5
-
-
84864921697
-
Modeling human performance in restless bandits with particle filters
-
Available
-
Yi MS, Steyvers M, Lee M (2009) Modeling human performance in restless bandits with particle filters. The Journal of Problem Solving 2: Available: http://docs.lib.purdue.edu/jps/vol2/iss2/5/.
-
(2009)
The Journal of Problem Solving 2
-
-
Yi, M.S.1
Steyvers, M.2
Lee, M.3
-
7
-
-
57049112212
-
When does reward maximization lead to matching law?
-
Sakai Y, Fukai T (2008) When does reward maximization lead to matching law? PLoS One 3: e3795.
-
(2008)
PLoS One
, vol.3
-
-
Sakai, Y.1
Fukai, T.2
-
8
-
-
37749023538
-
The actor-critic learning is behind the matching law: Matching vs. optimal behaviors
-
Sakai Y, Fukai T (2008) The actor-critic learning is behind the matching law: Matching vs. optimal behaviors. Neural Comput 20: 227-251.
-
(2008)
Neural Comput
, vol.20
, pp. 227-251
-
-
Sakai, Y.1
Fukai, T.2
-
9
-
-
0032073263
-
Planning and acting in partially observable stochastic domains
-
Kaelbling L, Littman M, Cassandra A (1998) Planning and acting in partially observable stochastic domains. Artif Intell 101: 99-134.
-
(1998)
Artif Intell
, vol.101
, pp. 99-134
-
-
Kaelbling, L.1
Littman, M.2
Cassandra, A.3
-
12
-
-
33749251297
-
An analytic solution to discrete bayesian reinforcement learning
-
Pittsburgh, Penn
-
Poupart P, Vlassis N, Hoey J, Regan K (2006) An analytic solution to discrete bayesian reinforcement learning. In: 23rd International Conference on Machine Learning. Pittsburgh, Penn. pp 697-704.
-
(2006)
23rd International Conference on Machine Learning
, pp. 697-704
-
-
Poupart, P.1
Vlassis, N.2
Hoey, J.3
Regan, K.4
-
14
-
-
34249761849
-
Learning bayesian networks: The combination of knowledge and statistical data
-
Heckerman D, Geiger D, Chickering DM (1995) Learning bayesian networks: The combination of knowledge and statistical data. Mach Learn 20: 197-243.
-
(1995)
Mach Learn
, vol.20
, pp. 197-243
-
-
Heckerman, D.1
Geiger, D.2
Chickering, D.M.3
-
16
-
-
33746260413
-
Theory-based bayesian models of inductive learning and reasoning
-
Tenenbaum JB, Griffiths TL, Kemp C (2006) Theory-based bayesian models of inductive learning and reasoning. Trends Cogn Sci 10: 309-318.
-
(2006)
Trends Cogn Sci
, vol.10
, pp. 309-318
-
-
Tenenbaum, J.B.1
Griffiths, T.L.2
Kemp, C.3
-
17
-
-
0003787146
-
-
Princeton: Princeton University Press
-
Bellman RE (1957) Dynamic programming. Princeton: Princeton University Press.
-
(1957)
Dynamic Programming
-
-
Bellman, R.E.1
-
18
-
-
0002955623
-
A dynamic allocation index for the sequential design of experiments
-
In: Gani J, Sarkadi K, Vincze I, eds., Amsterdam: North-Holland Pub. Co
-
Gittins JC, Jones DM (1974) A dynamic allocation index for the sequential design of experiments. In: Gani J, Sarkadi K, Vincze I, eds. Progress in statistics. Amsterdam: North-Holland Pub. Co. pp 241-266.
-
(1974)
Progress in Statistics
, pp. 241-266
-
-
Gittins, J.C.1
Jones, D.M.2
-
19
-
-
34249833101
-
Technical note: Q-learning
-
Watkins C, Dayan P (1992) Technical note: Q-learning. Mach Learn 8: 279-292.
-
(1992)
Mach Learn
, vol.8
, pp. 279-292
-
-
Watkins, C.1
Dayan, P.2
-
21
-
-
0030896968
-
A neural substrate of prediction and reward
-
Schultz W, Dayan P, Montague P (1997) A neural substrate of prediction and reward. Science 275: 1593-1599.
-
(1997)
Science
, vol.275
, pp. 1593-1599
-
-
Schultz, W.1
Dayan, P.2
Montague, P.3
-
22
-
-
0031867046
-
Predictive reward signal of dopamine neurons
-
Schultz W (1998) Predictive reward signal of dopamine neurons. J Neurophysiol 80: 1-27.
-
(1998)
J Neurophysiol
, vol.80
, pp. 1-27
-
-
Schultz, W.1
-
24
-
-
0008803714
-
Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem
-
Meyer RJ, Shi Y (1995) Sequential choice under ambiguity: Intuitive solutions to the armed-bandit problem. Manage Sci 41: 817-834.
-
(1995)
Manage Sci
, vol.41
, pp. 817-834
-
-
Meyer, R.J.1
Shi, Y.2
-
25
-
-
0031287072
-
An experimental analysis of the bandit problem
-
Banks J, Olson M, Porter D (1997) An experimental analysis of the bandit problem. Econ Theory 10: 55-77.
-
(1997)
Econ Theory
, vol.10
, pp. 55-77
-
-
Banks, J.1
Olson, M.2
Porter, D.3
-
27
-
-
61549113484
-
Simple models of discrete choice and their performance in bandit experiments
-
Gans N, Knox G, Croson R (2007) Simple models of discrete choice and their performance in bandit experiments. Manuf Serv Oper Manag 9: 383-408.
-
(2007)
Manuf Serv Oper Manag
, vol.9
, pp. 383-408
-
-
Gans, N.1
Knox, G.2
Croson, R.3
-
28
-
-
0010186317
-
Reward probability, amount, and information as determiners of sequential two-alternative decisions
-
Edwards W (1956) Reward probability, amount, and information as determiners of sequential two-alternative decisions. J Exp Psychol 52: 177-88.
-
(1956)
J Exp Psychol
, vol.52
, pp. 177-188
-
-
Edwards, W.1
-
29
-
-
0001515225
-
Probability learning in 1000 trials
-
Edwards W (1961) Probability learning in 1000 trials. J Exp Psychol 62: 385-394.
-
(1961)
J Exp Psychol
, vol.62
, pp. 385-394
-
-
Edwards, W.1
-
30
-
-
0342748193
-
Supplementary report: The utility of correctly predicting infrequent events
-
Brackbill Y, Bravos A (1962) Supplementary report: The utility of correctly predicting infrequent events. J Exp Psychol 64: 648-649.
-
(1962)
J Exp Psychol
, vol.64
, pp. 648-649
-
-
Brackbill, Y.1
Bravos, A.2
-
32
-
-
77952541839
-
Learning latent structure: Carving nature at its joints
-
Gershman SJ, Niv Y (2010) Learning latent structure: carving nature at its joints. Curr Opin Neurobiol 20: 251-256.
-
(2010)
Curr Opin Neurobiol
, vol.20
, pp. 251-256
-
-
Gershman, S.J.1
Niv, Y.2
-
33
-
-
84898993037
-
Model uncertainty in classical conditioning
-
Cambridge, MA: MIT Press
-
Courville AC, Daw ND, Gordon GJ, Touretzky DS (2004) Model uncertainty in classical conditioning. In: Advances in Neural Information Processing Systems 16. Cambridge, MA: MIT Press. pp 977-986.
-
(2004)
Advances in Neural Information Processing Systems 16
, pp. 977-986
-
-
Courville, A.C.1
Daw, N.D.2
Gordon, G.J.3
Touretzky, D.S.4
-
35
-
-
77951576301
-
Bayesian modeling of human sequential decisionmaking on the multi-armed bandit problem
-
In: Sloutsky V, Love B, McRae K, eds., AustinTX: Cognitive Science Society
-
Acuna D, Schrater P (2008) Bayesian modeling of human sequential decisionmaking on the multi-armed bandit problem. In: Sloutsky V, Love B, McRae K, eds. 30th Annual Conference of the Cognitive Science Society. AustinTX: Cognitive Science Society. pp 2065-2070.
-
(2008)
30th Annual Conference of the Cognitive Science Society
, pp. 2065-2070
-
-
Acuna, D.1
Schrater, P.2
-
36
-
-
33745910265
-
A hierarchical Bayesian model of human decision-making on an optimal stopping problem
-
Lee MD (2006) A hierarchical Bayesian model of human decision-making on an optimal stopping problem. Cogn Sci 30: 1-26.
-
(2006)
Cogn Sci
, vol.30
, pp. 1-26
-
-
Lee, M.D.1
-
38
-
-
84864032307
-
Prediction and change detection
-
Steyvers M, Brown S (2006) Prediction and change detection. In: NIPS 2006. pp 1281-1288.
-
(2006)
NIPS 2006
, pp. 1281-1288
-
-
Steyvers, M.1
Brown, S.2
-
40
-
-
0038829878
-
Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria
-
Erev I, Roth AE (1998) Predicting how people play games: Reinforcement learning in experimental games with unique, mixed strategy equilibria. Am Econ Rev 88: 848-881.
-
(1998)
Am Econ Rev
, vol.88
, pp. 848-881
-
-
Erev, I.1
Roth, A.E.2
-
41
-
-
33646230819
-
Dopamine, prediction error and associative learning: A model-based account
-
Smith A, Li M, Becker S, Kapur S (2006) Dopamine, prediction error and associative learning: A model-based account. Network 17: 61-84.
-
(2006)
Network
, vol.17
, pp. 61-84
-
-
Smith, A.1
Li, M.2
Becker, S.3
Kapur, S.4
-
43
-
-
67349268975
-
A bayesian analysis of human decision-making on bandit problems
-
Steyvers M, Lee MD, Wagenmakers E (2009) A bayesian analysis of human decision-making on bandit problems. J Math Psychol 53: 168-179.
-
(2009)
J Math Psychol
, vol.53
, pp. 168-179
-
-
Steyvers, M.1
Lee, M.D.2
Wagenmakers, E.3
|