SCOPUS 정보 검색 플랫폼

Columbia Law Review

Volumn 109, Issue 8, 2009, Pages 2081-2105

A practical solution to the reference class problem

(1) Cheng, Edward K a,b

a BROOKLYN LAW SCHOOL (United States)

b Och Spine at New York Presbyterian Hospitals (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EID: 74349105256 PISSN: 00101958 EISSN: None Source Type: Journal
DOI: None Document Type: Review

Times cited : (34)

References (114)

1
- 74349125938
- 895 F. Supp. 460 (E.D.N.Y. 1995).
- 895 F. Supp. 460 (E.D.N.Y. 1995).

2
- 74349098048
- See id. at 464
- See id. at 464.

3
- 74349106888
- Id. at 467
- Id. at 467.

4
- 74349123923
- Id. at 466 (Based on evidence at the trial and at sentencing, the judge found that the defendant had made a total of eight smuggling trips to Nigeria between September 1, 1990 and December 10, 1991.).
- Id. at 466 ("Based on evidence at the trial and at sentencing, the judge found that the defendant had made a total of eight smuggling trips to Nigeria between September 1, 1990 and December 10, 1991.").

5
- 74349094154
- The specific type of judicial factfinding practiced in Shonubi is now surely unconstitutional in the wake of United States v. Booker, 543 U.S. 220 (2005), which requires that such sentencing enhancements be found by the jury. Nevertheless, the fundamental statistical problem remains, irrespective of the context or decisionmaker.
- The specific type of judicial factfinding practiced in Shonubi is now surely unconstitutional in the wake of United States v. Booker, 543 U.S. 220 (2005), which requires that such sentencing enhancements be found by the jury. Nevertheless, the fundamental statistical problem remains, irrespective of the context or decisionmaker.

6
- 74349096325
- Peter Tillers, If Wishes Were Horses: Discursive Comments on Attempts to Prevent Individuals from Being Unfairly Burdened by Their Reference Classes, 4 Law, Probability & Risk 33, 33 n.-J, 36 n.12 (2005).
- Peter Tillers, If Wishes Were Horses: Discursive Comments on Attempts to Prevent Individuals from Being Unfairly Burdened by Their Reference Classes, 4 Law, Probability & Risk 33, 33 n.-J, 36 n.12 (2005).

7
- 74349121848
- Shonubi, 895 F. Supp. at 499-504 (describing customs service data).
- Shonubi, 895 F. Supp. at 499-504 (describing customs service data).

8
- 74349128930
- Id. at 521-23 describing model
- Id. at 521-23 (describing model).

9
- 74349129938
- Id. at 530
- Id. at 530.

10
- 0035610914
- See Mark Colyvan, Helen M. Regan & Scott Ferson, Is It a Crime to Belong to a Reference Class?, 9 J. Pol. Phil. 168, 172-73 (2001) [hereinafter Colyvan et al., Is It a Crime] (discussing possibility of other classifications);
- See Mark Colyvan, Helen M. Regan & Scott Ferson, Is It a Crime to Belong to a Reference Class?, 9 J. Pol. Phil. 168, 172-73 (2001) [hereinafter Colyvan et al., Is It a Crime] (discussing possibility of other classifications);

11
- 74349129445
- Paul Roberts, From Theory into Practice: Introducing the Reference Class Problem, 11 Int'1 J. Evidence & Proof 243, 247 (2007) (noting Judge Weinstein's court-appointed expert panel of statistician David Schum and law professor Peter Tillers challenged statistics offered by prosecution because of reference class problem).
- Paul Roberts, From Theory into Practice: Introducing the Reference Class Problem, 11 Int'1 J. Evidence & Proof 243, 247 (2007) (noting Judge Weinstein's court-appointed expert panel of statistician David Schum and law professor Peter Tillers challenged statistics offered by prosecution because of reference class problem).

12
- 74349129805
- This basic multiplicative solution is actually what Judge Weinstein applied as an initial matter, United States v. Shonubi, 802 F. Supp. 859, 860-61, 864 (E.D.N.Y. 1992, until he was reversed by the Second Circuit, United States v. Shonubi, 998 F.2d 84 (2d Cir. 1993);
- This basic multiplicative solution is actually what Judge Weinstein applied as an initial matter, United States v. Shonubi, 802 F. Supp. 859, 860-61, 864 (E.D.N.Y. 1992), until he was reversed by the Second Circuit, United States v. Shonubi, 998 F.2d 84 (2d Cir. 1993);

13
- 74349110223
- see also Shonubi, 895 F. Supp. at 466-68 (detailing procedural history).
- see also Shonubi, 895 F. Supp. at 466-68 (detailing procedural history).

14
- 74349121849
- Shonubi, 895 F. Supp. at 465;
- Shonubi, 895 F. Supp. at 465;

15
- 74349113444
- Colyvan et al, Is It a Crime, supra note 8, at 172
- Colyvan et al., Is It a Crime, supra note 8, at 172.

16
- 74349126528
- Colyvan et al, Is It a Crime, supra note 8, at 172
- Colyvan et al., Is It a Crime, supra note 8, at 172.

17
- 34047273943
- See Ronald J. Allen & Michael S. Pardo, The Problematic Value of Mathematical Models of Evidence, 36 J. Legal Stud. 107, 109 (2007) (questioning usefulness of statistical models because of reference class problem);
- See Ronald J. Allen & Michael S. Pardo, The Problematic Value of Mathematical Models of Evidence, 36 J. Legal Stud. 107, 109 (2007) (questioning usefulness of statistical models because of reference class problem);

18
- 74349117689
- see also Dale A. Nance, The Reference Class Problem and Mathematical Models of Inference, 11 Int'l J. Evidence & Proof 259, 267 (2007) (noting it would be catastrophic for any system of trials ... [to require] that every judgment explicitly or implicitly adopting a reference class ... be explored and debated).
- see also Dale A. Nance, The Reference Class Problem and Mathematical Models of Inference, 11 Int'l J. Evidence & Proof 259, 267 (2007) (noting it would be "catastrophic for any system of trials ... [to require] that every judgment explicitly or implicitly adopting a reference class ... be explored and debated").

19
- 74349101102
- Allen & Pardo, supra note 12, at 109 (submitting that while application of the probability theory to juridical proof ... is interesting, instructive, and insightful[,] ... it also suffers from the deep conceptual problem . . . of reference classes);
- Allen & Pardo, supra note 12, at 109 (submitting that while "application of the probability theory to juridical proof ... is interesting, instructive, and insightful[,] ... it also suffers from the deep conceptual problem . . . of reference classes");

20
- 74349128929
- Mark Colyvan & Helen M. Regan, Legal Decisions and the Reference Class Problem, 11 Int'1 J. Evidence & Proof 274, 276 (2007) [hereinafter Colyvan & Regan, Legal Decisions] ([G]iven that the different reference classes provide different answers to the probability assignment in question, there is considerable uncertainty about the probability assignment itself.);
- Mark Colyvan & Helen M. Regan, Legal Decisions and the Reference Class Problem, 11 Int'1 J. Evidence & Proof 274, 276 (2007) [hereinafter Colyvan & Regan, Legal Decisions] ("[G]iven that the different reference classes provide different answers to the probability assignment in question, there is considerable uncertainty about the probability assignment itself.");

21
- 74349117792
- Tillers, supra note 4, at 48 (If statistical reasoning based on reference classes is ever to work, such reasoning can work only if resort is made to reference classes that consist at least in part of events that are not generated by tile choices and behaviour of the individual about whom inferences are under consideration.).
- Tillers, supra note 4, at 48 ("If statistical reasoning based on reference classes is ever to work, such reasoning can work only if resort is made to reference classes that consist at least in part of events that are not generated by tile choices and behaviour of the individual about whom inferences are under consideration.").

22
- 74349124206
- Allen & Pardo, supra note 12, at 135
- Allen & Pardo, supra note 12, at 135.

23
- 74349110994
- Symposium, Special Issue on the Reference Class Problem, 11 Int'l J. Evidence & Proof 243 (2007);
- Symposium, Special Issue on the Reference Class Problem, 11 Int'l J. Evidence & Proof 243 (2007);

24
- 74349090113
- see also Roberts, supra note 8 introducing symposium
- see also Roberts, supra note 8 (introducing symposium).

25
- 74349115678
- See, e.g., Nance, supra note 12, at 272 (concluding more work needs to be done on the theory of reference class selection, on how people do and ought to select reference classes for the purposes of assessing probabilities and drawing inferences).
- See, e.g., Nance, supra note 12, at 272 (concluding "more work needs to be done on the theory of reference class selection, on how people do and ought to select reference classes for the purposes of assessing probabilities and drawing inferences").

26
- 74349087509
- Colyvan et al., Is it a Crime, supra note 8, at 172 (We are not claiming that there is no solution to the reference-class problem, just that there is no straightforward solution . . . .);
- Colyvan et al., Is it a Crime, supra note 8, at 172 ("We are not claiming that there is no solution to the reference-class problem, just that there is no straightforward solution . . . .");

27
- 74349103865
- Tillers, supra note 4, at 38 n.21 (discussing what a 'solution' to 'the' reference class problem would do).
- Tillers, supra note 4, at 38 n.21 (discussing what "a 'solution' to 'the' reference class problem would do").

28
- 74349109713
- Dale Nance seems to be more ambivalent on the possibility of a solution. Compare Nance, supra note 12, at 272 ([M]ore work needs to be done ... on how people do and ought to select reference classes.), with id. (remarking that the proposition that different advocates can always argue for different classes that generate different frequencies ... is probably true).
- Dale Nance seems to be more ambivalent on the possibility of a solution. Compare Nance, supra note 12, at 272 ("[M]ore work needs to be done ... on how people do and ought to select reference classes."), with id. (remarking that the proposition that "different advocates can always argue for different classes that generate different frequencies ... is probably true").

29
- 74349091097
- Allen & Pardo, supra note 12, at 112 ([N]othing in the natural world privileges or picks out one of the classes as the right one; rather, our interests in the various inferences they generate pick out certain classes as more or less relevant.).
- Allen & Pardo, supra note 12, at 112 ("[N]othing in the natural world privileges or picks out one of the classes as the right one; rather, our interests in the various inferences they generate pick out certain classes as more or less relevant.").

30
- 74349125128
- Id. at 113 (There is no a priori correct answer [to the question of which reference class to use]; it depends on the interests at stake.);
- Id. at 113 ("There is no a priori correct answer [to the question of which reference class to use]; it depends on the interests at stake.");

31
- 74349114204
- Colyvan & Regan, Legal Decisions, supra note 13, at 275 ([T] here is no principled way to establish the relevance of a reference class.).
- Colyvan & Regan, Legal Decisions, supra note 13, at 275 ("[T] here is no principled way to establish the relevance of a reference class.").

32
- 34249712324
- The Reference Class Problem Is Your Problem Too, 156
- discussing the problem writ large, See generally
- See generally Alan Hájek, The Reference Class Problem Is Your Problem Too, 156 Synthese 563, 564 (2007) (discussing the problem writ large).
- (2007) Synthese , vol.563 , pp. 564
- Hájek, A.¹

33
- 84868069644
- According to Alan Hájek, the problem was probably first noted by John Venn, who is most famous for Venn diagrams. The term reference class problem, however, is attributed to Hans Reichenbach in 1949. Id. at 564.
- According to Alan Hájek, the problem was probably first noted by John Venn, who is most famous for Venn diagrams. The term "reference class problem," however, is attributed to Hans Reichenbach in 1949. Id. at 564.

34
- 38349040198
- Divorce Rate: It's Not as High as You Think
- Apr. 19, at
- Dan Hurley, Divorce Rate: It's Not as High as You Think, N.Y. Times, Apr. 19, 2005, at F7.
- (2005) N.Y. Times
- Hurley, D.¹

35
- 74349097797
- Id
- Id.

36
- 74349128040
- David Popenoe, The Future of Marriage in America, in Barbara Dafoe Whitehead & David Popenoe, The State of Our Unions: The Social Health of Marriage in America 18 (2007), available at http://marriage.rutgers.edu/ Publications/SOOU/SOOU2007.pdf (on file with the Columbia Law Review); Hurley, supra note 22, at F7.
- David Popenoe, The Future of Marriage in America, in Barbara Dafoe Whitehead & David Popenoe, The State of Our Unions: The Social Health of Marriage in America 18 (2007), available at http://marriage.rutgers.edu/ Publications/SOOU/SOOU2007.pdf (on file with the Columbia Law Review); Hurley, supra note 22, at F7.

37
- 74349110739
- Comments on James Franklin's 'The Representation of Context: Ideas from Artificial Intelligence' (Or, More Remarks on the Contextuality of Probability)
- 201, noting that you cannot reduce
- Branden Fitelson, Comments on James Franklin's 'The Representation of Context: Ideas from Artificial Intelligence' (Or, More Remarks on the Contextuality of Probability), 2 Law, Probability & Risk 201, 203 (2003) (noting that you cannot reduce
- (2003) Law, Probability & Risk , vol.2 , pp. 203
- Fitelson, B.¹

38
- 74349096560
- reference class to single person because then probability is either 0 or 1, which would correspond to the truth, which is precisely what is unknown.
- reference class to single person because then probability is either 0 or 1, which would correspond to the truth, which is precisely what is unknown).

39
- 74349084149
- Roberts, supra note 8, at 245 (arguing that because [e]very factual generalisation implies a reference class, . . . this in turn entails that the reference class problem is an inescapable concomitant of inferential reasoning and fact-finding in legal proceedings).
- Roberts, supra note 8, at 245 (arguing that because "[e]very factual generalisation implies a reference class, . . . this in turn entails that the reference class problem is an inescapable concomitant of inferential reasoning and fact-finding in legal proceedings").

40
- 74349119929
- See, e.g., Allen & Pardo, supra note 12, at 113 (discussing reference class problem in determining error rates for eyewitness identifications).
- See, e.g., Allen & Pardo, supra note 12, at 113 (discussing reference class problem in determining error rates for eyewitness identifications).

41
- 74349126516
- Although technically outside the evidentiary context, but no less important, Rob Rhee argues that assessment of case values for settlement purposes falls prey to the reference class problem, because these case valuations can be framed from the reference point of the judge, the court and forum, the attorneys, the parties (if repeat players, the type of action, the type of injury, the legal framework, and-not the least of which-the evidentiary assessment Robert J. Rhee, Probability, Policy and die Problem of the Reference Class, 11 Int'l J. Evidence Sc Proof 286, 289 2007
- Although technically outside the evidentiary context, but no less important, Rob Rhee argues that assessment of case values for settlement purposes falls prey to the reference class problem, because these case valuations "can be framed from the reference point of the judge, the court and forum, the attorneys, the parties (if repeat players), the type of action, the type of injury, the legal framework, and-not the least of which-the evidentiary assessment" Robert J. Rhee, Probability, Policy and die Problem of the Reference Class, 11 Int'l J. Evidence Sc Proof 286, 289 (2007).

42
- 74349095563
- Transp. Comm'n, 981 So
- E.g, 2d 942, 947-48 Miss., using comparisons to value property taken by eminent domain
- E.g., Adcock v. Miss. Transp. Comm'n, 981 So. 2d 942, 947-48 (Miss. 2008) (using comparisons to value property taken by eminent domain).
- (2008)
- Miss, A.V.¹

43
- 74349105892
- E.g., Engquist v. Wash. County Assessor, No. TC-MD 030303F, 2003 WL 23883581, at *1 (Or. T.C. Magis. Div. July 29, 2003) (illustrating problem in tax assessment context).
- E.g., Engquist v. Wash. County Assessor, No. TC-MD 030303F, 2003 WL 23883581, at *1 (Or. T.C. Magis. Div. July 29, 2003) (illustrating problem in tax assessment context).

44
- 74349090601
- E.g., United States v. 819.98 Acres of Land, 78 F.3d 1468, 1472 (10th Cir. 1996) (A dissimilarity between sales of property proffered as comparable sales and the property involved in the condemnation action goes to the weight, rather than to the admissibility of the evidence of comparable sales.).
- E.g., United States v. 819.98 Acres of Land, 78 F.3d 1468, 1472 (10th Cir. 1996) ("A dissimilarity between sales of property proffered as comparable sales and the property involved in the condemnation action goes to the weight, rather than to the admissibility of the evidence of comparable sales.").

45
- 74349130449
- 387 F. Supp. 2d 794, 812 (N.D. 111. 2005) (invoking time-honored admonition in case involving comparison groups for calculating lost profits).
- 387 F. Supp. 2d 794, 812 (N.D. 111. 2005) (invoking time-honored admonition in case involving comparison groups for calculating lost profits).

46
- 74349115676
- E.g., In re Silicone Gel Breast Implants Prods. Liab. Litig., 318 F. Supp. 2d 879, 893-94 (CD. Cal. 2004) (discussing how doubling of background risk provides evidence that substance caused specific plaintiffs disease).
- E.g., In re Silicone Gel Breast Implants Prods. Liab. Litig., 318 F. Supp. 2d 879, 893-94 (CD. Cal. 2004) (discussing how doubling of background risk provides evidence that substance caused specific plaintiffs disease).

47
- 74349130847
- See, e.g., Colyvan Sc Regan, Legal Decisions, supra note 13, at 275 (using probability of contacting lung cancer as example of reference class). In an entertaining article, evolutionary theorist Stephen Jay Gould wrote about his diagnosis of abdominal mesothelioma in 1982.
- See, e.g., Colyvan Sc Regan, Legal Decisions, supra note 13, at 275 (using probability of contacting lung cancer as example of reference class). In an entertaining article, evolutionary theorist Stephen Jay Gould wrote about his diagnosis of abdominal mesothelioma in 1982.

48
- 74349092609
- See Stephen Jay Gould, The Median Isn't the Message, Discover, June 1985, at 40-42. His initial shock at the short median lifespan, eight months, quickly faded away as he sharpened his reference class, learning that he possessed every one of the characteristics conferring a probability of longer life: youth, early diagnosis, good medical care, and a healthy outlook on life.
- See Stephen Jay Gould, The Median Isn't the Message, Discover, June 1985, at 40-42. His initial shock at the short median lifespan, eight months, quickly faded away as he sharpened his reference class, learning that he "possessed every one of the characteristics conferring a probability of longer life": youth, early diagnosis, good medical care, and a healthy outlook on life.

49
- 74349092123
- Id. Gould lived another twenty years before his death in 2002.
- Id. Gould lived another twenty years before his death in 2002.

50
- 74349113948
- Carol Kaesuk Yoon, Stephen Jay Gould, 60, Is Dead; Enlivened Evolutionary Theory, N.Y. Times, May 21, 2002, at Al.
- Carol Kaesuk Yoon, Stephen Jay Gould, 60, Is Dead; Enlivened Evolutionary Theory, N.Y. Times, May 21, 2002, at Al.

51
- 74349119530
- Many thanks to Richard Nagareda for suggesting this example
- Many thanks to Richard Nagareda for suggesting this example.

52
- 74349112540
- Fed. R. Civ. P. 23 (permitting court to certify class only if. . . there are questions of law or fact common to the class and common questions predominate over any questions affecting only individual members);
- Fed. R. Civ. P. 23 (permitting court to certify class "only if. . . there are questions of law or fact common to the class" and common questions "predominate over any questions affecting only individual members");

53
- 66349086456
- see Richard A. Nagareda, Class Certification in the Age of Aggregate Proof, 84 N.Y.U. L. Rev. 97, 102-03 (2009) (discussing difficulties in obtaining class certification).
- see Richard A. Nagareda, Class Certification in the Age of Aggregate Proof, 84 N.Y.U. L. Rev. 97, 102-03 (2009) (discussing difficulties in obtaining class certification).

54
- 74349118288
- Class certification questions are arguably more complex dian other reference class questions. For one thing, they are not strictly factual, but are instead highly interwoven with issues of administrative efficiency and substantive policy. As such, reference class issues in this context are largely beyond the scope of the Essay
- Class certification questions are arguably more complex dian other reference class questions. For one thing, they are not strictly factual, but are instead highly interwoven with issues of administrative efficiency and substantive policy. As such, reference class issues in this context are largely beyond the scope of the Essay.

55
- 74349128676
- 509 F.3d 1168, 1195 (9th Cir. 2007).
- 509 F.3d 1168, 1195 (9th Cir. 2007).

56
- 74349090856
- Id. at 1180 & n.5
- Id. at 1180 & n.5.

57
- 74349103133
- Id. at 1181 (noting Wal-Mart's expert apparently looked at the sub-store level by comparing departments to analyze the pay differential).
- Id. at 1181 (noting Wal-Mart's expert apparently looked at the "sub-store level by comparing departments to analyze the pay differential").

58
- 74349098047
- 808 So. 2d 145 (Fla. 2002).
- 808 So. 2d 145 (Fla. 2002).

59
- 74349112934
- Id. at 159
- Id. at 159.

60
- 74349113688
- The Florida Supreme Court largely evaded the problem by arguing that any error associated with use of the African American database was negligible. See id. David Kaye has suggested that the proper reference class was men living in Orlando, while acknowledging that if there had been a Bahamian community in Orlando, the class may have required further narrowing. D.H. Kaye, Logical Relevance: Problems with the Reference Population and DNA Mixtures in People v. Pizarro, 3 Law, Probability & Risk 211, 212 & n.15 2004
- The Florida Supreme Court largely evaded the problem by arguing that any error associated with use of the African American database was negligible. See id. David Kaye has suggested that the proper reference class was men living in Orlando, while acknowledging that if there had been a Bahamian community in Orlando, the class may have required further narrowing. D.H. Kaye, Logical Relevance: Problems with the Reference Population and DNA Mixtures in People v. Pizarro, 3 Law, Probability & Risk 211, 212 & n.15 (2004).

61
- 74349105421
- 61 F.3d 45 (1st Cir. 1995).
- 61 F.3d 45 (1st Cir. 1995).

62
- 74349109975
- Id. at 49-50
- Id. at 49-50.

63
- 74349115428
- Id. at 50 & n.6
- Id. at 50 & n.6.

64
- 74349099260
- Id. at 50 & nn.6-7
- Id. at 50 & nn.6-7.

65
- 74349087243
- Id. at 64-65 (Torruella, C.J., dissenting).
- Id. at 64-65 (Torruella, C.J., dissenting).

66
- 74349101863
- Id. at 66-67
- Id. at 66-67.

67
- 74349112013
- Is It a Crime, supra note 8
- at, disclaiming that reference class problem is unsolvable, but showing that obvious solutions fail
- Colyvan et al., Is It a Crime, supra note 8, at 172-74 (disclaiming that reference class problem is unsolvable, but showing that obvious solutions fail).
- Colyvan¹

68
- 74349131083
- Nance, supra note 12, at 263 n.9 ([P]eople drawing inferences routinely order reference classes as better or worse relative to their inferential task. Presumably, they do so with some success.).
- Nance, supra note 12, at 263 n.9 ("[P]eople drawing inferences routinely order reference classes as better or worse relative to their inferential task. Presumably, they do so with some success.").

69
- 74349107415
- See Colyvan & Regan, Legal Decisions, supra note 13, at 275 (showing intractability of reference class problem where multiple classes appear plausible);
- See Colyvan & Regan, Legal Decisions, supra note 13, at 275 (showing intractability of reference class problem where multiple classes appear plausible);

70
- 84868053778
- Hájek, supra note 20, at 565 same
- Hájek, supra note 20, at 565 (same).

71
- 74349094671
- Allen & Pardo, supra note 12, at 115
- Allen & Pardo, supra note 12, at 115.

72
- 0001935527
- See generally Walter Zucchini, An Introduction to Model Selection, 44 J. Mathematical Psychol. 41 (2000) (offering short and less technical introduction to concepts in model selection).
- See generally Walter Zucchini, An Introduction to Model Selection, 44 J. Mathematical Psychol. 41 (2000) (offering short and less technical introduction to concepts in model selection).

73
- 0000177088
- Determining exactly which line best fits the data points presents another issue of inference, but this problem is reasonably well understood. A common method is leastsquares estimation, in which the sum of the squared distances between the line and each of the points is minimized. See Malcolm R. Forster, Key Concepts in Model Selection: Performance and Generalizability, 44 J. Mathematical Psychol. 205, 210 (2000) [hereinafter Forster, Key Concepts] (discussing link between least-squares method and maximum-likelihood method often favored by statisticians).
- Determining exactly which line best "fits" the data points presents another issue of inference, but this problem is reasonably well understood. A common method is leastsquares estimation, in which the sum of the squared distances between the line and each of the points is minimized. See Malcolm R. Forster, Key Concepts in Model Selection: Performance and Generalizability, 44 J. Mathematical Psychol. 205, 210 (2000) [hereinafter Forster, Key Concepts] (discussing link between least-squares method and maximum-likelihood method often favored by statisticians).

74
- 74349115427
- Mathematically speaking, for n data points, an nth order polynomial is all that is required for the curve to pass through all the points. However, one can certainly fit higher order polynomials, which will simply allow for more-for lack of a better term-squiggles between the data points
- Mathematically speaking, for n data points, an nth order polynomial is all that is required for the curve to pass through all the points. However, one can certainly fit higher order polynomials, which will simply allow for more-for lack of a better term-squiggles between the data points.

75
- 74349108403
- See, e.g., Lewis S. Feuer, The Principle of Simplicity, 24 Phil. Sci. 109, 109 (1957) (stating Occam's Razor: Entities are not to be multiplied unnecessarily (emphasis omitted)). Technically, Occam's Razor excludes needlessly complex models, which means that it may only exclude models that have variables beyond those necessary to pass the fit line through all of the data points. See supra note 55 (discussing fitting a line to pass through all points). Occam's Razor does not necessarily select simpler models on the assumption that some of the variation is due to random error. Nevertheless, the spirit of Occam's Razor is toward simpler models, and researchers often invoke it in this broader vein.
- See, e.g., Lewis S. Feuer, The Principle of Simplicity, 24 Phil. Sci. 109, 109 (1957) (stating Occam's Razor: "Entities are not to be multiplied unnecessarily" (emphasis omitted)). Technically, Occam's Razor excludes needlessly complex models, which means that it may only exclude models that have variables beyond those necessary to pass the fit line through all of the data points. See supra note 55 (discussing fitting a line to pass through all points). Occam's Razor does not necessarily select simpler models on the assumption that some of the variation is due to random error. Nevertheless, the spirit of Occam's Razor is toward simpler models, and researchers often invoke it in this broader vein.

76
- 84868053779
- The generating function for the datapoints in Figures 1 and 2 is GPA = 1 + 0.1*HOURS + ε, where ε - N(0, 0.5).
- The generating function for the datapoints in Figures 1 and 2 is GPA = 1 + 0.1*HOURS + ε, where ε - N(0, 0.5).

77
- 74349103864
- In constructing a predictive model, the given dataset is only a sample of the population. Thus, constructing a model that tracks the current data too closely, we are likely to make inferences that are too strong, hampering the model's ability to accommodate future data
- In constructing a predictive model, the given dataset is only a sample of the population. Thus, constructing a model that tracks the current data too closely, we are likely to make inferences that are too strong, hampering the model's ability to accommodate future data.

78
- 74349114939
- This result can also be considered from a slightly different perspective. Given the current data, a model can attribute the variation in the response variable to either a (deterministic) predictor variable or an (stochastic) error term. Since the given dataset is only a sample of the population, constructing a model that is too deterministic-i.e, one that tracks the current data too closely-will cause it to lack the flexibility needed to handle future observations. At the same time, constructing a model that is too stochastic-i.e, one that just blames chance for everything-will fail to use all of the available information. The key question is whether the structural information in the data justifies use of a predictor variable over the error term. See Kenneth P. Burnham & David R. Anderson, Model Selection and Multimodel Inference 31-33 2d ed. 2002, discussing balance between overfitting and underfitting
- This result can also be considered from a slightly different perspective. Given the current data, a model can "attribute" the variation in the response variable to either a (deterministic) predictor variable or an (stochastic) error term. Since the given dataset is only a sample of the population, constructing a model that is too deterministic-i.e., one that tracks the current data too closely-will cause it to lack the flexibility needed to handle future observations. At the same time, constructing a model that is too stochastic-i.e., one that just blames chance for everything-will fail to use all of the available information. The key question is whether the structural information in the data justifies use of a predictor variable over the error term. See Kenneth P. Burnham & David R. Anderson, Model Selection and Multimodel Inference 31-33 (2d ed. 2002) (discussing balance between overfitting and underfitting).

79
- 74349094419
- The formula for AIC is AIC = -21, + 2p, where I represents the maximum loglikelihood for the model, and p is the number of predictors or parameters in the model. E.g., W.N. Venables & B.D. Ripley, Modern Applied Statistics with S 173-74 (4th ed. 2002). The maximum log-likelihood (1,) term measures how well the model fits the observed data, while the number of predictors (p) measures its complexity.
- The formula for AIC is AIC = -21, + 2p, where I represents the maximum loglikelihood for the model, and p is the number of predictors or parameters in the model. E.g., W.N. Venables & B.D. Ripley, Modern Applied Statistics with S 173-74 (4th ed. 2002). The maximum log-likelihood (1,) term measures how well the model fits the observed data, while the number of predictors (p) measures its complexity.

80
- 0016355478
- For technical discussions of AIC, see Burnham & Anderson, supra note 59, at 353-71, as well as the original paper, Hirotugu Akaike, A New Look at Statistical Model Identification, 19 IEEE Transactions on Automatic Conuol 716 (1974).
- For technical discussions of AIC, see Burnham & Anderson, supra note 59, at 353-71, as well as the original paper, Hirotugu Akaike, A New Look at Statistical Model Identification, 19 IEEE Transactions on Automatic Conuol 716 (1974).

81
- 52949086982
- How to Tell When Simpler, More Unified or Less Ad Hoc Theories Will Provide More Accurate Predictions, 45 Brit
- For a terrific discussion of its philosophical implications, see generally
- For a terrific discussion of its philosophical implications, see generally Malcolm R. Forster & Elliott Sober, How to Tell When Simpler, More Unified or Less Ad Hoc Theories Will Provide More Accurate Predictions, 45 Brit. J. Phil. Sci. 1 (1994).
- (1994) J. Phil. Sci , vol.1
- Forster, M.R.¹ Sober, E.²

82
- 74349127280
- As Elliott Sober notes, AIC makes three major assumptions. First, it takes Kullback-Leibler distances (or relative entropy) as the measure between two probability distributions. Second, it makes the Humean 'uniformity of nature' assumption that the data are drawn from a relatively stable world and that the mechanism that links the predictor and outcome variables stays constant. Third, AIC makes a normality assumption, which is that repeated estimates of each parameter are normally distributed. Elliott Sober, Instrumentalism, Parsimony, and the Akaike Framework, 69 Phil. Sci. S112, S116 (2002) [hereinafter Sober, Instrumentalism].
- As Elliott Sober notes, AIC makes three major assumptions. First, it takes Kullback-Leibler distances (or relative entropy) as the measure between two probability distributions. Second, it makes the "Humean 'uniformity of nature' assumption" that the data are drawn from a relatively stable world and that the mechanism that links the predictor and outcome variables stays constant. Third, AIC makes a normality assumption, which is that "repeated estimates of each parameter are normally distributed." Elliott Sober, Instrumentalism, Parsimony, and the Akaike Framework, 69 Phil. Sci. S112, S116 (2002) [hereinafter Sober, Instrumentalism].

83
- 74349094670
- BIC rests on entirely different theoretical foundations from AIC, but arrives at a strikingly similar tradeoff. BIC is defined as: BIC = -21, + p * log n, where 1, is the maximum log-likelihood of the model, p is the number of parameters, and n is the number of observations in the datasets.
- BIC rests on entirely different theoretical foundations from AIC, but arrives at a strikingly similar tradeoff. BIC is defined as: BIC = -21, + p * log n, where 1, is the maximum log-likelihood of the model, p is the number of parameters, and n is the number of observations in the datasets.

84
- 74349098998
- Venables & Ripley, supra note 60, at 276;
- Venables & Ripley, supra note 60, at 276;

85
- 74349089614
- see also Forster, Key Concepts, supra note 54, at 220-24 (discussing when different model selection methods are better than others).
- see also Forster, Key Concepts, supra note 54, at 220-24 (discussing when different model selection methods are better than others).

86
- 0036435040
- See David J. Spiegelhalter et al., Bayesian Measures of Model Complexity and Fit, 64 J. Royal Stat. Soc'y: Series B (Stat. Methodology) 583, 602-05 (2002) (proposing the Deviance Information Criterion and a method for assessing models).
- See David J. Spiegelhalter et al., Bayesian Measures of Model Complexity and Fit, 64 J. Royal Stat. Soc'y: Series B (Stat. Methodology) 583, 602-05 (2002) (proposing the Deviance Information Criterion and a method for assessing models).

87
- 84868053780
- In theory, one could dispense with the various model selection criteria and estímate the accuracy of each proposed model directly through a technique called crossvalidation. Cross-validation roughly proceeds along the following lines: Randomly divide the available data into two (not necessarily equal) parts. Use the first part, known as the training set, to fit the model. Then use the fitted model to make predictions on the second part (the testing set) and to determine the resulting error. Perform this procedure repeatedly to obtain an average cross-validation error for the model. By comparing the cross-validation errors of one proposed model to another, one can estimate which one is likely to be more accurate. Cross-validation for model selection is a well-established area of study. For more information, see the citations given in Burnham & Anderson, supra note 59, at 36
- In theory, one could dispense with the various model selection criteria and estímate the accuracy of each proposed model directly through a technique called crossvalidation. Cross-validation roughly proceeds along the following lines: Randomly divide the available data into two (not necessarily equal) parts. Use the first part, known as the "training set," to fit the model. Then use the fitted model to make predictions on the second part (the "testing set") and to determine the resulting error. Perform this procedure repeatedly to obtain an average "cross-validation" error for the model. By comparing the cross-validation errors of one proposed model to another, one can estimate which one is likely to be more accurate. Cross-validation for model selection is a well-established area of study. For more information, see the citations given in Burnham & Anderson, supra note 59, at 36.

88
- 74349085762
- The problem with cross-validation is that depending on the size of the dataset and the complexity of the models, it can be quite computationally involved. Id. Thus, heuristics like AIC are often preferred. Fortunately, for large datasets, researchers have shown that the results of AIC closely approximate those of cross-validation. Id. at 62
- The problem with cross-validation is that depending on the size of the dataset and the complexity of the models, it can be quite computationally involved. Id. Thus, heuristics like AIC are often preferred. Fortunately, for large datasets, researchers have shown that the results of AIC closely approximate those of cross-validation. Id. at 62.

89
- 74349131337
- See David Draper, Assessment and Propagation of Model Uncertainty, 57 J. Royal Stat. Soc'y: Series B (Stat. Methodology) 45, 51 (1995) (discussing how the range of possible models grows at a rate much faster than that at which information about the relative plausibility of alternative structural choices accumulates).
- See David Draper, Assessment and Propagation of Model Uncertainty, 57 J. Royal Stat. Soc'y: Series B (Stat. Methodology) 45, 51 (1995) (discussing how the range of possible models grows at "a rate much faster than that at which information about the relative plausibility of alternative structural choices accumulates").

90
- 74349123269
- This analysis does not even account for transformations, which again result in an infinite set of possible models. For example, while the dataset may provide height as a potential predictor, we could also potentially use height squared, the square root of height, etc
- This analysis does not even account for transformations, which again result in an infinite set of possible models. For example, while the dataset may provide height as a potential predictor, we could also potentially use height squared, the square root of height, etc.

91
- 74349118289
- E.g., Kilcoyne v. Plain Dealer Pub. Co., 678 N.E.2d 581, 586 (Ohio Ct. App. 1996) (A judicial proceeding resolves a dispute among the parties, but does not establish absolutely the 'truth' for all time and all purposes.);
- E.g., Kilcoyne v. Plain Dealer Pub. Co., 678 N.E.2d 581, 586 (Ohio Ct. App. 1996) ("A judicial proceeding resolves a dispute among the parties, but does not establish absolutely the 'truth' for all time and all purposes.");

92
- 74349127281
- Morrison v. State, 845 S.W.2d 882, 902 n.2 (Tex. Crim. App. 1992) (Benavides, J., dissenting) (It is widely accepted that the primary goal of adversary process is a fair resolution of disputes between litigants, not the discovery of objective historical fact.);
- Morrison v. State, 845 S.W.2d 882, 902 n.2 (Tex. Crim. App. 1992) (Benavides, J., dissenting) ("It is widely accepted that the primary goal of adversary process is a fair resolution of disputes between litigants, not the discovery of objective historical fact.");

93
- 74349083133
- Judicial Panel Discussion on Science and the Law, 25 Conn. L. Rev. 1127, 1132 (1993) (statement of Connecticut Superior Court Judge Martin L. Nigro) (Don't misconceive the purpose of a trial... . The trial is really a dispute settlement. It's got to come to an end.).
- Judicial Panel Discussion on Science and the Law, 25 Conn. L. Rev. 1127, 1132 (1993) (statement of Connecticut Superior Court Judge Martin L. Nigro) ("Don't misconceive the purpose of a trial... . The trial is really a dispute settlement. It's got to come to an end.").

94
- 74349086495
- See supra note 14 and accompanying text (questioning the usefulness of mathematical models due to the reference class problem).
- See supra note 14 and accompanying text (questioning the usefulness of mathematical models due to the reference class problem).

95
- 74349110748
- Allen & Pardo, supra note 12, at 119 ([T]here may be no data for other plausible reference classes, which means that the mathematics can be done only by picking these or some variant.);
- Allen & Pardo, supra note 12, at 119 ("[T]here may be no data for other plausible reference classes, which means that the mathematics can be done only by picking these or some variant.");

96
- 74349126766
- Sober, Instrumentalism, supra note 62, at Sl 18 (noting that the answer of optimality in AIC depends on the data available).
- Sober, Instrumentalism, supra note 62, at Sl 18 (noting that the answer of optimality in AIC depends on the data available).

97
- 74349085493
- Cf., e.g., Floorgraphics, Inc. v. News Am. Mktg. In-Store Servs., Inc., 546 F. Supp. 2d 155, 172 (D.N.J. 2008) (When challenging the admissibility of. . . expert testimony, a party must move beyond empty criticisms and demonstrate that a proposed alternative approach would yield different results.).
- Cf., e.g., Floorgraphics, Inc. v. News Am. Mktg. In-Store Servs., Inc., 546 F. Supp. 2d 155, 172 (D.N.J. 2008) ("When challenging the admissibility of. . . expert testimony, a party must move beyond empty criticisms and demonstrate that a proposed alternative approach would yield different results.").

98
- 74349109200
- See Nance, supra note 12, at 268 ([I]t is plausible that, in the absence of suggestions by the accused, jurors ought to accept the figures provided by the prosecution's witness.);
- See Nance, supra note 12, at 268 ("[I]t is plausible that, in the absence of suggestions by the accused, jurors ought to accept the figures provided by the prosecution's witness.");

99
- 74349119042
- see also Jonathan J. Koehler, Why DNA Likelihood Ratios Should Account for Error (Even When a National Research Council Report Says They Should Not), 37 Jurimeuics J. 425, 431-33 (1997) (arguing broader DNA population statistics should be used if local ones are unavailable).
- see also Jonathan J. Koehler, Why DNA Likelihood Ratios Should Account for Error (Even When a National Research Council Report Says They Should Not), 37 Jurimeuics J. 425, 431-33 (1997) (arguing broader DNA population statistics should be used if local ones are unavailable).

100
- 74349104444
- Alternatively, methods exist for combining multiple models. See Zucchini, supra note 53, at 60 (suggesting combination of multiple models);
- Alternatively, methods exist for combining multiple models. See Zucchini, supra note 53, at 60 (suggesting combination of multiple models);

101
- 74349114203
- see also Nance, supra note 12, at 263 (discussing use of multiple reference classes);
- see also Nance, supra note 12, at 263 (discussing use of multiple reference classes);

102
- 74349126527
- Robert Tibshirani, Regression Shrinkage and Selection via the Lasso, 58 J. Royal Stat. Soc'y: Series B (Stat. Methodology) 267, 267-68 (1996) (showing method for combining multiple models).
- Robert Tibshirani, Regression Shrinkage and Selection via the Lasso, 58 J. Royal Stat. Soc'y: Series B (Stat. Methodology) 267, 267-68 (1996) (showing method for combining multiple models).

103
- 74349099493
- Thanks to Jeff Lipshaw for prompting this discussion
- Thanks to Jeff Lipshaw for prompting this discussion.

104
- 74349106887
- Elliott Sober calls this the Humean 'uniformity of nature' assumption. See Sober, Instrumentalism, supra note 62.
- Elliott Sober calls this the "Humean 'uniformity of nature' assumption." See Sober, Instrumentalism, supra note 62.

105
- 84868053781
- E.g
- E.g., Fed. R. Civ. P. 23(b)(3) (allowing class action when it "is superior to other available methods for fairly and efficiendy adjudicating the controversy").
- 23(b)(3) (allowing class action when it "is superior to other available methods for fairly and efficiendy adjudicating the controversy")
- Fed, R.¹ Civ, P.²

106
- 74349107155
- See generally David Rosenberg, A New Sampling Method for Reducing the Cost of Resolving Differing Claims Against a Defendant 2-3 (2008) (unpublished manuscript, on file with the Columbia Law Review) (proposing use of sampling methods with no structuring or pre-screening of the group of claims to promote greater administrative efficiency).
- See generally David Rosenberg, A New Sampling Method for Reducing the Cost of Resolving Differing Claims Against a Defendant 2-3 (2008) (unpublished manuscript, on file with the Columbia Law Review) (proposing use of sampling methods with "no structuring or pre-screening of the group of claims" to promote greater administrative efficiency).

107
- 74349101366
- Allen & Pardo, supra note 12, at 111-13
- Allen & Pardo, supra note 12, at 111-13.

108
- 74349104090
- Id. at 111
- Id. at 111.

109
- 74349086999
- Id. at 112
- Id. at 112.

110
- 74349108153
- Id. at 113
- Id. at 113.

111
- 74349131082
- The dataset used for this hypothetical was from Newton, Massachusetts, and was generously made available on the MIT OpenCourseWare website. MIT OpenCourseWare, Hedonic.dta, at http://ocw.mit.edu/OcwWeb/Economics/14-33Fall- 2004/Labs/ (last visited Sept. 13, 2009) (on file with the Columbia Law Review).
- The dataset used for this hypothetical was from Newton, Massachusetts, and was generously made available on the MIT OpenCourseWare website. MIT OpenCourseWare, Hedonic.dta, at http://ocw.mit.edu/OcwWeb/Economics/14-33Fall- 2004/Labs/ (last visited Sept. 13, 2009) (on file with the Columbia Law Review).

112
- 84868053776
- Incidentally, using the mean for the entire dataset, which would have yielded an estimate of $341,139, generates an AIC score of 3096.37, which is less optimal than either model proposed by the parties.
- Incidentally, using the mean for the entire dataset, which would have yielded an estimate of $341,139, generates an AIC score of 3096.37, which is less optimal than either model proposed by the parties.

113
- 74349100609
- Whether the estimation method involves reference classes or regression analyses, model selection methods can be helpful for improving predictive accuracy
- Whether the estimation method involves reference classes or regression analyses, model selection methods can be helpful for improving predictive accuracy.

114
- 74349104942
- Tillers, supra note 4, at 38 n.21.
- Tillers, supra note 4, at 38 n.21.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.