Skip to content

Commit

Permalink
add Shafer's 1976 definition of evidence
Browse files Browse the repository at this point in the history
  • Loading branch information
Lakens committed Feb 18, 2024
1 parent 7c7878c commit e4fdf29
Show file tree
Hide file tree
Showing 55 changed files with 90 additions and 70 deletions.
2 changes: 1 addition & 1 deletion 01-pvalue.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ The claim is about the data we have observed, but not about the theory we used t

Even when we have made correct claims, the underlying theory can be false. Popper [-@popper_logic_2002] reminds us that “The empirical basis of objective science has thus nothing ‘absolute’ about it”. He argues that science is not built on a solid bedrock, but on piles driven in a swamp, and notes that “We simply stop when we are satisfied that the piles are firm enough to carry the structure, at least for the time being.” As Hacking [-@hacking_logic_1965] writes: “Rejection is not refutation. Plenty of rejections must be only tentative.” So when we reject the null model, we do so tentatively, aware of the fact we might have done so in error, without necessarily believing the null model is false, and without believing the theory we have used to make predictions is true. For Neyman [-@neyman_inductive_1957] inferential behavior is an “act of will to behave in the future (perhaps until new experiments are performed) in a particular manner, conforming with the outcome of the experiment.” All knowledge in science is provisional.

Some statisticians recommend interpreting *p*-values as measures of *evidence*. For example, Bland [-@bland_introduction_2015] teaches that *p*-values can be interpreted as a "rough and ready" guide for the strength of evidence, and that *p* > 0.1 indicates 'little or no evidence', .05 < *p* < 0.1 indicates 'weak evidence', 0.01 < *p* < 0.05 indicates 'evidence', *p* < 0.001 is 'very strong evidence'. This is incorrect [@johansson_hail_2011; @lakens_why_2022], as is clear from the previous discussions of Lindley's paradox and uniform *p*-value distributions. If you want to quantify *evidence*, see the chapters on [likelihoods](#sec-likelihoods) or [Bayesian statistics](#sec-bayes).
Some statisticians recommend interpreting *p*-values as measures of *evidence*. For example, Bland [-@bland_introduction_2015] teaches that *p*-values can be interpreted as a "rough and ready" guide for the strength of evidence, and that *p* > 0.1 indicates 'little or no evidence', .05 < *p* < 0.1 indicates 'weak evidence', 0.01 < *p* < 0.05 indicates 'evidence', *p* < 0.001 is 'very strong evidence'. This is incorrect [@johansson_hail_2011; @lakens_why_2022], as is clear from the previous discussions of Lindley's paradox and uniform *p*-value distributions. Researchers who claim *p* values are measures of evidence typically do not *define* the concept of evidence. In this textbook I follow the mathematical theory of evidence as developed by Shafer [-@shafer_mathematical_1976, p. 144], who writes "An adequate summary of the impact of the evidence on a particular proposition $A$ must include at least two items of information: a report on how well $A$ is supported and a report on how well its negation $\overline{A}$ is supported." According to Shafer, evidence is quantified through support functions, and when assessing statistical evidence, support is quantified by the likelihood function. If you want to quantify *evidence*, see the chapters on [likelihoods](#sec-likelihoods) or [Bayesian statistics](#sec-bayes).

## Preventing common misconceptions about *p*-values {#sec-misconceptions}

Expand Down
Binary file modified 01-pvalue_files/figure-pdf/fig-fig131-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-fig132-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-fig134-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-fig135-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-fig136-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-fig137-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-fig138-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-paradox-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-pdft-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/fig-tdist-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/q1-1.pdf
Binary file not shown.
Binary file modified 01-pvalue_files/figure-pdf/unnamed-chunk-3-1.pdf
Binary file not shown.
Binary file modified 02-errorcontrol_files/figure-pdf/fig-minerror-1.pdf
Binary file not shown.
Binary file modified 02-errorcontrol_files/figure-pdf/justifyalpha1-1.pdf
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified 08-samplesizejustification_files/figure-pdf/fig-plot-1-1.pdf
Binary file not shown.
Binary file modified 08-samplesizejustification_files/figure-pdf/fig-plot-4-1.pdf
Binary file not shown.
Binary file modified 08-samplesizejustification_files/figure-pdf/fig-power-2-1.pdf
Binary file not shown.
Binary file modified 08-samplesizejustification_files/figure-pdf/fig-power-3-1.pdf
Binary file not shown.
Binary file not shown.
Binary file modified 09-equivalencetest_files/figure-pdf/fig-ciequivalence1-1.pdf
Binary file not shown.
Binary file modified 09-equivalencetest_files/figure-pdf/fig-ciequivalence2-1.pdf
Binary file not shown.
Binary file modified 09-equivalencetest_files/figure-pdf/fig-intervaltest-1.pdf
Binary file not shown.
Binary file modified 09-equivalencetest_files/figure-pdf/fig-tdistequivalence-1.pdf
Binary file not shown.
Binary file modified 09-equivalencetest_files/figure-pdf/fig-tmet-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-boundplot1-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-comparison-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-fourspendingfunctions-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-futility1-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-futility2-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-futilityq13-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-powerseq-1.pdf
Binary file not shown.
Binary file modified 10-sequential_files/figure-pdf/fig-powerseq2-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/fig-carterbias-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/fig-funnel1-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/fig-funnel2-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/fig-petpeese-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/fig-petpeeseq4-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/fig-trimfill1-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/fig-twoforestplot-1.pdf
Binary file not shown.
Binary file modified 12-bias_files/figure-pdf/metasimq2-1.pdf
Binary file not shown.
5 changes: 4 additions & 1 deletion docs/01-pvalue.html
Original file line number Diff line number Diff line change
Expand Up @@ -546,7 +546,7 @@ <h1 class="title"><span id="sec-pvalue" class="quarto-section-identifier"><span
<p>When researchers “accept” or “reject” a hypothesis in a Neyman-Pearson approach to statistical inferences, they do not communicate any belief or conclusion about the substantive hypothesis. Instead, they utter a Popperian <strong>basic statement</strong> based on a prespecified decision rule that the observed data reflect a certain state of the world. Basic statements describe an observation that has been made (e.g., “I have observed a black swan”) or an event that has occurred (e.g., “students performed better at the exam when being their training is spread out over multiple days, than when they train all information in one day”).</p>
<p>The claim is about the data we have observed, but not about the theory we used to make our predictions, which requires a theoretical inference. Data never ‘proves’ a theory is true or false. A basic statement can <strong>corroborate</strong> a prediction derived from a theory, or not. If many predictions deduced from a theory are corroborated, we can become increasingly convinced the theory is close to the truth. This ‘truth-likeness’ of theories is called <strong>verisimilitude</strong> <span class="citation" data-cites="niiniluoto_verisimilitude_1998 popper_logic_2002">(<a href="references.html#ref-niiniluoto_verisimilitude_1998" role="doc-biblioref">Niiniluoto, 1998</a>; <a href="references.html#ref-popper_logic_2002" role="doc-biblioref">Popper, 2002</a>)</span>. A shorter statement when a hypothesis test is presented would therefore read ‘<em>p</em> = .xx, which corroborates our prediction, at an alpha level of y%’, or ‘<em>p</em> = .xx, which does not corroborate our prediction, at a statistical power of y% for our effect size of interest’. Often, the alpha level or the statistical power is only mentioned in the experimental design section of an article, but repeating them in the results section might remind readers of the error rates associated with your claims.</p>
<p>Even when we have made correct claims, the underlying theory can be false. Popper <span class="citation" data-cites="popper_logic_2002">(<a href="references.html#ref-popper_logic_2002" role="doc-biblioref">2002</a>)</span> reminds us that “The empirical basis of objective science has thus nothing ‘absolute’ about it”. He argues that science is not built on a solid bedrock, but on piles driven in a swamp, and notes that “We simply stop when we are satisfied that the piles are firm enough to carry the structure, at least for the time being.” As Hacking <span class="citation" data-cites="hacking_logic_1965">(<a href="references.html#ref-hacking_logic_1965" role="doc-biblioref">1965</a>)</span> writes: “Rejection is not refutation. Plenty of rejections must be only tentative.” So when we reject the null model, we do so tentatively, aware of the fact we might have done so in error, without necessarily believing the null model is false, and without believing the theory we have used to make predictions is true. For Neyman <span class="citation" data-cites="neyman_inductive_1957">(<a href="references.html#ref-neyman_inductive_1957" role="doc-biblioref">1957</a>)</span> inferential behavior is an “act of will to behave in the future (perhaps until new experiments are performed) in a particular manner, conforming with the outcome of the experiment.” All knowledge in science is provisional.</p>
<p>Some statisticians recommend interpreting <em>p</em>-values as measures of <em>evidence</em>. For example, Bland <span class="citation" data-cites="bland_introduction_2015">(<a href="references.html#ref-bland_introduction_2015" role="doc-biblioref">2015</a>)</span> teaches that <em>p</em>-values can be interpreted as a “rough and ready” guide for the strength of evidence, and that <em>p</em> &gt; 0.1 indicates ‘little or no evidence’, .05 &lt; <em>p</em> &lt; 0.1 indicates ‘weak evidence’, 0.01 &lt; <em>p</em> &lt; 0.05 indicates ‘evidence’, <em>p</em> &lt; 0.001 is ‘very strong evidence’. This is incorrect <span class="citation" data-cites="johansson_hail_2011 lakens_why_2022">(<a href="references.html#ref-johansson_hail_2011" role="doc-biblioref">Johansson, 2011</a>; <a href="references.html#ref-lakens_why_2022" role="doc-biblioref">Lakens, 2022</a>)</span>, as is clear from the previous discussions of Lindley’s paradox and uniform <em>p</em>-value distributions. If you want to quantify <em>evidence</em>, see the chapters on <a href="03-likelihoods.html">likelihoods</a> or <a href="04-bayes.html">Bayesian statistics</a>.</p>
<p>Some statisticians recommend interpreting <em>p</em>-values as measures of <em>evidence</em>. For example, Bland <span class="citation" data-cites="bland_introduction_2015">(<a href="references.html#ref-bland_introduction_2015" role="doc-biblioref">2015</a>)</span> teaches that <em>p</em>-values can be interpreted as a “rough and ready” guide for the strength of evidence, and that <em>p</em> &gt; 0.1 indicates ‘little or no evidence’, .05 &lt; <em>p</em> &lt; 0.1 indicates ‘weak evidence’, 0.01 &lt; <em>p</em> &lt; 0.05 indicates ‘evidence’, <em>p</em> &lt; 0.001 is ‘very strong evidence’. This is incorrect <span class="citation" data-cites="johansson_hail_2011 lakens_why_2022">(<a href="references.html#ref-johansson_hail_2011" role="doc-biblioref">Johansson, 2011</a>; <a href="references.html#ref-lakens_why_2022" role="doc-biblioref">Lakens, 2022</a>)</span>, as is clear from the previous discussions of Lindley’s paradox and uniform <em>p</em>-value distributions. Researchers who claim <em>p</em> values are measures of evidence typically do not <em>define</em> the concept of evidence. In this textbook I follow the mathematical theory of evidence as developed by Shafer <span class="citation" data-cites="shafer_mathematical_1976">(<a href="references.html#ref-shafer_mathematical_1976" role="doc-biblioref">1976, p. 144</a>)</span>, who writes “An adequate summary of the impact of the evidence on a particular proposition <span class="math inline">\(A\)</span> must include at least two items of information: a report on how well <span class="math inline">\(A\)</span> is supported and a report on how well its negation <span class="math inline">\(\overline{A}\)</span> is supported.” According to Shafer, evidence is quantified through support functions, and when assessing statistical evidence, support is quantified by the likelihood function. If you want to quantify <em>evidence</em>, see the chapters on <a href="03-likelihoods.html">likelihoods</a> or <a href="04-bayes.html">Bayesian statistics</a>.</p>
</section><section id="sec-misconceptions" class="level2" data-number="1.7"><h2 data-number="1.7" class="anchored" data-anchor-id="sec-misconceptions">
<span class="header-section-number">1.7</span> Preventing common misconceptions about <em>p</em>-values</h2>
<p>A <em>p</em>-value is the probability of the observed data, or more extreme data, under the assumption that the null hypothesis is true. To understand what this means, it might be especially useful to know what this doesn’t mean. First, we need to know what ‘the assumption that the null hypothesis is true’ looks like, and which data we should expect if the null hypothesis is true. Although the null hypothesis can be any value, in this assignment we will assume the null hypothesis is specified as a mean difference of 0. For example, we might be interested in calculating the difference between a control condition and an experimental condition on a dependent variable.</p>
Expand Down Expand Up @@ -998,6 +998,9 @@ <h1 class="title"><span id="sec-pvalue" class="quarto-section-identifier"><span
<div id="ref-schweder_confidence_2016" class="csl-entry" role="listitem">
Schweder, T., &amp; Hjort, N. L. (2016). <em>Confidence, <span>Likelihood</span>, <span>Probability</span>: <span>Statistical Inference</span> with <span>Confidence Distributions</span></em>. <span>Cambridge University Press</span>. <a href="https://doi.org/10.1017/CBO9781139046671">https://doi.org/10.1017/CBO9781139046671</a>
</div>
<div id="ref-shafer_mathematical_1976" class="csl-entry" role="listitem">
Shafer, G. (1976). <em>A mathematical theory of evidence</em>. <span>Princeton University Press</span>.
</div>
<div id="ref-spanos_probability_1999" class="csl-entry" role="listitem">
Spanos, A. (1999). <em>Probability theory and statistical inference: Econometric modeling with observational data</em>. <span>Cambridge University Press</span>.
</div>
Expand Down
Loading

0 comments on commit e4fdf29

Please sign in to comment.