TY - JOUR

T1 - Misinterpreting p

T2 - The Discrepancy Between p Values and the Probability the Null Hypothesis is True, the Influence of Multiple Testing, and Implications for the Replication Crisis

AU - Anderson, Samantha F.

PY - 2019

Y1 - 2019

N2 - The p value is still misinterpreted as the probability that the null hypothesis is true. Even psychologists who correctly understand that p values do not provide this probability may not realize the degree to which p values differ from the probability that the null hypothesis is true. Importantly, previous research on this topic has not addressed the influence of multiple testing, often a reality in psychological studies, and has not extensively considered the influence of different prior probabilities favoring the null and alternative hypotheses. Simulation studies are presented that emphasize the magnitude by which p values are distinct from the posterior probability that the null hypothesis is true, under an extensive set of conditions including multiple testing. Particular emphasis is placed on p values just under .05, given the prevalence of these p values in the published literature, though p values in other intervals are also assessed. In diverse conditions, results indicate that posterior probabilities favoring the null hypothesis are often far removed from .05, and this pattern quickly gets much worse when multiple testing is conducted. Rather than simply telling researchers that p values do not reflect the probability favoring the null hypothesis, as has been done previously, the results presented here allow psychologists to see the evidence provided by various p values. These results have particularly topical implications for the replication crisis, for how much weight should be placed on a single study, and for how the term statistical significance should be interpreted, particularly in conditions typical in psychological research.

AB - The p value is still misinterpreted as the probability that the null hypothesis is true. Even psychologists who correctly understand that p values do not provide this probability may not realize the degree to which p values differ from the probability that the null hypothesis is true. Importantly, previous research on this topic has not addressed the influence of multiple testing, often a reality in psychological studies, and has not extensively considered the influence of different prior probabilities favoring the null and alternative hypotheses. Simulation studies are presented that emphasize the magnitude by which p values are distinct from the posterior probability that the null hypothesis is true, under an extensive set of conditions including multiple testing. Particular emphasis is placed on p values just under .05, given the prevalence of these p values in the published literature, though p values in other intervals are also assessed. In diverse conditions, results indicate that posterior probabilities favoring the null hypothesis are often far removed from .05, and this pattern quickly gets much worse when multiple testing is conducted. Rather than simply telling researchers that p values do not reflect the probability favoring the null hypothesis, as has been done previously, the results presented here allow psychologists to see the evidence provided by various p values. These results have particularly topical implications for the replication crisis, for how much weight should be placed on a single study, and for how the term statistical significance should be interpreted, particularly in conditions typical in psychological research.

KW - Multiple testing

KW - P values

KW - Replication

KW - Statistical significance

UR - http://www.scopus.com/inward/record.url?scp=85076444032&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076444032&partnerID=8YFLogxK

U2 - 10.1037/met0000248

DO - 10.1037/met0000248

M3 - Article

C2 - 31829657

AN - SCOPUS:85076444032

JO - Psychological Methods

JF - Psychological Methods

SN - 1082-989X

ER -