Misinterpreting p: The Discrepancy Between p Values and the Probability the Null Hypothesis is True, the Influence of Multiple Testing, and Implications for the Replication Crisis

Samantha F. Anderson

doi:10.1037/met0000248

Misinterpreting p: The Discrepancy Between p Values and the Probability the Null Hypothesis is True, the Influence of Multiple Testing, and Implications for the Replication Crisis

Samantha F. Anderson

Psychology

Research output: Contribution to journal › Article › peer-review

33 Scopus citations

Abstract

The p value is still misinterpreted as the probability that the null hypothesis is true. Even psychologists who correctly understand that p values do not provide this probability may not realize the degree to which p values differ from the probability that the null hypothesis is true. Importantly, previous research on this topic has not addressed the influence of multiple testing, often a reality in psychological studies, and has not extensively considered the influence of different prior probabilities favoring the null and alternative hypotheses. Simulation studies are presented that emphasize the magnitude by which p values are distinct from the posterior probability that the null hypothesis is true, under an extensive set of conditions including multiple testing. Particular emphasis is placed on p values just under .05, given the prevalence of these p values in the published literature, though p values in other intervals are also assessed. In diverse conditions, results indicate that posterior probabilities favoring the null hypothesis are often far removed from .05, and this pattern quickly gets much worse when multiple testing is conducted. Rather than simply telling researchers that p values do not reflect the probability favoring the null hypothesis, as has been done previously, the results presented here allow psychologists to see the evidence provided by various p values. These results have particularly topical implications for the replication crisis, for how much weight should be placed on a single study, and for how the term statistical significance should be interpreted, particularly in conditions typical in psychological research.

Original language	English (US)
Journal	Psychological Methods
Volume	25
Issue number	5
DOIs	https://doi.org/10.1037/met0000248
State	Accepted/In press - Oct 2020

Keywords

Multiple testing
P values
Replication
Statistical significance

ASJC Scopus subject areas

Psychology (miscellaneous)

Access to Document

10.1037/met0000248

Cite this

@article{ea30aea7a3f1439dacca596953e4e304,

title = "Misinterpreting p: The Discrepancy Between p Values and the Probability the Null Hypothesis is True, the Influence of Multiple Testing, and Implications for the Replication Crisis",

abstract = "The p value is still misinterpreted as the probability that the null hypothesis is true. Even psychologists who correctly understand that p values do not provide this probability may not realize the degree to which p values differ from the probability that the null hypothesis is true. Importantly, previous research on this topic has not addressed the influence of multiple testing, often a reality in psychological studies, and has not extensively considered the influence of different prior probabilities favoring the null and alternative hypotheses. Simulation studies are presented that emphasize the magnitude by which p values are distinct from the posterior probability that the null hypothesis is true, under an extensive set of conditions including multiple testing. Particular emphasis is placed on p values just under .05, given the prevalence of these p values in the published literature, though p values in other intervals are also assessed. In diverse conditions, results indicate that posterior probabilities favoring the null hypothesis are often far removed from .05, and this pattern quickly gets much worse when multiple testing is conducted. Rather than simply telling researchers that p values do not reflect the probability favoring the null hypothesis, as has been done previously, the results presented here allow psychologists to see the evidence provided by various p values. These results have particularly topical implications for the replication crisis, for how much weight should be placed on a single study, and for how the term statistical significance should be interpreted, particularly in conditions typical in psychological research.",

keywords = "Multiple testing, P values, Replication, Statistical significance",

author = "Anderson, {Samantha F.}",

note = "Publisher Copyright: {\textcopyright} 2019 American Psychological Association.",

year = "2020",

month = oct,

doi = "10.1037/met0000248",

language = "English (US)",

volume = "25",

journal = "Psychological Methods",

issn = "1082-989X",

publisher = "American Psychological Association Inc.",

number = "5",

}

TY - JOUR

T1 - Misinterpreting p

T2 - The Discrepancy Between p Values and the Probability the Null Hypothesis is True, the Influence of Multiple Testing, and Implications for the Replication Crisis

AU - Anderson, Samantha F.

PY - 2020/10

Y1 - 2020/10

N2 - The p value is still misinterpreted as the probability that the null hypothesis is true. Even psychologists who correctly understand that p values do not provide this probability may not realize the degree to which p values differ from the probability that the null hypothesis is true. Importantly, previous research on this topic has not addressed the influence of multiple testing, often a reality in psychological studies, and has not extensively considered the influence of different prior probabilities favoring the null and alternative hypotheses. Simulation studies are presented that emphasize the magnitude by which p values are distinct from the posterior probability that the null hypothesis is true, under an extensive set of conditions including multiple testing. Particular emphasis is placed on p values just under .05, given the prevalence of these p values in the published literature, though p values in other intervals are also assessed. In diverse conditions, results indicate that posterior probabilities favoring the null hypothesis are often far removed from .05, and this pattern quickly gets much worse when multiple testing is conducted. Rather than simply telling researchers that p values do not reflect the probability favoring the null hypothesis, as has been done previously, the results presented here allow psychologists to see the evidence provided by various p values. These results have particularly topical implications for the replication crisis, for how much weight should be placed on a single study, and for how the term statistical significance should be interpreted, particularly in conditions typical in psychological research.

AB - The p value is still misinterpreted as the probability that the null hypothesis is true. Even psychologists who correctly understand that p values do not provide this probability may not realize the degree to which p values differ from the probability that the null hypothesis is true. Importantly, previous research on this topic has not addressed the influence of multiple testing, often a reality in psychological studies, and has not extensively considered the influence of different prior probabilities favoring the null and alternative hypotheses. Simulation studies are presented that emphasize the magnitude by which p values are distinct from the posterior probability that the null hypothesis is true, under an extensive set of conditions including multiple testing. Particular emphasis is placed on p values just under .05, given the prevalence of these p values in the published literature, though p values in other intervals are also assessed. In diverse conditions, results indicate that posterior probabilities favoring the null hypothesis are often far removed from .05, and this pattern quickly gets much worse when multiple testing is conducted. Rather than simply telling researchers that p values do not reflect the probability favoring the null hypothesis, as has been done previously, the results presented here allow psychologists to see the evidence provided by various p values. These results have particularly topical implications for the replication crisis, for how much weight should be placed on a single study, and for how the term statistical significance should be interpreted, particularly in conditions typical in psychological research.

KW - Multiple testing

KW - P values

KW - Replication

KW - Statistical significance

UR - http://www.scopus.com/inward/record.url?scp=85076444032&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85076444032&partnerID=8YFLogxK

U2 - 10.1037/met0000248

DO - 10.1037/met0000248

M3 - Article

C2 - 31829657

AN - SCOPUS:85076444032

SN - 1082-989X

VL - 25

JO - Psychological Methods

JF - Psychological Methods

IS - 5

ER -

Misinterpreting p: The Discrepancy Between p Values and the Probability the Null Hypothesis is True, the Influence of Multiple Testing, and Implications for the Replication Crisis

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this