Crowdsourcing Hypothesis Tests: Making transparent how design choices shape research results

Justin F. Landy, Miaolei Liam, Isabel L. Ding, Domenico Viganola, Warren Tierney, Anna Dreber, Magnus Johannesson, Thomas Pfeiffer, Charles R. Ebersole, Quentin F. Gronau, Alexander Ly, Don Van Den Bergh, Maarten Marsman, Koen Derks, Eric Jan Wagenmaker, Andrew Proctor, Daniel M. Bartels, Christopher W. Bauman, William J. Brady, Felix CheungAndrei Cimpian, Simone Dohle, M. Brent Donnellan, Adam Hahn, Michael P. Hall, William Jiménez-Leal, David J. Johnson, Richard E. Lucas, BenoÎt Monin, Andres Montealegre, Elizabeth Mullen, Jun Pang, Jennifer Ray, Diego A. Reinero, Jesse Reynolds, Walter Sowden, Daniel Storage, Runkun Su, Christina M. Tworek, Jay J. Van Bavel, Daniel Walco, Julian Wills, Xiaobing Xu, Kai Chi Yam, Xiaoyu Yang, William A. Cunningham, Martin Schweinsberg, Molly Urwitz, Eric L. Uhlmann, Matú Adamkovic, Ravin Alaei, Casper J. Albers, Aurélien Allard, Ian A. Anderson, Michael R. Andreychik, Peter Babinčák, Bradley J. Baker, Gabriel Baník, Ernest Baskin, Jozef Bavolar, Ruud M.W.J. Berkers, Michal Białek, Joel Blanke, Johannes Breuer, Ambra Brizi, Stephanie E.V. Brown, Florian Brühlmann, Hendrik Bruns, Leigh Caldwell, Jean François Campourcy, Eugene Y. Chan, Yen Ping Chang, Benjamin Y. Cheung, Alycia Chin, Kit W. Cho, Simon Columbus, Paul Conway, Conrad A. Corretti, Adam W. Craig, Paul G. Curran, Alexander F. Danvers, Ian G.J. Dawson, Martin V. Day, Erik Dietl, Johannes T. Doerflinger, Alice Dominici, Vilius Dranseika, Peter A. Edelsbrunner, John E. Edlund, Matthew Fisher, Anna Fung, Oliver Genschow, Timo Gnambs, Matthew H. Goldberg, Lorenz Graf-Vlachy, Andrew C. Hafenbrack, Sebastian Hafenbrädl, Andree Hartanto, Joseph P. Heffner, Joseph Hilgard, Felix Holzmeister, Oleksandr V. Horchak, Tina S.T. Huang, Joachim Hüffmeier, Sean Hughes, Ian Hussey, Roland Imhoff, Bastian Jaeger, Konrad Jamro, Samuel G.B. Johnson, Andrew Jones, Lucas Keller, Olga Kombeiz, Lacy E. Krueger, Anthony Lantian, Justin P. Laplante, Ljiljana B. Lazarevic, Jonathan Leclerc, Nicole Legate, James M. Leonhardt, Desmond W. Leung, Carmel A. Levitan, Hause Lin, Qinglan Liu, Marco Tullio Liuzza, Kenneth D. Locke, Albert L. Ly, Melanie MacEacheron, Christopher R. Madan, Harry Manley, Silvia Mari, Marcel Martončik, Scott L. McLean, Jonathon McPhetres, Brett G. Mercier, Corinna Michels, Michael C. Mullarkey, Erica D. Musser, Ladislas Nalborczyk, Gustav Nilsonne, Nicholas G. Otis, Sarah M.G. Otner, Philipp E. Otto, Oscar Oviedo-Trespalacios, Mariola Paruzel-Czachura, Francesco Pellegrini, Vitor M.D. Pereira, Hannah Perfecto, Gerit Pfuhl, Mark H. Phillips, Ori Plonsky, Maura Pozzi, Danka B. Puric, Brett Raymond-Barker, David E. Redman, Caleb J. Reynolds, Ivan Ropovik, Lukas Röseler, Janna K. Ruessmann, William H. Ryan, Nika Sablaturova, Kurt J. Schuepfer, Astrid Schütz, Miroslav Sirota, Matthias Stefan, Eric L. Stocks, Garrett L. Strosser, Jordan W. Suchow, Anna Szabelska, Kian Siong Tey, Leonid Tiokhin, Jais Troian, Till Utesch, Alejandro Vásquez-Echeverriá, Leigh Ann Vaughn, Mark Verschoor, Bettina Von Helversen, Pascal Wallisch, Sophia C. Weissgerber, Aaron L. Wichman, Jan K. Woike, Iris Žeželj, Janis H. Zickfeld, Yeonsin Ahn, Philippe F. Blaettchen, Xi Kang, Yoo Jin Lee, Philip M. Parker, Paul A. Parker, Jamie S. Song, May Anne Very, Lynn Wong

Research output: Contribution to journalArticlepeer-review

23 Scopus citations

Abstract

To what extent are research results influenced by subjective decisions that scientists make as they design studies? Fifteen research teams independently designed studies to answer five original research questions related to moral judgments, negotiations, and implicit cognition. Participants from 2 separate large samples (total N = 15,000) were then randomly assigned to complete 1 version of each study. Effect sizes varied dramatically across different sets of materials designed to test the same hypothesis: Materials from different teams rendered statistically significant effects in opposite directions for 4 of 5 hypotheses, with the narrowest range in estimates being d = —0.37 to + 0.26. Meta-analysis and a Bayesian perspective on the results revealed overall support for 2 hypotheses and a lack of support for 3 hypotheses. Overall, practically none of the variability in effect sizes was attributable to the skill of the research team in designing materials, whereas considerable variability was attributable to the hypothesis being tested. In a forecasting survey, predictions of other scientists were significantly correlated with study results, both across and within hypotheses. Crowdsourced testing of research hypotheses helps reveal the true consistency of empirical support for a scientific claim.

Original languageEnglish (US)
Pages (from-to)451-479
Number of pages29
JournalPsychological bulletin
Volume146
Issue number5
DOIs
StatePublished - 2020
Externally publishedYes

Keywords

  • conceptual replications
  • crowdsourcing
  • forecasting
  • research robustness
  • scientific transparency

ASJC Scopus subject areas

  • Psychology(all)

Fingerprint

Dive into the research topics of 'Crowdsourcing Hypothesis Tests: Making transparent how design choices shape research results'. Together they form a unique fingerprint.

Cite this