Statistical Forensics | Best Practices in Science

P-Hacking

General Information

In some rare cases, researchers will manipulate data in order to get the most pleasing results. One method few researchers take to change their work is called “p-hacking.” P-hacking describes the manipulation of data that results in a desired p-value. A p-value is part of a significance test in statistics which shows how surprising or extreme your results are. Therefore, it is beneficial for some researchers to manipulate this value in order to make their work notable.

Psychologists Uri Simonsohn, Joseph Simmons, and Leif Nelson were curious about how to counteract this phenomenon and decided to create their own ‘ridiculous’ experiment to see if they can reverse ‘p-hack’ (Aschwanden, 2019). They did this by creating a study and manipulating it to find the results they desired. In this case, their study wanted to show that listening to the once popular band, The Beatles, could take years off your life and make you younger. These psychologists were able to show that slight changes to the variables they decided to compare, the observations they decided to measure, and the factors they decided to combine could easily provide a multitude of different results. Thus, what once was known as ‘researcher’s degrees of freedom,’ was changed to the term ‘p-hacking’ by these psychologists.

There are some ways that researchers can prevent ‘p-hacking.’ According to Aschwanden, researchers can prevent p-hacking by preregistering their methodology and analysis plan, which can increase the credibility and validity of one’s results in the future. Researchers could also replicate their own study to see if they receive the same results and to see if their results are truly significant. Lastly, another way to prevent p-hacking is to use technology to one’s advantage and use a p-checker to run through the numbers and analyze the results. These are just some of the many ways one can check the data of others or check their own data to make sure they are not ‘p-hacking.’ Much more information on p-hacking can be found below.

Here are resources on the phenomenon:

Adda, J., Ottaviani, M., & Decker, C. (2020) P-hacking in clinical trials and how incentives shape the distribution of results across phases. Proceedings of the National Academy of Sciences of the United States of America, 117(24), 13386–13392.

Amici, D. (2016) Are you Guilty of P-hacking? BiteSizeBio.

Aschwanden C. (2019). We’re All ‘P-Hacking’ Now. Wired.

Baum, J. A. C. & Bromiley, P. (2019). P-hacking in Top-tier Management Journals. Academy of Management Annual Meeting Proceedings.

Berman, R., Pekelis, L., Scott, A., & Van den Bulte, C. (2018). p-Hacking and False Discovery in A/B Testing. SSRN.

Bettis, R. A. (2012). The search for asterisks: compromised statistical tests and flawed theories. Strategic Management Journal, 1.

Bishop, D. V. M. & Thompson, P. A. (2016). Problems in using p-curve analysis and text-mining to detect rate of p-hacking and evidential value. PeerJ 4:e1715.

Boutron, I., & Ravaud, P. (2018). Misrepresentation and distortion of research in biomedical literature. Proceedings of the National Academy of Sciences of the United States, 11, 2613.

Bruns, S. B., & Ioannidis, J. P. (2016). p-Curve and p-Hacking in Observational Research. PloS one, 11(2), e0149144.

Bruns, S. B. & Kalthaus, M. (2020). Flexibility in the selection of patent counts: Implications for p-hacking and evidence-based policymaking. Research Policy, 49(1).

Bruns, S.B. & Stern, D.I. (2019). Lag length selection and p-hacking in Granger causality testing: prevalence and performance of meta-regression models. Empirical Economics 56, 797–830.

Brodeur, A., Cook, N., & Heyes, A. (2018). Methods Matter: P-Hacking and Causal Inference in Economics. IZA Discussion Papers 11796.

Brodeur, A., Cook, N., & Heyes, A. (2020). A Proposed Specification Check for p-Hacking. AEA Papers and Proceedings, 110, 66-69.

Chen, A. Y. (2020). The Limits of p-Hacking: Some Thought Experiments. SSRN.

Chordia, T., Goyal, A., & Saretto, A. (2017). p-Hacking: Evidence from Two Million Trading Strategies. Swiss Finance Institute Research Paper Series 17-37.

Costello, V. (2015).“P-hacking”: Megan Head on why it’s bad for science. PLOSblogs./p>

Coy, P. (2017). Investors always think they’re getting ripped off. Here’s why they’re right. BloombergBusinessweek.

Crane, H. (2018). The Impact of P-hacking on “Redefine Statistical Significance.” Basic and Applied Social Psychology, 40(4), 219-235.

Cumming, G. (2016). One reason so many scientific studies may be wrong. The Conversation blog.

Dean, T. (2017). How we edit science part 2: significance testing, p-hacking and peer review. The Conversation blog.

Dodson, T. B. (2019). The Problem With P-Hacking. Journal of Oral and Maxillofacial Surgery, 77(3), 459-460.

Erdfelder, E., & Heck, D. W. (2019). Detecting Evidential Value and P-Hacking With the P-curve tool: A Word of Caution. ZPID (Leibniz Institute for Psychology Information).

Friese, M., & Frankenbach, J. (2020). p-Hacking and publication bias interact to distort meta-analytic effect size estimates. Psychological Methods, 25(4), 456–471.

Gelman, A. (2016). The Problems With P-Values are not Just With P-Values. The American Statistician, supplemental materials to ASA Statement on p-Values and Statistical Significance, 70, 1–2.

Hack your way to scientific glory – FiveThirtyEight Simulation

Hartgerink, C. H. J. (2017). Reanalyzing Head et al. (2015): investigating the robustness of widespread p-hacking. PeerJ 5:e3068.

Head, M. L., Holman, L., Lanfear, R., Kahn, A. T., & Jennions, M. D. (2015). The extent and consequences of p-hacking in science. PLoS Biol, 13(3), e1002106.

How much do we know about p-hacking “in the wild”? (2016). Cross Validated.

Ingre, M. (2017). P-hacking in academic research. Academic dissertation at Stockholm University.

Kahan, B.C., Forbes, G. & Cro, S. (2020). How to design a pre-specified statistical analysis approach to limit p-hacking in clinical trials: the Pre-SPEC framework. BMC Med, 18.

Khan, M. J. & Trønnes, P. C. (2019). p-Hacking in Experimental Audit Research. Behavioral Research in Accounting 31(1), 119–131.

Lakens, D. (2015). Comment: What p-hacking really looks like: A comment on Masicampo and LaLande (2012). Quarterly Journal of Experimental Psychology, 68(4), 829-832.

Lombrozo, Tania. (2014). Science, Trust And Psychology In Crisis. National Public Radio.

MacCoun, R. (2019). p-Hacking: A Strategic Analysis. SSRN.

Moody, O. (2017). Psychologist in the soup over food claims. The Times.

Nelson, L.D. (2014). False-positive, p-hacking, statistical power, and evidential value. Berkeley Initiative for Transparency in the Social Sciences.

Neuroskeptic (2015). P-hacking: a talk and further thoughts. Discover Magazine.

Novella, S. (2014). P-hacking and other statistical sins. NEUROLOGICABLOG.

Nuzzo, R. (2014). Statistical errors. i>Nature, 506(7487), 150.

p-checker. ShinyApps: Experience Statistics.

Prior, M., Hibberd, R., Asemota, N., & Thornton, J. G. (2017). Inadvertent P-hacking among trials and systematic reviews of the effect of progestogens in pregnancy? A systematic review and meta-analysis. BJOG, 124(7), 1008-1015.

Raj, A. T., Patil, S., Sarode, S., & Salameh, Z. (2018). P-Hacking: A Wake-Up Call for the Scientific Community. Sci Eng Ethics, 24(6), 1813-1814.

Raj, A. T., Patil, S., Sarode, S., & Sarode, G. (2017). P-hacking. The Journal of Contemporary Dental Practice, 18(8), 633-634.

Rytchkov, O. & Zhong, X. (2019). Information Aggregation and P-Hacking. Management Science, 66(4), 1509-1782.

Science isn’t broken. Fivethirtyeight

Shafer, G. (2019). On the Nineteenth-Century Origins of Significance Testing and P-Hacking. SSRN.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2013). Life after P-Hacking. Meeting of the Society for Personality and Social Psychology, New Orleans, LA, 17-19 January 2013.

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2013). P-Curve: A Key to the File Drawer. Journal of Experimental Psychology: General, 143(2), 534-547.

Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a Reply to Ulrich and Miller (2015). Journal of Experimental Psychology: General, 144(6), 1146–1152.

Statistical P-Hacking explained. Science in the Newsroom.

Streiner, D. L. (2018). Statistics Commentary Series: Commentary No. 27: P-Hacking. Journal of Clinical Psychopharmacology, 38(4), 286-288.

Thieking, M. (2016). John Oliver rips apart bad science on ‘Last Week Tonight’. STAT.

Turner, D. P. (2018). P-Hacking in Headache Research. The Journal of Head and Face Pain, 58(2), 196-198.

Ulrich, R., & Miller, J. (2015). p-hacking by post hoc selection with multiple opportunities: Detectability by skewness test?: Comment on Simonsohn, Nelson, and Simmons (2014). Journal of Experimental Psychology: General, 144(6), 1137–1145.

Verhulst, B. (2016). In Defense of P Values. AANA J, 84(5), 305-308.

Vermeulen, I. (2015). Blinded by the Light: How a Focus on Statistical “Significance” May Cause p-Value Misreporting and an Excess of p-Values Just Below .05 in Communication Science. Communication Methods and Measures, 9(4), 253-279.

Vidgen, B. & Yasseri, T. (2016). P-Values: Misunderstood and Misused. Front. Phys. 4:6.

What Marketers Are Doing Wrong in Data Analytics. (2018). Knowledge at Wharton.

Wicherts, J. M., Veldkamp, C. L. S., Augusteijn, H. E. M., Bakker, M., van Aert, R. C. M., & van Assen, M. A. L. M. (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7, [1832].