Skip to main content Skip to secondary navigation

Academic Publications | Statistical Perspectives

Main content start

Statistical Perspectives

Analyses of the robustness of results through the use of statistical tools, such as evaluating the p-curve, replicability index, or using software to test for image manipulation.


Aiken, L. S., West, S. G., & Millsap, R. E. (2008). Doctoral training in statistics, measurement, and methodology in psychology: Replication and extension of aiken, west, sechrest, and reno’s (1990) survey of PhD programs in north america. American Psychologist, 63(1), 32-50.

Altman, M., Gill, J., & McDonald, M. P. (2004). Sources of inaccuracy in statistical computation. Numerical Issues in Statistical Computing for the Social Scientist, 12-43.

Altman, D. G. (1981). Statistics and ethics in medical research. VIII-improving the quality of statistics in medical journals. British Medical Journal, 282(6257), 44.

Ang, R. P. (1998). Use of the jacknife statistic to evaluate result replicability. The Journal of General Psychology, 125(3), 218-228.

Bakan, D. (1966). The test of significance in psychological research. Psychological Bulletin66(6), 423.

Brown, A. B, Kaiser, K. A., & Allison, D. B. (2017). Issues with data and analyses: Errors, underlying themes, and potential solutions. PNAS, 115(11), 2563-2570.

Barto, E. K., & Rillig, M. C. (2012). Dissemination biases in ecology: effect sizes matter more than quality. Oikos121(2), 228-235.

Basch, C. E., Sliepcevich, E. M., Gold, R. S., Duncan, D. F., & Kolbe, L. J. (1985). Avoiding type III errors in health education program evaluations: a case study. Health Education & Behavior12(3), 315-331.

Begley, C. G., & Ioannidis, J. P. (2015). Reproducibility in science: Improving the standard for basic and preclinical research. Circulation Research116(1), 116-126.

Berk, R., Brown, L., Buja, A., Zhang, K., & Zhao, L. (2012). Valid post-selection inference. Annals of Statistics41(2), 802-837.

Bernau, C., Riester, M., Boulesteix, A. L., Parmigiani, G., Huttenhower, C., Waldron, L., & Trippa, L. (2014). Cross-study validation for the assessment of prediction algorithms. Bioinformatics30(12), i105-i112.

Berry, D. (2012). Multiplicities in cancer research: Ubiquitous and necessary evils. Journal of the National Cancer Institute104(15), 1125-1133.

Betz, M. A., & Gabriel, K. R. (1978). Type IV errors and analysis of simple effects. Journal of Educational and Behavioral Statistics3(2), 121-143.

Bland, J. M. (2009). The tyranny of power: is there a better way to calculate sample size?. BMJ339.

Bofinger, E. (1985). Multiple comparisons and type iii errors. Journal of the American Statistical Association80(390), 433-437.

Bolik, R.J. (1979). Interactions, partial interactions, and interaction contrasts in the analysis of variance. Psychological Bulletin, 86(5), 1084-1089.

Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Confidence and precision increase with high statistical power. Nature Reviews Neuroscience14(8), 585-585.

Carmer, S. G., & Swanson, M. R. (1973). An evaluation of ten pairwise multiple comparison procedures by Monte Carlo methods. Journal of the American Statistical Association68(341), 66-74.

Carver, R.J. (1978). The case against statistical significance testing. Harvard Educational Review, 48, 378-399.

Clarke, K. A. (2005). The phantom menace: Omitted variable bias in econometric research. Conflict Management and Peace Science22(4), 341-352.

Chan, A. W., Hróbjartsson, A., Haahr, M. T., Gøtzsche, P. C., & Altman, D. G. (2004). Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles. Jama291(20), 2457-2465.

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312.

Cohen, J. (1995). The earth is round (p<. 05). American Psychologist, 49, 997-1003.

Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science1(3), 140216.

Cooper, H., & Dent, A. (2011). Ethical issues in the conduct and reporting of meta-analysis. In A. T. Panter, & S. K. Sterba (Eds.), Handbook of ethics in quantitative methodology; handbook of ethics in quantitative methodology (pp. 417-443, Chapter xix, 519 Pages) Routledge/Taylor & Francis Group, New York, NY. 

Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but confidence intervals do much better. Perspectives on Psychological Science3(4), 286-300.

Cumming, G. (2014). The new statistics: How and why. Psychological Science, 25, 7-29.

De Long, J. B., & Lang, K. (1992). Are all economic hypotheses false?. Journal of Political Economy, 1257-1272.

Doerfler, L. A., & Chaplin, W. F. (1985). Type III error in research on interpersonal models of depression. Journal of Abnormal Psychology94(2), 227.

Djulbegovic, B., Hozo, I., & Ioannidis, J. P. (2014). Improving the drug development process: more not less randomized trials. The Journal fo the American Medical Association311(4), 355-356.

Dobson, D., & Cook, T. J. (1980). Avoiding type III error in program evaluation: Results from a field experiment. Evaluation and Program Planning3(4), 269-276.

Dodhia, R. M. (2005). A review of applied multiple Regression/Correlation analysis for the behavioral sciences (3rd ed.). Journal of Educational and Behavioral Statistics, 30(2), 227-229.

Donoho, D., & Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Annals of Statistics, 962-994.

Doshi, P., Goodman, S. N., & Ioannidis, J. P. (2013). Raw data from clinical trials: within reach?. Trends in Pharmacological Sciences34(12), 645-647.

Dunn, W. N. (2001). Using the method of context validation to mitigate Type III errors in environmental policy analysis. Hisschemoller, M., Hoppe, R., Ravetz, J.R. (Ed.). New Brunswick, New Jersey: Transaction Publishers. Knowledge, Power and Participation in Environmental Policy Analysis, 417-436.

Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105(3), 399-412.

Dwan, K., Altman, D. G., Arnaiz, J. A., Bloom, J., Chan, A. W., Cronin, E., … & Williamson, P. R. (2008). Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PloS One3(8), e3081.

Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Sciences, 201602413.

Games, P. A. (1978). Nesting, crossing, type IV errors, and the role of statistical models. American Educational Research Journal15(2), 253-258.

Gelman, A. (2013). Commentary: P values and statistical practice. Epidemiology24(1), 69-72.

Gelman, A. (2013). Ethics and statistics: It’s too hard to publish criticisms and obtain data for republication. Chance26(3), 49-52.

Gelman, A. (2014). The connection between varying treatment effects and the crisis of unreplicable research a Bayesian perspective. Journal of Management41(2), 632-643.

Goodman, S. N., Altman, D. G., & George, S. L. (1998). Statistical reviewing policies of medical journals. Journal of General Internal Medicine13(11), 753-756.

Gurusamy, K. S., Gluud, C., Nikolova, D., & Davidson, B. R. (2009). Assessment of risk of bias in randomized clinical trials in surgery. British Journal of Surgery96(4), 342-349.

Falk, R., & Greenbaum, C. W. (1995). Significance Tests Die Hard The Amazing Persistence of a Probabilistic Misconception. Theory & Psychology5(1), 75-98.

Fang, F. C., Steen, R. G., & Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences109(42), 17028-17033.

Fiedler, K. (2009). Voodoo correlations are everywhere – not only in neuroscience. Perspectives on Psychological Science6(2), 163-171.

Fiedler, K., Kutzner, F., & Krueger, J. I. (2012). The long way from α-error control to validity proper problems with a short-sighted false-positive debate. Perspectives on Psychological Science7(6), 661-669.

Filho, D. B. F., Paranhos, R., da Rocha, E.,C., Batista, M., da Silva, José Alexandre, Jr, Santos, M. L. W. D., & Marino, J. G. (2013). When is statistical significance not significant? Brazilian Political Science Review, 7(1), 31-55.

Finner, H. (1999). Stepwise multiple test procedures and control of directional errors. The Annals of Statistics, 27, 274-289.

Frick, R. W. (1996). The appropriate use of null hypothesis testing. Psychological Methods, 1(4), 379-390.

Gale, R. P., Hochhaus, A., & Zhang, M. -. (2016). What is the (p-) value of the P-value? Leukemia, 30(10), 1965-1967.

Games, P. A. (1973). Type IV errors revisited. Psychological Bulletin, Vol 80(4), Oct 1973, 304-307.

Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist102(6), 460.

Gillett, R. (1994). Post hoc power analysis. Journal of Applied Psychology, 79, 783-785.

Goodie, A. S. (2004). Null hypothesis statistical testing and the balance between positive and negative approaches. Behavioral and Brain Sciences, 27(3), 338-339.

Goodman, S. N. (1992). A comment on replication, p‐values and evidence. Statistics in Medicine11(7), 875-879.

Goodman, S. (2008, July). A dirty dozen: twelve p-value misconceptions. Seminars in Hematology (Vol. 48, No.43, pp. 135-140).

Greenland, S. (2008, July). Bayesian interpretation and analysis of research results. Seminars in Hematology (Vol. 45, No. 3, pp. 141-149). WB Saunders.

Greenwald, A. G. (1975). Consequences of prejudice against the null hypothesis. Psychological Bulletin82(1), 1-20.

Greenwald, A., Gonzalez, R., Harris, R., & Guthrie, D. (1996). Effect sizes and p values: what should be reported and what should be replicated?. Psychophysiology33(2), 175-183.

Gresham, F. M. (1993). Social skills and learning disabilities as a type III error: Rejoinder to Conte and Andrews. Journal of Learning Disabilities26(3), 154-158.

Harmon‐Jones, E., Amodio, D. M., & Harmon‐Jones, C. (2009). Action‐based model of dissonance: A review, integration, and expansion of conceptions of cognitive conflict. Advances in Experimental Social Psychology, 41, 119-166.

Hassan, S., Yellur, R., Subramani, P., Adiga, P., Gokhale, M., Iyer, M. S., & Mayya, S. S. (2015). Research design and statistical methods in indian medical journals: A retrospective survey. PLoS One, 10(4).

Heller, R., & Yekutieli, D. (2014). Replicability analysis for genome-wide association studies. The Annals of Applied Statistics8(1), 481-498.

Hozo, I., Schell, M. J., & Djulbegovic, B. (2008, July). Decision-making when data and inferences are not conclusive: risk-benefit and acceptable regret approach. Seminars in Hematology (Vol. 45, No. 3, pp. 150-159). WB Saunders.

Hung, J.H.M., O’Neil, R.T., Bauer, P., & Kohne, K. (1997). The behavior of the p-value when the alternative hypothesis is true. Biometrics, 53, 11-22.

Imai, K. (2005). Do get-out-the-vote calls reduce turnout? The importance of statistical methods for field experiments. American Political Science Review99(2), 283-300.

IntHout, J., Ioannidis, J. P., & Borm, G. F. (2014). The Hartung-Knapp-Sidik-Jonkman method for random effects meta-analysis is straightforward and considerably outperforms the standard DerSimonian-Laird method. BMC Medical Research Methodology14(1), 25.

Ioannidis, J. P., Hozo, I., & Djulbegovic, B. (2013). Optimal type I and type II error pairs when the available sample size is fixed. Journal of Clinical Epidemiology66(8), 903-910.

Ioannidis, J. P., & Trikalinos, T. A. (2007). An exploratory test for an excess of significant findings. Clinical Trials4(3), 245-253.

Ioannidis, J. P. (2013). Clarifications on the application and interpretation of the test for excess significance and its extensions. Journal of Mathematical Psychology57(5), 184-187.

Ioannidis, J. P. (2005). Contradicted and initially stronger effects in highly cited clinical research. JAMA294(2), 218-228.

Ioannidis, J. P. (2008). Effectiveness of antidepressants: an evidence myth constructed from a thousand randomized trials?. Philosophy, Ethics, and Humanities in Medicine3(1), 14.

Ioannidis, J. P. (2008). Effect of formal statistical significance on the credibility of observational associations. American Journal of Epidemiology168(4), 374-383.

Ioannidis, J. P. (2008, July). Interpretation of research results: an indispensable mission impossible?. Seminars in Hematology (Vol. 45, No. 3, pp. 133-134). WB Saunders.

Ioannidis, J. P. (2013). Meta-analyses of hydroxyethyl starch for volume resuscitation. JAMA309(21), 2209-2209.

Ioannidis, J. P. (2014). Research accomplishments that are too good to be true. Intensive Care Medicine40(1), 99-101.

Ioannidis, J. P. (2012). Scientific communication is down at the moment, please check again later. Psychological Inquiry23(3), 267-270.

Ioannidis, J. P. (2008). Why most discovered true associations are inflated. Epidemiology19(5), 640-648.

Jager, L. R., & Leek, J. T. (2014). An estimate of the science-wise false discovery rate and application to the top medical literature. Biostatistics15(1), 1-12.

Judd, C.M., Westfall, J., & Kenny, D.A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103, 54-69.

Kavvoura, F. K., McQueen, M. B., Khoury, M. J., Tanzi, R. E., Bertram, L., & Ioannidis, J. P. (2008). Evaluation of the potential excess of statistically significant findings in published genetic association studies: application to Alzheimer’s disease. American Journal of Epidemiology168(8), 855-865.

Kimball, A.W. (1957). Errors of the third kind in statistical consulting. Journal of the American Statistical Association, 52, 133-142.

Kline, R.B. (2013). Beyond significance testing: Statistical reform in the behavioral sciences (2nd ed.). American Psychological Association, Washington DC.

Kyzas, P. A., Denaxa-Kyza, D., & Ioannidis, J. P. (2007). Almost all articles on cancer prognostic markers report statistically significant results. European Journal of Cancer43(17), 2559-2579.

LeBlond, D., PhD. (2009). Understanding hypothesis testing using probability distributions. Journal of Validation Technology, 15(1), 45-61.

Leggett, N.C., Thomas, N.A., Loetscher, T., & Nicholls M.E.R. (2013). The life of p: “Just significant” results are on the rise. The Quarterly Journal of Experimental Psychology, 66, 2303-2309.

Lenard, C., McCarthy, S., & Mills, T. (2014). Ethics in statistics. Australian Senior Mathematics Journal, 28(1), 38-42.

Levin, J. R., & Marascuilo, L. A. (1972). Type IV errors and interactions. Psychological Bulletin, Vol 78(5), Nov 1972, 368-374.

Lenzer, J., Hoffman, J. R., Furberg, C. D., & Ioannidis, J. P. (2013). Ensuring the integrity of clinical practice guidelines: a tool for protecting patients. BMJ347.

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P., … & Moher, D. (2009). The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. Annals of Internal Medicine151(4), W-65.

Lu, T. H. (2001). International comparisons: they do help and are essential for avoiding type III error. Injury Prevention7(4), 270-271.

Luce, B. R., Kramer, J. M., Goodman, S. N., Connor, J. T., Tunis, S., Whicher, D., & Schwartz, J. S. (2009). Rethinking randomized clinical trials for comparative effectiveness research: the need for transformational change. Annals of Internal Medicine151(3), 206-209.

Lyons, R. (2011). The spread of evidence-poor medicine via flawed social-network analysis. Statistics, Politics, and Policy2(1).

Macdonald, P. (1999). Power, Type I, and Type III error rates of parametric and nonparametric statistical tests. The Journal of Experimental Education67(4), 367-379.

Macleod, M. R., Michie, S., Roberts, I., Dirnagl, U., Chalmers, I., Ioannidis, J. P., … & Glasziou, P. (2014). Biomedical research: increasing value, reducing waste. The Lancet383(9912), 101-104.

Marascuilo, L. A., & Levin, J. R. (1970). Appropriate post hoc comparisons for interaction and nested hypotheses in analysis of variance designs: The elimination of type IV errors. American Educational Research Journal, 397-421.

Masicampo, E.J. & Lalande, D.R. (2012). A peculiar prevalence of p values just below .05. The Quarterly Journal of Experimental Psychology, 65, 2271-2279.

McCullough, B. D., & McWilliams, T. P. (2010). Baseball players with the initial “K” do not strike out more often. Journal of Applied Statistics37(6), 881-891.

Meyer, D.I. (1991). Misinterpretation of interaction effects: A reply to Rosnow and Rosenthal. Psychological Bulletin, 110(3), 571-573.

Moonesinghe, R., Khoury, M. J., & Janssens, A. C. J. (2007). Most published research findings are false—but a little replication goes a long way. PLoS Medicine4(2), e28.

Motulsky, H. J. (in press). Common misconceptions about data analysis and statistics. British Journal of Pharmacology.

Nickerson, R. S. (2000). Null hypothesis significance testing: a review of an old and continuing controversy. Psychological Methods5(2), 241.

Nuzzo, R. (2014). Statistical errors. Nature506(13), 150-152.

Osborne, J. W. (2013). Is data cleaning and the testing of assumptions relevant in the 21st century? Frontiers in Psychology, 4, 3.

Patsopoulos, N. A., Analatos, A. A., & Ioannidis, J. P. (2005). Relative citation impact of various study designs in the health sciences. The Journal of the American Medical Association293(19), 2362-2366.

Rabbitt, P. M. (1966). Errors and error correction in choice-response tasks. Journal of Experimental Psychology71(2), 264.

Raha, S. (2011). A critique of statistical hypothesis testing in clinical research. Journal of Ayurveda and Integrative Medicine, 2(3), 105-114.

Rezmovic, E. L. (1982). Program implementation and evaluation results: A reexamination of type III error in a field experiment. Evaluation and Program Planning5(2), 111-118.

Rekdal, O. B. (2014). Academic urban legends. Social Studies of Science44(4), 638-654.

Robert, W. E. (2015). Convenience sampling, random sampling, and snowball sampling: How does sampling affect the validity of research? Journal of Visual Impairment & Blindness (Online), 109(2), 164.

Rodriguez, A., Reise, S. P., & Haviland, M. G. (2016). Evaluating bifactor models: Calculating and interpreting statistical indices. Psychological Methods, 21(2), 137-150.

Rozeboom, W.W. (1960). The fallacy of null hypothesis testing. Psychological Bulletin, 57, 416-428.

Salanti, G., Higgins, J. P., Ades, A. E., & Ioannidis, J. P. (2008). Evaluation of networks of randomized trials. Statistical Methods in Medical Research17(3), 279-301.

Schmidt, F. L. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers. Psychological Methods1(2), 115.

Schneider, D., Tahk A., & Krosnick J. (2007). Reconsidering the impact of behavior prediction questions on illegal drug use: The importance of using proper analytic methods. Social Influence2(3), 178-196.

Shaffer, J. P. (2002). Multiplicity, directional (type III) errors, and the null hypothesis. Psychological Methods7(3), 356.

Simonsohn, U., Nelson, L.D., & Simmons, J.P. (2014). P-curve: A key to the file drawer. Journal of Experimental Psychology: General, 143, 534-547.

Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication decisions revisited: The effect of the outcome of statistical tests on the decision to publish and vice versa. The American Statistician49(1), 108-112.

Sterne, J. A., & Smith, G. D. (2001). Sifting the evidence—what’s wrong with significance tests?. Physical Therapy81(8), 1464-1469.

Stewart, D. W. (2000). Testing statistical significance testing: Some observations of an agnostic. Educational and Psychological Measurement, 60(5), 685-690.

Stroebe, W., Postmes, T., & Spears, R. (2012). Scientific misconduct and the myth of self-correction in science. Perspectives on Psychological Science7(6), 670-688.

Trikalinos, N. A., Evangelou, E., & Ioannidis, J. P. (2008). Falsified papers in high-impact journals were slow to retract and indistinguishable from nonfraudulent papers. Journal of Clinical Epidemiology61(5), 464-470.

Tsilidis, K. K., Papatheodorou, S. I., Evangelou, E., & Ioannidis, J. P. (2012). Evaluation of excess statistical significance in meta-analyses of 98 biomarker associations with cancer risk. Journal of the National Cancer Institute, djs437

Tsilidis, K. K., Panagiotou, O. A., Sena, E. S., Aretouli, E., Evangelou, E., Howells, D. W., … & Ioannidis, J. P. (2013). Evaluation of excess significance bias in animal studies of neurological diseases. PLoS Biology11(7), e1001609.

Umesh, U.N., Peterson, R.A., McCann-Nelson, M., & Vaidyanathan, R. (1996). Type IV errors in marketing research: The investigation of ANOVA interactions. Journal of the Academy of Marketing Science, 24, 17-26

Vardeman, S. B., & Morris, M. D. (2003). Statistics and ethics: Some advice for young statisticians. The American Statistician, 57(1), 21-26.

Vasilopoulos, A. (2012). Hypothesis testing: A statistical procedure for testing the validity of claims. Review of Business, 32(1), 89-110.

Vul, E., Harris, C., Winkielman, P., & Pashler, H. (2009). Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspectives on Psychological Science, 4, 274-290.

Wacholder, S., Chanock, S., Garcia-Closas, M., & Rothman, N. (2004). Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. Journal of the National Cancer Institute96(6), 434-442.

Wade, D. T. (2001). Research into the black box of rehabilitation: the risks of a Type III error. Clinical Rehabilitation15(1), 1-4.

Walfish, S. (2008). The power of hypothesis. Biopharm International, 21(6), 32-32,34,36.

Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PloS One6(11), e26828.

Wolverton, Marvin L,PhD., M.A.I. (2009). Research design, hypothesis testing, and sampling. The Appraisal Journal, 77(4), 370-382.

Yadav, S. B., & Korukonda, A. (1985). Management of type III error in problem identification. Interfaces15(4), 55-61.

Yarkoni, T. (2009). Big correlations in little studies: Inflated fMRI correlations reflect low statistical power—Commentary on Vul et al.(2009). Perspectives on Psychological Science4(3), 294-298.

Yekutieli, D. (2008). Hierarchical false discovery rate–controlling methodology.Journal of the American Statistical Association103(481), 309-316.

Zyphur, M. J., & Pierides, D. C. (2019). Statistics and probability have always been value-laden: An historical ontology of quantitative research methods: JBE JBE. Journal of Business Ethics, , 1-18.