Statistical Forensics | Best Practices in Science

Simpson’s Paradox

General Information

Simpson’s paradox, in statistics, was once called the ‘Yule-Simpson effect.’ This paradox is an effect that “when the marginal association between two categorical variables is qualitatively different from the partial association between the same two variables after controlling for one or more other variables” (Carlson, 2016). This paradox reminds researchers that statistical relationships are not always immutable, that this paradox is not just a phenomenon that occurs for only a small group of people, and that causal inferences, within non-experimental studies especially, can be dangerous.

A classic example to understanding this paradox is through the patient and hospital example. Here, we can say that for less severe patients, the success rate in the better treatment hospital is much higher than the normal hospital. Similar results hold true for more severe patients. This seems true, those who go to a better hospital have higher success rates. However, if we did the reversal, it would show that the normal hospital showed better success rates, thus creating a simpson’s paradox.

Simpson’s paradox could be avoided through many ways. For example, a combination of critical thinking and improved analytics, including reviews of frequency tables and correlations could remove this paradox (Berman, et al., 2012). In addition, “awareness of company, market, consumer and other general trends can enable the analyst to quickly focus in on and test relevant variables during the data-mining process.”

Here are resources on the phenomenon:

Appleton, D. R., French, J. M., Vanderpump, M. P. J. Ignoring a Covariate: An Example of Simpson’s Paradox. The American Statistician, Vol. 50, 1996

Berman, S., DalleMule, L., Greene, M., & Lucker, J. (2012). Simpson’s Paradox: a cautionary tale in advanced analytics. Significance.

Bickel, P. J., Hammel, E.A., & O’Connell J. W. (1975). Sex Bias in Graduate Admissions: Data From Berkeley. Science. 187 (4175): 398–404.

Blythe, C. R. (1972). On Simpson’s Paradox and the Sure-Thing Principle. Journal of the American Statistical Association, 67 (338): 364–366.

Carlson, B. W. (2016). Simpson’s paradox. Encyclopedia Britannica, inc.

Freitas, A. A. (2020). Investigating the role of Simpson’s paradox in the analysis of top-ranked features in high-dimensional bioinformatics datasets. Briefings in Bioinformatics, 21(2), 421–428.

Gardner, M. (1976). MATHEMATICAL GAMES: On the fabric of inductive logic, and some probability paradoxes. Scientific American. 234 (3): 119.

Liddell, M. (2016). How statistics can be misleading. TEDEd.

Hersbein, B. (2015). When average isn’t good enough: Simpson’s Paradox in education and earnings. Brookings.

Pearl, J. (2011). Simpson’s Paradox: An Anatomy. eScholarship, University of California.

Pearl, J. (2014). Comment: understanding simpson’s paradox. The American Statistician, 68(1), 8-13.

Qian, S. S., Stow, C. A., Nojavan A., F., Stachelek, J., Cha, Y., Alameddine, I., & Soranno, P. (2019). The implications of Simpson’s paradox for cross-scale inference among lakes. Water Research, 163.

Reintjes, R., de Boer, A., van Pelt, W., & Mintjes-de Groot, J. (2000). Simpson’s Paradox: An Example from Hospital Epidemiology. Epidemiology, 11(1), 81.

Rücker, G., & Schumacher, M. (2008). Simpson’s paradox visualized: The example of the Rosiglitazone meta-analysis. BMC Medical Research Methodology, 8, 1–8.

Simpson, E. H. (1951). The Interpretation of Interaction in Contingency Tables. Journal of the Royal Statistical Society, Series B. 13: 238–241

singingbanana (2010). Maths: Simpson’s Paradox. YouTube.

Wang, B., Wu, P., Kwan, B., Tu, X., & Feng, C. (2018). Simpson’s Paradox: Examples. Shanghai Archives of Psychiatry, 2, 139.

What is Simpson’s Paradox?. (2013). Statistics How To.

Woo, E. (2019). Pearls: Simpson’s Paradox-Understanding Numbers That Don’t Seem to Make Sense. Clinical Orthopaedics & Related Research, 477, 2427-2428.