Statistical Forensics


Statistical forensics refer to statistical techniques that can assess the credibility or likely replicability of scientific studies.

Statistical Techniques to Detect Fraud

Checking the Distribution of Rightmost Digits:
When trying to fabricate data the leftmost digit must be manipulated to match the desired level of magnitude. The rightmost digit is usually given little thought. Statistically speaking, rightmost digits are approximately uniformly distributed in many circumstances. However, humans have a very difficult time creating a uniform distribution of digits even if they are trying to do so. This makes testing for a uniform distribution of the rightmost digit a powerful tool for detecting fraud.
Check Rightmost Digits for Uniform Distribution

Testing for Linguistic Obfuscation:
Two Stanford researchers Markowitz and Hancock found that studies that had been retracted for scientific fraud had similar linguistic patterns. Scientists who are committing fraud often use more jargon, avoid positive language and use more negative language, and have generally lower readability. In the future, it is possible that these findings could be used to create a computer program that would search for these patterns in papers before being published. Journals would then be able to investigate flagged studies in more depth to ensure that they were not fraudulent.
Markowitz, D.M. &amp Hancock, J.T. (2015)Linguistic Obfuscation in Fraudulent Science. Journal of Language and Social Psychology, 35(4), 435-445.
Stanford researchers uncover patterns in how scientists lie about their data

This paper proposes using P-Curves as a way to determine whether a set of statistically significant results has evidential value. When a study shows that an effect exists its p-curve will be right skewed. When a researcher p-hacks it is likely that they stop upon reaching statistical significant (p<0.05). This would cause the p-curve of a p-hacked study to be left skewed. Analyzing p-curves can be employed as a way to screen studies. Studies that have left skewed p-curves can be evaluated by journals and other researchers in more detail. Simmons, J.P. , Nelson, L.D., &amp Simonsohn, U. (2014). P-Curve: A Key to the File-Drawer Journal of Experimental Psychology, 143(2), 534-537.

Replicability Index: is a statistical tool that can be used to determine the replicability of studies without having to actually invest the resources necessary to replicate a study.
Replicability Index