Statistical Forensics

Testing for Linguistic Obfuscation

General Information

One way to test for bad scientific practices is to check the linguistic obfuscation. Most noted by Professor Jeffrey T. Hancock at Stanford University, one can check the jargon, emotion, abstract language, and causal terms in a large set of papers to test which papers may contain fraudulent information. For example, retracted papers for scientific misconduct use 1.5 times more jargon than unretracted papers (Carey 2015). “For instance, a fraudulent author may use fewer positive emotion terms to curb praise for the data, for fear of triggering inquiry” (Carey 2015). Researchers who plan on committing scientific fraud will try to muddle their wordings, results, and data by using these linguistic techniques. 

Researchers may do this due to the ‘publish or perish’ idea in which researchers may commit fraud/misconduct to stay relevant, keep their careers, etc. One could use computer software to test these linguistic cues and flag papers for further review. However, a computerized based system may not be the best option for testing linguistic obfuscation due to the rates of ‘false positives.’ However, this is one technique to test for scientific misconduct. If you are interested in linguistic obfuscation and want to learn more about it, check out the sources below.


Here are some resources on the technique:

Carey, Bjorn (2015). Stanford researchers uncover patterns in how scientists lie about their data. Stanford News.

Dalal, F. (2015). Statistical spin: Linguistic obfuscation—The art of overselling the CBT evidence base. The Journal of Psychological Therapies in Primary Care, 4(1), 1-25.

Little, L. (1998). Hiding with Words: Obfuscation, Avoidance, and Federal Jurisdiction Opinions. UCLA Law Review, 46(1), 75-160.

Mairs, M.A., Linguistic Obfuscation Techniques. (Masters Thesis, The University of Liverpool).

Markowitz, D.M. &amp Hancock, J.T. (2015)Linguistic Obfuscation in Fraudulent Science. Journal of Language and Social Psychology, 35(4), 435-445.

Markowitz, D. M., & Hancock, J. T. (2014). Linguistic traces of a scientific fraud: The case of Diederik Stapel. PloS one, 9(8), e105937.

Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and social psychology bulletin, 29(5), 665-675.

Simmons, J. P., & Simonsohn, U. (2017). Power posing: P-curving the evidence. Psychological Science.

Simonsohn, U., Simmons, J. P., & Nelson, L. D. (2015). Better P-curves: Making P-curve analysis more robust to errors, fraud, and ambitious P-hacking, a Reply to Ulrich and Miller (2015).

Tanner, S. (2015). Evidence of false positives in research clearinghouses and influential journals: An application of P-curve to policy research. Observational Studies, 1, 18-29.

Toma, C. L., & Hancock, J. T. (2012). What lies beneath: The linguistic traces of deception in online dating profiles. Journal of Communication, 62(1), 78-97.