Statistical Forensics

Detecting Survey Data Fabrication

General information

Fraudulent survey data consists of an intentional deviation from the stated guidelines, instructions or sampling procedures by any member of the survey project, including interviewers, supervisors, data entry personnel, the project leaders or the principal investigator, that results in a contamination of the data (Robbins, 2018). Fraudulent or fabricated survey data could include some of the following: selecting the wrong respondent, misreading the question, duplicating survey responses, misrecording a response, and creating data. 

It is important to note that fabricated/fraudulent survey data is different from survey error. For example, intentionally selecting a house that was not originally in the survey sampling plan just because people are home is intentional, and therefore, fraudulent survey data collection. However, accidentally selecting a house that was not originally in the survey sampling plan by miscounting the skip pattern is unintentional, and therefore, this would be considered a survey error. 

There are many motivations for why someone would fabricate survey data or collect the wrong data on purpose. One might do this to save time and money, to cover up a mistake, lack of incentive to improve methodology, the questions are too sensitive to ask, etc. To detect fraudulent survey data, one could record portions of interviews, use GPS trackers to make sure data collectors are going to the correct locations (also known as CAPI or computer-assisted personal interviewing), use PercentMatch to prevent duplicates, have supervisors attend interviews, etc. In addition, providing survey collection training and providing a financial incentive for doing great work could also limit fraudulent or fabricated data. To learn more about fabricated/fraudulent survey data please see the sources below.

Here are some resources on the techniques: 

Biemer, P. P. & Stokes, S. L. (1989). The Optimal Design of Quality Control Samples to Detect Interviewer Cheating. Journal of Official Statistics, 5 (1), 23-39. 

Birnbaum, B., Borriello, G., Flaxman, A. D., DeRenzi, B. & Karlin, A. R. (2013). Using behavioral data to identify interviewer fabrication in surveys. Human Factors in Computing Systems, 2911–2920.

Blasius, J. & Thiessen, V. (2018). Perceived Corruption, Trust, and Interviewer Behavior in 26 European Countries. Sociological Methods & Research, 1-38.

Bredl, S., Winker, P., & Kötschau, K. (2008). A statistical approach to detect cheating interviewers (No. 39). Discussion Paper.

Bredl, S., Storfinger, N., & Menold, N. (2011). A literature review of methods to detect fabricated survey data (No. 56). Discussion Paper.

Bredl, S., Winker, P., & Kötschau, K. (2012). A statistical approach to detect interviewer falsification of survey data. Survey Methodology, 38 (1), 1-10.

Bushery, J., J. Reichert, K. Albright, and J. Rossiter (1999). Using date and time stamps to detect interviewer falsification. In Proceedings of the American Statistical Association (Survey Research Methods Section), pp. 316–320.

De Haas, S. & Winker, P. (2014). Identification of partial falsifications in survey data. Statistical Journal of the IAOS, 30, 271–281.

Interviewer Falsification in Survey Research: Current Best Methods for Prevention, Detection and Repair of Its Effects. (2003). American Association for Public Opinion Research

Kemper, C. J., & Menold, N. (2014). Nuisance or remedy? The utility of stylistic responding as an indicator of data fabrication in surveys. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 10(3), 92–99.

Menold, K. & Kemper, C.J. (2014). How Do Real and Falsified Data Differ? Psychology of Survey Response as a Source of Falsification Indicators in Face-to-Face Surveys. International Journal of Public Opinion Research, 26 (1), 41-65.

Menold, N. Identification of Falsifications in Surveys – a Link to the Cross- Cultural Context [Slides]. GESIS Leibniz Institute for the Social Sciences.

Murphy, J., Biemer, P., Stringer, C., Thissen, R., Day, O., & Hsieh, Y. P. (n.d.). Interviewer falsification: Current and best practices for prevention, detection, and mitigation. Statistical Journal of the IAOS, 32(3), 313–326.

Murphy, J., R. Baxter, J. Eyerman, D. Cunningham, and J. Kennet (2004). A system for detecting interviewer falsification. Paper Presented at the American Association for Public Opinion Research 59th Annual Conference.

Porras, J. and N. English (2004). Data-driven approaches to identifying interviewer data falsification: The case of health surveys. In Proceedings of the American Statistical Association (Survey Research Methods Section), pp. 4223–4228.

Robbins, M. (2019). New frontiers in detecting data fabrication. In T. P. Johnson, B.-E. Pennell, I. A. L. Stoop, & B. Dorer (Eds.), Wiley series in survey methodology. Advances in comparative survey methods: Multinational, multiregional, and multicultural contexts (3MC) (p. 771–805). John Wiley & Sons, Inc.

Schräpler, J. and G. Wagner (2003). Identification, characteristics and impact of faked interviews in surveys – an analysis by means of genuine fakes in the raw data of SOEP. IZA Discussion Paper Series, 969.

Schräpler, J.-P. (2011). Benford’s Law as an Instrument for Fraud Detection in Surveys Using the Data of the Socio-economic Panel (SOEP). Jahrbucher Fur Nationalokonomie Und Statistik, 231(5–6), 685–718.

Smith, P. B., MacQuarrie, C. R., Herbert, R. J., Cairns, D. L., & Begley, L. H. (2004). Preventing data fabrication in telephone survey research. Journal of Research Administration, 35(2), 13.

Stoop, I., Briceño-Rosas, R., Koch, A., & Vandenplas, C. (2018). Data falsification in the European Social Survey? European Social Survey.

Storfinger, Nina and Winker, Peter: Robustness of Clustering Methods for Identification of Potential Falsifications in Survey Data, ZEU Discussion Paper 57.

Swanson, D., M. Cho, and J. Eltinge (2003). Detecting possibly fraudulent data or error-prone survey data using Benford’s law. In Proceedings of the American Statistical Association (Survey Research Methods Section), pp. 4172–4177.

Thissen, M. R., & Myers, S. K. (2016). Systems and Processes for Detecting Interviewer Falsification and Assuring Data Collection Quality. Statistical Journal of the IAOS, 32(3), 339–347.

Turner, C.F., J.N. Gribble, A.A. Al-Tayyib, J.R. Chromy (2002) Falsification in epidemiologic surveys: detection and remediation [Prepublication Draft]. Technical Papers on Health and Behavior Measurement, No. 53. Washington DC: Research Triangle Institute.

Weinauer, M. (n.d.). Be a detective for a day: How to detect falsified interviews with statistics. Statistical Journal of the IAOS, 35(4), 569–575.

Yamamoto, K. (2017, May 11). Understanding and Detecting Data Fabrication in Large- Scale Assessments. Educational Testing Service, Organisation for Economic Co-operation and Development, Paris, France. 

Yamamoto, K., & Lennon, M. L. (2018). Understanding and Detecting Data Fabrication in Large-Scale Assessments. Quality Assurance in Education: An International Perspective, 26(2), 196–212.