Statistical Forensics

Detecting Fraudulent Survey Responses

General Information

Fraudulent survey data consists of an intentional deviation from the stated guidelines, instructions or sampling procedures by any member of the survey project, including interviewers, supervisors, data entry personnel, the project leaders or the principal investigator, that results in a contamination of the data (Robbins, 2018). Fraudulent or fabricated survey data could include some of the following: selecting the wrong respondent, misreading the question, duplicating survey responses, misrecording a response, and creating data. 

It is important to note that fabricated/fraudulent survey data is different from survey error. For example, intentionally selecting a house that was not originally in the survey sampling plan just because people are home is intentional, and therefore, fraudulent survey data collection. However, accidentally selecting a house that was not originally in the survey sampling plan by miscounting the skip pattern is unintentional, and therefore, this would be considered a survey error. 

There are many motivations for why someone would fabricate survey data or collect the wrong data on purpose. One might do this to save time and money, to cover up a mistake, lack of incentive to improve methodology, the questions are too sensitive to ask, etc. To detect fraudulent survey data, one could record portions of interviews, use GPS trackers to make sure data collectors are going to the correct locations (also known as CAPI or computer-assisted personal interviewing), use PercentMatch to prevent duplicates, have supervisors attend interviews, etc. In addition, providing survey collection training and providing a financial incentive for doing great work could also limit fraudulent or fabricated data. To learn more about fabricated/fraudulent survey data please see the sources below.

