Published in Uncategorized

Survey Data Cleansing: Five Steps for Cleaning Up Your Data

Preparing your survey data for analysis can be a messy process, mostly because data typically needs to be cleansed for various reasons. For example, respondents’ answers may not match pre-defined choices or they may answer questions that don’t really apply.

Using an online survey tool can eliminate many of the problems associated with paper surveys by limiting response choice and enabling participants to skip irrelevant questions. But even online survey data may contain records that exclude key variable or include duplicate responses from the same person. And if your survey is large, the task of cleaning up your data can, at first glance, seem a bit overwhelming.

However, it needn’t be. For instance, I recently completed a survey analysis with 35,000 respondents who answered about 75 questions, which resulted in a 2,625,000 cell spread sheet. Fortunately, editing and cleansing the data was fairly simple because I used a tried-and-true, five-step process that included:

Step 1: Make a copy of your data and use that version for data cleansing. This isn’t as much of a step as it is a warning. Even the best laid data cleansing plans sometimes have to be taken back to the drawing board. So, only delete records from a copy of your data and keep your original file on hand in case you need to put something back in.

Step 2: Conduct a few mini data cleansing trial runs. Export smaller subsets of your data to conduct data cleansing trial runs to refine your process. It’s a lot easier to get the process down with a data set of 2,000 than 35,000. Plus, then you’ll know the exact steps that you’ll need to follow when you export all 35,000.

Step 3: Identify “crucial variables” in your survey efforts and define what constitutes “complete”. – In the survey I mentioned above, senior-level executives wanted to identify high-performing managers in geographically defined regions. To the company, these geographical regions were a crucial variable in their survey efforts, as without them, survey responses were useless and had to be deleted. In addition, the scores for each region were based on answers to 7 questions. The company decided that in order for a response to be considered complete, all 7 questions had to be answered.

Step 4: Remove “speeders” and “flat-liners” – Using an internet survey tool, we were able to place a date/time stamp on each response and find out how much time it took each person to complete it. We know from past experience that respondents who complete the survey too quickly (less than 30%-50% of median time) and are likely not reading or answering the questions appropriately. The same is true for flat-liners (i.e. those who mark each answer the same), which are often speeders. They may have read the questions, but they don’t really think about their answers. Therefore, it’s best to remove speeders and flat-liners from your data to eliminate a lot of meaningless data.

Step 5: Eliminate duplicate responses – Usually, it’s hard enough to get people to respond to a survey once. But some people actually care so much that they tell you twice, especially if there are some particularly juicy survey incentives involved (which may tempt them to try to increase their odds of winning) and/or if your managers are informed that their scores are somehow tied in with response rates (which may cause them to flood the system with duplicate favorable responses). Fortunately, all you have to do in those cases is match the entry information with your survey list, and then use the date/time stamp to identify and delete duplicate entries later. I recommend keeping the first response and deleting any subsequent responses.

Once you complete these steps, you’ll not only have a cleaner and more accurate data set, but you’ll also be able to ensure that each person who takes your survey only counts as one response instead of several in the results.

Alan Bainbridge, Best Practices Consulting Specialist, Allegiance