How To Handle Missing Data In Surveys & Questionnaires

The main purpose of distributing questionnaires for people to answer is to collect data for the study. But more often than not, data goes missing during surveys.

Missing data poses a potential difficulty in drawing conclusions and making recommendations from the study. Data collection through surveys is also not cheap; it takes time and resources.

So, going back to conducting the survey may not be the best solution because you’d have to spend twice as much time and money.

However, with the right methods, you can handle missing data to make a fair and accurate analysis. You could also avoid it entirely by taking some precautionary measures.

In this article, we’ll discuss how to avoid losing data and what to do if data goes missing in your survey.

What is “Missing Data” in a Questionnaire

The most common type of missing data in a questionnaire occurs when respondents skip one or more questions.

Missing data could also indicate that respondents dropped out of a survey. This means they began answering a questionnaire and then stopped for whatever reason.

Survey fatigue is the most common reason respondents stop answering questions. When participants are tired of answering survey questions they skip questions to get to finish the survey.

However, it could be due to other factors, such as skepticism over their data protection. For example, the survey is collecting highly sensitive information about participants, and there is no indication of data security.

People also drop out of surveys if they believe the incentive isn’t worth the stress of completing the survey. This is why most researchers are against offering your respondents incentives for participating in surveys; volunteers are more likely to genuinely answer questions.

Types of Missing Data

Not all missing data is detrimental to your research; some type of missing data is ignorable while some aren’t. Missing data doesn’t become a major problem until it renders the study invalid.

For example, if an important element of the survey is missing, the remaining responses may become invalid. These types of data must be closely monitored in a survey.

Aside from the type of data that goes missing, the size also matters. If a significant portion of data goes missing, you have to be concerned because the chances of you drawing valid conclusions from a small pool of data are very slim.

The following are types of missing data in surveys and questionnaires:

Missing Completely at Random (MCAR)

When data is missing completely at random, it means that the missing value has nothing to do with any other information in the survey. This is a stand-alone question in a survey that has nothing to do with the other data collected during the survey.

It also has no pattern which it’s missing. For example, no set of people is deliberately skipping the question or dropping out of the study.

Since it has no impact on other data collected in the research, you can discard it and go ahead with your data analysis.

For example, you’re surveying how people in their late twenties save and invest in comparison to people in their late thirties.

The most important information collected in this research would be their income, savings frequency, age, occupation, and expenses. So, if a few participants skip the question about their region, it is unlikely to jeopardize the research.

Missing at Random (MAR)

In this case, the data isn’t missing at random; it’s missing because of an underlying reason. Although there is a pattern to the missing data and a reason for it, it’s not a big deal.

When this type of data goes missing, researchers don’t panic because it has no direct impact on the study’s validity.

For example, you’re running a fitness poll and asking participants to rate their fitness. However, the survey data shows that most people in their teens and early twenties skip this question.

The participants skipping the question are at the point in their lives when they feel the healthiest; answering the question seems unnecessary. So, despite the missing data, your study is still valid because there’s a justifiable bias.

Missing Not at Random (MNAR)

When missing data falls into this category, it’s time to sound the alarm. This is because you’re missing a significant portion of what makes your survey results valid and conclusive.

The main issue when data goes ‘missing not at random’ is that it directly impacts the validity of your results. There is missing data throughout the survey in a consistent pattern, but there is little you can do about it because you can’t determine the bias.

Most of the time, this is due to the survey question itself being problematic. The questionnaire may be too intrusive, or incomprehensible to participants.

It could also be due to a technical error, such as a white screen, connection error, etc.

Causes of Missing Data

There are several reasons why data in surveys goes missing; here are the most common:

Respondents Skipping Questions

This happens when participants do not answer questions intentionally or due to a technical glitch. It’s the most prevalent cause of missing data in studies.

Insufficient Sample Size

This happens when the sample is difficult to obtain, such as a survey on the symptoms of a rare medical condition.

Participants Dropping Out of Surveys

People abandon surveys for a variety of reasons, including the survey not fitting into their schedule, losing interest in the survey, survey fatigue, and, in some cases, death.

Difficult to Measure Data

Missing data also occurs in surveys and questionnaires when the data is difficult to quantify, e.g emotions. Measuring qualitative data can be difficult for both respondents and data analysts.

For example, if people are asked in their own words how they feel about a product, measuring this data may be difficult because there is no exact value to go by.

Wrong Data Entry

More often than not, participants unintentionally select options they do not want. In questionnaires where participants cannot go back to change or review their answers before submitting, the data goes missing.

Also, it could happen to how the data is handled. Data could go missing because it wasn’t entered correctly by the person collating it.

This occurs far more frequently in physical data collection than in online surveys. When manually collating the results, the person entering data may end up mixing up data, resulting in missing or incorrect data entry.

The best way to prevent this is to automate the survey.

Effects and Implications of Missing Data

Reduces Result Validity and Reliability

When important parts of a questionnaire’s data go missing, it becomes difficult to replicate the results.

When results are inconsistent when the same survey is conducted with the same sample, it indicates an error in the survey. This means you can’t use the results to draw a meaningful conclusion or use it to recommend solutions for future studies.

Establishes Survey Bias

Missing data can sometimes be an indication that the survey is being carried out with bias, which means the results are unlikely to be valid.

For example, you’re surveying teens getting braces in Ohio, but your sample size is limited to students from a single high school in Ohio. This survey’s data excludes a significant portion of the sample size, so drawing conclusions based on data from this high school is biased.

Reduce Elements of Data

When a large amount of data is missing from a survey on a specific subject, you may have to exclude it from the data results. This reduces elements of the data you gathered, and you’d have to reduce the factors you’re using as a basis for your conclusion.

How to Spot Survey Conclusions Where Data Was Missing

Small Sample Size

When a large number of survey participants drop out of a survey, you’d have to exclude their entries from the data resulting in a smaller sample size.

Small sample sizes can also be problematic for most surveys, as they can cause your data to become unreliable when scaled up to a larger sample size.

Limited Survey Elements

Surveys are meant to investigate as many factors as possible to obtain accurate results. But when there is missing data, researchers will most likely reduce the factors used to make conclusions from the survey.

For example, a survey was conducted to determine the most preferred deodorant flavor, and the flavors tested were lemongrass, coconut, cinnamon, and cucumber.

However, because a significant number of participants have not used lemongrass and cucumber deodorant, these flavors were removed to obtain a more accurate result.

How to Handle Missing Data

Irrespective of how careful you are when handling data from a survey or a questionnaire, missing data is almost unavoidable.

Here are some proven methods for handling missing data without compromising the validity of your research:

Listwise Deletion

This is also referred to as case deletion. It occurs when you omit cases where data is missing and proceed with data remnant.

This method of missing data handling is best suited when the missing data is Missing Completely At Random. Using this method in other scenarios may result in bias and incorrect conclusions.

Also, this method is only applicable if your sample size is still sufficient after removing the missing data.

Recover the Values

In cases where you have contact with the participants, you can contact them and ask them to retake the parts of the survey they missed.

For physical surveys, you would have a quality assurance personnel scan through the questionnaire to ensure that no questions were omitted during the survey. So when participants skip questions, the quality assurance personnel will instruct them to complete their responses.

Educated Guessing

This is accomplished by replacing the missing data with a value that the respondent is most likely to choose. For example, if a participant selects ‘strongly agree’ to a similar set of questions and skips one, you can make an educated guess that the respondent’s answer will be strongly ‘agree’.

This is only applicable in surveys where the respondents have established a pattern in their answers and for quantifiable responses.

Average Imputation

This is also called mean substitution. It involves calculating the average response of respondents to a specific question and using it to fill in the missing value in the survey.

This method isn’t the best way to eliminate missing data in your survey, especially if you’re trying to establish deviation.

Common-Point Imputation

This method is a subtle combination of average imputation and educated guessing. It substitutes missing data for the survey’s median value or uses the most common answer.

For example, if the most common answer in a survey is 4 for a specific question, you’d replace the missing data with 4 using this method.

Also, if you’re surveying to see how people rate a product on a scale of 1 to 5. However, if the responses are so dispersed that there is no common point (popular answer), the missing response would be replaced with 3, which is the median value.

Although this method is more reliable than guessing what the participants would choose, it’s not the best solution to missing data because it can introduce bias.

Regression Substitution

This is a more structured approach to calculating missing data from available data. It involves substituting missing data with estimated values.

Rather than simply removing missing data and reducing your sample size, you would calculate the closest answer using other survey responses. So, you create a regression graph and look for values that fit along the line to replace missing data.

The available data and conditions are used to predict with the regression imputation method. After that, the missing value is substituted with the predicted value.

The main upside to using this method is that it allows you to keep your sample size and data elements. Also, unlike the common-point computation, it allows for deviation in your results.

Multiple Imputation

This is the most commonly used method for resolving missing data. Instead of just using regression imputation, you’d combine it with a correlation between responses to get values for the missing data.

With this method, you are not replacing missing data with single values, but with a realistic data set. This data set allows for variance and error, bringing you closer to the true value for the missing data.

The first step in this method is to predict missing data using existing data from other variables. Next, replace the missing values with the predicted values to yield a complete data set known as the imputed data set.

The process is then repeated to generate multiple data sets. When you’ve concluded the data set generation, use a statistical analysis method to generate multiple analysis results.

Finally, create a single overall analysis result by combining these analysis results.

Conclusion

Missing data could comprise the validity of your data, making it difficult to reproduce. However, not all types of missing data are detrimental to the survey.

In cases where you can’t ignore the missing data, you can simply remove it from the result. You could also use any of the other methods that allow you to replace missing data with calculated or estimated data.

Moradeke Owa

Aug 30

10 min read