Last week Jeffrey Henning gave a great #NewMR lecture on how to improve the representativeness of online surveys (click here to access the slides and recordings). During the lecture he touched lightly on the topic of calculating sampling error from non-probability samples, pointing out that it did not really do what it was supposed to. In this blog I want to highlight why I recommend using this statistic as a measure of reliability, but not validity.

If we calculate the sampling error for a non-probability sample, for example from an online access panel, we are not representing the wider population. The population for this calculation is just those people who might have taken the survey. For example, just those members of the online access panel who met the screening criteria and who were willing (during the survey period) to take the study. The sampling error tells us how good our estimates of this population are (i.e. those members of the panel who met the criteria and who were willing to take a survey at that particular time).

If we take a sample of 1000 people from an online access panel and we calculate that the confidence interval is +/-3% at the 95% level, what we are saying is that if we had done another test, on the same day, with the same panel, with a different group of people, we are 95% sure that the answer we would have got would have been within 3% of the first test. That is a measure of reliability. But we are not saying that if we had measured the wider population the answer would have been within 3%, or 10% or any other number we could quote.

The sampling error statistic from a panel is not about validity, since we can’t estimate how representative the panel is of the wider population. But, it does give us a statistical measure of how likely we are to get the same answer again if we repeat the study on the same panel, with the same sample specification, during the same period of time – which is a pretty good statement of reliability.

Note, to researchers reliability is about whether something measures the same way each time. Validity relates to whether what is measured is correct. A metal metre ruler that is 10cm short is reliable, it is always 10 cm short, but it is not as valid as we would like.

My recommendation is to calculate the sampling error and use it to indicate which values from the non-probability sample are at least big enough to be reliable. But let’s not claim it represents the sampling error of the wider population, nor that it directly links to validity.

I would recommend adding text something like: “The sampling reliability of this estimate at the 95% level is +/- X%, which means that if we used the same sampling source 20 times, with the same specification, we would expect the answers to be within X% 19 times.”

**Total Survey Error**

Another reason to be careful with sampling error is that it is only one source of error in a survey. Asking leading questions, asking questions that people can’t answer (for example because we are poor witnesses to our own plans and motivations), or asking questions that people don’t want to answer (for example because of social desirability bias), can all result in much bigger problems than sampling error.

Researchers can sometimes be too worried about sampling error, leading them to ignore much bigger sources of error in their work.