Survey Sampling

Contemporary survey research requires a sophisticated approach to sampling.

The basic intuition behind sampling for survey research has long rested on the idea of simple random sampling (SRS). The process behind SRS is straightforward: the researcher has a complete listing of the population, and then using some type of randomization device selects a subset of observations from that population list for the survey sample. If the list is accurate and complete, and the researcher has has sufficient resources to contact and interview every observation on that list, then this methodology will typically produce reliable survey results.

But in today’s world, many complications arise that render SRS impossible or overly costly to implement. For example, many of the customer lists that businesses and organizations maintain are incomplete: a list might only have telephone numbers or email addresses for a fraction of those customers on the list because many may not be willing to provide their contact information. If the customers who do not give their phone numbers or email addresses are indeed different from those who do (they are older, or wealthier, for example) that means that the list with contact information is biased.

One way to resolve the biased list problem is to send that list to a firm that specializes in the augmentation or appending of contact information; while that may produce a complete list, such an approach can be costly and may not be as accurate as needed for the subsequent survey problem.

Our team has developed sampling approaches that can be used for situations like these. We also have expertise dealing with problems that can arise during sampling or survey implementation, like significant compliance or non-response problems. The Pivotal Targeting team has substantial experience working with complex survey designs, especially those that include experimental treatments embedded within the survey.

Geographic Targeting

Here’s a challenge – Develop walk lists targeting people who aren’t on your address list. How do you contact
people who are not registered to vote and who don’t show up in consumer data files? For
example, if you are registering voters, the voter file alone can’t tell you where to go, and who
to talk to. Read more about how Geographic Targeting can add value to your campaign.

AAPOR

Microtargeting and support scoring has already become standard practice for political campaigns, but new developments in modeling persuasion are set to transform campaigns in 2014, 2016, and beyond. Rather than guessing who will be moved by an ad or contact, we run experiments to test who actually moves–who the contact persuades– and we and tell campaigns where to target their ads to get the most movement possible. At AAPOR, Peter Foley discussed why persuasion matters, how to measure persuasion in experiments, and how to target persuadable voters. Check out the AAPOR presentation here.

Overfitting

A common problem in predictive modeling is overfitting. It occurs when models start to represent the random noise in a data set rather than the true underlying structure. In the wild, analyses often add and remove variables based on statistical significance or build up models based on how well they fit the training data. For example, they might add in a large set of variables, and their interactions, to simply maximize some goodness-of-fit statistic. These sorts of approaches greatly increase the risk of overfitting, and produce models that perform badly on new data.

In this post, we will walk through an example using some plots of simulated data. The x axis has a predictor, and the y axis is the variable we want to predict. The first plot shows the true relationship between the predictor and outcome – a curving line that mostly trends upwards.

Read More »