With the increasingly digital lean in marketing, it’s easy to assume that the “old ways” like direct mail are going the way of the dodo. While it might be tempting to agree, given the almost monolithic focus we put on digital channels, you might be surprised to find that direct mail is doing just fine. In fact, direct mail is more than fine — it’s actually a powerful tool to drive new sales and leads from channels you may not expect.
Consider that in 2018, direct mail remained the largest category for US local advertising dollars, with over $38 billion spent. More importantly, however, surveys found that 70% of respondents feel direct mail is a more personalized way to receive marketing messaging than through online interactions. Most importantly, the survey found that 62% of consumers who responded to direct mail in the previous three months ended up making a purchase.
So, it seems the rumors of direct mail’s death have been greatly exaggerated. Indeed, it’s a great tool to have in your kit, but that doesn’t make it easy to use properly. If you do it wrong, you’ll end up with an expensive and low-yielding marketing strategy. Luckily, Explorium can help you make sure that doesn’t happen, and that you build a powerful addition to your marketing campaigns. Let’s see how.
The problem: Targeting the audience most receptive to convert direct mail offerings
One of the biggest challenges is finding those people who are most likely to respond positively and convert into leads or sales. Unlike email, which is incredibly cost-effective in terms of volume, direct mail costs can be quite high per lead. In the example we’ll work through below, we’re going to focus on a healthcare company trying to build better targeting models for their upcoming campaign.
Let’s build a targeting model with Explorium
Explorium can add thousands of relevant data sources and enrichments to your data, but it needs to start somewhere. Before we can actually run an enrichment, we need to take stock of the data we already have. Keep in mind that, while we have some information that’s public and easily accessible, using personally identifiable information is not always permitted by regulations, industries, and governments. Therefore, we have two columns (first_name and last_name), which have been anonymized to ensure that the model doesn’t run afoul of any compliance standards. Now, let’s see what we have:
- Users’ address, which can help us place our targets in specific regions and demographic clusters.
- State, city, and ZIP codes, which help us add geospatial enrichments, socio-economic indicators, and more.
In this case, our Y (the value we want to predict) is whether our users are qualified as high-quality leads. This dataset is fairly standard for a direct mail campaign. However, there are very few ways this data can be used for targeting unless it’s connected to a CRM that can add important information, and in this case, we can assume that’s not what’s happening. Instead, we’ll have to add some information so that we can build better customer profiles. Let’s see what we get once we enrich the dataset.
Our first step is to run the augmented data discovery engine, to see what new data sources we can connect to our core.
Right away, we can notice a few things. First, we ended up with 21 new data sources. Second, these all provide a good uplift to our internal data. Let’s dive in a little further to see what we got:
- Geocoding data that enriches our full address column by turning addresses into coordinates, giving us the ability to use latitude and longitude to build better clusters based on relevant data points.
- Individual insurance information, which gives us data about consumers’ insurance behaviors, propensities, and preferences. More importantly, it gave us 42 potential features with an impressive 92.74% coverage.
- Individual credit and bank card preferences (another name enrichment) that gives you greater visibility into individuals’ habits and activities — especially related to preferences such as card usage volume, number of different cards, type of cards, and frequency of purchases.
- On a broader level, there’s also enrichments such as US population by ZIP code, which can improve targeting by making clearer distinctions in demographic groups and help clarify socioeconomic status by region.
- Additionally, US rental statistics, which admittedly has a relatively low uplift, might still provide some valuable insights surrounding the demographics of a direct marketing campaign.
It’s worth noting that in this example, the majority of external sources that Explorium added are individual — which makes sense, since we’re looking to dive into user preferences to find the best targets for our direct mail campaigns. Now it’s time to take these data enrichments and convert them into a feature set that can boost our predictive capabilities.
Let’s generate and select the best features to make predictions
The great news is that after finding our data sources and enriching the core dataset, Explorium has already generated over 700 features that could give us accurate predictions. Let’s dive a little deeper to find the best ones. Before going further, we should mention that while there are hundreds of possible features to test, not all of them will be as relevant or provide as much uplift as others, so you can always select the best features for your own projects based on your own expertise and domain knowledge.
In this case, however, let’s move forward with the complete feature list. Explorium will create new features out of your core dataset enriched with our external sources and list them all by relevance level. Let’s see what some of the most relevant features include:
- Whether someone invests in stocks and bonds (graded one to six)
- The likelihood someone will have medical insurance for themselves or others in their home
- The likelihood (range of one to six) that someone will have a credit life insurance policy
- How heavily someone uses a credit card
- The number of different credit card types in use
- Whether someone opened an IRA in the previous 12 months
- A consumer’s economic stability score
- The likelihood someone will obtain medical insurance via Medicaid
Again, these are just the most relevant out of the more than 700 features Explorium produced. Down the line, the platform might find some features that, while interesting for curiosity’s sake, might show correlation but not causation.
Training, testing, and deploying our model for predictions
Now that we have an enriched dataset and a fully optimized feature set, it’s time to get to the nitty-gritty of building a model. We’ll start by training our model. When you use Explorium, the platform automatically handles this step for you. Instead of testing a single model at a time, Explorium runs several different models simultaneously, including different iterations of the same model with slightly different hyperparameters to find the right balance of predictive ability and accuracy for your desired predictive question.
The top-performing model was a CatBoost algorithm that uses 55 estimators, giving us an AUC score of 73.58, with 32.21 precision and 71.98% accuracy. After this gradient boosting model, the next most accurate model was a Random forest with 388 estimators, which delivered a slightly lower 72.98 AUC, although it had slightly better accuracy, with 74.50%. The next best model was an XGBoost, though there was some drop-off in both accuracy and AUC.
From here, it’s just a matter of selecting the model you’re most comfortable with, and that gives you the best results. Now, we’re ready to test it, and make sure our model’s ready to start making predictions for us.
To test our top-performing model, we’re going to use Explorium’s Realtime Sandbox to select a single row and see if we can make a prediction. Keep in mind that, while our core dataset might already include whether someone is qualified or not, Explorium splits datasets to avoid any leakage that might impact a model’s accuracy. Therefore, we’re running this test blind.
To run the test, we simply need to press “Predict” and we’ll get an answer (in this case “Yes”), but what’s interesting here is that we can see how each feature played into the model’s decision-making. We can see that the model’s probability of a Yes was much lower than a No (which makes sense, when you consider the low average response rates direct mail campaigns have by default), but let’s take a bit of a dive into our signals and features.
We can see that for several of the ranged indicators (those on a scale of one to six), our test row had several with low scores (we’re counting any score lower than three as a low mark), including whether the consumer had opened an IRA in the previous 12 months, whether they had any type of IRA, whether they invest in stocks and bonds, and whether they had completed a trade in the previous year.
Additionally, the estimated maximum income was relatively low, at $19,999. On the other hand, risk-indicating measures such as whether they exhibit risk-taking behaviors in their investments, whether they’ve acquired insurance through Medicaid (i.e. through government assistance programs and not privately). Moreover, indicators that show a low likelihood of already having coverage or looking for it show us that the target for our mail campaign is likely someone who is in a lower socioeconomic stratum, who doesn’t have insurance, and who exhibits risk-taking behaviors.
Excellent! Our model works; now, all that’s left is to deploy it.
Supercharge your marketing with Explorium
Direct mail campaigns remain a powerful tool for your marketing strategies, but only when you can ensure a certain level of success. Simply mass-mailing coupons and promotions will leave you disappointed and with a significantly lighter budget. Instead, you can use Explorium to quickly build a machine learning model that can help you qualify your leads and ensure that every single envelope or flier you mail out has a high chance of success and converting. Instead of leaving your campaign’s vital parts to chance and intuition, use Explorium’s data science platform to build a smarter strategy in minutes.