How to Find and Onboard External Data Without the Headaches
“External data sources are helping businesses personalize marketing offers, improve HR decisions, gain new revenue streams by launching new products or services, enhance risk visibility and mitigation, and better anticipate shifts in demand for their products and services.”
Deloitte Insights, “Data Ecosystems: How Third-Party Information Can Enhance Data Analytics”, September 2019
Organizations across all sectors, regions, and sizes are waking up to the fact that their internal data only give them part of the picture. To really understand their business in context, they need to incorporate external sources, too. They need to expand their range of data signals. They need to fill in the gaps in their knowledge.
But, crucially, they also need to find a way to connect to the wider data ecosystem in ways that enhance their ML and analytics workflows, rather than slowing them down. Without the right technology, this is easier said than done.
Who is Acquiring External Data?
From retail to real estate, manufacturing to marketing, all types of businesses now use external data.
In fact, the overwhelming majority of companies now recognize that external data is super important to their success, even if they haven’t got around to developing a complete external data strategy.
As we discovered while polling organizations for our 2021 State of External Data Report:
“Our respondents overwhelmingly indicated that the acquisition and onboarding of external data were important to their business, with 79% calling it “very valuable” and none saying they saw no value at all.”
Why is External Data Essential?
“The COVID-19 crisis provides an example of just how relevant external data can be. In a few short months, consumer purchasing habits, activities, and digital behavior changed dramatically, making pre-existing consumer research, forecasts, and predictive models obsolete. Moreover, as organizations scrambled to understand these changing patterns, they discovered little of use in their internal data.”
McKinsey Digital, “Harnessing the Power of External Data”, February 2021
External data provides scope, nuance, and context well beyond what you can gain from your internal sources. At the very minimum, the right data will help you address gaps, improving the accuracy of your predictive models.
In tumultuous times, when there are hard-to-predict market events or trends that seem to come from nowhere, you really can’t look to your own historical data – it won’t tell you anything useful. In these situations, tapping into external data means you can get large volumes of very recent data to help you make sense of emerging patterns.
The Challenges of Finding and Integrating External Data
Using external data is vital, but it can be tricky. Here are some of the biggest hurdles:
Finding your way around the market
Perhaps the biggest headache when working with external data is navigating relationships with multiple vendors. A key finding of our report was that the majority of companies that use third-party data are working with at least two vendors, while a small but significant minority work with up to five.
Whenever you work with a new vendor, you need to verify that they are dependable, that the data they offer is high quality and in compliance, that a specific dataset is relevant to your data science question. Different vendors will have their own policies, processes, standards, conventions, and approaches to annotation and labeling. Navigating these is a huge headache, before you’ve even started thinking about how to combine and integrate them.
Plus, as Deloitte has highlighted in their report, the data provider market is big and complex. Simply identifying who you should choose to work with is an ordeal. Negotiating access is a constant battle, especially if you need ongoing, real-time access, for example, to refresh your ML models.
Compatibility and integration hurdles
Every time you want to add a new dataset to your workflow, you need to make sure it’s compatible with your existing data. Depending on how each one is formatted, you may need to spend considerable time cleaning and harmonizing the data before you can even think about using it for your predictive analytics. This adds a ton of work and time to the process, increasing your time to deployment.
Even after preparing and matching the external data with your internal data, organizations find that integrating the data into production pipelines can be very complex and costly. Monitoring and maintenance to avoid data drift (unexpected changes to the input data) is also required to maintain the accuracy of predictive models.
Until you’ve actually bought a dataset, it’s hard to figure out if it definitely contains the exact information you need for your advanced analytics or ML model. The chances are, you’ll need to cherry-pick the most relevant data from multiple sources and either combine these or use them to augment your original dataset.
If you have to scour through dozens or hundreds of different datasets to find all the data points you need, that’s both resource-intensive and potentially very expensive. That’s before you’ve even factored in the costs of managing your licensing agreements and contracts, or compiling risk assessments in case problems in the data create liability later on.
To make matters worse, some vendors demand payment in the form of a share in any revenue derived from the data. This is really tricky to track or measure, especially if you’re using that data for AI or analytics; building models that then inform business-critical decisions. How can you be sure what role a particular slice of data had in revenue-generation? And how can you turn a profit from your predictive analytics efforts if you’re constantly paying back a share of any successes, but absorbing the cost if a model leads nowhere?
There Must Be a Better Way, Right?
Yes – absolutely. The simplest, most effective way to address these problems is to seek out a unified platform that automates your connections to hundreds of external data sources. That way, you won’t have to stop around multiple data vendors – you can get everything from one place.
The platform you use should pre-vet and harmonize the data sources for you, guaranteeing quality and accuracy while making integration a breeze. The best options out there will suggest the most relevant data signals, data points, and features, helping you to enhance and augment your original datasets. They’ll also feed the data directly into your advanced analytics projects or ML models.
Final Thoughts: What Happens if You Lag Behind?
Remember earlier on, when I mentioned that 79% of companies we polled told us that external data is important to their business? Well, in the same survey, we also discovered that less than a third of companies have actually acted on that realization by developing a proper data acquisition strategy. This is an enormous problem, because – as we’ve also seen – without a plan, identifying the right data signals and managing third-party data flows can get really complicated. You simply won’t be able to fully seize the opportunity.
It also means that gaps are opening up between companies that have their external data strategies all wrapped up and those that don’t.
Imagine that you’re competing against an organization that manages the whole data acquisition and consumption process seamlessly through an external data platform. That can go from idea to identifying the data they need, to data acquisition, to advanced analytics at lightning speed. While meanwhile, every time you want to update your own ML models or predictive analytics projects, you have to go through a lengthy, complex search, negotiation, procurement, and data preparation process?
Clearly, in this scenario, you would fall behind. Your competitors would beat you to the finish line, pivoting towards lucrative markets and demographics or developing game-changing products faster than you could dream of.
We all know how important external data is. That much was clear from our report. The next question is: what are you doing about it? Are you acting on that realization? Are you investing in the tools and platforms that let you capitalize on the opportunity? Or are you going to let your competitors steam ahead, while your business lags behind?
Looking to get to grips with the most important trends, challenges, and opportunities in the data acquisition market today? Get the insights you need right here! Download your free copy of the research report 2021 State of External Data Acquisition
1 Deloitte Insights, “Data Ecosystems: How Third-Party Information Can Enhance Data Analytics”, September 2019
2 McKinsey Digital, “Harnessing the Power of External Data”, February 2021