What are unknown unknowns?
An unknown unknown is something you don’t know that you don’t know. The term, made famous by former United States Secretary of Defense Donald Rumsfeld, refers to situations where you don’t know the root cause of a problem, and therefore don’t know what resources you need to solve it. Unknown unknowns are essentially blind spots. For example, a business could see a dip in completed customer orders. The company might know that something is preventing the customer order completions but they do not know what is causing the dip. Finding the root cause can be challenging when looking into internal data, analytics, or BI solutions, which can misrepresent the situation, making it hard to know what issues need to be solved, and how to solve them.
When it comes to external data, many organizations might know that they need external data but don’t understand the unknown variables and complex problems that need to be solved, making it difficult to know precisely what types of external data to look for. These unidentified datasets are unknown unknowns; an organization might realize that data relevant to their business problems exists, but the specifics are entirely outside their scope of knowledge.
Unknown unknowns are common in big data especially as the volume of available datasets grows. Organizations store immense amounts of unstructured data without fully knowing its potential value. Without understanding the value of the data they have on hand, it becomes even more challenging to understand the gaps and what external datasets are needed to fill those gaps.
It is tough to know what you don’t know, before it’s too late. This article will discuss the importance of using external data in your analytics and how to identify unknowns as quickly as possible by using an external data platform.
Why use external data in your analytics?
Most decision-makers today understand the value of incorporating alternative data sets into data pipelines to solve complex business problems. The challenge is finding the right external data that they can use to build and improve predictive models.
Sourcing external data can be challenging. Organizations report a wide variety of challenges in deriving insights from external data, including the size and complexity of the data provider market, assessing data quality and accuracy, and resolving inconsistencies between external and internal data.
Instead of simply deriving static reports from data moved in and out of data warehouses, companies can benefit from using advanced analytics tools that simultaneously collect, mix, and match diverse data from different data sources.
For example, companies that combine data from a variety of internal and external sources can enhance customer service, boost sales and marketing campaigns, and enhance products and services.
To do all of this, it is imperative to use the right kind of data. Relying on historical internal data doesn’t provide the entire picture as it is often limited in scope. To quote computer scientist and technology entrepreneur Andrew Ng: “Data is food for AI, so don’t feed it junk.”
With external data, companies can build more accurate predictive models to fuel more effective, data-driven decision-making processes. Given the challenges of finding and integrating external data, companies should start leveraging the best new technology that makes external data easier to use.
How new technologies can help find unknown unknowns
Sometimes, when you are looking for data, you don’t know what data to look for, or where to start. The missing data might be a spreadsheet on someone’s monitor halfway around the world, or the foot traffic of certain points of interest around the country. It might be another external data source that will add context to help you solve a complex business problem. You might know you need some data, but not what type of data you need, the format you need it in, how many external datasets you need, nor where to find them. What you are looking for are unknown unknowns.
Unknown unknowns can put you at a disadvantage—not to mention your competition may be using it to gain an advantage and take market share.
Finding unknown unknowns is one of the many challenges that external data platforms solve. An external data platform can help with every step of the data sourcing process—from data discovery to data prep, integration, model training, compliance, deployment, and model retraining. When it comes to unknown unknowns in finding the data you need, external data platforms help you discover what you need, and then help you integrate the new datasets into your existing pipelines. A proficient external data platform should not only provide access to the most relevant data sources, but also automatically show you which data sources and features will provide the best model uplifts.
A great example of leveraging external data to build more accurate predictive models is with online B2B lending. Many online lenders are seeking to improve their approach to mitigating risk and uncovering fraudulent loan applications. Internal data alone is not enough to create an effective machine learning model which can accurately predict the likelihood of a potential lender defaulting on a loan. By connecting to alternative data sources , online lenders can obtain the data they need to help them make better lending decisions. Using relevant alternative data improves the speed and accuracy of B2B credit decisions and helps firms evaluate the creditworthiness of potential lenders who may not obtain credit in the mainstream credit system. In the past, lenders would only look at traditional data on loan applications such as income, revenue, or past credit repayments. Now, lenders can get more context by looking at third-party data such as website registrations, online reviews, and other online alternative signals to determine the creditworthiness of small business loan applicants. The same principals can be applied to other use cases such as B2B marketing; generating leads and adding context by enriching the lead data with external data.
With the right external data platform, you can connect to the following alternative data signals:
- Government filings
- Business registrations
- Social media activity and engagement, such as number of LinkedIn followers
- Domain information
- Search engine results
- Foot traffic
- Pandemic recovery signals
While it may be obvious that external data is valuable, it is not always easy to obtain. External data platforms also help organizations overcome data access and data usage issues, augment data for quicker time to value, and understand compliance risks.
The Benefits of Using an External Data Platform
Understanding privacy risks and ensuring compliance
While identifying the requisite unknown unknowns is vital, organizations must also maintain data quality, protect consumers, and meet compliance regulations. Proper data privacy compliance involves identifying, classifying, and documenting internal and external personal information.
The EU General Data Protection Regulation (GDPR) requires businesses to correct inaccurate or incomplete personal data, yet many organizations neglect the importance of data validation. Without comprehensive data quality controls, organizations cannot locate and resolve data inaccuracies involving personal data.
Unfortunately, resolving important data quality issues does not guarantee compliance. Instead, organizations must eliminate all siloed data tasks by integrating data quality efforts with data governance and data catalog initiatives.
As s regulations grow more complex and data environments swell, new technologies help maintain compliance. An external data platform acknowledges the significance of security and compliance and understands that a centralized data governance framework empowers a unified method and promotes collaboration and shared responsibility of enterprise data.
The right technology and approach can enrich the quality of your data pool, bringing extensive benefits to you and your customers while still playing by the rules.
Overcoming data access and usage issues
More efficient data sharing improves overall business efficiency. By sharing live data with internal and external business partners, organizations can optimize spend, provide superior customer service, and streamline operations.
However, the scale of data movement has prohibited wide-scale data sharing due to a few factors:
- Problems with data management
- Insufficient tools and technologies
- Perceived regulatory prohibitions and regulatory risks
Overcoming access issues can be a resource-heavy process and does not guarantee correct insights. Procurement for one data source can take months. Choosing the right external data platform, however, unlocks instant access to premier, proprietary, and public data sets.
Delivering quicker time to value
Many business analysts look for data that is relevant to their use cases. An external data platform finds the most relevant signals automatically, saving the time and effort typically associated with data discovery.
A marketing firm might rely heavily on internal data such as social media engagements, website views, or webinar registrations. By connecting to an external data platform and enriching that internal data, they can then begin to understand metrics such as:
- Social media interactions with the product and others in the category
- Spending potential and financial stability metrics
- Number of previous purchases in the same category
An external data platform not only automates the discovery of relevant external data signals, but also integrates them into a company’s existing data pipelines, delivering faster time to insights.
How an External Data Platform fits into your data architecture
Sourcing external data is a lengthy process. When leveraging external data, manually conceptualizing, testing, and performing analysis for each project takes too much time and distracts from the end goal. Organizations may place undue importance on hoarding every gigabyte of internal data and thus make the incorrect assumption that internal data is the only data available to them.
Subsequently, all of the datasets stored in data warehouses or data lakes eventually become too voluminous to integrate into a single dashboard. An automated external data platform can help organizations enrich models with premium external data that connects seamlessly into a modern data architecture.
The datasets form a single and collective catalog that eliminates the need to separate and integrate each dataset. The data catalog then connects end-users like data scientists and business analysts to the data sources. It also helps end-users find their way around them, suggest the most relevant data points, and provide ways to match and integrate them with internal data sources automatically.
Ideally, an external data platform should have:
- Data access
- Automated data matching and harmonization
- Data enrichment and transformation
- Machine learning
- Data orchestration
Overlooking external data is a missed opportunity for companies. It provides important context not always captured in internal data. The insights generated create a competitive edge, keeping companies a step ahead of industry peers by improving customer acquisition, streamlining operational efficiency, and managing risk.
Explorium offers the industry’s first end-to-end external data platform for advanced analytics and machine learning. Our unique all-in-one platform automatically matches external data with internal enterprise data to uncover thousands of signals to enhance ML models, dramatically decrease time to superior predictive power and decision-making, and improve business outcomes. Learn more at www.explorium.ai.
- David Schatsky, Craig Muraskin, and Jonathan Camhi, “Data Ecosystems: How third-party information can enhance data analytics”, Deloitte Insights, February 28, 2019. https://www2.deloitte.com/us/en/insights/focus/signals-for-strategists/smart-analytics-with-external-data.html
- Joseph D. Stec, “Why External Data Needs to Be Part of Your Data & Analytics Strategy” (United States, O’Reilly Media Inc, 2022).