Table of Contents

    Artificial intelligence (AI) has been the main driver of big data, robotics, and IoT. Still, this year, the trend is shifting to see access to external data emerge as a strong competitive advantage. 

    Today, there is more interest than ever in AI, Machine Learning, and Data Science. AI applications are in use for applications ranging from recommendation systems to self-driving cars. However, AI on its own will not solve the world’s most critical problems. Successful AI depends on data. Businesses of all types  raced to be the first to implement AI projects or build AI algorithms.  Now, the race is to to outperform competitors by training AI and ML models on the most up-to-date, relevant data.

    Experts, including Omer Har, co-founder and CTO at Explorium, agree with Andrew Ng that the best way to improve AI performance is with better data, not better algorithms. In fact, that is Omer’s major prediction for 2022, and that in 2022, we will see external data emerge as a strong competitive advantage. 

    Read on to learn more about why AI advancements depend on good quality data, and the importance of incorporating relevant, and carefully curated external data signals. 

    Why is good data essential for AI algorithms?

    Building and deploying AI and machine learning (ML) applications requires big data. Forbes highlights that when it comes to identifying patterns of threat and opportunity, AI can help recognize both expected and unexpected signals, events, and patterns to recognize anomalies that might warrant attention.

    It can also help in decision management to improve customer experiences or in goal management to assist in resource performance evaluation and growth.

    However, algorithms need to constantly be fed with data to carry out these functions optimally. With the amount of big data available, it can be difficult for organizations to understand which datasets will be relevant to help them solve their business problems.

    Research suggests that organizations can improve their existing AI algorithms by retraining them with more comprehensive data beyond what is available within the four walls of their organization. In the coming years, we will continue to see access to external data emerge as a strong competitive advantage as it directly influences business metrics and ROI.

    The data also must be of high quality for the algorithm to produce the desired insights and outcomes, as the old adage says, “garbage in, garbage out” . An Information Age article highlights that the more facets the data covers, the faster the algorithms can learn and fine-tune their predictive analyses.

    In a recent blog, Noam Cohen, Data Products Team Leader at Explorium points out there are many business questions that use data to find the answers. They all rely on the assumption that the data represents reality.

    There are three measures of data quality: correctness, freshness, and completeness.

    Data Correctness: How accurately the data value describes real-world facts. The data correctness metric is usually measured with classification metrics such as precision – a compound of correct data points compared to incorrect data points. There are many potential root causes of collection issues such as collection noise, faulty data transformations, outdated data, or incorrect schema description. 

    Data Freshness: How relevant the data is to describe the current state of an entity, taking into consideration the timeliness of the data and how frequently it is updated. This metric is typically measured with time.

    Data Completeness: How whole and complete a data asset is, which is especially important when you want to attach new attributes to existing data. Coverage is also important if you want to extract insights from your dataset. Having insufficient coverage would increase the risk for biases, such as unrepresented stratas in your conclusions.

    Data quality is also directly linked to a data team’s productivity, as they can spend more time on solving business problems, decision making, and ensuring compliance — rather than getting bogged down by the tedious task of cleaning and preparing data.

    Poor quality data is hard to detect and can easily lead to bad data-driven conclusions because data is often referred to as the main source of truth in decision-making. Following that, stakeholders lose faith in data when there are quality issues and make suboptimal intuition-based decisions.

    The role of external data

    Businesses that want to leverage accurate predictive models in their decision-making processes will not get a  complete picture if they only train their models on historical, internal data. Integrating internal data with external datasets provides richer insights, increases machine learning model accuracy, and boosts advanced analytics. 

    An MIT Sloan article states that businesses have primarily relied on structured and unstructured data so far. With the onset of COVID-19, they have been forced to look outside of internal databases to rebuild, retrain and recalculate their predictive analytics and machine learning models.

    Over the summer of 2020, Hershey’s found more people were making s’mores snacks as a way to spend time with friends and family during the pandemic — noting that in areas where COVID-19 cases were on the rise, demand for s’mores-related ingredients surged.

    This, in addition to other data points, led to the company being able to predict future demand for its 6-pack Hershey bars and increase production and inventory levels.

    The key to marrying internal and external data lies in how it is done. The best way is to identify a discrete line of business and limit it to that. Once a comparison is made between the accuracy of the forecast and the prediction made using traditional methods, then a use case can be presented.

    One area where more accurate predictive models are built using external data is B2B lending and risk management. Small to medium businesses (SMBs) are increasingly looking to fintech start-ups  for loans and credit lines. With less stringent background checks than what is required at regular banks, this can attract higher-risk borrowers.

    Lenders need new ways of measuring default and fraud risks. By incorporating alternative data into risk models, lenders can retrain existing risk models with new datasets and reduce default rates. Improved risk modelling can also lead to better operational efficiency; assessing credit risk more accurately means less personnel and budget allocated to recoup bad loans.

    Another area external data comes into play is lead enrichment. Lead scoring models need more information to determine which leads are higher quality and more likely to become profitable customers.

    Training a lead scoring model using external data helps companies more accurately predict how likely a prospect is to convert to a paid customer, helping field sales teams understand which targets are worth pursuing. This accelerates the sales process, enables sellers to build better relationships with customers, and ultimately increases revenue.  

    The consensus is that good data is essential when it comes to data science, machine learning, and artificial intelligence. However, it has become increasingly apparent that models are only as good as the data they are fed — meaning relying on internal data provides a limited view on business processes and potential strategic decisions.

    By integrating internal data with external data, companies can enrich their existing insights, fine-tune their operations, and unlock further growth.

    About Explorium

    Explorium offers the industry’s first end-to-end external data platform for advanced analytics and machine learning. Our unique all-in-one platform automatically matches external data with internal enterprise data to uncover thousands of signals to enhance ML models, dramatically decrease time to superior predictive power and decision-making, and improve business outcomes. Learn more at www.explorium.ai.

    References:

    1.       Jim Sinur and Ed Peters, “AI & Big Data; Better Together”, Forbes, Sep 30, 2019, https://www.forbes.com/sites/cognitiveworld/2019/09/30/ai-big-data-better-together/?sh=510fa90160b3.
    2.     Nick Ismail, “The success of  artificial intelligence depends on data”, Information Age, April 23, 2018, https://www.information-age.com/success-artificial-intelligence-data-123471607/.
    3.     Noam Cohen, “Data Quality – An Interview with a Data Science Expert”, Explorium, Jan 10, 2022, https://www.explorium.ai/blog/data-quality-an-interview-with-a-data-science-expert/.
    4.     Bridget Kimball, “The Importance of Good Data in AI/ML” LinkedIn, July 22, 2020, https://www.linkedin.com/pulse/importance-good-data-aiml-bridget-kimball/.
    5.     Sara Brown, “Why external data should be part of your data strategy”, MIT Management Sloan School, Feb 18, 2021, https://mitsloan.mit.edu/ideas-made-to-matter/why-external-data-should-be-part-your-data-strategy.
    6.     Christopher Doering, “How Hershey chocolate bar sales caught fire amid surge in s’mores consumption”, FoodDive, July 1, 2021, https://www.fooddive.com/news/how-hershey-chocolate-bar-sales-caught-fire-amid-surge-in-smores-consumpti/602321/.
    7. Robert Freedman, “How to combine external data with AI to improve forecasting”, CFO Dive, Jan 26, 2020, https://www.cfodive.com/news/combine-external-data-with-ai-forecasting/571022/.