The Top External Data Trends to Look out for in 2022
2021 was an exciting year for Explorium. We raised our series C round of funding in May, were named a Cool vendor by Gartner, launched two new offerings to build audiences and add data enrichments. In addition, we have been growing and expanding rapidly with new offices in New York, Utah and in California. As we shift our attention to 2022, we have our eye on several key trends that will impact our industry. Read on to hear some thoughts from Explorium employees on the top data trends to look out for in 2022.
Prediction 1: Continue moving away from Big Data systems
It is a basic understanding that without relevant data, you cannot have great insights with your BI reporting, nor build great predictive models. Years ago with the introduction of Big Data systems and the cost of storage going down, many companies decided to capture and store every piece of data they could find and build a massive internal data lake that essentially became useless because they needed a map worthy of Indiana Jones to find the data needed for specific projects. What made it more challenging was the cost and time it took to compute and aggregate the data to make it usable. Analysts started looking elsewhere for the data they needed. Data Scientists ended up spending too much time sifting and sorting through mounds of useless data looking for data relevant to the models they needed. In some cases they couldn’t access these big data systems for security reasons.
After gaining access, jumping through security hoops, downloading massive amounts of data, sorting, filtering, aggregating, normalizing, and preparing the data for consumption, internal data alone still doesn’t provide the full picture.
The trend is going towards streamlining and building better stores of the data necessary for better BI reports and training data for models. To do this companies will accumulate data, not only the internal data they have, but also relevant external data. External data platforms will facilitate this and help companies build a full 360 view of customers and prospects.
Victor Ghadban – Head of Data Science and Data Evangelist.
Prediction 2: Data access will no longer limit business ideas
As businesses are realizing the importance of external data, they will continue to seek a wider variety of data to help launch new products, business models, and revenue streams. Business teams will demand easier access to external data and simpler usage without needing deep skills for integration and data preparations.
With almost unlimited access to external data, and easier ways to use the data, business leaders, analysts and data scientists can focus on the business outcomes and not data collection and cleanups. To realize the vision of data democratization few things will happen:
- Emergence of External Data Clouds across industries that will bring together the world’s data to generate, enrich, and prepare datasets for analytics and machine learning. We’ll see a decrease in dependence on data marketplaces that leave it up to users to clean and combine data. Users will demand ready-to-use data that is easily matched with any data source.
- Intelligent recommendations about best data in context of the business problem. With access to such a large variety of data, it will not be possible to search and test each data property or a signal for its impact to the business. Here ML driven recommendation engines will suggest the best data to use to solve a business problem.
- Use of AI/ML to ensure data quality. Data quality will continue to challenge data practitioners. ML will help measure and monitor the data quality, not only from correctness, coverage, ageing, or impact perspective but also highlight any regulatory compliance risks.
- Data observability and anomaly detection. With a large volume of data created every minute, ensuring the data accuracy and making sure that the data pipelines work as expected, data systems will incorporate tools for data observability and anomaly detection.
- Managing entities rather than data tables. Data teams will look for data entities and their relationships when they build their data solutions. Understanding a customer and its relationship with other entities such as products, stores, households, channels, will help offer hyper-personalized experiences.
Ajay Khanna, Chief Marketing Officer
Prediction 3: External data platforms become an essential component of the modern data stack.
One of the lasting business impacts from the COVID-19 pandemic is the importance of external data. While many companies used external data for traditional sales and marketing initiatives such as outbound prospecting, it was used sparingly in analytics.
This proved to be a gaping hole for businesses as their analytical models lacked the context to understand what was happening in the world as shelter in place orders drove unpredictable customer behavior (sold out toilet paper).
External data’s importance will only grow in 2022 and beyond as organizations realize internal data doesn’t provide a complete picture. AWS, Snowflake, and Google have all invested in growing their data marketplaces but external data platforms will revolutionize how companies search for, access, integrate, and use external data. External data platforms allow organizations to generate new data sets from external sources, enrich their internal data with individual external data signals, and build predictive models that include external data signals to improve their accuracy.
External data platforms solve many of the challenges of using external data:
- centralized platform to manage the sources
- automated matching and integrating of the data
- and reducing the risk associated with governance, compliance, and security requirements.
Stephen Archut, Product Marketing Director
Prediction 4: Increased emphasis on data quality
As data sets get bigger and more data sources emerge, ensuring data quality also becomes more difficult. The amount of data available is not what drives the most accurate insights; the mentality is shifting towards quality over quantity. There is more emphasis on data quality in enterprise systems as organizations increasingly use data analytics and predictive models to help drive business decisions. Improving data quality reduces operational risks and costs. Good quality data provides a better understanding of customers and prospects. Yet, when purchasing external data, ensuring its data quality (completeness, validity, uniqueness, consistency, timeliness, and accuracy) can be difficult to do. Companies understanding the value of external data and actively seeking it out are also going to be choosing the data vendors that can prove the quality of the data that they are selling. There will be an advancement in the technology available that measures data quality, including machine learning and AI algorithms. Companies are already starting to adopt data quality checks that pick up on anomalies in datasets before they continue through the data pipeline. Currently, these techniques are mostly manual, but more automated options are on the way. Data quality is a competitive advantage that data leaders need to improve upon continuously.
Stephanie Casola, Content Writer
Prediction 5: Reinvention is driven by data
This year, at #aws #reinvent2021, It was inspiring to hear Dr. Swami Sivasubramanian talk in his AI/ML keynote about how data is powering the ML revolution, and how customers are reinventing their business with data:
“Data is the underlying force that fuels the insights and the predictions that help you make better decisions and stimulate completely new innovations … With such diverse data growing and spreading faster than most organizations can keep track of, having data and actually getting value out of this data is a challenging thing to do.”
He also spoke about the survival of the most informed, and explained that those who use data to make more informed decisions, respond faster to the unexpected, and uncover new opportunities – will thrive the most. He also explained that harnessing the right data is imperative to current and future business.
Adding #ExternalData to this mix is critical but adds extra challenges! Enterprises need external data to inject context and timeliness into their business decisions, but getting the right data is complicated and resource-intensive. Explorium solves this problem by automatically discovering, connecting, and matching internal data with thousands of relevant external data signals so that you can quickly find and access the right external data. External data is used to enhance analytical and ML models that forecast demand, understand buyer behavior, improve conversion rates, assess risk, and detect fraud.
Iris Zarecki – Senior Partner Marketing Manager
Prediction 6: Data explainability
With more data available than ever before, data explainability will be a key factor in businesses deciding which external data to incorporate in their systems. As external data becomes a more mainstream practice, businesses are looking to better understand both the collection processes and the data itself. Data explainability is a tool through which data becomes accessible, helps to build trust with customers, and is part of creating model transparency. Data explainability is critical for an overall better experience and will be the mark of data providers who recognize businesses’ needs to truly understand their data.
As part of the explainability efforts, my forecast is that data visualisation will also become integrated as part of creating a multi-faceted explainable data experience for customers. Data visualisation will provide users with additional mediums through which they can seek to understand and get a feel for the external data they are adding to their models in 2022.
Shera Mantver, Data Catalog Manager