Table of Contents

    Data has been a hot topic of discussion for years now. As organizations create and collect more and more data, it’s key that they use it to make actionable decisions and open up new lines of business. However, most organizations are only scratching the surface of what their data can do  — typically using it only to derive insights on their business operations and not for building AI models to improve business functions. 

    When it comes to data science, well, a lot of the focus has been on hardware and infrastructure — which was commoditized via cloud services like AWS, Azure, and GCP — and algorithms, which can now be developed open source. However, the data that’s feeding those algorithms has become a proprietary asset, which creates a competitive advantage for companies that are able to leverage breakthroughs in AI technology. We believe that the combination of a need for a competitive advantage and commoditized algorithms can mean only one thing:

    2020 will be the year data science focuses on data. 

    Let’s take a look at exactly how we believe the next 365 days will play out.

    1. Data enrichments on the rise

    Data enrichments, merging authoritative third-party data with your internal data sources in order to enhance and improve model performance, are coming to the forefront of the data conversation. Earlier this month the Federal Reserve Board, the Consumer Financial Protection Bureau (CFPB), the Federal Deposit Insurance Corporation (FDIC), the National Credit Union Administration (NCUA) and the Comptroller of the Currency (OCC) released a joint statement highlighting the potential of external data has “to expand access to credit and produce benefits for customers.” They even encouraged the responsible use of external data in models for credit underwriting.   

    We suspect this is just the beginning. As additional industries begin to recognize the power that external data sources have to move the needle, there will be a surge of data science teams in organizations tasked with improving their models through external data sources. 

    That’s why, in 2020, we predict data enrichments will take center stage as forward-thinking businesses look outside their own data sources to sources they never thought about before. These enriched actionable insights will enable modern organizations to make better decisions and open up new lines of business and increased revenue, giving them a major competitive advantage over others who are slower to react. We also expect to see platforms that support this focus by enabling, facilitating, improving, and eventually automating data enrichment ;). 

    Feature Generation: The Next Frontier of Data Science

    2. Data economy growth

    We’re certainly not the first to say that data is the new oil. However, we believe that in 2020 the data economy will finally evolve and spread widely with the understanding that data is a top commodity. This will allow businesses to improve and grow in two ways:

    1. Businesses will begin to look at the data assets they have and find ways to monetize them. This will become a top priority for CDOs who are not only responsible for internal data strategies but also external “data as a service” revenue streams.
    2. The monetization of businesses’ data assets means that other businesses will be able to progress in using those assets to improve their AI-based solutions. This will facilitate our previous data enrichment prediction.

    We also believe the development of platforms and tools that allow data-related transactions to become more widespread will be a new tech stack must-have.

    3. KPI-driven DS teams

    Businesses who use data science and machine learning to improve their processes and products have long been seen as early adopters. In 2020, we believe there will be a shift from early adopters to an early majority stage, ushering in mass demand for companies to transform from BI-driven (passive use of data) to AI-driven (proactive use of data). 

    As these predictive models drive growth, the focus will be on the accuracy and quality of the results they deliver. In many cases, a model’s accuracy will become directly correlated with business results and shift data science teams to be managed as a target-driven business function with quarterly accuracy lift targets. Effectively, companies will move past the “magic” of machine learning and focus on models that deliver measurable business results, just as they do from other strategic functions.

    In order to meet these new targets, we believe the focus will shift back to data. As models are only as good as the data that feeds them — data science teams will pursue new and proven data sources that can measurably move the needle of their core predictive models and growth initiatives. To drive results, they will move beyond their internal data lakes and turn to the vast and rich ecosystem of data sources online.

    4. Full-funnel AI

    Speaking of business results, as AI becomes commoditized, businesses will be able to afford and deploy AI models in areas beyond their core operations. This means that business units from marketing to sales and customer success will all benefit from the ability to develop, maintain, and leverage AI models at scale. In order to scale, data science and analytics leaders will be on a quest for the best, easy-to-deploy, and scalable solutions to manage models. 

    Even Gartner predicts that in the next several years “competitive advantage for 30% of organizations will come from the workforce’s ability to creatively exploit emerging technologies such as artificial intelligence (AI), the Internet of Things (IoT) and augmented analytics.”

    However, we can’t expect already scarce data science resources to scale at the same rate. This means that machine learning tools will support this demand in part by augmenting the tedious and time-consuming tasks data scientists often find themselves focusing on. These new tools will allow data scientists to go beyond the standard autoML solutions by providing a platform that will perform ETL, connect to external data sources, distill features, and provide production-ready models, which will allow data scientists will be able to spend more time on more strategic, ROI-driven projects. 

    5. AI will grow…through AI

    Our last prediction will get a bit meta but stay with us here…

    Traditional software development saw the creation of software that’s purpose is to make tools that radically improve the process of developing new software. Take, for example, compilers, IDEs, source control, and automation in software for software quality assurance. We expect to see a similar turn of events in AI.  

    We believe that the progress of AI will become exponential as AI-based technologies that actually build AI is more widely adopted. We already observe the beginning of this process through the application of methods such as neural architecture search, automated feature generation in model creation, and AI-augmented chip design, which allows the creation of better hardware to run AI applications. 

    This is not something to be feared. AI is not going to progress beyond human control and take over the world. If anything, the ability of AI to create AI only lays the groundwork for our first four predictions. As AI tools that build AI gain momentum, machine learning will be able to scale even faster and wider leaving data teams hyperfocused on improving models by accessing an economy of data for enrichments. 

    The year of better models

    If your business isn’t taking action based on predictive models yet, now is the time to start. Machine learning is no longer a tool only for early adopters. In fact, implementing machine learning and predictive models is relatively old news. The challenge now is feeding those models with the data that will give you the best results for any predictive question across your entire business. We believe 2020 is the year to focus exactly on that — data.

    Here’s to a year filled with better data, better features, and better models!  

    Feature Generation: The Next Frontier of Data Science