Over the last few years, the amount and variety of available data has grown at an unprecedented rate.
At the same time, predictive models have become commoditized, mainly due to well-designed open-source libraries. This has gotten us to the point where it’s relatively easy to train predictive models but increasingly more difficult to find the right data to feed them.
You can rarely depend on new models to improve your predictive power. So, what do you do? Start the long and time-consuming process of data acquisition, right?
Here’s the thing: it’s impossible to interact, test, and understand the value of every data source and provider out there. The process of doing so with even one data provider includes (among other things) research, endless proof of concept engagements, testing, API integrations, legal documents, money, and due diligence. Extracting actual machine learning features from these data sources only adds another level of complexity and does not guarantee an uplift in your model.