Article by Patrick Nelson for siliconANGLE
The key to successful artificial intelligence-based advanced analytical training is augmenting internal data with external, according to data science platform startup Explorium Inc.
The process is not easy, though. Problems include getting a hold of and managing that mass of external material.
“The key to any analytical problem is having the right data,” said Zach Booth (pictured), director of global partnerships and channels at Explorium.
Models are only as strong as the data they train on, but getting that needed, external data is challenging. “It’s manual, it’s tedious and it’s extremely time consuming,” he said.
Explorium claims it has a solution that includes a curated, data-source catalog coupled with a platform to integrate everything within an organization’s existing, internal material.
Booth spoke with John Furrier, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during the AWS Startup Showcase: The Next Big Things in AI, Security & Life Sciences. They discussed how external data source platforms — taking advantage of the massive amounts of data being generated in the world — could improve and speed up analytical machine learning for individual organizations. (* Disclosure below.)
Businesses must bring variety into analytical pipelines, according to Booth. The way one does that is by adding more data, and in particular augmenting the existing, high-quality data produced by the organization with external data.
Internal data is good data, Booth caveats, but the problem is that the organization-created data lacks context. Augmenting it with connections to highly varied, additional data in the cloud solves that isolation issue. “We’re helping customers to reach better, more relevant external data to feed into their predictive and analytical models,” he said.
Fraud detection is one use case that will benefit, according to Booth. Verticals could include financial services, insurance, e-commerce, consumer goods and software-as-a-service.
Decision-making to generate and create actions through traditional analog means, such as rule-based approaches or using simple spreadsheets, just isn’t dynamic or flexible enough. “It’s highly limited in its ability to change,” Booth stated. So, one needs to plug in analytics and perform the work via cloud.
Explorium is in the Amazon Partner Network with the goal of bringing context to decision-making.
“Modeling and using data, it’s really a huge arsenal at our fingertips,” Booth said. “The trick is extracting value.” Filtering, navigating and connecting to the increasing quantities of data being created outside of an organization is how he wants to do it.
In the case of fraud analysis, with the end goal of reducing fraud, one can break down how external data can be integrated into three steps, according to Booth. The first is to take advantage of the fact that companies have historical data on hand that shows how many customers have been acting fraudulently over time. That’s the internal training data. It could be supplied to Explorium as an Excel spreadsheet, for instance. That initial data produces a binary output indicating whether the business was fraudulent, or not.
Data enrichments are then added as a second step by the machine learning analytical platform. That step is for the purpose of generating signals, called “features” in machine learning.
The platform then matches and connects the internal and external data, inferring and understanding the meaning of the data along with automating the process of distilling the signals.
This three-step process generates “an edge in your modeling efforts,” Booth said. Fraud rates can thus be reduced, and customers can be given resultant better price-points or other benefits, like a more compelling service.
But it’s not just end-customer offerings that benefit. An organization itself, too, sees extensive value, Booth claims. That’s because the data scientists can spend more time on features and signal engineering and less on managing and formatting the data. One of Explorium’s platforms, called Signal Studio, is geared toward data science teams finding external data signals for analytics pipelines.
“You don’t have to go out and spend all of these one-off efforts on time finding data, organizing it, cleaning it, et cetera,” Booth said. Scale in modeling is also made possible through the platform.
Interestingly, assessing if there’s a return on investment is another area that Booth thinks can be optimized through using an external data source platform for analytics data. The time spent manually sourcing, cleaning, validating and so on detracts from ROI if the model doesn’t ultimately work out — something that takes time. It becomes an opportunity cost.
“That takes months,” Booth said of the manual method. “Every day there’s more and more data that’s being created outside of our org. Can I use a tool that can effectively query all that data?” is the question organizations should be asking themselves, he said.
Stay tuned for the complete video interview, part of SiliconANGLE’s and theCUBE’s coverage of the AWS Startup Showcase: The Next Big Things in AI, Security & Life Sciences. (* Disclosure: Explorium sponsored this segment of theCUBE. Neither Explorium nor other sponsors have editorial control over content on theCUBE or SiliconANGLE.)