A good data scientist is a bit like a good journalist: they know how to ask questions so precise and to the point that there can be no vague, misleading answer. Data science is all about asking these sharp, specific questions of your data — and the techniques involved in machine learning guides you to these clear answers by requiring that you figure out exactly what it is you want to know before you start.
How are artificial intelligence and machine learning different from business intelligence?
Until the data science revolution changed the game, most businesses were using data analytics or business intelligence (BI) to ask critical questions about their performance.
BI is a great tool to use when you want to analyze historic trends and figure out how well you did in the past. It’s a useful starting point for justifying your decision to keep doing more of the same or to alert you that you ought to try something else. It’s great when you want to turn enormous datasets into reports, dashboards, and visualizations you can understand.
BI also has some predictive capabilities, albeit limited ones: you can draw out actionable insights and patterns from the past that help you make smarter decisions. At least, provided the wider business context hasn’t really changed. If things are very different now to a year ago, these insights will be limited.
That’s the main problem with BI: it only lets you look backward. You can get answers to pre-established questions, but you can’t dive into the unknown and figure out what might work unless you’ve tried it before and succeeded. To really get ahead, you need to look forward — and that’s where machine learning (and automated machine learning) comes in.
Unlike the data analysis involved in BI, machine learning algorithms are complex mathematical models that use historical data to predict future outcomes. These models often combine data from multiple sources to reveal patterns and emerging trends with far more accuracy and sensitivity to myriad external factors than BI could ever hope to achieve. This means you really are making data-driven decisions — not by simply extrapolating from the experiences of the past, but by modeling scenarios that haven’t happened yet. This helps with everything from deciding whether a customer can be trusted with credit through to predicting which segments of your customer base will react in a particular way to a proposed advertising campaign.
What’s more, machine learning is becoming increasingly accessible. As more and more organizations are able to tap into the benefits without hiring a huge team of data scientists, it’s becoming clearer that automated machine learning will make BI redundant for many. That said, data science, AI, and machine learning can only give you valuable answers if you know what questions to ask in the first place.
Classification of data allows you to figure out whether it belongs to a predefined group that predicts a particular outcome. For example, you may be asking the question “What is the likelihood of this customer making a purchase?”. In machine learning terms, what you’re really asking is, “does this customer’s profile/behavior indicate traits we usually see in others in the buyer category?”
Classification can also be used to identify anomalies and with it suspicious behavior. So, it helps answer questions like “Is this a fraudulent transaction?” or “Is this a cyberattack?”
This is a way of investigating findings in your data when applied to a new situation. For example, “If we mention food in the subject line will our emails get a better open rate?”
Feature selection is a way of pinpointing exactly which aspect of the data predicts a certain outcome. It’s particularly useful for facial recognition and other types of image processing. For example: “Does this type of photograph contravene our community policy?”
With scenario prediction, you analyze multiple future outcomes under different conditions. This means you can create many possible scenarios, predicting what would happen in each one. For example: “Would we save money if we cut one of our product lines?” or, “would changing a specific price impact our revenues in these promotions?”
These are questions that deal with minimization and maximization issues — those that seem like straightforward questions but are actually very complex to answer, with a lot of contributing factors to take into account. For example: “What route should our delivery drivers take?”
Clustering is a way of categorizing data based on collections of shared observations. So an example question might be: Are there certain “types” of people who share our posts on Facebook?
This is a way of reducing complex data down to its most important key indicators to tease out trends and make predictions about the future. For example, it could help answer the question “Which of these books will be a bestseller?”.
AI and machine learning can be deployed all across the funnel with huge benefits to your business, but figuring out how to frame your questions is half the battle. It’s important to understand what kinds of questions it’s possible to ask of data so that you can translate these into machine learning models. It’s also important that you ensure that the datasets you have chosen are suitable for these questions - and if they’re not, that you’ve thought about where to get the high-quality external data you need for these machine learning models to work. You can get the answers you need, you just need to be smart about the questions you ask.