Domain Knowledge in Data Science: Are Your Models Ready for Business?
Data science for business is a very different beast than building models in an academic or purely scientific context. You aren’t looking for patterns simply because they’re food for thought, inspire further questions, or demonstrate interesting dynamics and behaviors. You’re looking for patterns because someone in your company wants to know what action they should take next.
From a data science perspective, it may seem like a subtle difference, but it completely changes how you interact with your team.
When your colleagues ask you a question about their data, they are coming from a different place than a data scientist. They aren’t particularly interested in what’s going on under the hood, or in dissecting ambiguities, or in anything superfluous to their workflows and decision-making. They expect you to understand why they are asking the question and why it’s commercially relevant. They expect you to get the context.
In other words, they presume a level of domain knowledge.
What is domain knowledge?
Simply put, having domain knowledge means that you understand a particular field and are comfortable talking about how it functions and what aspects of it matter the most.
You know the industry acronyms, what areas people who operate in this domain focus on, and how they structure their workflows. You get who their customers or stakeholders are and how they interact with them. You understand how this domain interplays with and deviates from other, connected domains. You appreciate the pressing business challenges and concerns.
You really can’t work effectively in any sector or discipline without any domain knowledge. That said, there’s obviously a big difference between awareness of what matters to your team and knowing so much that you’re a bona fide domain expert. You don’t necessarily need to be the latter to do great work. But, more on that in a moment.
The role of domain knowledge in data science
Without an extensive contextual understanding of the industry and sector, you will struggle to move effectively beyond event framing to more sophisticated forms of data science, such as predictive analytics, condition monitoring, mapping, and conflict avoidance. When you get to the really advanced stuff, including diagnostic analytics, root-cause analysis, and drawing out actionable insights from the models, you absolutely need to acquire deep-level domain knowledge.
The importance of domain knowledge in data science also becomes clear when you think about the four factors you need to consider at the beginning of any data science project: precision, accuracy, representativeness, and significance.
Without understanding the sector and industry you’re working in, including how and where the data is captured, it’s difficult to appreciate how much uncertainty is attached to each value.
On a similar note, if you aren’t familiar with the broader context, it’s tricky to estimate how far your data could deviate from the reality of the situation. How certain can you be about the patterns your model produces? How high do you need to set the bar for evidence?
To trust your results, you need to know what’s missing. Do the datasets you’re working with reflect all the important and relevant aspects of the domain? Do you need to introduce other datasets to get a fuller picture?
Equally, do these datasets reveal something important about domain dynamics, patterns, and behavior? Do you know enough about this area to tell whether you’re stating the obvious or adding genuine insight? Would other behavioral patterns drawn from other datasets be more pertinent, revealing, and useful to your team?
Working with a domain knowledge expert
You may be thinking to yourself, “But my background is data science! How exactly am I supposed to get to a point where I understand the business context without years of experience behind me?”
It’s a valid concern. The important thing is to remember that you are part of a team and that different voices and experiences enrich the process by bringing new ideas and perspectives to the table. You aren’t supposed to magically absorb everyone else’s expertise – you just need to find better ways to tap into that expertise.
Put it this way: you are (presumably) surrounded by other people with domain knowledge. Everyone in your company that works on the business side has built up that understanding over the course of their careers. They need your skills to make sense of data to help them further their business goals – and you need their contextual understanding to enrich your models. The key is to involve them in the process, right from the planning stage.
In other words, you need to speak to existing domain experts and stakeholders among your colleagues and ask them to help you define the task. You can then figure out which toolset will best help you solve the problem they describe or get the answers they need, and work together to select the most significant and representative data.
What data scientists need to ask domain knowledge experts
- What are you expecting to see?
One of the great things about approaching the problem from a purely scientific, data-orientated perspective is that you’re less likely to bring your own biases to the table. That said, if you don’t have a little bit of framing, it’s hard to know which patterns and trends your colleagues are looking out for – and whether what you have built is actually useful to them.
By getting an idea of your colleagues’ preconceptions, you can think about how to build a model that tests a specific hypothesis. It also means you can highlight things that specifically challenge those preconceptions, ensuring that your company doesn’t get swept up in confirmation bias, only seeing what they expect to see.
- What do you already know?
There’s nothing worse than proudly revealing your model only to be told that it’s already redundant or adds very little new knowledge and understanding of the domain. It’s disappointing for everyone to have you waste time proving things that are already obvious to your colleagues!
Before you start, ask stakeholders to give you a run-down of the aspects and patterns they already take for granted. From here, you can focus on adding nuance, challenging underlying assumptions, and bringing new, valuable insights to the table.
- How exactly will you use the results?
Before you start any data science project, you need to know why you’re doing it and what your team needs to get out of it. However, dig deeper than simply asking domain knowledge experts their overarching objectives.
Really drill down to find out exactly what details they’re looking for and how this will alter their future decisions. The more details you get now, the clearer picture you’ll have of how their world works and the role your model plays within it. This should open up new questions, ideas, and avenues that improve your approach.
- Is this model giving you the precision you need?
One reason a collaborative relationship works well is that you’re attuned to different types of problems.
Once you’ve built and tested your model, you will be in a good position to spot training data issues such as overfitting, explaining to business-minded colleagues when there are too many parameters, and why this will make it difficult to establish meaningful patterns.
Meanwhile, though they may not know the relevant data science terms, your domain expert colleague will quickly be able to tell when you have the opposite problem, underfitting, because they can tell you when the model isn’t imprecise or accurate enough to be genuinely useful in the real world. Conversations like these will help you refine and develop the model into something robust and valuable.
- What data do you wish you had?
Unless you know this sector or industry inside-out, you will probably be left to draw your conclusions from the kind of data the company has amassed in-house. The trouble is, you can’t tell if this gives a limited picture, or if there are actually better ways – and better data – that would lead you to the answers your team needs.
Ask your domain expert colleague if they can explain what kind of data they would kill to have but don’t collect internally. They may not know that it’s possible for you to connect to external data sources until you tell them!
Final thoughts: honesty is the best policy
The key to nailing data science for business is to be open, humble, and upfront about what you don’t know.
Keep asking questions and getting domain experts in your orbit to explain any terms or concepts you aren’t familiar with. If something isn’t obvious, question it. The likelihood is either that this is something you need to understand better to improve your models, or that it’s something your colleagues take for granted, but could do with being challenged and tested by a data scientist (like you).
Either way, encouraging a culture of asking questions and challenging assumptions is really important. That applies on both sides; the more you get non-technical colleagues querying what you do and how it works, the more confident they’ll be about explaining what they need and pointing out potential problems.
Pretending to have more domain knowledge than you really do might save face at the moment, but will slow down in the long run.