The bad hire has been a problem in recruiting for as long as we’ve been doing it, but it seems that no matter how we try, it’s hard to predict how well someone will pan out once they actually start their job. Even in today’s data-rich world, with the incredibly granular view we have of potential hires, it’s not uncommon to hear about the amazing interview that turned into a nightmare employee and later an uncomfortable firing.
Even worse, given the costs — in time, resources, and money — of hiring a new employee, these failures get put under a microscope if we have to start the process all over. But, what if we could predict whether a potential hire would be a successful one?
Could a machine learning for HR model effectively help us predict the best candidate for the job? What data would we need, and how would it look? More importantly, does the data we have match the answers we need to get, or do we need to reframe our questions to get answers that are relevant?
Before writing a line of code and building a predictive model, it’s worth examining the data we’re starting from to make sure we’ve framed the problem (and thus the solution) the right way.
Right off the bat, there are multiple questions we could ask the model to predict about a potential hire, but not all of them would be well suited to the task at hand. For instance, we could ask “how long will a hire stay at the company?”, but that is slightly too open-ended — determining a time frame may go beyond the scope of a simple model.
Alternatively, we could ask a more discrete question: “how long will an employee have to stay at the position to justify the cost of hiring them?” In this case, we don’t really need machine learning for HR to answer the question. We could use BI tools to find the balance.
One of the factors that is often overlooked during a hiring process is that while a first impression is crucial, it’s just that — a first step in many that lead to an eventual decision. Hiring processes occur over time, weeding out candidates and matching those that could be ideal, so any model we build should keep this in mind. That is, instead of asking a single question — will this person be a good hire? — we should attempt to create a layered model that can help select the right candidates at every stage. In this scenario, our first step isn’t necessarily to identify the perfect candidate for the job, but simply those most likely, based on the features we’re looking for, to get to our second round of interview.
Instead of simply asking if a person will be hired, we need to understand those features that can help us predict which candidates are more likely to make it through the hiring process. To do so, we need to understand which features are relevant to our scoring model, and what questions we can derive from them.
When considering the question we want to ask, the first step should be to make sure the data we have can give us a relevant answer. This leads us to examine the dataset we’re using — in this case, the historical records of previous recruitment efforts. In our scenario, we’ll be using a set of 300 previous candidates, with our Y1 being whether they were given a second interview, and our Y2 whether they were actually hired.
We can collect this data by raiding our own databases for relevant information. In this case, HR probably has the most valuable data for our model, but we need to be careful not to take more than we need or information that’s irrelevant. We’re interested in keeping things as leak-free as possible for our test, so let’s focus on pre-recruitment data. Let’s take a look at what our dataset looks like (keep in mind we’re looking only at the first 30 rows):
To start the process of filtering out our candidate pool, we focused on qualification metrics that cover both education and real-world experience. The hypothesis is that the more actual work a candidate has done in the field, the likelier they’ll make it to a second-round interview.
Here we have our first opportunity to consider the features we’ll use when we actually build a model. Our goal at this point is to reduce the amount of time spent prequalifying potential candidates (and maybe even removing the need for a multi-round hiring process, which is both time- and resource-consuming).
The first step in any model should be to make sure we’ve selected the right features. For this, we can use a variety of tools, but let’s use two of the most common ones — F-scoring and mutual information (MI) scores. F-scores will help us determine how accurately each feature will give us a true positive, while MI will help us build weighted models by understanding which combinations of features can help us narrow down the candidate pool before we even interview our first candidate.
We can quickly determine both by importing the test scripts from Scikit-learn and running our datasets through. Although we’re using relatively few features in this example (just ten), there are hundreds of more features we can extract from a CV, and more that we can include when considering second round interviews.
However, by measuring the importance of each feature, we can get a better idea of what we should be looking for, and we can start building weighted scores to help us make better, faster decisions when it comes to who we should invite in for an interview.
As a second step in our model, and a refinement of the hiring process, we can add new features to our dataset including factors such as number of recommendation letters, source of the reference (is it a C-level executive, or is it a mid-level manager?), performance at previous companies, scores on take-home assignments, and more which can help us get a deeper picture of the potential hire without requiring us to assess every individual that submits an application in-depth.
With the right features selected, it’s time to choose the model we’ll be using. Our features and problem mean that we have a few options to choose from, but due to our more limited feature set, we can safely run our data through several models to see which gives us the best results. This may still take some time though, and based on the questions we’re asking, we could safely choose decision-tree-based models.
We could choose a single decision tree, which would come with some clear benefits for our specific example. For instance, we can build a set of rules based on each branch that would give us a clear process to pre-qualify. However, based on our dataset, a single decision tree may not be enough, and could indeed become too complex to accurately reduce our workload.
Instead, we’ll opt for a gradient boosting model such as XGBoost, as it gives us some margin to play with, and won’t consume significant resources like its other tree-based counterparts. Moreover, its ability to work around missing values is particularly useful, and its iterative nature means we can safely run it multiple times without gaining unnecessary complexity. From here, it’s a matter of building out our model based on the concept we’ve designed.
This simple conceptual example isn’t always so simple, but it does exemplify the importance of how we plan our models. Before starting to code, or even draw out our math, it’s important to lay the foundations for a model that will not just work, but that will provide the most accurate results possible.
In our HR-relevant use case, asking different questions may give us unique answers, but they may not be the ones we want or need. By building conceptual plans before we build, we can avoid problems that become crises and eventual catastrophes.