Hit Song Classifier Part 2

Using Data Science to Predict the Next Hit Song (Part 2)

November 20, 2019 Maël Fabien Data Science

In part one of this two-part series, we explored basic models and data enrichments for our hit song classifier. In this article, we will try to push our model a little more by attempting to improve its performance through better data enrichment and feature engineering. 

Before we get started, let’s recall the context.

The context

Like I explained previously, there’s no shortage of articles and papers trying to explain why a song became a hit, and the features hit songs share. Here, we will try to go a bit further and try to enrich the hit song classifier we built in part one. 

In part one we used data from the Billboard Year-End Hot100 Singles Chart between 2010 and 2018. We then enriched the data using Spotify’s API. Our model achieved an accuracy of 93% on the test set.

In part two we will use data from to enrich our model and do some sentiment analysis. As a reminder, we will consider a song a hit only if it reached the top 10 of the most popular songs of the year. Otherwise, it does not count as a hit.

Feature Generation: The Next Frontier of Data Science

Data enrichment through is a great resource if you are looking for song lyrics. It offers a great API, all of which is packaged in a great library called lyricsgenius. Start by installing the package (instructions can be found on GitHub).

You will have to get a token from developer’s website.

Start by importing the package:

As before, the API has a powerful search functionality:

You’ll need to create a column “lyrics” that contains the lyrics of each song. This one might take some time.

Notice how some of the text is not clean and contains \n to denote a new line or has text between brackets to split sections:

Some features we could add are:

  • The length of the lyrics
  • The number of unique words used
  • The length of the lyrics without stopwords
  • The number of unique words used without stopwords

We will use NLTK stop words list in English. However, we should also consider that some of the songs of the Billboard Year-End Hot100 Singles Chart are not English songs.

Next, apply this to the dataset:

Data exploration

Just like in the first article, some data exploration might bring us additional insights.

How many words are used in the lyrics?

Hit song classifier

The histogram above does not represent outliers, but a few songs count over 2000 words. On average, there are 467 words in a song and 166 unique words. This can be verified by:

The ratio of unique words over total words is 35%. We can also plot the distribution of this ratio:

Hit Song Classifier

The vast majority of the songs do not exceed 40% of unique words, which reflects the balance that hit songs have between repetitive lyrics and a diversified vocabulary. To illustrate the diversity of the vocabulary used in the songs, we can compute the ratio of words that are not stop words over all other words:

Hit Song Classifier

When we remove the stop words, the average ratio is now much higher. A large part of the vocabulary used in those songs seems to be made of stop words.  What are the most common words that singers use in their songs?

Hit Song Classifier

We won’t spend too much time commenting this, but “yeah,” “oh,” and “baby” should definitely be on your hit-song to-do list.

Lyrics sentiment

Should a song be positive? Negative? Neutral? To assess the positiveness of a song and its intensity, we will use Valence Aware Dictionary and sEntiment Reasoner (VADER), a lexicon and rule-based sentiment analysis tool, available on Github. This method relies on lexicons and has over 7500 words annotated by linguists. This kind of algorithm was used before the rise of Natural Language Processing, but can still be useful in cases like this one where we do not have labeled data or trained models for song sentiment classification.

We can also create a feature that is the difference between the positive and the negative score:

What are the sentiments expressed in the songs?

Hit Song Classifier

On average, the sentiment is slightly positive. Some songs have strong sentiments attached to them (i.e more than 0.5 in absolute value), but most songs have sentiments that are more controlled. This approach is however limited since it derives the average sentiment of a song by averaging the word sentiments but does not understand the content and the context.

New model

Now, let’s train a new model and see whether the performance was improved. First, we create the train and test sets and apply oversampling:

Then, we define the random forest classifier and train the model:

The accuracy score improved by close to 5% and reaches 98.3%.

What are the most important features of this new model?

Hit Song Classifier

The order of the important features remains the same, but the compound sentiment feature is now one of the most important features.

Making predictions

Prediction function

We can build a predictor that takes the name of the song and the singer as an input, creates the features, and outputs the probability of a song being a hit. Since the algorithm has never been trained on songs from 2019, we can feed it with recent songs and observe the outcome.

Let’s recall the whole pipeline first:

Let’s build this pipeline and try it with “Lover” by Taylor Swift, a song that was recently released when we wrote this article:

We can create an interactive from the Notebook to ask the user for the name of the artist, the title of the song, and output the prediction.

And in the next cell, type:

Hit Song Classifier

According to our algorithm, there is only a 22% chance that the song “Lover” by Taylor Swift will make it to the top 10 of the most popular songs of 2019. And this is probably the case since the song “only” peaked at number 10 on Billboard Hot 100 for a few days.


Through this article, we illustrated the importance of external data sources for most data science problems. A good enrichment dataset can boost the performance of your model.  Relevant feature engineering can help gain additional performance.

Here is a performance summary of the different steps of our model:

Data from billboardDecision TreeF1-Score: 6.6%
Enrich with Spotify and oversampleRandom ForestAccuracy: 93%
Enrich with GeniusRandom ForestAccuracy: 98%

Feature Generation: The Next Frontier of Data Science

Subscribe Today! Get the latest updates with our newsletter.
We promise you'll love it.

Follow us

Just announced! Explorium Announces $31M in Series B Funding to Accelerate Growth Read more