Spotify Playlists 🤝 Data Science

Nik Linnane

Published Oct 25, 2021

In the fall of 2019 my buddies and I created a collaborative Spotify playlist titled Hungry Guys Radio (“hungry guys” is a reference to one of our favorite SNL sketches and has become an inside joke in our friend group). After years of listening to this spectacularly-curated playlist (shameless plug) I am now able to, by just listening to newly added songs, guess with pretty decent accuracy which of my other three friends added that song based on its vibe alone. What I mean by “vibe” is the combination of various song attributes like artist, tempo, genre and many others that I will dive into shortly.

As someone who loves discovering insights within data, this led me to wonder how I could use my data science skills to quantify and visualize these differences and maybe even find some machine learning use cases!

Important Definitions

Before moving forward I want to define all of the audio features that Spotify provides within its API that I was able to utilize in this analysis…

Acousticness: Describes how acoustic a song is. A score of 1.0 means the song is most likely to be an acoustic one.
Danceability: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.
Energy: Represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Liveness: Describes the probability that the song was recorded with a live audience. “A value above 0.8 provides strong likelihood that the track is live”.
Loudness: Overall loudness of a track in decibels (dB). Loudness values are averaged across the entire track. Values typically range between -60 and 0 db.
Speechiness: Detects the presence of spoken words in a track. If the speechiness of a song is above 0.66, it is probably made of spoken words, a score between 0.33 and 0.66 is a song that may contain both music and words, and a score below 0.33 means the song does not have any speech.
Instrumentalness: Represents the amount of vocals in the song. The closer it is to 1.0, the more instrumental the song is.
Valence: Measure from 0.0 to 1.0 describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

1. Our Perception of Hungry Guys Radio

Prior to starting this analysis I sent out a questionnaire to the three other contributors (Jack, Conor and Ian) as well as a fourth that listens frequently (Shane) to gauge how we perceive our own playlist.

Which adjectives best describe HGR? Groovy, fun, eclectic and happy were the most common answers with 3/5 saying “groovy” which is amazing seeing as this was not a multiple choice question.

Which audio feature(s) best captures the vibe of HGR? 5/5 said “valence”, 4/5 said “energy” and 3/5 said “danceability”.

When adding songs, are you selective in your additions to maintain the playlist vibe? Everyone responded “yes”.

All in all we believe the playlist to be fun and groovy, high energy and aim maintain this vibe throughout all tracks contributed. I will go in depth on more questions as they become relevant throughout the analysis.

2. Data Collection & Cleaning

Spotify maintains an API called Spotipy that allows developers to pull data on individual tracks, entire playlists, artist discographies and so much more. To access Spotipy I wrote a Python function that pulls every track within a playlist along with their corresponding audio features. Big shoutout to Samantha Jones whose article on the API was a huge help in getting started! Her blog can be viewed here:

Additionally, all of my code used throughout this analysis can be found on my GitHub page below. FYI - this blog will mostly include my insights and not so much my code, however, if you are interested in my code I would definitely check it out as I provide a comprehensive walkthrough of everything (especially the machine learning material).

3. Exploratory Data Analysis

While Python is great for making API calls and training machine learning models, I prefer to do any EDA and analytics work in R because of how easy Dplyr’s pipe feature makes manipulating datasets; plus I’m a huge fan of ggplot2 for creating visualizations.

To start, I wanted to get a high level overview of the playlist to shed light on features I may want to take a deeper look at. A few features in particular stand out to me in the above plot…

Loudness: A plurality of songs tend to be more loud than not. When comparing the distribution of loudness to the questionnaire answers, it does lend itself to being more groovy, fun and happy. Taking this a step further, the questionnaire also asked each of us which audio feature(s) we feel we ourselves contribute the most to the playlist as well as what we think everyone else contributes most. While I do listen to a wide variety of music, I especially enjoy an alternative rock vibe (my favorite band is The Band Camino) that tends to be more loud than not. That being said, I was pretty confident I would be contributing the most loudness to the playlist and it turns out my friends agreed. Both myself and my friends answered that we thought I contributed most to the loudness and energy of the playlist — but were we right?

As shown in the loudness density plot above, I do contribute the most loudness to Hungry Guys Radio with an average track loudness of -5.7db compared to the next loudest, Conor, at -6.3db. My next assumption was that loudness and energy would go hand in hand which was, in our case, proven accurate with me having the highest average energy of 0.74 compared to the next highest, Conor, at 0.72.

Popularity: Perhaps the most interesting distribution, there are a large number of tracks centered around a popularity score of 55, but then there are another 50 tracks with scores centered around 0. The questionnaire asked whether or not we avoid adding popular tracks to the playlist; I responded “yes” while everyone else responded “maybe / sometimes”. To further break down this distribution of popularity, based on the scores within this playlist alone, I assigned songs with a popularity score of 20–100 a label of “Popular” and all others “Unpopular” — since this is where the largest gap was in the audio feature distribution plot.

Upon first glance you might think that Jack contributes most to the popularity of the playlist, however, what we’re really looking at here is that he simply contributes the most songs overall (both popular and unpopular). When we take an even deeper look we can truly see who is keen on popular songs…

Since there is an uneven distribution of songs added per user I wanted to break down both the number of popular songs added as well as the percentage of songs added that were popular by user. From the above table it becomes clear that even though Jack has added the highest number of popular songs, he does not have the highest median popularity and he actually has the lowest percentage of songs added that are popular. It turns out that Ian contributes most to the popularity of the playlist (when we take into account total songs added) having the highest median popularity at 74 (19 more than the next highest, Jack) and the highest percentage of popular songs at 86%.

It’s important to note that the above metrics are all based on the popularity distribution of Hungry Guys Radio and not any other playlists outside of it. To compare how our playlist stacks up to one that is designed to be popular, I also pulled data from Spotify’s Today’s Top Hits playlists.

Recommended by LinkedIn

Spotify Symphony : The Data Wrapped

Kush Vyas 5 months ago

How Spotify keeps you hooked leveraging big data?

Kartik Sharma 3 years ago

Discover Your Music Taste with Looker Data Studio and…

Sajid Hasan Sifat 3 years ago

It now becomes obvious that we put an emphasis on contributing less well known songs given the wider range and the less consolidation of scores. Out of curiosity I also put this data into a linear regression model to attempt to find whether there were any meaningful audio features that constitute a popular song. Unfortunately, the model results showed no relationship whatsoever (very low R-Squared and very high p-value) however this could be a project I tackle on its own in the future by pulling a much larger sample of songs.

Valence: This playlist shows high valence which bodes well for our initial descriptions of fun, groovy and happy. What is also interesting is that when asked what we all think we contribute most to the playlist, all four of us had valence as one of our answers.

While we all thought we contributed the most valence, Jack ended up leading the pack with Conor as a close second. What is even more interesting is that all three of my friends assumed I contributed the most valence and I ended up being last.

(Bonus) When are most songs added?: Something else my friends and I were interested in was when we are most contributing songs throughout the year. The below plot shows that we are definitely most active in our track additions during the month of September; maybe has something to do with the nice Fall weather since those are in fact great vibes? I will also note that when Jack initially populated this playlist in September 2019 he added a large number of songs in one day so those were removed from the below plot. On average we add 16 songs per month which is great for people who want to hear new songs regularly! (Another shameless plug to give it a follow).

4. Machine Learning

After gaining a better understanding of the playlist I began to brainstorm possible machine learning applications. Since the playlist contains less than 300 songs at the time of this analysis, I was skeptical of training a model to predict which song was added by which contributor. Also because I didn't see much utility in this other than out of sheer curiosity. Instead, I decided to train several models that would predict the probability that any given track fits the vibe of Hungry Guys Radio. Currently Spotify has a similar feature (which can be found at the bottom of any playlist titled "Recommended") that recommends songs it thinks will go well with the given playlist. The main difference with my model however, is that we have the ability to choose any track we're curious about adding to the playlist whereas the current Spotify feature has full control over the tracks it shows.

To tackle this problem I needed not only data from our playlist, but data from other playlists that varied in style, genre etc. that would allow a model to pick up on patterns and differences between them (that's essentially what machine learning does in a very basic sense). My dataset ended up being comprised of Today's Top Hits, Hot Country, It's ALT Good, All New Jazz (all created by Spotify), Morgan Freeman (created by Jack) and of course Hungry Guys Radio. As I mentioned earlier, I'll mostly be discussing my insights here but to see a deep dive into my code and model evaluation metrics check out my Github!

Algorithm Selection & Model Training: Since this is a classification problem I decided to work with the follow algorithms...

Logistic Regression: Commonly used classification algorithm that estimates the probability that an instance belongs to a particular class (in this case HGR).
Decision Tree Classifier: The goal is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features. Within my code you can view the actual decision tree and its rules!
K-Nearest Neighbors Classifier: In k-NN classification, the output is a class membership. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.
Random Forest Classifier with GridSearchCV: Random forest classifier creates a set of decision trees from a randomly selected subset of the training set. It then aggregates the votes from different decision trees to decide the final class of the test object. Grid search is a tuning technique that attempts to compute the optimum values of hyperparameters. It is an exhaustive search that is performed on a the specific parameter values of a model. The model is also known as an estimator. Grid search exercise can save us time, effort and resources.
Precision, Recall & F1 Score: Important evaluation metrics for classification models. Precision is the accuracy of the positive (HGR) predictions. Recall is the ratio of positive (HGR) predictions that the model is able to identify. F1 Score is a harmonic mean between the two ranging from 0-1 and cannot be high unless both precision and recall are high. There is also an inverse relationship between precision and recall - one cannot increase without the other decreasing.

Throughout the training process (when the model learns the patterns of the dataset that constitute a HGR song and a non-HGR song) I was able to identify the threshold per model that would give me ~80% recall. I decided to focus on recall instead of precision because music is so subjective. In other words I preferred a model that would be better able to identify possible songs that fit the vibe (high recall) than a model that would definitively tell me a song fits the vibe (high precision). Because of this, the model is able to ingest a long list of songs and tell us which ones we should give a listen to for us to ultimately decide if we want to add it.

Out of the four models I trained and tested the logistic regression seems to be the best option, however, I do also like how the random forest performed as well. I think the logistic regression shined here due to the limited data, otherwise my guess is that the KNN would've performed better than it did. The last thing I had my friends and I do was select two songs of which one they would add to the playlist and another they would not (depicted within the "Actual" column in the below chart). We can then see the corresponding probabilities from each of the four models...

5. Conclusion

Overall I think these models perform pretty well but for some reason they all love "Vetement Socks" by NAV which definitely doesn't fit the vibe but has similar audio features. In the future I would love to add more features for the model to lean on including artist, genre, year released etc. and maybe even run some unsupervised clustering algorithms; but for now my friends and I will certainly have some fun with the existing models.

Links...

Hungry Guys Radio

Spotipy Documentation

My Github

Samantha Jones' Article

Spotify Playlists 🤝 Data Science

Nik Linnane

Important Definitions

1. Our Perception of Hungry Guys Radio

2. Data Collection & Cleaning

3. Exploratory Data Analysis

Recommended by LinkedIn

4. Machine Learning

5. Conclusion

Links...

Others also viewed

What Spotify Wrapped knows about me (and I don't want to admit)

Chartier Family Wrapped!

SpotifySmartPlaylistCreator: An AI system that builds custom Spotify playlists from a single sentence

Imagine what could happen if… Universities used zero-party data like Spotify.

t-sne dimension reduction on Spotify mp3 samples

How is Spotify leveraging Deep Learning to shake things up?

Case Study: Spotify’s Personalisation Engine & Lessons for PMs

Spotify Radio: Has Artificial Intelligence Taken Over The Human Listening Experience?

Recommendation algorithms are smarter than you think!

Enhancing Thematic Song Recommendation Apps with AI

Explore content categories