Applying Netflix Recommendation Algorithm to an Anime Dataset

Juandiego
5 min readMay 5, 2022

How many of you ever have been asked ‘¿What kind of films/series do you like most?’ It’s a complicated answer because many of you have multifaceted interests. But streaming services like Netflix or YouTube know our tastes perfectly and keep us entertained for a long time, our attention is their core business.

How can they mix the right combination of ingredients to serve us the perfect film? Recommendation algorithms some 15 years ago used a system called ‘content filtering’ based on features of a user given the movie’s features they react positively/negatively to.

Content Filtering

With these systems, users get movies according to film feature scores like drama, horror, comedy, historical fiction, etc. Let’s say the score for each movie is from 0 to 5, if ‘user a’ loves comedy (5) and hates drama (0), they will receive recommendations from movies or series that have high comedy scores. With this approach, the algorithm doesn’t need users to recommend movies, it just needs the features and matching the movies more correlated. If a user loves comedy, the algorithm recommends movies with similar features. It’s important to note that features and scores are assigned by a human to feed the algorithm.

During recommendation, the similarity metrics…are calculated from the item’s feature vectors and the user’s preferred feature vectors from his/her previous records. Then, the top few are recommended — Abhijit Roy

The problem with this system is that is not accurate with real user preferences, because we like movies for different reasons, we probably like the movie not for its history or casting, just because the movie filled you with good memories of when you were a child or the mood they pass on and a myriad of other factors.

We simply don’t know why we like a certain movie and that’s ok, Netflix and other service live streams understand you well enough to help you, that’s why they are so successful.

Collaborative filtering

With these systems, users get movie recommendations according to other user preferences, they don’t need features like drama, horror, comedy, or something else assigned to a movie and that’s the key insight behind this approach. With content filtering, the features are assigned by a human whereas with collaborative filtering the features are extracted from the patterns in the data. In other words, the algorithm learns which features users like by observing their behaviors.

It considers other users’ reactions while recommending a particular user. It notes which items a particular user likes and also the items that the users with behavior and likings like him/her likes, to recommend items to that user — Abhijit Roy

That’s enough theory for now.

Project: Anime Recommendation System

For this project I used the database that you can download in this link, it contains the data of 300k users, 14k anime metadata, and 80k ratings from MyAnimeList. Why anime? I like anime and I don’t have access to all the data from Netflix movies. If you want the direct code, go to my Github. However, here’s the step-by-step process.

It’s a large data set so I recommend you clean unnecessary columns and leave the important ones: user_id, anime_id, ratings, and name, if you want to follow me, save them as ‘Anime Recommendation System.csv’.

First, import Pandas, and Numpy, and upload the dataset.

It has 55 million rows but I used 45 million, if your PC or virtual machine can run all the 55M rows, go ahead and execute the function shape.

You will get something like this.

Assign the index to ‘user_id’, columns to ‘Name’, and values to ‘rating’ and get rid of the multi-index with the .xs function.

You will get something like this.

Don’t worry about NaN values, many users don’t assign ratings to animes they watch.

Now, I use a correlation matrix using the Pearson method, which assigns a value from 0 to 1 depending on how closely related the values are to each other. Does it sound familiar to you? It’s similar to the approach of collaborative Filtering.

You will get something like this.

If you notice, each anime name has been assigned a score based on user preferences but not pre-established features by a human.

Now the algorithm set the scores it’s time to define a function to get similar animes.

Now let’s simulate two users who like different anime names: ‘user’ likes Cowboy Bebop, Paprika, and Koukaku Kidotai, and ‘user 2’ likes Kono Subarashi Sekai Ni Shukufuku Wo! and Gintama.

Then append the user preferences to a DataFrame called ‘similar_movies’ with a for cycle and sum all the values into.

In this case, I used the preferences for the ‘user’, getting this.

Now if I use the preferences for ‘user 2’ getting this.

The scores on the right are assigned by the algorithm, not by me. Note that all the recommendations are based on user preferences, titles like Gintama and Kono Subarashi Sekai Ni Shukufuku Wo! are comedy series but the algorithm doesn’t need to know that because it doesn’t matter, the only thing that matters is the user preferences. If you are an anime lover, as you can see, these recommendations are accurate.

Conclusion

Recommendation algorithms have evolved to deeply understand user preferences and act in consequence.

I hope my project gives you a deep understanding of how these algorithms work. See you next time.

Originally published at https://juandiegorr.com on May 5, 2022.

--

--

Juandiego

Business Analyst | Data Analyst | Marketing Data-Driven Creator and Occasional Writer