Collaborative filtering approach for recommender engine
If you want to know a bit about a content-based recommender engine first, you can go through these two previous parts.
Part 1: Overview of recommender engine (https://www.garudax.id/pulse/recommender-system-himanshu-singh/)
Part 2: Content-based recommender engine (https://www.garudax.id/pulse/content-based-recommender-system-himanshu-singh/)
The general framework of a collaborative filtering recommender engine goes as follows:
- Consider a user and his preferences
- Find the neighborhood (i.e. a set of users whose preferences matches with the user under consideration)
- Calculate the rating of the items for the chosen user based on the ratings given by the users in the neighborhood. If the calculated rating is good enough, recommend the movie to her.
Preferences here mean likes and dislikes of users. The key point here is to find out the set of users whose preferences match with that of the chosen user. For that, we’ll have to first define a notion of similarity between users. Consider movies M1, M2 .. M7, and users U1, U2, U3, and U4. We can see in the table that U1 and U2 have rated one movie (M1) together and both of them have rated fairly high. U1 and U3 have rated two movies (M4 and M5) together. But U1 has given a higher rating to M4 and lower rating to M5 while U3 has given a lower rating to M4 and higher rating to M5. From this observation, we can say that preferences of user U1 and U2 are similar while it’s not the same for user U1 and U3. In fact, preferences of U1 and U3 are opposite. In other words, we need to capture the intuition that: similarity(U1, U2) > similarity(U1, U3)
Let’s see some similarity measures and see which one fits our purpose here.
Jaccard Similarity
J(U1, U2) =1/5 (M1 is the only movie that was rated by both U1 and U2) / (total number of movies that are either rated by U1 or U2)
J(U1, U3) = 2/4 => similarity(U1, U2) < similarity(U1, U3)
Thus, Jaccard similarity does not capture the intended intuition. This is because Jaccard similarity is not taking into account the rating values. It only sees if a movie is rated by both or not. Jaccard similarity does not take into account likes and dislikes for a movie. We definitely need some other similarity measures.
Cosine Similarity
similarity(A, B) = cos(rA, rB)
To calculate cosine similarity, we need to first fill the unrated movies in the table. One simple way of doing this is to fill with 0.
similarity(U1, U2) = 0.38
and similarity(U1, U3) = 0.32
similarity(U1, U2) > similarity(U1, U3) Only marginally greater
This still does not capture the intuition that U1 and U2 are much more similar than U1 and U3. This happened because the missing ratings were filled with 0. And 0 is nothing but a low rating i.e. the movie is negatively rated by the user. So, by default, if a user has not rated a movie it is considered that she won’t like the movie. We need to fix this. One way of fixing this can be to use centered cosine.
Centered Cosine (Pearson Correlation)
Calculate the mean of each row.
Normalize each row by subtracting each value in the row by the mean of the row.
Notice that the sum of values in each row is 0. This happened because we centered the rating of users around 0 and 0 became the average rating. +ve rating means that the user liked the movie more than average and -ve rating means that the user didn’t like the movie. Now if we fill the missing values with 0 and calculate the cosine similarity, we get:
similarity(U1, U2) = cos(rA, rB) = 0.09
similarity(U1, U3) = cos(rA, rC) = -0.56
We can see that this similarity measure captures the intuition that U1 and U2 have similar preferences while U1 and U3 do not have similar preferences.
This method assigns an average rating for the unrated movies i.e. a rating of 0. This method also handles “tough raters” and “easy raters”. Some users give ratings very generously while some users give a rating between 0 to 3 even though they are allowed to rate in the range of 0 to 5. This method handles these kinds of users by normalizing the ratings of the user.
Predicting Ratings
Consider a user u for which we want to calculate rating prediction for movie m. Find the k users that are most similar to u in terms of preferences. One way to calculate the rating of user u for movie m would be to simply average the ratings of the k most similar users for movie m. But similarity among users varies a lot i.e. among these k users, some users might be very similar to u while other users might not be that similar to the user u. Considering this in mind, we can use a weighted average of ratings by k users for movie m. Weight is determined by the centered cosine similarity function. This is because the more similar the users, the more weight it should be given while averaging.
So far, what we have discussed is called user-user collaborative filtering because, given a user, we try to find users similar to the given user and use the ratings of these users to predict the rating of the given user for a movie m.
A dual approach to user-user collaborative filtering is item-item collaborative filtering.
The general way of item-item collaborative filtering goes as follows:
- For item i, find other items that are similar to it
- Estimate the rating of item i based on the rating of other similar items.
Similarity metrics and prediction functions used in item-item filtering are the same as in user-user filtering.
Let’s understand item-item collaborative filtering using an example. The matrix represents ratings of movies by different users. Our target is to predict the rating of movie 1 by user 5. For the sake of simplicity, consider the size of the neighborhood to be 2 i.e. identify the top two movies similar to movie 1, rated by user 5.
Rating prediction is calculated using a weighted average method where weight is the similarity value calculated using the Pearson coefficient (centered cosine method) and is on the rightmost side of the matrix in green color for this example. Predicted rating of movie 1 by user 5 = (0.41*2 + 0.59*3)/(0.41+0.59) = 2.6
Better not to recommend her this movie as it has a lower predicted rating.
Item-Item vs User-User collaborative filtering
In theory, both should perform equally well. But in practice, Item-Item collaborative filtering hugely outperforms the User-User collaborative filtering in many use cases like movies, music, books etc. because items are simpler than humans.
- Items belong to a small set of genres while users have varied taste i.e. a song is very unlikely to belong to both a classic and rock genres. But a user might like both of these genres.
- Item similarity is more meaningful than user similarity. Because users can have very unique tastes and also Einstein has said “Everyone is unique in their own way”. Wait a minute.. when did Einstein say that :D
The biggest advantage of collaborative filtering is that it does not depend on the item type as it does not need any feature selection. And this is a big advantage because it is very difficult to find useful features for items such as movies or music.
The computational complexity of collaborative filtering
- The most expensive step in collaborative filtering is to find out the neighborhood i.e. k most similar users in user-user collaborative filtering or k most similar items in item-item collaborative filtering
- We have to match a given user with all the users in the utility matrix and find the k most similar users. This is obviously impractical to do at run time, given that the number of users is in millions. So, we’ll have to precompute the neighborhood. A naive technique to precompute takes an order of O(n*size_of_utility_matrix) which is still very large. Let the number of users = 1 million, the number of movies = 100 thousand then the time taken would be O(1000,000*(100,000*1000,000)) = O(10^17) which is still very difficult to do in practice.
Some common approaches to make it computationally achievable:
- Since the utility matrix is very sparse (assumption: most of the users have not watched most of the movies), dimensionality reduction can greatly improve the computation time.
- The other method can be to first use clustering i.e. similar users belong to the same cluster. And then limit the search to the cluster only while finding the k nearest neighbor.
- Use locality-sensitive hashing for computing the k nearest neighbors in higher-dimensional space.
We’ll see these techniques to make a collaborative filtering approach for recommender engine computationally feasible in the coming notes.
Interesting! I liked it