Simple Popularity Based Recommendation
This article shows a simple popularity based system using python. The dataset we are using is Restaurant & Consumer data. These datasets are hosted on: https://archive.ics.uci.edu/ml/datasets/Restaurant+%26+consumer+data
Popularity-based recommenders is a primitive form of collaborative filtering, where in products or items are recommended to users based on the popularity of those items among other users. Commonly, popularity-based recommenders rely on purchase history, but in our case the data that the recommender works off of, is not purchase history, but rather, it's a website user activity data set.
So, recommendations are made based on counts of most read or most shared articles. One disadvantage of popularity-based recommenders is that they can't make recommendations based on personalization to users because we are not taking user data into account here. The first type of recommender system shown here is the popularity-based recommender.
Let's not delve into concepts much and lets just get started...
Start by importing basic libraries, pandas and numpy, read the files and print few rows for ratings and cuisines.
import pandas as pd
import numpy as np
frame = pd.read_csv('rating_final.csv')
cuisine = pd.read_csv('chefmozcuisine.csv')
frame.head()
cuisine.head()
Lets take a count of the number of ratings that were given to an eatery. We will go by the assumption that the places that have the most number of ratings or reviews are the most popular ones. Based on this assumption, we make the popularity-based recommendation that one place is preferable to other.
Here, we group the data frame by the place ID and for each unique place ID, we look at the ratings column and take an account of how many ratings there are. Because we want this to be its own data frame, we're going to use Panda's data frame generator, so it's pd.DataFrame. This function just converts the output of our group by function into its own data frame. We will call this new data frame, rating_count and we can sort the places in descending order, according to the number of reviews they received. We just take the rating_count data frame and invoke the sort values method by passing in rating, because we want it to sort by the rating, and we want it to sort in descending value, so we pass in the argument, ascending=False.
rating_count = pd.DataFrame(frame.groupby('placeID')['rating'].count())
rating_count.sort_values('rating', ascending=False).head()
Looks like the most reviewed place is a place with an id 135085 with a total of 36 ratings. We can take the top five most often rated places to see similarities between the cuisines that they serve. To do that, we'll first make a data frame of the place IDs of the most often rated places, then we'll merge that data frame with the cuisine data frame.
Let's create the data frame with the place IDs for each of the most reviewed places in the data set. We will merge this data set, most_rated_places, with the cuisines data set and see if there're any similarities between the cuisines that are served at the most popular places in town. We will use the panda's merge function and we're going to say that, on the left, we want most_rated_places and then on the right, we want the cuisine and we want it to be merged on the field called placeID. Print the out put we see the most popular places in town and the cuisine types served in each of them.
most_rated_places = pd.DataFrame([135085, 132825, 135032, 135052, 132834], index=np.arange(5), columns=['placeID'])
summary = pd.merge(most_rated_places, cuisine, on='placeID')
summary
Let's see number of types of cuisines available from places in this data set by selecting the Rcuisine variable and pull the describe method off of it.
cuisine['Rcuisine'].describe()
We can see here that there are 59 unique types of cuisines that are represented in our data with the most frequently occurring type of cuisine in the data set is Mexican food. Looking back at our summary table we can observer that two of the top rated places in town both serve Mexican food. The recommender is suggesting that Mexican food is popular and that places that serve it are good candidates for recommending. So, from the description of our cuisine data frame we see that Mexican food is the most frequently served type of cuisine in the data set. Our recommender is basically saying that places that serve the most popular types of cuisine are more likely to be appreciated by the average restaurant goer in the city. It does make sense, right?
count 916
unique 59
top Mexican
freq 239
Name: Rcuisine, dtype: object
Hope you found this useful.. Look out for my next article on Correlation Based Recommendation which is one of the basic forms of collaborative filtering.
Nice One Lokesh..
Will look forward to more articles like these :)