It’s actually very easy to build a simple recommendation system in Python. I’ll show you how to do it utilizing a movie dataset with various user ratings. In this case we’re just comparing two movies against all others to recommend what a user might like if they were to enjoy Star Wars or Liar Liar.

In [ ]:

```
#Initial Imports
import numpy as np
import pandas as pd
import seaborn as sns
%matplotlib inline
```

In [81]:

```
#Set column names
column_names = ['user_id','item_id','rating','timestamp']
#Read in data
df = pd.read_csv('u.data',sep='\t',names=column_names)
```

In [82]:

```
df.head()
```

Out[82]:

In [83]:

```
#Read in data
movie_titles = pd.read_csv('Movie_Id_Titles')
```

In [84]:

```
movie_titles.head()
```

Out[84]:

In [85]:

```
#Merge these two datasets on the item id
df = pd.merge(df,movie_titles,on='item_id')
```

In [86]:

```
df.head()
```

Out[86]:

In [88]:

```
#Create base ratings dataframe with avg. rating by title
ratings = pd.DataFrame(df.groupby('title')['rating'].mean())
```

In [89]:

```
ratings.head()
```

Out[89]:

In [90]:

```
#Tack on a new column called 'num of ratings' to hold the num of users who rated that movie
ratings['num of ratings'] = pd.DataFrame(df.groupby('title')['rating'].count())
```

In [91]:

```
ratings.head()
```

Out[91]:

In [40]:

```
sns.set_style('whitegrid')
ratings['num of ratings'].hist(bins=70)
```

Out[40]:

In [41]:

```
ratings['rating'].hist(bins=70)
```

Out[41]:

In [43]:

```
sns.jointplot(x='rating',y='num of ratings',data=ratings,alpha=0.5)
```

Out[43]:

In [97]:

```
#Create pivoted dataframe to show user ratings by movie titles
moviemat = df.pivot_table(index='user_id',columns='title',values='rating')
```

In [98]:

```
moviemat.head()
```

Out[98]:

In [99]:

```
#Grab the ratings for Star Wars and Liar Liar individually
starwars_user_ratings = moviemat['Star Wars (1977)']
liarliar_user_ratings = moviemat['Liar Liar (1997)']
```

In [48]:

```
starwars_user_ratings.head()
```

Out[48]:

In [145]:

```
#Find all movies similar to Star Wars
similar_to_starwars = moviemat.corrwith(starwars_user_ratings)
```

In [146]:

```
#Build correlation dataframe for Star Wars
corr_starwars = pd.DataFrame(similar_to_starwars, columns=['Correlation'])
#Drop out NaN
corr_starwars.dropna(inplace=True)
```

In [147]:

```
corr_starwars.head()
```

Out[147]:

In [148]:

```
corr_starwars.sort_values('Correlation',ascending=False).head(10)
```

Out[148]:

In [149]:

```
#Pull number of ratings for these titles
tempDF = corr_starwars.sort_values('Correlation',ascending=False).head(10)
pd.merge(ratings,tempDF,on='title')
```

Out[149]:

In [150]:

```
#Pull mean of num of ratings
ratings['num of ratings'].mean()
```

Out[150]:

In [151]:

```
#Join in the ratings for all movies
corr_starwars = corr_starwars.join(ratings['num of ratings'])
```

In [152]:

```
corr_starwars.head()
```

Out[152]:

In [153]:

```
#Sort out anything that falls below the mean of 60 ratings
corr_starwars[corr_starwars['num of ratings'] > 60].sort_values('Correlation',ascending=False)
```

Out[153]:

In [155]:

```
#Find all movies similar to Liar Liar
similar_to_liarliar = moviemat.corrwith(liarliar_user_ratings)
```

In [156]:

```
#Build correlation dataframe
corr_liarliar = pd.DataFrame(similar_to_liarliar, columns=['Correlation'])
#Drop NaN
corr_liarliar.dropna(inplace=True)
```

In [157]:

```
#Join in the num of ratings
corr_liarliar = corr_liarliar.join(ratings['num of ratings'])
```

In [158]:

```
#Show final correlation
corr_liarliar[corr_liarliar['num of ratings'] > 100].sort_values('Correlation',ascending=False)
```

Out[158]:

WordPress conversion from Recommender System – Simple Example.ipynb by nb2wp v0.3.1