Recommender Systems with Surprise

  • Created by Andrés Segura Tinoco
  • Created on May 27, 2019

Tune algorithm parameters

  • Model built from a Pandas dataframe
  • The algorithm used is: Singular Value Decomposition (SVD)
  • Model trained using train and test datasets (80/20) and cross-validation
  • The RMSE and MAE metrics were used to estimate the model error
  • Type of filtering: collaborative
In [1]:
# Load the Python libraries
import os
import pandas as pd
import numpy as np
from collections import defaultdict
from sklearn.model_selection import train_test_split
In [2]:
# Load Surprise libraries
from surprise import SVD
from surprise import Reader
from surprise import Dataset
from surprise import accuracy
from surprise.model_selection import GridSearchCV
In [3]:
# Load plotting libraries
import matplotlib.pyplot as plt
import seaborn as sns
In [4]:
# Path to dataset file
file_path = os.path.expanduser('../data/u.data')

# Read current ratings of the users
rawdata = pd.read_csv(file_path, sep = '\t', names = ['user_id','item_id','rating','timestamp'])
rawdata.head()
Out[4]:
user_id item_id rating timestamp
0 196 242 3 881250949
1 186 302 3 891717742
2 22 377 1 878887116
3 244 51 2 880606923
4 166 346 1 886397596

1. Manual Tune

In [5]:
# Split data in training and test
train_data, test_data = train_test_split(rawdata, test_size = 0.2)
print("Train size:", train_data.shape)    # 80.00%
print("Test size:", test_data.shape)      # 20.00%
Train size: (80000, 4)
Test size: (20000, 4)
In [6]:
# Read the data into a Surprise dataset
reader = Reader(rating_scale = (1, 5))
data_train = Dataset.load_from_df(train_data[['user_id', 'item_id', 'rating']], reader)
data_test = Dataset.load_from_df(test_data[['user_id', 'item_id', 'rating']], reader)
In [7]:
# Build full trainset
data_train = data_train.build_full_trainset()
data_test = data_test.build_full_trainset()

# Create the trainset and testset
data_trainset = data_train.build_testset()
data_testset = data_test.build_testset()
In [8]:
# Plot the model RMSE
def plot_model_rmse(xs, ys, title, x_label, y_label):
    # Set up the matplotlib figure
    fig, ax = plt.subplots(figsize = (10, 5))
    ax.plot(xs, ys, marker = 'o')
    
    for x,y in zip(xs,ys):
        label = "{:.2f}".format(y)
        plt.annotate(label, (x,y), textcoords="offset points", xytext=(0,10), ha='center')
    
    plt.title(title, fontsize = 12)
    plt.xlabel(x_label, fontsize = 10)
    plt.ylabel(y_label, fontsize = 10)
    plt.draw()

Number of factors (k)

In [9]:
# Factors list
k_factors = [5, 10, 25, 50, 75, 100]
In [10]:
# CV results
train_rmse = []
test_rmse = []

# Loop in which errors are calculated
for k in k_factors:
    algo = SVD(n_factors=k, n_epochs=200, biased=True, lr_all=0.005, reg_all=0, init_mean=0, init_std_dev=0.01, verbose=False)
    algo.fit(data_train)
    
    # The error of the training data is calculated and saved
    predictions = algo.test(data_trainset)
    error = accuracy.rmse(predictions, verbose = False)
    train_rmse.append(error)
    
    # The error of the testing data is calculated and saved
    predictions_test = algo.test(data_testset)
    error = accuracy.rmse(predictions_test, verbose = False)
    test_rmse.append(error)
In [11]:
# Train RMSE dataframe
error_data = {'k': k_factors, 'error': train_rmse}
pd.DataFrame(error_data)
Out[11]:
k error
0 5 0.746269
1 10 0.646057
2 25 0.421978
3 50 0.205627
4 75 0.097156
5 100 0.049158
In [12]:
# Plotting the RMSE behaviour
plot_model_rmse(error_data['k'], error_data['error'], 'Train Model Errors', 'k', 'rmse')
In [13]:
# Test RMSE dataframe
error_data = {'k': k_factors, 'error': test_rmse}
pd.DataFrame(error_data)
Out[13]:
k error
0 5 0.966206
1 10 1.026923
2 25 1.125814
3 50 1.145770
4 75 1.101524
5 100 1.053419
In [14]:
# Plotting the RMSE behaviour
plot_model_rmse(error_data['k'], error_data['error'], 'Test Model Errors', 'k', 'rmse')

The regularization term for all parameters. Default is 0.02

In [15]:
# Lista de valores de regularización
k = 5
reg_all = [0.01, 0.02, 0.05, 0.1, 0.5]
In [16]:
# CV results
train_rmse = []
test_rmse = []

# Loop in which errors are calculated
for reg in reg_all:
    algo = SVD(n_factors=k, n_epochs=200, biased=True, lr_all=0.005, reg_all=reg, init_mean=0, init_std_dev=0.01, verbose=False)
    algo.fit(data_train)
    
    # The error of the training data is calculated and saved
    predictions = algo.test(data_trainset)
    error = accuracy.rmse(predictions, verbose = False)
    train_rmse.append(error)
    
    # The error of the testing data is calculated and saved
    predictions_test = algo.test(data_testset)
    error = accuracy.rmse(predictions_test, verbose = False)
    test_rmse.append(error)
In [17]:
# Train RMSE dataframe
error_data = {'reg_all': reg_all, 'error': train_rmse}
pd.DataFrame(error_data)
Out[17]:
reg_all error
0 0.01 0.749832
1 0.02 0.752630
2 0.05 0.768882
3 0.10 0.801547
4 0.50 0.937865
In [18]:
# Plotting the RMSE behaviour
plot_model_rmse(error_data['reg_all'], error_data['error'], 'Train Model Errors', 'reg all', 'rmse')
In [19]:
# Test RMSE dataframe
error_data = {'reg_all': reg_all, 'error': test_rmse}
pd.DataFrame(error_data)
Out[19]:
reg_all error
0 0.01 0.958093
1 0.02 0.953766
2 0.05 0.933283
3 0.10 0.919134
4 0.50 0.964596
In [20]:
# Plotting the RMSE behaviour
plot_model_rmse(error_data['reg_all'], error_data['error'], 'Train Model Errors', 'reg all', 'rmse')

2. Auto Tune

The GridSearchCV class computes accuracy metrics for an algorithm on various combinations of parameters, over a cross-validation procedure. This is useful for finding the best set of parameters for a prediction algorithm.

In [21]:
# Read the raw data into a Surprise dataset
reader = Reader(rating_scale = (1, 5))
dataset = Dataset.load_from_df(rawdata[['user_id', 'item_id', 'rating']], reader)
In [22]:
# SVD params: 3 * 3 * 3 * 3 combinations
param_grid = {'n_factors': [5, 10, 20],
              'n_epochs': [20, 30, 50],
              'lr_all': [0.002, 0.005, 0.01],
              'reg_all': [0.02, 0.05, 0.1]}
In [23]:
# Tune algorithm parameters with GridSearchCV and k=4 cross-validation 
gs = GridSearchCV(SVD, param_grid, measures = ['rmse', 'mae'], cv = 4)
gs.fit(dataset)
In [24]:
# Best RMSE and MAE scores
gs.best_score
Out[24]:
{'rmse': 0.9166393930190868, 'mae': 0.7226028608788528}
In [25]:
# Combination of parameters that gave the best scores
gs.best_params
Out[25]:
{'rmse': {'n_factors': 20, 'n_epochs': 50, 'lr_all': 0.01, 'reg_all': 0.1},
 'mae': {'n_factors': 20, 'n_epochs': 50, 'lr_all': 0.01, 'reg_all': 0.1}}

3. Compute precision@k and recall@k

An item is considered relevant if its true rating $r_{ui}$ is greater than a given threshold. An item is considered recommended if its estimated rating $\hat{r}_{ui}$ is greater than the threshold, and if it is among the k highest estimated ratings.

$$ Precision@k = \frac{| TP |}{| TP + FP |} = \frac{| \{Recommended \; items \; that \; are \; relevant\} |}{| \{Recommended \; items\} |} \tag{1} $$
$$ Recall@k = \frac{| TP |}{| TP + FN |} = \frac{| \{Recommended \; items \; that \; are \; relevant\} |}{| \{Relevant \; items\} |} \tag{2} $$
In [26]:
# Return precision and recall at k metrics for each user
def precision_recall_at_k(predictions, k = 10, threshold = 3.5):
    
    # First map the predictions to each user.
    user_est_true = defaultdict(list)
    for uid, _, true_r, est, _ in predictions:
        user_est_true[uid].append((est, true_r))
    
    precisions = dict()
    recalls = dict()
    for uid, user_ratings in user_est_true.items():
        
        # Sort user ratings by estimated value
        user_ratings.sort(key=lambda x: x[0], reverse=True)
        
        # Number of relevant items
        n_rel = sum((true_r >= threshold) for (_, true_r) in user_ratings)
        
        # Number of recommended items in top k
        n_rec_k = sum((est >= threshold) for (est, _) in user_ratings[:k])
        
        # Number of relevant and recommended items in top k
        n_rel_and_rec_k = sum(((true_r >= threshold) and (est >= threshold)) for (est, true_r) in user_ratings[:k])
        
        # Precision@K: Proportion of recommended items that are relevant
        precisions[uid] = n_rel_and_rec_k / n_rec_k if n_rec_k != 0 else 1
        
        # Recall@K: Proportion of relevant items that are recommended
        recalls[uid] = n_rel_and_rec_k / n_rel if n_rel != 0 else 1
    
    return precisions, recalls;
In [27]:
# Returns the precision and recall of the model at k metrics
def get_precision_vs_recall(algo, k_max = 10, verbose = False):
    precision_list = []
    recall_list = []
    f1_score_list = []
    
    if algo:
        for k_curr in range(1, k_max + 1):
            algo.fit(data_train)
            predictions = algo.test(data_testset)
            
            # Get precision and recall at k metrics for each user
            precisions, recalls = precision_recall_at_k(predictions, k = k_curr, threshold = 4)
            
            # Precision and recall can then be averaged over all users
            precision = sum(prec for prec in precisions.values()) / len(precisions)
            recall = sum(rec for rec in recalls.values()) / len(recalls)
            f1_score = 2 * (precision * recall) / (precision + recall)
            
            # Save measures
            precision_list.append(precision)
            recall_list.append(recall)
            f1_score_list.append(f1_score)
            
            if verbose:
                print('K =', k_curr, '- Precision:', precision, ', Recall:', recall, ', F1 score:', f1_score)
    
    return {'precision': precision_list, 'recall': recall_list, 'f1_score': f1_score_list};
In [28]:
# Show best params for SVD algo
gs.best_params['rmse']
Out[28]:
{'n_factors': 20, 'n_epochs': 50, 'lr_all': 0.01, 'reg_all': 0.1}
In [29]:
# Use SVD algorithm with the best params
algo = SVD() #gs.best_estimator['rmse']
In [30]:
# Calculate the precision and recall of the model at k metrics
k_max = 20
metrics = get_precision_vs_recall(algo, k_max, True)
K = 1 - Precision: 0.886411889596603 , Recall: 0.10718462087913463 , F1 score: 0.1912440740833082
K = 2 - Precision: 0.8795116772823779 , Recall: 0.169447583530288 , F1 score: 0.28415046030807667
K = 3 - Precision: 0.8791578202406222 , Recall: 0.20326047352732038 , F1 score: 0.33018295399517034
K = 4 - Precision: 0.872434536447275 , Recall: 0.2376092759887271 , F1 score: 0.37349613813508686
K = 5 - Precision: 0.8726999292285911 , Recall: 0.2525420967759154 , F1 score: 0.3917263395611901
K = 6 - Precision: 0.8678874734607218 , Recall: 0.27146692548631346 , F1 score: 0.4135723604634434
K = 7 - Precision: 0.872725204731573 , Recall: 0.283912137018604 , F1 score: 0.42844419587977856
K = 8 - Precision: 0.8714551107067028 , Recall: 0.291218419798995 , F1 score: 0.4365520906894345
K = 9 - Precision: 0.8573842381963406 , Recall: 0.3065108943383628 , F1 score: 0.4515829687660294
K = 10 - Precision: 0.8603124052168634 , Recall: 0.30742238523149756 , F1 score: 0.45297835402243336
K = 11 - Precision: 0.8682346669607819 , Recall: 0.3171578453565094 , F1 score: 0.46460127489551944
K = 12 - Precision: 0.8584259718813855 , Recall: 0.31319496004774927 , F1 score: 0.4589448355530106
K = 13 - Precision: 0.8653974122126985 , Recall: 0.31888408800841744 , F1 score: 0.46604031981714883
K = 14 - Precision: 0.8656337322738595 , Recall: 0.3247169340600693 , F1 score: 0.47227416174535836
K = 15 - Precision: 0.859049722982844 , Recall: 0.32586313495428215 , F1 score: 0.47249489097476055
K = 16 - Precision: 0.8689343182177576 , Recall: 0.3235016464291065 , F1 score: 0.47147468026166933
K = 17 - Precision: 0.860531121321679 , Recall: 0.32423520534384065 , F1 score: 0.4710034013403737
K = 18 - Precision: 0.8627281985772997 , Recall: 0.3378237955104116 , F1 score: 0.48552685093612097
K = 19 - Precision: 0.8563642283616748 , Recall: 0.33039343700305207 , F1 score: 0.47682375095158974
K = 20 - Precision: 0.8552265935307832 , Recall: 0.33484476929988183 , F1 score: 0.48126215007609785
In [31]:
# Get data
c1 = metrics['precision']
c2 = metrics['recall']
c3 = metrics['f1_score']
x = np.arange(len(c1))

# Set up the matplotlib figure
fig, ax1 = plt.subplots(figsize = (10, 5))
plt.xticks(np.arange(min(x), max(x) + 1, 1.0))
plt.ylim(0, 1)
ax1.plot(x, c1, marker = 'o')
ax1.plot(x, c2, marker = 'o')
ax1.plot(x, c3, marker = 'o')
ax1.axvline(x = 10, color = "#8b0000", linestyle = "--")

# Chart setup
plt.title("Model's metrics", fontsize = 12)
plt.xlabel("k", fontsize = 10)
plt.ylabel("Precision and Recall", fontsize = 10)
plt.legend(("Precision", "Recall", "F1 score"), loc = "best")
plt.draw()

Based on this graph, we can select a k = 10, since from that value, the model's metrics practically do not vary.

In [32]:
# Get data
x = metrics['recall']
y = metrics['precision']

# Create scatter plot with the precision and recall results
fig, ax2 = plt.subplots(figsize = (10, 10))

# Create 2D scatter plot
sns.regplot(ax = ax2, x = x, y = y, fit_reg = False, marker = "o", color = "#1f77b4", scatter_kws = {"s": 30})

# Plot setup
ax2.set_title("Precision vs Recall", fontsize = 12)
ax2.set_xlabel("Recall", fontsize = 10)
ax2.set_ylabel("Precision", fontsize = 10)
ax2.grid()