Recommender Systems with Surprise

Project with examples of different recommender systems created with the Surprise framework. Different algorithms (with a collaborative filtering approach) are explored, such as KNN or SVD.

Examples

1. RS with KNN

Model built from a plain text file
The algorithm used is: KNNBasic
Model trained using the technique of cross validation (5 folds)
The RMSE and MAE metrics were used to estimate the model error
Type of filtering: user-based collaborative

2. RS with SVD

Model built from a Pandas dataframe
The algorithm used is: Singular Value Decomposition (SVD)
Model trained using train and test datasets (80/20)
The error of the model was estimated using the RMSE metric
Type of filtering: collaborative

3. Tune model (SVD)

Model tuning: manual
Model tuning: automatic
Compute precision@k and recall@k

Data

MovieLens datasets were collected by the GroupLens Research Project at the University of Minnesota.

This data set consists of:

100,000 ratings (1-5) from 943 users on 1682 movies.
Each user has rated at least 20 movies.
Simple demographic info for the users (age, gender, occupation, zip)

Table format: u.data

user id	item id	rating	timestamp
196	242	3	881250949
186	302	3	891717742
22	377	1	878887116
244	51	2	880606923
166	346	1	886397596

Table format: u.item

movie id	movie title	release date	IMDb URL
1	Toy Story (1995)	01-Jan-1995	http://us.imdb.com/M/title-exact?Toy%20Story%20(1995)
2	GoldenEye (1995)	01-Jan-1995	http://us.imdb.com/M/title-exact?GoldenEye%20(1995)
3	Four Rooms (1995)	01-Jan-1995	http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995)
4	Get Shorty (1995)	01-Jan-1995	http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995)
5	Copycat (1995)	01-Jan-1995	http://us.imdb.com/M/title-exact?Copycat%20(1995)

Table format: u.user

user id	age	gender	occupation	zip code
1	24	M	technician	85711
2	53	F	other	94043
3	23	M	writer	32067
4	24	M	technician	43537
5	33	F	other	15213

You can see the original dataset here

Python Dependencies

  conda install -c conda-forge scikit-surprise

Contributing and Feedback

Any kind of feedback/criticism would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc…).

Authors

Created by Andrés Segura Tinoco
Created on May 23, 2019

License

This project is licensed under the terms of the MIT license.

Acknowledgments

I would like to show my gratitude to:

F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI = http://dx.doi.org/10.1145/2827872

A Recommender Systems with Surprise

Recommendation system with collaborative filtering created with Surprise