Recommender Systems with Surprise
Project with examples of different recommender systems created with the Surprise framework. Different algorithms (with a collaborative filtering approach) are explored, such as KNN or SVD.
Examples
1. RS with KNN
- Model built from a plain text file
- The algorithm used is: KNNBasic
- Model trained using the technique of cross validation (5 folds)
- The RMSE and MAE metrics were used to estimate the model error
- Type of filtering: user-based collaborative
2. RS with SVD
- Model built from a Pandas dataframe
- The algorithm used is: Singular Value Decomposition (SVD)
- Model trained using train and test datasets (80/20)
- The error of the model was estimated using the RMSE metric
- Type of filtering: collaborative
3. Tune model (SVD)
- Model tuning: manual
- Model tuning: automatic
- Compute precision@k and recall@k
Data
MovieLens datasets were collected by the GroupLens Research Project at the University of Minnesota.
This data set consists of:
- 100,000 ratings (1-5) from 943 users on 1682 movies.
- Each user has rated at least 20 movies.
- Simple demographic info for the users (age, gender, occupation, zip)
Table format: u.data
user id | item id | rating | timestamp |
---|---|---|---|
196 | 242 | 3 | 881250949 |
186 | 302 | 3 | 891717742 |
22 | 377 | 1 | 878887116 |
244 | 51 | 2 | 880606923 |
166 | 346 | 1 | 886397596 |
Table format: u.item
movie id | movie title | release date | IMDb URL |
---|---|---|---|
1 | Toy Story (1995) | 01-Jan-1995 | http://us.imdb.com/M/title-exact?Toy%20Story%20(1995) |
2 | GoldenEye (1995) | 01-Jan-1995 | http://us.imdb.com/M/title-exact?GoldenEye%20(1995) |
3 | Four Rooms (1995) | 01-Jan-1995 | http://us.imdb.com/M/title-exact?Four%20Rooms%20(1995) |
4 | Get Shorty (1995) | 01-Jan-1995 | http://us.imdb.com/M/title-exact?Get%20Shorty%20(1995) |
5 | Copycat (1995) | 01-Jan-1995 | http://us.imdb.com/M/title-exact?Copycat%20(1995) |
Table format: u.user
user id | age | gender | occupation | zip code |
---|---|---|---|---|
1 | 24 | M | technician | 85711 |
2 | 53 | F | other | 94043 |
3 | 23 | M | writer | 32067 |
4 | 24 | M | technician | 43537 |
5 | 33 | F | other | 15213 |
You can see the original dataset here
Python Dependencies
conda install -c conda-forge scikit-surprise
Contributing and Feedback
Any kind of feedback/criticism would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc…).
Authors
- Created by Andrés Segura Tinoco
- Created on May 23, 2019
License
This project is licensed under the terms of the MIT license.
Acknowledgments
I would like to show my gratitude to:
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4, Article 19 (December 2015), 19 pages. DOI = http://dx.doi.org/10.1145/2827872