Recommender Systems for Last.fm

Recommender systems with collaborative filtering created with Apache Mahout framework. The system uses a Music Recommendation dataset for research purposes as input, but you can train it and predict recommendations with any other dataset. This project explores the calibration and accuracy of user-based and item-based models.

Data

The original dataset contains <user, timestamp, artist, song> tuples collected from Last.fm API, using the user.getRecentTracks() method. This dataset represents the whole listening habits (till May, 5th 2009) for nearly 1,000 users.

The pre-processed dataset below contains <user, artist, rating> tuples. The rating field was calculated by normalizing the number of times a user listened to a specific artist’s songs in Last.fm.

Table format: u.data.csv

user id artist id rating
1 100001 5.0
3 101943 4.6
6 100906 4.3
11 101722 3.6
15 107070 3.9

You can see the original dataset here

Model Tuning

User-Based 1 - Model tuning

User-Based 2 - Model tuning

Item-Based 1 - Model tuning

Technologies and Techniques

Program Execution Rules

The project has an executable in the ‘jar’ folder. The JAR name is: RS_CF_LastFm-v1.jar and you must send as input parameters:

Execution examples:

    java -jar RS_CF_LastFm-v1.jar 1 ../data/in/u.data.csv ../data/out/output.txt USER COSINE 101 20
    java -jar RS_CF_LastFm-v1.jar 10 ../data/in/u.data.csv ../data/out/output.txt ITEM PEARSON 0 10
    java -jar RS_CF_LastFm-v1.jar 10 ../data/in/u.data.csv ../data/out/output.txt ITEM JACCARD

The .JAR program must be run with Java 7 or higher.

Program Output

Once trained the model, the system can make recommendations (on demand) for users, as follows:

user id artist id rating
1 130710 4.366509
1 114674 3.0061495
1 143895 2.9370918
1 103116 2.8950827
1 104052 2.7250140
1 135747 2.6153402
1 135743 2.5869453
1 102936 2.5726979
1 113273 2.5512722
1 114145 2.5447776

Conclusions

  1. Apache Mahout is an excellent fast development framework for Recommender Systems projects. It has a wide variety of Machine Learning algorithms to make predictions (recommendations) and an extensive list of similarity functions.
  2. In static or semi-static scenarios, recommendation systems with items-based collaborative filtering offer better results than those users-based , since it is easier to calculate the similarity between items than between users.
  3. As a general rule, both in the user-based and in the item-based models, the predictive results improved when the models were exposed to more data (75/25 or 80/20), since the Machine Learning algorithm used has more information from which to learn.
  4. The construction of a model items-based takes more time than the construction of a user-based model. However, once the model is built, it makes predictions more quickly and above all, more accurately than the user-based one.
  5. When a new user is created, the user-based recommender model will have a cold start for that user, until the user performs enough interactions to be able to look like someone else. Analogously, it occurs for the item-based model when a new item is created.

Contributing and Feedback

Any kind of feedback/criticism would be greatly appreciated (algorithm design, documentation, improvement ideas, spelling mistakes, etc…).

Authors

License

This project is licensed under the terms of the MIT license.

Acknowledgements:

Thanks to Last.fm for providing the access to this data via their web services.