Collaborative Filtering Based Recommendation System using Jaccard Similarity

 min de leitura

The practical applications and use cases of recommendation systems are endless in today’s connected lifestyle. From e-commerce websites to dating apps and social media platforms, the user’s experience depends heavily on the suggested next step.

Collaborative Filtering Based Recommendation System using Jaccard Similarity

Inthis context, many algorithms have been developed over the past decades attempting to solve the problem of given meaningful and useful recommendations.Machine learning methods are particularly accurate when considering large amounts of data, but what if your recommendations need to perform well with avery limited amount of information?

           One might be tempted to look exclusively at classical algorithms as the famous Apriori or FP-Growth algorithms, both based on frequent itemset mining procedures. Although extremely useful in many circumstances, one might find interesting that asimpler, less computationally expensive alternative exists that gives recommendations almost as good as one might obtain, and requiring even lessdata, with those well known approaches: a recommendation system using Jaccardsimilarity coefficient.

           For instance, let’s work with ane-commerce dataset, similar to the one publicly available at . Generally in this kind of dataset the relevant information is found in the userID (“user”), productID (“sku”), product_name(“name”), product_group (“group”) and/or order_ID (“order”) column, where each column corresponds to a purchased item or itemset by a client at the store.

           Let’s suppose that one wants to suggest products for a user that has already bought something at the store. For the sake of simplicity let’s assume that the number of items to be recommended will be the same as the number of different items purchased initially by theuser.

            It is extremely simple to make are commendation based on Jaccard coefficients. For every item (item_1)purchased by the user select every basket in the dataset containing that specific item. Then calculate the Jaccard coefficient for every item (item_2)in any of these baskets selected previously, that is, the ratio between the number of baskets containing both item_2 and item_1 and the number of  baskets containing item_2or item_1. Finally for every purchased item (item_1) choose the candidate item (item_2) with the highest coefficient.

           This approach is highly versatile and provides stable and good results for very small and large datasets makingit the perfect way to grasp the fundamental purchase trends easily, with the caveat that extremely rare items must be analysed more carefully to providenon-biased recommendations for users that purchase such products.

           An implementation of such an algorithm can be seen below, considering a dataset with column names as givenin the above example.








Compartilhar esse conteúdo: