The practical applications and use cases of recommendation systems are endless in today’s connected lifestyle. From e-commerce websites to dating apps and social media platforms, the user’s experience depends heavily on the suggested next step.
Inthis context, many algorithms have been developed over the past decades attempting to solve the problem of given meaningful and useful recommendations.Machine learning methods are particularly accurate when considering large amounts of data, but what if your recommendations need to perform well with avery limited amount of information?
One might be tempted to look exclusively at classical algorithms as the famous Apriori or FP-Growth algorithms, both based on frequent itemset mining procedures. Although extremely useful in many circumstances, one might find interesting that asimpler, less computationally expensive alternative exists that gives recommendations almost as good as one might obtain, and requiring even lessdata, with those well known approaches: a recommendation system using Jaccardsimilarity coefficient.
For instance, let’s work with ane-commerce dataset, similar to the one publicly available at http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx . Generally in this kind of dataset the relevant information is found in the userID (“user”), productID (“sku”), product_name(“name”), product_group (“group”) and/or order_ID (“order”) column, where each column corresponds to a purchased item or itemset by a client at the store.
Let’s suppose that one wants to suggest products for a user that has already bought something at the store. For the sake of simplicity let’s assume that the number of items to be recommended will be the same as the number of different items purchased initially by theuser.
It is extremely simple to make are commendation based on Jaccard coefficients. For every item (item_1)purchased by the user select every basket in the dataset containing that specific item. Then calculate the Jaccard coefficient for every item (item_2)in any of these baskets selected previously, that is, the ratio between the number of baskets containing both item_2 and item_1 and the number of baskets containing item_2or item_1. Finally for every purchased item (item_1) choose the candidate item (item_2) with the highest coefficient.
This approach is highly versatile and provides stable and good results for very small and large datasets makingit the perfect way to grasp the fundamental purchase trends easily, with the caveat that extremely rare items must be analysed more carefully to providenon-biased recommendations for users that purchase such products.
An implementation of such an algorithm can be seen below, considering a dataset with column names as givenin the above example.
O estudo de séries temporais tem sido de grande importância para a comunidade científica ao longo dos anos. É uma área que tem diversas aplicações desde o mercado financeiro para identificar se no próximo dia vale ou não a pena comprar ou vender ações e até mesmo previsões de tempo para identificar se na próxima semana o dia estará ensolarado ou nublado.
Iniciei minha trajetória de dados como analista de MIS, depois trilhei por caminhos do BI (Business Intelligence) e foi quando surgiu o desafio de ingressar pelo imenso mundo do Big Data. Inicialmente foi assustador. Como usar os gráficos que eu conhecia com um número gigantesco de informações?
Despite the growing list of QC simulators readily available, one of the best ways to learn something is to do it yourself. In this series of posts we are going to learn exactly how to do that. We are not going to worry about performance or ways to simulate a large number of qubits, instead we will focus on a simply way to simulate smaller systems.