From Netflix, Google, and Amazon, to smaller webshops, recommendation systems are everywhere. In fact, this type of system represents probably one of the most successful business applications of Machine Learning. Their ability to predict what users would like to read, watch and buy proved to be good not only for the business but for the users as well. For users, they provide a way to explore product space and for businesses they provide an increase in user engagement and more knowledge about the customers. Also, these systems are widespread and existing in almost every big cloud platform. When we think of YouTube video recommendations, they are there. Netflix menus with suggested series, they are turning the wheels behind the scene. Gmap suggested routes? You can bet. These systems became one of the building blocks of our industry and it would be bad not to know anything about them. In this article, we get familiar with these systems and see how we can build one using ML.NET.

Are you afraid that AI might take your job? Make sure you are the one who is building it.

STAY RELEVANT IN THE RISING AI INDUSTRY! ūüĖĖ

1. Dataset and Prerequisites

Everyone loves Netflix. One reason for it is that their recommendations are top-notch. The company invested a lot in this. They are famous for their Netflix Prize competition, where engineers try to predict user ratings for films, based on previous ratings without any other information about the users or films. They even provided a dataset, a training data set of 100,480,507 ratings that 480,189 users gave to 17,770 movies. Each sample in the dataset is formatted as a set of four features: user ID, movie ID, Grade, Date of Grade. The user ID and movie ID features are integer IDs, while grades are from 1 to 5. In general, in this article, we will not use dates. Here is how that data looks like.

The implementations provided here are done in C#, and we use the latest .NET 5. So make sure that you have installed this SDK. If you are using Visual Studio this comes with version 16.8.3. Also, make sure that you have installed the following packages:

You can do the same from the Package Manager Console:

Note that this will install default Microsoft.ML package as well. You can do a similar thing using Visual Studio’s Manage NuGetPackage option:

If you need to catch up with the basics of machine learning with ML.NET check out this article. 

2. Types of Recommendation Systems

As we mentioned, the Netflix dataset contains information on how a user rated a movie. Based on this, how do we create a recommendation for that user? We need to consider some features of the movie that the user has watched and ranked, and then recommend similar items. Alternately,  we could consider finding similar users based on those rankings and suggest items that those users purchased. But what does it mean that two items are similar? What does it mean that two users are similar? How to calculate that and present that similarity in some mathematical terms? 

Different types of recommendation systems take different approaches to these questions. In general, there are four types of recommendation systems:

  • Content-Based¬†Recommendation Systems –¬†This type of recommendation system is focused, well, on¬†content. Meaning they use only features and information from the¬†items¬†and based on them create recommendations for the user. They don‚Äôt take into account information from other users.¬†
  • Collaborative Filtering¬†Recommendation Systems –¬†The biggest power of recommendation systems is that they can suggest items for users based on their¬†behavior¬†on a certain platform or based on the behavior of¬†other users¬†of the same platform. For example, Netflix suggests your next series to binge, based on the series you‚Äôve previously watched, but based on the series that users that watched and liked the same content as you too.¬†
  • Knowledge-Based¬†Recommendation Systems –¬†¬†This type of recommendation system use¬†explicit knowledge about the user‚Äôs preferences, items, and or recommendation criteria. In this scenario, recommendation systems would ask a user about their preferences and based on that feedback build recommendations.
  • Hybrid Solutions¬†Recommendation Systems – Often, we use a combination of all types for some custom solutions.

If you want to learn more about how each of these systems works check out this article. From these three types, the first two are used most often and the most popular. In practice, it can happen that we build hybrid solutions to get better results.

ML.NET supports only collaborative filtering, or to be more specific – matrix factorization. That is why in this article we focus on this type of recommendation system. Let’s learn more about how these systems function under the hood.

3. Collaborative Filtering Intuition

One of the most popular techniques to create recommendation systems is¬†Collaborative Filtering. Unlike Content-Based Filtering, this approach places users and items are within a common embedding space along dimensions (read – features) they have in common. For example, let’s consider two users from Netflix and shows that they rated.

We can present that like this in TensorFlow (no worries, we will not go into TensorFlow details, this is just for example purposes :)):

Now, we ca take the features of each show, which is just k-hot encoding of the genre:

Or in TensorFlow:

Then we can do simple dot product of these matrices and get the affinities of each user:

We can see that the top feature for both users is Comedy, which means they like similar stuff. What have we done here? Well, we not only described items in terms of the mentioned genres, but we have done the same for each user with the same terms. The meaning for a User1, for example, is that she likes Comedy 0.5 but he likes Action 0.1. Note that if we multiply users’ embedding matrix with the transpose item embedding matrix we will recreate the user-item interaction matrix. Now, this works well for simple examples with few users and items. However, as more items and users are added to the system it becomes unscalable. Also, how can we be so sure that the features that we picked are the relevant ones? What if there are some latent features, that we are unable to recognize. So how can we pick the correct features then? This brings us to the matrix factorization.

4. Matrix Factorization Intuition

We mentioned that human-defined features for items and users might not be the best option overall. Fortunately, these embeddings can be learned from data. This means that we don’t manually assign features to the items and to the users, but we will use the user-item interaction matrix to learn the latent factors that best factorize it. As in the previous mind-exercise, this process results in a user factor embedding and item factor embedding matrixes. Technically, we are compressing a sparse user-item interaction matrix and extracting latent factors (something like PCA). That is what matrix factorization is all about, being able to factorize a matrix into two smaller matrixes using which we can reconstruct the original one:

Your content goes here. Edit or remove this text inline or in the module Content settings. You can also style every aspect of this content in the module Design settings and even apply custom CSS to this text in the module Advanced settings.

Similar to the other dimensionality reduction techniques, the number of latent features is a hyperparameter that we can change and use it for a tradeoff between more information compression and more reconstruction error. In order to make a prediction, we can do it in two ways. We can either take the dot product of a user with the item factors or the dot product of an item with the user factors. Matrix factorization helps us with one more problem. Imagine that you have thousands of users in our system and you want to calculate the similarity matrix between them. That matrix would get quite big. Matrix factorization compresses that information for us.

4.1 Matrix Factorization Algoritms

There are several good Matrix Factorization out there. So let’s explore some of the more popular ones. A couple of years back, Netflix published a $1M competition for recommendation systems. The goal was to improve the accuracy of their system based on users’ ratings. The winner used the SVD (Singular Value Decomposition) algorithm to get the best results. This algorithm is still very popular. Formally, it can be defined something like this.

Let A be an m √ó n matrix. The Singular Value Decomposition (SVD) of A,

where U is m √ó m and orthogonal, V is n √ó n and orthogonal, and ő£ is an m √ó n diagonal matrix
with nonnegative diagonal entries ŌÉ1 ‚Č• ŌÉ2 ‚Č• ¬∑ ¬∑ ¬∑ ‚Č• ŌÉp, p = min{m, n}, known as the singular values of A.

Another very popular algorithm is Alternating Least Squares or ALS, and their variations. Like the name suggests, it alternatively solves U holding V constant and then solves for V holding U constant and it works only for the least-squares problems. However, since it is specialized, ALS can be parallelized and it is quite fast algorythm.

One variation of it is Weighted Alternating Least Squares or WALS. The difference is in the way the missing data is treated. As we mentioned a couple of times in the previous articles, one of the biggest enemies of Recommendation Systems is sparse data. WALS adds weights for specific entries and uses those weight vector which can be linearly or exponentially scaled to normalize row and/or column frequencies.

NMF is another popular matrix factorization algorithm. It stands for non-negative matrix factorization. This technique is based on obtaining a low-rank representation of matrices with non-negative or positive elements. NMF uses an iterative procedure to modify the initial values of U and V so that the product approaches V.

5. Implementation with ML.NET

ML.NET currently supports just standard matrix factorization with stochastic gradient descent. This is supported with MatrixFactorization Trainer, as we will be able to see later. 

5.1 High-Level Architecutre

Before we dive deeper into this implementation, let’s consider the high-level architecture of this implementation. Just like in previous ML.NET guides, we want to build an easily extendable solution that we can easily extend with new Matrix Factorization algorithms that ML.NET¬†could include in the future. The solution we propose here is a simple form of Auto ML. The folder structure of our solution looks like this:

Recommendation Systems

The Data folder contains .csv with input data and the MachineLearning folder contains everything that is necessary for our algorithm to work. The architectural overview can be represented like this:

Recommendation Systems

At the core of this solution, we have an abstract TrainerBase class. This class is in the Common folder and its main goal is to standardize the way this whole process is done. It is in this class where we process data and perform feature engineering. This class is also in charge of training machine learning algorithm. The classes that implement this abstract class are located in the Trainers folder. Here we can find multiple classes which utilize ML.NET algorithms. These classes define which algorithm should be used. In this particular case, we have only one Predictor located in the Predictor folder.

5.2 Data Models

In order to load data from the dataset and use it with ML.NET algorithms, we need to implement classes that are going to model this data. Two files can be found in Data Folder: MovieRating and MovieRatingPredictions. The MovieRating class models input data and it looks like this:

As you can see we don’t use date from dataset.

Recommendation Systems

The MovieRatingPredictions class models output data:

5.3 TrainerBase and ITrainerBase

As we mentioned, this class is the core of this implementation. In essence, there are two parts to it. The first one is the interface that describes this class and another is the abstract class that needs to be overridden with the concrete implementations, however, it implements interface methods. Here is the ITrainerBase interface:

The TrainerBase class implements this interface. However, it is abstract since we want to inject specific algorithms:

Recommendation Systems

That is one large class. It controls the whole process. Let’s split it up and see what it is all about. First, let’s observe the fields and properties of this class:

The Name property is used by the class that inherits this one to add the name of the algorithm. The ModelPath field is there to define where we will store our model once it is trained. Note that the file name has .mdl extension. Then we have our MlContext so we can use ML.NET functionalities. Don’t forget that this class is a singleton, so there will be only one in our solution. The _dataSplit field contains loaded data. Data is split into train and test datasets within this structure.

The field _model is used by the child classes. These classes define which machine learning algorithm is used in this field. The _trainedModel field is the resulting model that should be evaluated and saved. In essence, the only job of the class that inherits and implements this one is to define the algorithm that should be used, by instantiating an object of the desired algorithm as _model. 

Cool, let’s now explore Fit() method:

This method is the blueprint for the training of the algorithms. As an input parameter, it receives the path to the .csv file. After we confirm that the file exists we use the private method LoadAndPrepareData. This method loads data into memory and splits it into two datasets, train and test dataset. We store the returning value into _dataSplit because we need a test dataset for the evaluation phase. Then we call BuildDataProcessingPipeline().

Recommendation systems 2

This is the method that performs data pre-processing and feature engineering. For this data, there is no need for some heavy work, we just encode it. Here is the method:

Next is the Evaluate() method:

It is a pretty simple method that creates a Transformer object by using _trainedModel and test Dataset. Then we utilize MlContext to retrieve regression metrics. Finally, let’s check out Save() method:

This is another simple method that just uses MLContext to save the model into the defined path.

5.4 Trainers

Thanks to all the heavy lifting that we have done in the TrainerBase class, the only¬†Trainer class is¬†simple and focused only on instantiating the ML.NET algorithm. Let’ take a look at RandomForestTrainer¬†class:

As you can see, this class is pretty simple. We override the Name and _model. We use the MatrixFactorization class from the Recommendation extension. Notice how we use some of the hyperparameters that this algorithm provides. With this, we can create more experiments. 

5.5 Predictor

The Predictor class is here to load the saved model and run some predictions. Usually, this class is not a part of the same microservice as trainers. We usually have one microservice that is performing the training of the model. This model is saved into file, from which the other model loads it and run predictions based on the user input. Here is how this class looks like:

In a nutshell, the model is loaded from a defined file, and predictions are made on the new sample. Note that we need to create PredictionEngine to do so.

Decision Tree

5.6 Usage and Results

Ok, let’s put all of this together.

Not the TrainEvaluatePredict() method. This method does the heavy lifting here. In this method, we can inject an instance of the class that inherits TrainerBase and a new sample that we want to be predicted. Then we call Fit() method to train the algorithm. Then we call Evaluate() method and print out the metrics. Finally, we save the model. Once that is done, we create an instance of Predictor, call Predict() method with a new sample and print out the predictions. In the Main, we create a list of trainer objects, and then we call TrainEvaluatePredict on these objects.

In the list of algorithms, we relied on the hyperparameters to create several variations of Random Forest. Here are the results:

For testing, we used user ID – 6 and movie ID – 11. If you take a look into the dataset, you will find that the pair and the rating is 4. As you can see, most of the matrix factorization variations have done a good job. The variation with 10 iterations, approximation Rank 50, and learning rate 0.01 seems to got closest. Also, its metrics seem very good. However, further tests are necessary in order to determine which variation performs the best.

Conclusion

In this article, we covered a lot of ground. We learned different Recommendation system types. Then we explored Collaborative filtering and Matrix Factorization. Also, we had a chance to see how it can be used for movie recommendations. Finally, we implemented it all using ML.NET.

Thank you for reading!

Nikola M. Zivkovic

Nikola M. Zivkovic

CAIO at Rubik's Code

Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.

Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.

Ultimate Guide to Machine Learning with Python

This bundle of e-books is specially crafted for beginners.

Everything from Python basics to the deployment of Machine Learning algorithms to production in one place.

Become a Machine Learning Superhero TODAY!