From Netflix, Google, and Amazon, to smaller webshops, recommendation systems are everywhere. In fact, this type of system represents probably one of the most successful business applications of Machine Learning. Their ability to predict what users would like to read, watch and buy proved to be good not only for the business but for the users as well. For users, they provide a way to explore product space and for businesses they provide an increase in user engagement and more knowledge about the customers. Also, these systems are widespread and existing in almost every big cloud platform. When we think of YouTube video recommendations, they are there. Netflix menus with suggested series, they are turning the wheels behind the scene. Gmap suggested routes? You can bet. These systems became one of the building blocks of our industry and it would be bad not to know anything about them. That is why we prepared series of articles on recommendation systems, where we will write about how these system functions, which types of recommendation systems exist and how to implement such a system with machine learning and deep learning techniques.
We don’t do sales, but given the circumstances and the severity of the situation, we decided to change that. Don’t be fooled, this sale isn’t meant for profit and it’s most definitely not planned. This sale is here to help people who want to become better, learn new skills and be more productive than ever before. Our book offers are on a 50% sale.
The main goal of recommendation engines is to identify things that a user may like based on items that they’ve interacted with in the past. Observe that we used terms like “item” and “interacted”. Items are products in the system you want to build a recommendation system for. Eg. for YouTube those are videos, for Amazon those are different types of products and for Netflix, those are series and movies. Users interact differently with these items based on their type, ie. they watch them, read them, buy them. Another goal of recommendation systems is a personalization of the user experience. For example, Google search takes into account where you are located and may yield a different search result for different locations. In a nutshell, these systems provide a user with an ability to find new content that a user might like but didn’t know to search for.
Types of Rec Systems
In order to better understand how the recommendation system functions, let’s consider this example. Let’s say that we need to build a recommendation system for a musical instruments shop. We have a database with users and instruments, along with users’ past purchases and users’ ratings. With this information, we can build a so-called user-item interaction matrix. Here is what that matrix looks like:
In this matrix, each row represents one user and each column represents an item. Items in our example are musical instruments, but depending on the platform they can be anything. If a user has rated an instrument, we have a checkmark (usually along with some value that represents rating) in that cell. So, based on this, how do we create a recommendation for certain users? We could consider some features of the instruments that the user has already ranked or purchased and then recommend similar items. Alternately, we could consider to find similar users based on those rankings and suggest items that those users purchased. But what does it mean that two items are similar? What does it mean that two users are similar? How to calculate that and present that similarity in some mathematical terms?
Different types of recommendation systems take different approaches to these questions. In general, there are four types of recommendation systems:
- Content-Based Recommendation Systems
- Collaborative Filtering Recommendation Systems
- Knowledge-Based Recommendation Systems
- Hybrid Solutions Recommendation Systems
From these three types, the first two are used most often and the most popular. In practice, it can happen that we build hybrid solutions to get better results. All of these approaches calculate some type of similarity, whether it is a similarity between items, or it is a similarity between users. So, before we dive into the details of each type of recommended system, let’s take a moment and explain how similarity is measured using cosine similarity.
Cosine Similarity
To calculate the similarity between two items or two users, we often use Cosine similarity or some variations of it. This approach works really well in the majority of situations and it is a great starting point when you start implementing recommendation systems. The basis of this approach is really simple. If we go back to our instruments shop, we can say that each instrument has two features, it can be a classical instrument or a popular musical instrument. That we can represent in vector space.
Now, if we take violin, for example, it is clearly classical music instrument, so if we map it to the vector space it has coordinates (1, 0) and we would put it somewhere over here:
On the other hand, some kind of midi controller is clearly a popular music instrument, so its coordinates would be (0, 1):
Finally, we can have a guitar, which could fall in both categories, so let’s give it coordinates (1, 1):
The angle between these items is telling us about the similarity between these items. However, the angle is not a good mathematical representation. Ideally, we want this similarity to come into measure between 0 and 1, where 0 means that these items have nothing in common, while 1 means that these items represent the same thing. IN order to achieve this, we calculate the cosine of that angle, which is really cool, because if the angle approaches 0 degrees cosine becomes 1 and as the angle approaches 90 degrees it becomes 0. That is exactly what we need.
Back to our example, the similarity between the MIDI controller and the violin would be 0, because the angle between vectors that represent these items is 90 degrees. The angle between MIDI controller and the guitar, on the other hand, is 45 degrees, so cosine similarity is 0.7071.
Now, in the real world, we would have more than two features, and we need to scale this concept to multiple features, meaning multidimensional vector space. To calculate cosine similarity between two items in multidimensional vector space we use the formula:
Let’s explain this a little bit. Technically we just go through every feature, ie. dimension and we take the value of each item we are calculating similarity for and apply it in the formula. Python code that explains how that is done looks like this:
def cosineSimilarity(item1, item2, item_features):
features1 = item_features[item1]
features2 = item_features[item2]
sum_xy, sum_xx, sum_yy = 0
for i in range(len(features1)):
x = features1[i]
y = features2[i]
sum_xx += x * x
sum_yy += y * y
sum_xy += x * y
return sum_xy/math.sqrt(sum_xx * sum_yy)
There are variations of this formula and completely different similarity metrics. We won’t go into details of each metric in this blog post, but you can explore them on your own:
- Adjusted Cosine
- Pearson Similarity
- Spearman Rank Correlation
- Mean Squared Difference
- Jaccard Similarity
Content-Based Recommendation Systems
This type of recommendation system is focused, well, on content. Meaning they use only features and information from the items and based on them create recommendations for the user. They don’t take into account information from other users. Going back to our example with instruments shop, if a user has bought a lot of guitars, this approach will recommend other guitar-related items. Sometimes this is done by hand engineering features of the items. Learning is then focused on finding out how much is every single user aligned with the item. One of the biggest advantages with this approach is that items in the system usually have permanent features and they don’t change over time, unlike users’ preferences. Also, this approach is good for performance as you would usually have fewer items than users in the system. However, the main problem of this approach is that if we use only user information is impossible to extrapolate how a given user would rank unseen items.
In general, the process of creating these recommendation systems would look like something like this:
- Collect data and present it tabularly – item based
- Calculate the similarity between items. Thinking about items as multidimensional vectors. Calculate other things except for similarity if the system requires this.
- Make a similarity matrix of items.
- Sort the list by similarity and pick Top-N neighbors, meaning items that have the highest similarity with the items that the user already bought( watched, read…). It is beneficial to add a minimum similarity threshold to avoid the recommendation of some obscure items.
- Pick items that selected users haven’t consumed from the list of items. Weight it with the similarity score between items and normalize ratings.
We will explore these types of recommended systems in more detail in a separate article.
Collaborative Filtering Recommendation Systems
The biggest power of recommendation systems is that they can suggest items for users based on their behavior on a certain platform or based on the behavior of other users of the same platform. For example, Netflix suggests your next series to binge, based on the series you’ve previously watched, but based on the series that users that watched and liked the same content as you too.
This type of system uses a complete interaction matrix and calculates similarities between the users and the items simultaneously. The major advantage is that the feature representations can be learned automatically and they don’t have to be hand-engineered. This approach is often based on matrix factorization.
In general, the process of creating these recommendation systems would look like something like this:
- Collect data and present it in tabular form, with values for each user and item (ratings or consumption – boolean)
- Represent each user as a multidimensional array and calculate similarity using only items that they have in common.
- Create a similarity matrix. Keep in mind that if the similarity is equal to1 this doesn’t mean that each user liked the same thing, but that they hated it as well (both marked something with rating 1). Also, this doesn’t mean that they are 100% similar if data is sparse. It is good to use the minimum threshold of shared items.
- Sort the list by similarity and pick Top-N neighbors, meaning users that have the most similarity with the selected user. Add a minimum similarity threshold.
- Pick items that selected users haven’t consumed from the list of items that other similar users have liked (sort by ranking again). Always suggest what other users liked never what they hated and weight it with the similarity score between users.
We will explore these types of recommended systems in more detail in a separate article.
Knowledge-Based Recommendation Systems
Often we can have situations in which we don’t have enough data and we can not use any of the previous approaches. We can consider that only a small number of people would buy instruments regularly, so we would model our recommended system differently. This type of recommendation system is called knowledge-based because they use explicit knowledge about the user’s preferences, items, and or recommendation criteria. In this scenario, recommendation systems would ask a user about their preferences and based on that feedback build recommendations.
Hybrid Solutions
In reality, we might use a combination of these methods and get benefits from all approaches. For example, we can develop different development approaches and then use the appropriate one based on the type of data we have. If the user has rated a lot of items we can use a content-based approach, if a user has rated only a few items we use collaborative filtering and if a user has rated no items we use a knowledge-based approach. Also, we could use outputs of each of these approaches as an input to some more sophisticated recommended system. In fact, research suggests that a hybrid approach combining multiple outcomes into singular input can result in a more accurate recommendation system. Deep learning models can also be used when building a recommendation system in combination with previously mentioned techniques.
Recommendation System Challenges
There are many problems when developing recommendation systems that we should be aware of, so let’s mention a few. The main problem being that data can be sparse and skewed. We say that data is sparse when most values are missing data. This can happen when we have a lot of users and a lot of items, with users that rated the only a fraction of items. In this situation, we would end up with a lot of 0 in the matrix that would only use our computational power. We say that data is skewed when one item is more popular than the others, or some users just like everything. We need to be especially careful about how we handle this type of data.
One of the better-known problems with recommendation systems is a Cold-start problem. This is the problem that occurs when a new user arrives at our platform and we don’t have information about what to recommend to her. Also, this may happen when we add a new item into our catalog because we don’t know how to model it because there are no ratings for it yet.
Other problems are more ethical in nature. For example, we can create so-called recommendation bubbles, where we never recommend something out of the interest of the user and thus keep her isolated form the items that potentially might interest her. This is why we need to inject diversity into our system.
Conclusion
In this article, we started exploring the world of recommended systems. We saw how we can calculate the similarity between different items and users in the systems and we mentioned several types of recommendation systems. Apart from that, we covered some problems that these types may have. In the next article, we are going to expand on this and we will explore metrics that we can use to calculate the performance of these systems.
Thank you for reading!
Nikola M. Zivkovic
CAIO at Rubik's Code
Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.
Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.
Trackbacks/Pingbacks