In the previous article, we have explored how we can use BERT with ML.NET for the question and answering NLP task. In this article, we explore another kind of NLP task – Sentiment analysis. This type of analysis is used to determine if some textual data is positive, negative, or neutral. It is a useful technique that helps businesses to monitor feedback and better understands customer needs.
Are you afraid that AI might take your job? Make sure you are the one who is building it.
STAY RELEVANT IN THE RISING AI INDUSTRY! 🖖
1. Dataset and Prerequisites
The dataset for this article is from the ‘From Group to Individual Labels using Deep Features’, Kotzias et. al,. KDD 2015, and hosted at the UCI Machine Learning Repository – Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository. Concretely, the complete dataset for sentiment analysis can be downloaded here.
This dataset contains sentences labeled with positive or negative sentiment, in the format: sentence | score. The Score is either 1 (for positive) or 0 (for negative). The sentences come from three different websites like Yelp, IMDB, and Amazon. In this article, we use the reviews from IMDB. Here is what they look like:
The implementations provided here are done in C#, and we use the latest .NET 5. So make sure that you have installed this SDK. If you are using Visual Studio this comes with version 16.8.3. Also, make sure that you have installed the following package:
You can do the same from the Package Manager Console:
Note that this will install default Microsoft.ML package as well. You can do a similar thing using Visual Studio’s Manage NuGetPackage option:
If you need to catch up with the basics of machine learning with ML.NET check out this article.
2. What is Sentiment Analysis?
In the past couple of years, sentiment analysis became one of the essential tools to monitor and understand customer feedback. This way detection of underlying emotional tone that messages and responses carry is fully automated, which means that business can better and faster understand what the customer needs and provide better products and services.
Sentiment Analysis is, in a nutshell, the most common text classification tool. It’s the process of analyzing pieces of text to determine the sentiment, whether they’re positive, negative, or neutral. Understand the social sentiment of your brand, product, or service while monitoring online conversations is one of the essential tools of the modern business and sentiment analysis is the first step towards that.
The applications of sentiment analysis are endless. For example, you can use this technique to automatically analyze a large number of reviews about your product which could help you discover if customers are happy about it. Or you want to monitor the response from social media in real-time and automatically detect and contact unhappy customers.
Another cool thing is that sentiment analysis is the first step in feedback analysis. Basically, you can start with sentiment analysis and after that extend your applications with more advanced techniques intent analysis and contextual semantic search.
3. Types of Sentiment Analysis
In its most basic form sentiment analysis detects two levels of emotional feedback – positive and negative. This type is used within this tutorial. However, it is possible to go the other way and detect more specific emotions and intentions. For example, you can detect if the customer is frustrated, happy, sad, interested, not interested, etc. In general, it all depends on what you want to detect and how you structure your training data.
Here are some of the most popular approaches to sentiment analysis:
- Emotions – If you noticed smilies in that social media now automatically puts while you type in your post, this is exactly it. The sentiment analysis component of the system detects in real-time the underlying emotion of the text you are typing in and it can predict if you are angry or happy.
- Fine-Grained Feedback – Instead of just detecting whether the feedback is positive or negative, you can extend this from a very negative to a very positive scale, with everything in between.
- Intent Analysis – This is a deeper understanding of the intention of the customer. You can predict if a customer intends to buy some product or not. Eventually, your system can track the intention of a particular customer, form a pattern, and then be used for marketing and advertising.
- Aspect-based – This type of analysis is used so you can understand how customers feel about specific attributes of the product. For example, how users feels about a certain sections of your e-book.
4. Sentiment Analysis and ML.NET
The dataset that is used in this tutorial has examples of feedbacks that are either positive or negative. This means that we just need to perform binary classification, which is very cool because we can utilize the knowledge from the previous blog posts. We can even use more advanced techniques like SVM and Decision trees. However, the bigger challenge that we face is how to prepare data for this.
Computers don’t understand words. They understand numbers. So we need a mechanism to map words into numbers. In the previous article, we used word embeddings to do so and we use the same technique here. Essentially, we convert words into some vector space, meaning we assign certain vectors or scalars (map them to some latent vector space) to each word in the language. These are word embeddings. There are many available word embeddings like Word2Vec.
In this article, we use ML.NET’s default word embeddings or word features. We use the FeaturizeText method to do so. This method transforms a text column into a float array of normalized ngrams and char-grams counts. Here is a quick example of how it can be used.
5. Sentiment Analysis Implementation with ML.NET
5.1 High-Level Architecture
Before we dive deeper into this implementation, let’s consider the high-level architecture of this implementation. In general, we want to build an easily extendable solution that we can easily extend with new Binary Classification algorithms that ML.NET will include in the future. We certainly hope that multiclass options will be available in the future. That is why the folder structure of our solution looks like this:
The Data folder contains .txt with input data and the MachineLearning folder contains everything that is necessary for our algorithm to work. The architectural overview can be represented like this:
At the core of this solution, we have an abstract TrainerBase class. This class is in the Common folder and its main goal is to standardize the way this whole process is done. It is in this class where we process data and perform feature engineering. This class is also in charge of training machine learning algorithm. The classes that implement this abstract class are located in the Trainers folder. Here we can find multiple classes which utilize ML.NET algorithms. These classes define which algorithm should be used. In this particular case, we have only one Predictor located in the Predictor folder.
4.2 Data Models
In order to load data from the dataset and use it with ML.NET algorithms, we need to implement classes that are going to model this data. Two files can be found in Data Folder: SentimentData and SentimentPredictions. The SentimentData class models input data and it looks like this:
The SentimentPredictions class models output data:
4.3 TrainerBase and ITrainerBase
As we mentioned, this class is the core of this implementation. In essence, there are two parts to it. The first one is the interface that describes this class and another is the abstract class that needs to be overridden with the concrete implementations, however, it implements interface methods. Here is the ITrainerBase interface:
The TrainerBase class implements this interface. However, it is abstract since we want to inject specific algorithms:
That is one large class. It controls the whole process. Let’s split it up and see what it is all about. First, let’s observe the fields and properties of this class:
The Name property is used by the class that inherits this one to add the name of the algorithm. The ModelPath field is there to define where we will store our model once it is trained. Note that the file name has .mdl extension. Then we have our MlContext so we can use ML.NET functionalities. Don’t forget that this class is a singleton, so there will be only one in our solution. The _dataSplit field contains loaded data. Data is split into train and test datasets within this structure.
The field _model is used by the child classes. These classes define which machine learning algorithm is used in this field. The _trainedModel field is the resulting model that should be evaluated and saved. In essence, the only job of the class that inherits and implements this one is to define the algorithm that should be used, by instantiating an object of the desired algorithm as _model.
Cool, let’s now explore Fit() method:
This method is the blueprint for the training of the algorithms. As an input parameter, it receives the path to the .csv file. After we confirm that the file exists we use the private method LoadAndPrepareData. This method loads data into memory and splits it into two datasets, train and test dataset. We store the returning value into _dataSplit because we need a test dataset for the evaluation phase. Then we call BuildDataProcessingPipeline().
This is the method that performs data pre-processing and feature engineering. For this data, there is no need for some heavy work, we just create word embeddings. Here is the method:
Next is the Evaluate() method:
It is a pretty simple method that creates a Transformer object by using _trainedModel and test Dataset. Then we utilize MlContext to retrieve regression metrics. Finally, let’s check out Save() method:
This is another simple method that just uses MLContext to save the model into the defined path.
Thanks to all the heavy lifting that we have done in the TrainerBase class, the other Trainer classes are pretty simple and focused only on instantiating the ML.NET algorithm. We have ten classes that utilize ML.NET‘s binary classifiers. Let’ take a look at one of them – DecisionTreeTrainer class:
As you can see, this class is pretty simple. We override the Name and _model. We use the FastTree class from the BinaryClassificaton namespace. Notice how we use some of the hyperparameters that this algorithm provides. With this, we can create more experiments. The numberOfLeaves represents the number of nodes that are going to be created in each branch of the decision tree, while the numberOfTrees represent the number of trees that are going to be trained. Remember, this implementation uses the MART algorithm, which creates multiple trees and then picks the best one. The learningRate hyperparameter defines how fast this algorithm learns. The other class are similar, some have hyperparameters, some don’t.
The Predictor class is here to load the saved model and run some predictions. Usually, this class is not a part of the same microservice as trainers. We usually have one microservice that is performing the training of the model. This model is saved into file, from which the other model loads it and run predictions based on the user input. Here is how this class looks like:
In a nutshell, the model is loaded from a defined file, and predictions are made on the new sample. Note that we need to create PredictionEngine to do so.
4.6 Usage and Results
Ok, let’s put all of this together.
Not the TrainEvaluatePredict() method. This method does the heavy lifting here. In this method, we can inject an instance of the class that inherits TrainerBase and a new sample that we want to be predicted. Then we call Fit() method to train the algorithm. Then we call Evaluate() method and print out the metrics. Finally, we save the model. Once that is done, we create an instance of Predictor, call Predict() method with a new sample and print out the predictions. In the Main, we create a list of trainer objects, and then we call TrainEvaluatePredict on these objects.
In the list of algorithms, we relied on the hyperparameters to create several variations of Decision Trees. Here are the results:
Awesome, so we got different predictions from different algorithms, along with different metrics. Note that a lot of algorithms actually have bad performance and mark “This is awesome!” as a negative review. Apart from that a lot of algorithms have low confidence (probability) even though they marked the statement as positive. The best results gave Sdca Logistic Regression with 76% confidence that sentence is positive. This is an indication that we need to do additional data preparation.
In this article, we covered a lot of ground. We learned how Sentiment Analysis works and which types of it are out there. As always, we implemented it all using ML.NET.
Thank you for reading!
Nikola M. Zivkovic
CAIO at Rubik's Code
Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.
Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.