Performance testing FastAPI ML APIs with Locust

MLOps knowledge has become one of the major skills that one machine learning engineer can have. However, putting a machine learning model into production successfully is not an easy task. It requires a wide range of software development and DevOps abilities in addition to data science understanding. In a nutshell, in order to increase your value as a machine learning engineer, you must not only understand how to apply various Machine Learning and Deep Learning models to a specific problem, but also how to test, verify, and deploy them.

Having someone who can put Machine Learning models into production has become a major benefit for any business. One of the final problems, when it comes to putting Machine Learning models into production, is verifying that API that is serving this model is having good performance. In this article, we focus on how to train and deploy a model with FastAPI and how to test the performance of this API with Locust.

Ultimate Guide to Machine Learning with Python

In this article we cover:

0. Prerequisites

1. Load Tests

2. Training and Deploying Machine Learning Model

3. Performance Tests with Locust

Prerequisites

To successfully run the examples from this tutorial, Python 3.6 or higher needs to be installed. The easiest way to do that is to use Anaconda distribution. It comes with all other necessary libraries for this tutorial, like Pandas, NumPy, SciKit Learn, etc. For training the model we use SciKit Learn.

To install FastAPI and all its dependencies use the following command:

pip install fastapi[all]

This includes Uvicorn, an ASGI server that runs your code. If you are more comfortable with some other ASGI server like Hypercorn, that is also fine and you can use it for this tutorial.

Finally, to install the Locust testing framework used in this tutorial use this command:

pip install locust

Dataset

Data that we use in this article is from PalmerPenguins Dataset. This dataset has been recently introduced as an alternative to the famous Iris dataset. It is created by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. You can obtain this dataset here, or via Kaggle.

This dataset is essentially composed of two datasets, each containing data of 344 penguins. Just like in Iris dataset there are 3 different species of penguins coming from 3 islands in the Palmer Archipelago. Also, these datasets contain culmen dimensions for each species. The culmen is the upper ridge of a bird’s bill. In the simplified penguin’s data, culmen length and depth are renamed as variables culmen_length_mm and culmen_depth_mm.

1. Load Tests

Load testing is a type of non-functional software testing in which a software application’s performance is evaluated under a certain load. It determines how the software application operates when numerous people access it at the same time, or how an API performs when it has a lot of requests. Load testing is used to identify performance bottlenecks and ensure the stability and smooth operation of the application.

Before releasing the machine learning API into production, load testing identifies the following issues:

Each transaction’s response time
System component performance under varying loads
Database component performance under various loads
Network delays
Issues with software design
Web servers, application servers, database servers, and other server configuration difficulties.
Issues with hardware

Load tests will indicate if the system needs to be fine-tuned or if hardware and software modifications are needed to increase performance.

2. Training and Deploying Machine Learning Model

In this tutorial, we use RandomForest for classifying PalmerPenguins dataset. We don’t go into the depth of how this algorithm functions and how you should train this algoritm. This chapter is just briefly running through all the concepts because the focus is on performance testing. If you want to learn more download the code here and check out our guide.

2.1 Training Classification Model

As we mentioned for the classification examples we use PalmerPenguins dataset. However, since we want to do binary classification we need to do some preparations. First, we load the dataset, remove features that we don’t use in this article and remove the one class because we perform binary classification:

data = pd.read_csv('./data/penguins_size.csv')
data.head()

2.1.1 Preparing the data

We remove all samples that are labeled with class ‘Chinstrap‘ and features that we don’t want to use (we use only culmen_length_mm and culmen_depth_mm).

data = data.dropna()
data = data.drop(['sex', 'island', 'flipper_length_mm', 'body_mass_g'], axis=1)
data = data[data['species'] != 'Chinstrap']

Then we separate input data and scale it:

X = data.drop(['species'], axis=1)
X = X.values
ss = StandardScaler()
X = ss.fit_transform(X)

After that, we extract output values and mark them with values -1 and 1, since the SVM algorithm requires that.

y = data['species']
spicies = {'Adelie': -1, 'Gentoo': 1}
y = [spicies[item] for item in y]
y = np.array(y)

Another thing that we remove is the 182nd sample form the dataset. Why? Well, this sample was in between classes and it kinda messed with points that we try to make, so we remove it and split data into training and test datasets:

# Remove sample that is too close
X = np.delete(X, 182, axis=0)
y = np.delete(y, 182, axis=0)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=33)

That is it, our dataset is ready and here is what it looks when we plot it:

plt.figure(figsize=(11, 5))
plt.scatter(X[y == 0][:, 0], X[y == 0][:, 1], color='orange', label='Adelie')
plt.scatter(X[y == 1][:, 0], X[y == 1][:, 1], color='gray', label='Gentoo')
plt.legend();

2.1.2 Training Random Forest

Sci-Kit Learn library provides a great implementation of Random Forest algorithm. Here is how we can use RandomForestClassifier:

rf_classifier = RandomForestClassifier(n_estimators=11, max_leaf_nodes=16, n_jobs=-1)
rf_classifier.fit(X_train, y_train)

rf_preditions = rf_classifier.predict(X_test)
print(accuracy_score(rf_preditions, y_test))

1.0

Let’s check out the classification diagram for Random Forest:

Ok, our Random Forest algorithm created two areas for two classes based on the training data. It seems that it did a pretty good job.

2.2. Deploying Machine Learning Model with FastAPI

The systems that use machine learning are usually built using a microservices architecture. From the machine learning point of view, this means that there is one service that performs the training of the model and another one that uses it. This way, we have a clear separation of concerns. The communication between these two microservices is done via some file in which the model is stored. Something like this:

2.2.1 Saving the Model into a File

Here is how we can save a model into file when we use Sci-Kit Learn. We need to add this at the end of the training file:

from joblib import dump

Save model into file.
dump(model, ' ./static/model/classifier.joblib')

2.2.2 Rest API with FastAPI

Let’s build an API that loads and serves this model with FastAPI. First, we build a cl component that is in charge of loading the trained model. Through the constructor, it receives information about the algorithm that the user picked and it handles it from thereon. Here is what it looks like:

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from penguin_sample import PenguinSample
from sklearn.preprocessing import StandardScaler
from joblib import load
import numpy as np

class ModelLoader():
    '''Class for loading models into memory.'''

    def __init__(self, algoritm):
        self.scaledData = False
        self.model = load('./models/random_forest.joblib')

        self.scaler = StandardScaler()

    def prepare_sample(self, raw_sample: PenguinSample):
        '''Prepare new sample so it can be processed by the model.'''
        island = self._island_map[raw_sample.island]
        sex = self._sex_map[raw_sample.sex]

        sample = [raw_sample.culmenLength, raw_sample.culmenDepth, raw_sample.flipperLength, raw_sample.bodyMass, island, sex]
        sample = np.array([np.asarray(sample)]).reshape(-1, 1)

        if(self.scaledData):
            self.scaler.fit_transform(sample)

        return sample.reshape(1, -1)

    def predict(self, data: PenguinSample):
        '''Predict class of the new sample.'''
        prepared_sample = self.prepare_sample(data)
        return self.model.predict(prepared_sample)

The most important file is the main.py file. This file puts all other pieces together and builds the REST API with FastAPI. Here is what that looks like:

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from model_loader import ModelLoader
from train_parameters import TrainParameters
from penguin_sample import PenguinSample

origins = [
    "http://localhost:8000",
    "http://127.0.0.1:8000/predict",
    "http://127.0.0.1:8000/load",
    "http://localhost:4200"
]

app = FastAPI()
app.add_middleware(
    CORSMiddleware,
    allow_origins=origins,
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

# Initially load SVM Model.
app.model = ModelLoader('svm')

@app.post("/load")
async def load(params: TrainParameters):
    '''API Route for loading defined model into memory.'''
    print("Model Loading Started")
    app.model =  ModelLoader(params.model.lower())
    return True

@app.post("/predict")
async def predict(data:PenguinSample):
    '''API Route for making predictions based on the entered information.'''
    print("Predicting")
    spicies_map = {0: 'Adelie', 1: 'Chinstrap', 2: 'Gentoo'}
    species = app.model.predict(data)
    return spicies_map[species[0]]

The predict method is receiving data from the predict tab of the web application. It calls the predict method from the ModelLoader and returns the predicted value. To run this app execute following command:

uvicorn main:app --port 8000

We can access our API via browser, where we can use Swagger to test /predict endpoint.

3. Performance Tests with Locust

Locust is a performance testing tool that is simple to use, scriptable, and scalable. Instead of being stuck in a UI or a limited domain specialized language, you describe your users’ behavior in standard Python code. Locust is infinitely extendable and developer-friendly as a result of this.

3.1 Locust Main Components

In essence, a Locust test is a Python application. This gives it a lot of flexibility, and it’s especially good at handling complicated user flows. There are three main components that we use to create tests:

task – This is a descriptor that we use to define which method is a test method that the Locust server should run.
HttpUser – This is a core or the solution. This class should be inherited from our test class. It offers each user a client attribute, which is an instance of HttpSession, which may be used to perform HTTP requests to the load-testing target system.
between(m,n) – makes the simulated users wait between m and n seconds after each task is executed.

3.2 Writing Locust Test

Here is how the locust test file looks like:

import json
from locust import HttpUser, task, between
from penguin_sample import PenguinSample

class PerformanceTests(HttpUser):
    wait_time = between(1, 3)

    @task(1)
    def testFastApi(self):
        sample = PenguinSample(island="Torgersen", 
                               culmenLength=39.1, 
                               culmenDepth=18.7, 
                               flipperLength=181, 
                               bodyMass=3750, 
                               sex="MALE", 
                               species="Adelie")
        headers = {'Accept': 'application/json', 'Content-Type': 'application/json'}
        self.client.post("/predict", data=json.dumps(sample.dict()), headers=headers)

Let’s break it down a little bit. After necessary imports, we create a PerformanceTests class that inherits HttpUser from the locust module. This way we are able to interact with the API that we implemented. This class has one method testFastApi which is our test task.

This task creates a new sample of PenguinSample for which we want to get the prediction. Then we define headers for our post method. Finally, we use the client to send a post method. Note that we just define the endpoint, ie. “/predict”, and not the complete address of the API. Also, we send the sample as data.

In general, this is a simple example, with just one task. We can define as many tasks as we want to.

3.3 Running Locust Test with a UI

To run previously defined file, we can use command:

locust -f performance_tests.py

This command runs a Locust server for performance tests on our machine. We can access it via browser, the default port is 8989. Here we need to provide information about tests to Locust. In the screen, we define how many users (ie. processes) we want to create. These processes are going to send requests to the API.

Also, we need to define how fast those processes are going to be created. Finally, we need to define the address of the API. Note that here you don’t need to define the endpoint, but just the address. Here is how I have filled it:

So, Locust is going to create 100 processes in total with the speed of 1 process per second. Each of these processes is going to perform a task defined in PerformanceTest class. Once the button Start Swarming is clicked. Once running, we have a couple of screens in which we can observe the progress. The first screen is Statistics. This one gives us the overall picture of our test. Here we can see how fast our API responded during the progress of the test, what was the number of requests per second (RPS) and how many failures we had:

The same thing, but with the complete history can be seen in the Charts tab. This is a more interesting tab in which you can see how a number of requests grows with the number of processes (users), and how these values affect the speed of the API.

Apart from that, there are tabs for Failures and Exceptions in which you can see why your requests failed. In this particular case, we didn’t have any errors, but here is an example of the API that I recently tested that has some errors in the stats and I had to check what was the reason in the Failures tab:

You can also download Locust report as csv or html in the Download tab.

Conclusion

In this article, we covered the complete process from training the model to deploying it in the API and testing the performance of that API with Locust.

Thank you for reading!

Nikola M. Zivkovic

CAIO at Rubik's Code

Nikola M. Zivkovic is the author of the book: Ultimate Guide to Machine Learning and Deep Learning for Programmers. He loves knowledge sharing, and he is an experienced speaker. You can find him speaking at meetups, conferences, and as a guest lecturer at the University of Novi Sad.

Performance testing FastAPI ML APIs with Locust

Prerequisites

Dataset

1. Load Tests

2. Training and Deploying Machine Learning Model

2.1 Training Classification Model

2.1.1 Preparing the data

2.1.2 Training Random Forest

2.2. Deploying Machine Learning Model with FastAPI

2.2.1 Saving the Model into a File

2.2.2 Rest API with FastAPI

3. Performance Tests with Locust

3.1 Locust Main Components

3.2 Writing Locust Test

3.3 Running Locust Test with a UI

Conclusion

Nikola M. Zivkovic

Trackbacks/Pingbacks

Leave a ReplyCancel reply

Feel Free To Message Us

Contact Info

Visit Us

Email Us

Call Us

Ultimate Guide to Machine Learning with Python

Performance testing FastAPI ML APIs with Locust

The code that accompanies this article can be received after subscription

Prerequisites

Dataset

1. Load Tests

2. Training and Deploying Machine Learning Model

2.1 Training Classification Model

2.1.1 Preparing the data

2.1.2 Training Random Forest

2.2. Deploying Machine Learning Model with FastAPI

2.2.1 Saving the Model into a File

2.2.2 Rest API with FastAPI

3. Performance Tests with Locust

3.1 Locust Main Components

3.2 Writing Locust Test

3.3 Running Locust Test with a UI

Conclusion

Nikola M. Zivkovic

Trackbacks/Pingbacks

Leave a ReplyCancel reply

Discover more from Rubix Code