In the previous articles, we covered some PyTorch basics. First, we explored tensors, gradients and how we can use these concepts to write machine learning algorithms using this framework. Then we utilized that knowledge and used Pytorch for its main purpose – deep learning. We had a chance to see how we can implement a feedforward and convolutional neural network for image classification. In this article, we cover TorchServe, a new way to deploy PyTorch models. This is still a new technology, it’s current version is 0.1 and it is highly experimental, but it is very promising. In essence, TorchServe removes additional servers that you would otherwise have to write manually. It is similar to TensorFlow Serve and it makes creating modern Deep Learning applications much easier.
Are you afraid that AI might take your job? Make sure you are the one who is building it.
STAY RELEVANT IN THE RISING AI INDUSTRY! 🖖
TorchServe Architecture
The main goal of the TorchServe and similar applications is to provide API through which other parts of the system can communicate with the model. It exposes three types of API:
- API description – Retrieves a description of the API using OpenAPI 3.0 specification.
Call example: curl -X OPTIONS http://localhost:8080 - Health check API – Retrieves the status of the TorchServe.
Call example: curl http://localhost:8080/ping - Predictions API – Make predictions API calls to the models that are served with TorchServe.
Call example: curl -X POST http://localhost:8080/predictions/{model_name} -T {input_data}
In order to provide these APIs TorchServe is composed of several parts. Here is how it looks like from a high-level perspective:
All API requests are routed through the so-called Frontend. This component of TorchServe is, apart from handling all requests and responses coming from the client, in charge of the model’s lifecycle. The instances of the models are hosted by Model Workers. These components run the actual interfaces of each model. All loadable models are stored within a directory (cloud or a local one) – Model Store. In a nutshell, once you run TorchServe (we will see how that is done in a bit) you load different models that are available in Model Store. Then you can start each of these model, which will create a new Model Worker instance which will expose the interface to that specific model. Then you can send API calls that are handled by the Frontend and routed to the correct Model Worker.
Installation
At the moment, TorchServe supports only Linux and MacOS. If you are a Windows user you can still use TorchServe with Docker. TorchServe also requires Java 11 SDK, so make sure that you install that first. For Linux run this command:
sudo apt-get install openjdk-11-jdk
For MacOS run this:
brew tap AdoptOpenJDK/openjdk
brew cask install adoptopenjdk11
Then you can install TorchServe with either pip:
pip install torch torchtext torchvision sentencepiece psutil future
pip install torchserve torch-model-archiver
Or with Conda:
conda create --name torchserve torchserve torch-model-archiver psutil \
future pytorch torchtext torchvision -c pytorch -c powerai
If you want to install GPU version use this:
conda create --name torchserve torchserve torch-model-archiver psutil \
future pytorch torchtext torchvision cudatoolkit=10.1 -c pytorch -c powerai
Note that these commands will install TorchServe and Torch Model Archiver.
Saving Trained Model
Before running and serving models with TorchServe first you need to save it. Let’s do that with the Feed-Forward Neural network model we created in the previous article. Here is how that model looks like:
class FFNN(nn.Module):
"""Simple Feed Forward Neural Network with n hidden layers"""
def __init__(self, input_size, num_hidden_layers, hidden_size, out_size, accuracy_function):
super().__init__()
self.accuracy_function = accuracy_function
# Create first hidden layer
self.input_layer = nn.Linear(input_size, hidden_size)
# Create remaining hidden layers
self.hidden_layers = nn.ModuleList()
for i in range(0, num_hidden_layers):
self.hidden_layers.append(nn.Linear(hidden_size, hidden_size))
# Create output layer
self.output_layer = nn.Linear(hidden_size, out_size)
def forward(self, input_image):
# Flatten image
input_image = input_image.view(input_image.size(0), -1)
# Utilize hidden layers and apply activation function
output = self.input_layer(input_image)
output = F.relu(output)
for layer in self.hidden_layers:
output = layer(output)
output = F.relu(output)
# Get predictions
output = self.output_layer(output)
return output
def training_step(self, batch):
# Load batch
images, labels = batch
# Generate predictions
output = self(images)
# Calculate loss
loss = F.cross_entropy(output, labels)
return loss
def validation_step(self, batch):
# Load batch
images, labels = batch
# Generate predictions
output = self(images)
# Calculate loss
loss = F.cross_entropy(output, labels)
# Calculate accuracy
acc = self.accuracy_function(output, labels)
return {'val_loss': loss, 'val_acc': acc}
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
# Combine losses and return mean value
epoch_loss = torch.stack(batch_losses).mean()
# Combine accuracies and return mean value
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean()
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result):
print("Epoch: {} - Validation Loss: {:.4f}, Validation Accuracy: {:.4f}".format( \
epoch, result['val_loss'], result['val_acc']))
After the training process (for more details check out here) we can save it using the save() method and model’s state dictionary. When you train the model using PyTorch, all its weights and biases are stored within the parameters attribute of torch.nn.Module. You can access these parameters using parameters function model.parameters(). The state dictionary is a Python dictionary object that maps each layer of the model to its parameter tensor. You can access it using the state_dict attribute of the model. Note that optimizer objects from torch.optim have state_dict as well, but it contains information about used hyperparameters and optimizer state. Here is how you can use state_dict to save the model:
torch.save(model.state_dict(), os.path.join('.\models', 'ffnn.pth'))
Quite easy, isn’t it? Your model will be located in the path you used for a second parameter. As the output, you will find the ffnn.pth located in the ./models folder.
Serving Model
Now, we have all the necessary pieces for serving the models using TorchServe. Let’s start model archiver and add model to the Model Store:
torch-model-archiver --model-name ffnn --version 1.0 --serialized-file ./models/ffnn.pth \
--export-path ./model_store --handler image_classifier
With this command, we moved the ffnn model we created in the previous section. However, we also gave it a version and a name. Make sure that you have created a folder for Model Store before calling this command (in our example that is ./model_store location). After this call, you will find ffnn.mar file in the ./model_store. Finally, we can start TorchServe:
torchserve --start --ncs --model-store model_store --models ffnn.mar
Our model is available at localhost port 8080 and we can utilize APIs we covered previously. For example:
curl POST http://localhost:8080/predictions/ffnn -T data/sample_image.png
Also you can aqire list of available models:
curl http://localhost:8081/models
{
"models": [
{
"modelName": "ffnn",
"modelUrl": "ffnn.mar"
}
]
}
If you want to stop TorchServe, all you need to do is call the command:
torchserve --stop
Another cool thing is that TorchServe exposes configuration, using which you can configure a number of worker threads on CPU and GPU. This can be very useful if your server is under a heavy workload. For example, you might want to use number_of_gpu which limits the number of used GPU per model.
TorchServe and Docker
Another option when it comes to serving PyTorch models with TorchServe is to use it in combination with Docker. If you need some more info on what is Docker check out here. Just like for other tools, there is TorchServe image available on Docker Hub, so you can pull it from there. To start CPU based image run this command:
docker run --rm -it -p 8080:8080 -p 8081:8081 pytorch/torchserve:latest-cpu
Similarly for the GPU based image run:
docker run --rm -it --gpus '"device=1,2"' -p 8080:8080 -p 8081:8081 pytorch/torchserve:latest-gpu
However, if you want to create .mar file, you need to do some additional steps. After the Docker container is started, acquire the name of the container:
docker ps
Connect to it’s bash prompt:
docker exec -it <container_name> /bin/bash
Finally, run Model Archiver:
torch-model-archiver --model-name ffnn --version 1.0 --serialized-file /home/modles/ffnn.pth \
--export-path /home/model-server/model-store --handler image_classifier
Don’t forget to take care of production parameters when you are deploying TorchServe in Production with Docker. For example, you might want to use something like this:
docker run --rm --shm-size=2g \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-p8080:8080 \
-p8081:8081 \
--mount type=bind,source=path_to_model_store,target=/tmp/models <container> \
torchserve --model-store=/tmp/models
This way you can set up shared memory size, user limits for system resources and expose ports, and avoid potential problems on your server.
Conclusion
In this article, we covered the basics of deployment with PyTorch and TorchServe. We had a chance to explore the architecture and main components of TorchServe, and we had a chance to see how we can prepare models for serving it with this tool. Finally, we combined this tool with Docker.
Thanks for reading!
Nikola M. Zivkovic
CAIO at Rubik's Code
Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.
Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.
Hii Nikola!
This was a really interesting read. I’ve recently been introduced to PyTorch and have been blogging about it (https://realdevtalk.com) for a few weeks to get more familiar with its exploitation when it comes to deep learning models. This, however, is my first time hearing about TorchServe, and I found it really cool and useful. Your explanations are very clear and easy to follow. Will definitely be reading more about it.
Thanks for sharing! You’ve just earned a new subscriber 🙂
Thank you for your kind words Khalil, I am glad you like it!