In the previous articles, we covered some PyTorch basics. First, we explored tensors, gradients and how we can use these concepts to write machine learning algorithms using this framework. Then we utilized that knowledge and used Pytorch for its main purpose – deep learning. We had a chance to see how we can implement a feedforward and convolutional neural network for image classification. In this article, we cover TorchServe, a new way to deploy PyTorch models. This is still a new technology, it’s current version is 0.1 and it is highly experimental, but it is very promising. In essence, TorchServe removes additional servers that you would otherwise have to write manually. It is similar to TensorFlow Serve and it makes creating modern Deep Learning applications much easier.

Are you afraid that AI might take your job? Make sure you are the one who is building it.

STAY RELEVANT IN THE RISING AI INDUSTRY! đź––

TorchServe Architecture

The main goal of the TorchServe and similar applications is to provide API through which other parts of the system can communicate with the model. It exposes three types of API:

  • API description – Retrieves a description of the API using OpenAPI 3.0 specification.
    Call example: curl -X OPTIONS http://localhost:8080
  • Health check API – Retrieves the status of the TorchServe.
    Call example: curl http://localhost:8080/ping
  • Predictions API – Make predictions API calls to the models that are served with TorchServe.
    Call example: curl -X POST http://localhost:8080/predictions/{model_name} -T {input_data}

In order to provide these APIs TorchServe is composed of several parts. Here is how it looks like from a high-level perspective:

Pytorch Installation

All API requests are routed through the so-called Frontend. This component of TorchServe is, apart from handling all requests and responses coming from the client, in charge of the model’s lifecycle. The instances of the models are hosted by Model Workers. These components run the actual interfaces of each model. All loadable models are stored within a directory (cloud or a local one) – Model Store. In a nutshell, once you run TorchServe (we will see how that is done in a bit) you load different models that are available in Model Store. Then you can start each of these model, which will create a new Model Worker instance which will expose the interface to that specific model. Then you can send API calls that are handled by the Frontend and routed to the correct Model Worker.

Installation

At the moment, TorchServe supports only Linux and MacOS. If you are a Windows user you can still use TorchServe with Docker. TorchServe also requires Java 11 SDK, so make sure that you install that first. For Linux run this command:

For MacOS run this:

Then you can install TorchServe with either pip:

Or with Conda:

If you want to install GPU version use this:

Note that these commands will install TorchServe and Torch Model Archiver.

Saving Trained Model

Before running and serving models with TorchServe first you need to save it. Let’s do that with the Feed-Forward Neural network model we created in the previous article. Here is how that model looks like:

After the training process (for more details check out here) we can save it using the save() method and model’s state dictionary. When you train the model using PyTorch, all its weights and biases are stored within the parameters attribute of torch.nn.Module. You can access these parameters using parameters function model.parameters(). The state dictionary is a Python dictionary object that maps each layer of the model to its parameter tensor. You can access it using the state_dict attribute of the model. Note that optimizer objects from torch.optim have state_dict as well, but it contains information about used hyperparameters and optimizer state. Here is how you can use state_dict to save the model:

Quite easy, isn’t it? Your model will be located in the path you used for a second parameter. As the output, you will find the ffnn.pth located in the ./models folder.

Serving Model

Now, we have all the necessary pieces for serving the models using TorchServe. Let’s start model archiver and add model to the Model Store:

With this command, we moved the ffnn model we created in the previous section. However, we also gave it a version and a name. Make sure that you have created a folder for Model Store before calling this command (in our example that is ./model_store location). After this call, you will find ffnn.mar file in the ./model_store. Finally, we can start TorchServe:

Decision Tree

Our model is available at localhost port 8080 and we can utilize APIs we covered previously. For example:

Also you can aqire list of available models:

If you want to stop TorchServe, all you need to do is call the command:

Another cool thing is that TorchServe exposes configuration, using which you can configure a number of worker threads on CPU and GPU. This can be very useful if your server is under a heavy workload. For example, you might want to use number_of_gpu which limits the number of used GPU per model.

TorchServe and Docker

Another option when it comes to serving PyTorch models with TorchServe is to use it in combination with Docker. If you need some more info on what is Docker check out here. Just like for other tools, there is TorchServe image available on Docker Hub, so you can pull it from there. To start CPU based image run this command:

Similarly for the GPU based image run:

However, if you want to create .mar file, you need to do some additional steps. After the Docker container is started, acquire the name of the container:

Connect to it’s bash prompt:

Finally, run Model Archiver:

Don’t forget to take care of production parameters when you are deploying TorchServe in Production with Docker. For example, you might want to use something like this:

This way you can set up shared memory size, user limits for system resources and expose ports, and avoid potential problems on your server.

Conclusion

In this article, we covered the basics of deployment with PyTorch and TorchServe. We had a chance to explore the architecture and main components of TorchServe, and we had a chance to see how we can prepare models for serving it with this tool. Finally, we combined this tool with Docker.

Thanks for reading!

Nikola M. Zivkovic

Nikola M. Zivkovic

CAIO at Rubik's Code

Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.

Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.