In the previous article, we started exploring the ways one deep learning model can be deployed. There we decided to run a simple Flask Web app and expose simple REST API that utilizes a deep learning model in that first experiment. However, this approach is not very efficient and has a few limitations. One of the flaws is that applications deployed like this would be hard to scale. If we would have more users, there would be no way to easily spawn more workers that can handle this situation. Another shortcoming is that the REST API is not the best way to utilize deep learning models, though it is sometimes necessary. This is, however, the problem for another article. In this article, we focus on the scalability problem and we explore model deployment with Docker and TensorFlow Serving. We also want to skip some of the work that we have done while building the Flask app. Let’s start with Docker and containers.
Eager to learn how to build Deep Learning systems using Tensorflow 2 and Python? Get the ebook here!
Docker and containers were born out of the need to better utilize servers. Basically, at one moment in history, servers became very powerful in terms of processing power. However, there we never create processes that require and utilize that power. Web Applications don’t need that. That is when concepts of virtualization and virtual machines were born. These concepts gave servers the power to run more applications on different operating systems at the same time. Then big companies like Amazon saw this as a business opportunity and provided cheap cloud solutions based on it. This is how the cloud was created. During the time, applications became bigger in terms of dependencies. It became hard to develop and maintain them because a developer should take care of all external libraries, frameworks and operating systems. Docker and containers solved this problem, by providing a means to run any application regardless of the operating system.
Docker and Containers
At this point, one might ask, what is a container? The container packages and unites application code with all its dependencies. This way it became a unit that can run on any computing environment. Docker is a tool that helps us build and manage containers. The difference between the container and the virtual machine is in the way that hardware is utilized. In general, virtual machines are managed by Hypervisors. However, Hypervisors are only available on processors that support the virtual replication of hardware. This essentially means that virtual machines run software on real hardware while providing isolation from it. On the other hand, containers require an operating system with basic services and use virtual-memory support for isolation. To sum it up, a virtual machine provides an abstract machine (along with device drivers required for that abstract machine), while a container provides an abstract operating system.
To install Docker follow instructions provided on this page. Docker comes with the UI, which we will not consider in this article. We utilize only docker CLI, which comes with this installation as well. You know, like the real hackers 🙂
There are three important Docker components that you should be aware of: Docker container Image, Dockerfile and Docker Engine. Docker container image is a lightweight file-system that includes everything that the application needs to run. It has the system tools and libraries, runtime and the application code. These images are turned into Docker containers once a user runs docker build command, which will be explained in detail in a little while. Once this command is initiated, Docker uses Dockerfile to create a Docker container from Docker Image. This container is then run by the Docker engine. There are a lot of pre-cooked Docker Images available at Docker Hub. For example, if you have a need for a docker image for Ubuntu, you can find it at Docker Hub. Docker image is obtained by initiating command: docker pull image_name. For the purpose of this article, we need three images, so let’s download them right away:
docker pull ubuntu
docker pull tensorflow/tensorflow
docker pull tensorflow/serving
Note that every Docker image has multiple tags, which can be observed as a specific image version or variant.
Dockerfile is the file that defines what the container…well, contains. In this file, a user defines which Docker image should be used, which dependencies should be installed and which application should be run and how. In essence, a Docker image consists of read-only layers. Each of these layers is represented with one Dockerfile instruction.
Usually, we start a Dockerfile with FROM instruction. With it, we define which image is used. Then we use different instructions to describe the system and the application we want to run. We may use COPY instruction to copy files from the local environment to the container or we can use WORKDIR instruction to set the working directory. We can run various commands withing the container with CMD instruction and run the application with the RUN command. A full list of Docker file instructions and how they can be used can be found here.
Let’s observe one example of Dockerfile:
FROM ubuntu:18.04
COPY . /app
RUN make /app
CMD python /app/app.py
In this file, the first instruction – FROM defines that we use ubuntu:18.04 Docker image. COPY instruction adds files to the container’s app directory. RUN builds the application with make and CMD runs command ‘python app/app.py‘ which in turn runs the application.
Once Dockerfile is ready we can proceed with building and running the Docker container. We already mentioned some of the Docker commands that we can use, but let’s get into more details. With Docker installation comes rich Docker CLI. Using this CLI we can build, run and stop our container. In this chapter, we explore some of the most important commands. First on the list is definitely the docker build command. When this command is issued, the current working directory becomes a so-called build context. All files from this directory are sent to the container. Docker assumes that the Dockerfile is located in this directory, but you can define different locations as well. Once this command finishes its job, user can run the docker run command and run the container.
Docker CLI has other useful commands. For example, you can check the list of running containers with the command docker ps. Also, you can stop running a container with the command docker container stop CONTAINER_ID. If you need to run some commands within the bash of the container, you can use the docker exec -it /bin/bash to run the bash. Ok, this is all cool in theory, but let’s do something practical. We want to run the Flask application we created in the previous article within the Docker container.
Deploying Flask App with Docker
First, we regroup our files a little bit into this kind of structure:
└───app
├───static
│ ├───css
│ └───model
└───templates
Because there are several requirements we installed with pipenv, we need to make sure that these requirements are installed in the container as well. In order to make our lives a little bit easier, we convert requirements from pipenv lock file into .txt file. This is done with the command:
pipenv lock --requirements --keep-outdated > requirements.txt
Don’t forget to run pipenv synced -d before it though. This will create requirements.txt which looks something like this:
-i https://pypi.org/simple
absl-py==0.9.0
astor==0.8.1
cachetools==4.0.0
certifi==2019.11.28
chardet==3.0.4
click==7.0
flask==1.1.1
gast==0.2.2
google-auth-oauthlib==0.4.1
google-auth==1.11.0
google-pasta==0.1.8
grpcio==1.26.0
h5py==2.10.0
idna==2.8
itsdangerous==1.1.0
jinja2==2.11.1
keras-applications==1.0.8
keras-preprocessing==1.1.0
markdown==3.1.1
markupsafe==1.1.1
numpy==1.18.1
oauthlib==3.1.0
opt-einsum==3.1.0
protobuf==3.11.3
pyasn1-modules==0.2.8
pyasn1==0.4.8
requests-oauthlib==1.3.0
requests==2.22.0
rsa==4.0
six==1.14.0
tensorboard==2.0.2
tensorflow-estimator==2.0.1
tensorflow
termcolor==1.1.0
urllib3==1.25.8
werkzeug==0.16.1
wheel==0.34.2 ; python_version >= '3'
wrapt==1.11.2
Now we can create Dockerfile within the root folder. Here is how that file looks like:
#Start from the latest Long Term Support (LTS) Ubuntu version
FROM ubuntu:18.04
#Install pipenv
RUN apt-get update
RUN apt-get install -y build-essential python3.6 python3.6-dev python3-pip python3.6-venv
#Create the working directory
RUN set -ex && mkdir /app
WORKDIR /app
#Copy only the relevant directories to the working directory
COPY app ./app
#Install Python dependencies
RUN set -ex && pip3 install -r app/requirements.txt
#Run the web server
EXPOSE 8000
ENV PYTHONPATH /app
CMD python3 ./app/app.py
In this file we defined these things:
- Using image ubuntu 18.04
- Install pipenv and Python 3.6
- Create a working directory and copy over necessary files
- Install everything from requirements.txt
- Run the webserver
Now we can run docker build command:
docker build -t dl_flask_rest_api -f Dockerfile .
And after that we can run docker run command:
docker run -p 8000:80 dl_flask_rest_api
Serving Flask app "app" (lazy loading)
Environment: production
WARNING: This is a development server. Do not use it in a production deployment.
Use a production WSGI server instead.
Debug mode: off
Running on http://0.0.0.0:8000/ (Press CTRL+C to quit)
Finally, we can go to the IP we defined and see that our application is running just like in the previous article:
TensorFlow Serving
Guys from Google wanted to provide an easy way to deploy algorithms and run experiments and that is why they created TensorFlow Serving. At the core of this tool, we can find the concept of servables. In its essence, servable represent an object that clients use to perform some kind of computation. It can of any type and interface: look-up table, single model, or multiple models. Typically it is a TensorFlow model or a lookup table for embeddings. TensorFlow Serving run these servebles together and helps us combine them into one singular model.
Life-cycle of servable is handled by loaders, sources and managers. Managers dictate loading, serving and unloading, while loading and unloading itself are handled by loaders. Sources are plugins that help them to find and provide servables. The complete workflow looks something like this:
- Source plugin creates a loader for a specific version of the servables.
- Loader contains all the information that is necessary to load the servable.
- The loader is sent to the Manager.
- The manager uses configuration to determent the actions. It decides should a new version of the servable be used.
- Once it is ready, the Manager provides resources to the Loader and initiates the loading of the new version of the servable through it.
- When clients request servable from Manager, it returns a handle to it.
TensorFlow Serving and Docker
As for any tool, there are a lot of docker images based on TensorFlow’s official Python binaries. To get these images you can use this command:
docker pull tensorflow/tensorflow
Note that images that are built after May 20th, 2019 are based on Ubuntu 18.04. This image as multiple tags which can be found here. There are several options that one might find interesting. For example, if you want to start TensorFlow CPU container, you can do so like this:
docker run -it --rm tensorflow/tensorflow bash
For a GPU version you should run:
docker run -it --rm tensorflow/tensorflow bash
Even if you want just to run development with Jupyter Notebook on it, you can run:
docker run -it --rm -v $(realpath ~/notebooks):/tf/notebooks \
-p 8888:8888 tensorflow/tensorflow:latest-jupyter
TensorFlow Serving has its own image as well. Get it like this:
docker pull tensorflow/serving
An example of command for ruining TensorFlow Serving image looks like this:
docker run -p 8501:8501 \ --mount type=bind,source=/path/to/my_model/,target=/models/my_model \
-e MODEL_NAME=my_model -t tensorflow/serving
This image will open:
- Port 8500 exposed for gRPC (More on this in the next article)
- Port 8501 exposed for the REST API
- Optional environment variable MODEL_NAME (defaults to model).
- Optional environment variable MODEL_BASE_PATH (defaults to /models)
In order to run model withing TensorFlow Serving, we need to save it in a SaveModel format. So here is the full code from the previous article, with additional line for saving SavedModel format:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.python.keras import utils
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
COLUMN_NAMES = [
'SepalLength',
'SepalWidth',
'PetalLength',
'PetalWidth',
'Species'
]
data = pd.read_csv('data/iris_data.csv', names=COLUMN_NAMES, header=0)
output_data = data["Species"]
input_data = data.drop("Species",axis=1)
X_train, X_test, y_train, y_test = train_test_split(input_data, output_data, test_size=0.3, \
random_state=42)
y_train = utils.to_categorical(y_train)
y_test = utils.to_categorical(y_test)
model = Sequential()
model.add(Dense(10, input_dim=4, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X_train, y_train, epochs=300, batch_size=10)
scores = model.evaluate(X_test, y_test)
print("\nAccuracy: %.2f%%" % (scores[1]*100))
tf.saved_model.save(model, ' ./static/model/1')
For more information about this code that creates the model for Iris Flower predictions, read the previous article. The final line is used to save the model. This is what the saved model looks like on the file-system:
─────1
│ saved_model.pb
│
├───assets
└───variables
variables.data-00000-of-00001
variables.index
Ok, let’s put this all together and serve saved model with TensorFlow Serving and Docker. In the local environment, we run this command:
docker run -p 8501:8501 --mount type=bind, \
source="G:/deep_learning/deployment pt2/app/static/model/1/", \
target=/models/saved_model/1 -e MODEL_NAME=saved_model -t tensorflow/serving
Note that you have to add the version of the model in the path ( /models/saved_model/1 ). Now, when our container is running, we can try to get the information from the REST API it exposes. We use information from the first instance of the test part of the dataset:
import requests
import numpy as np
import json
data = [[7.7, 3.0, 6.1, 2.3]]
json_data = json.dumps(['data', data])
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/saved_model:predict', \
data=json_data, headers=headers)
predictions = json.loads(json_response.text)['predictions']
print(json_response.text)
In essence, we got prediction that passed data belongs to the Iris flower with the class 2, which is correct assumption.
Conclusion
In this article, we covered a lot of ground and prepared some concepts for the articles to come. We saw how one can use Docker containers, and how it can be applied to already existing application. Also, we had a chance to see how one can use TensorFlow Serving in combination with Docker. In the next article, we will fruther explore possibilities of TensorFlow Serving and gRPC.
Nikola M. Zivkovic
CAIO at Rubik's Code
Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.
Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.