In this article, which is the first in the series, we explore how we can prepare a deep learning model for production and deploy it inside of Python Web application. This is just the first step in the long journey. In fact, deployment of Deep Learning models is an art for itself. This task requires goes beyond data science knowledge and engages lot of software development and DevOps skills. Why should you care about all this? Well, at the moment one of the most valued role in data science teams are Machine Learning engineers. This role gathers best of both worlds. These engineers don’t have to know only how to apply different Machine Learning and Deep Learning models to a proper problem, but how to test them, verify them and finally deploy them as well. Having a person that is able to put deep learning models into production became huge asset to any company. This is, in general, main type of services that Rubik’s Code provides. In order to become successful Machine Learning engineer, you need to have variety of skills that are not focused only on the data. If we compare the amount of code that is written for Machine Learning models, and the rest of the code that supports testing and serving that model, it looks something like this:

Ok, let’s first create a model that we will be served inside of our Web App.

Creating and Saving Model

We need to create one simple neural network that will be utilized by Web Application. As we mentioned, Iris flower dataset has 4 features and the label which represents the class of the Iris flower. In an essence, we will create a network with two hidden layers of 10 neurons. Input layer has 4 neurons, because dataset has four features, and output layer has 1 neuron, since we are doing simple classification.

Eager to learn how to build Deep Learning systems using Tensorflow 2 and Python? Get the ebook here!

Let’s first load necessary libraries:

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.python.keras import utils
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

Now we can load and prepare data from iris.csv file which contains the mentioned information. We use test_train_split method to split data into training and testing data:


data = pd.read_csv('data/iris_data.csv', names=COLUMN_NAMES, header=0)

output_data = data["Species"]
input_data = data.drop("Species",axis=1)
X_train, X_test, y_train, y_test = train_test_split(input_data, output_data, test_size=0.3, random_state=42)

y_train = utils.to_categorical(y_train) 
y_test = utils.to_categorical(y_test)

Let’s create and compile the model:

model = Sequential()
model.add(Dense(10, input_dim=4, activation='relu'))
model.add(Dense(10, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Finally, we can train it:, y_train, epochs=300, batch_size=10)
Train on 105 samples
Epoch 1/300
105/105 [==============================] - 0s 646us/sample - loss: 1.5356 - accuracy: 0.3333
Epoch 2/300
105/105 [==============================] - 0s 133us/sample - loss: 1.3580 - accuracy: 0.3333
Epoch 3/300
105/105 [==============================] - 0s 123us/sample - loss: 1.2116 - accuracy: 0.3333
Epoch 4/300
105/105 [==============================] - 0s 143us/sample - loss: 1.1196 - accuracy: 0.3714
Epoch 5/300
105/105 [==============================] - 0s 123us/sample - loss: 1.0318 - accuracy: 0.4571
Epoch 6/300
105/105 [==============================] - 0s 133us/sample - loss: 0.9732 - accuracy: 0.4857
Epoch 7/300
105/105 [==============================] - 0s 143us/sample - loss: 0.9200 - accuracy: 0.8286
Epoch 8/300
105/105 [==============================] - 0s 124us/sample - loss: 0.8709 - accuracy: 0.8190
Epoch 9/300
105/105 [==============================] - 0s 123us/sample - loss: 0.8261 - accuracy: 0.8190
Epoch 10/300
105/105 [==============================] - 0s 114us/sample - loss: 0.7865 - accuracy: 0.8095

Finally, we can evaluate it:

scores = model.evaluate(X_test, y_test)
print("\nAccuracy: %.2f%%" % (scores[1]*100))
45/45 [==============================] - 0s 444us/sample - loss: 0.0953 - accuracy: 0.9333

Accuracy: 93.33%

Cool, we got really high accuracy due to the fact that this is really simple example. We can save this complete model into file by calling the save method:'model.h5')

Models are saved into .h5 files. This file is used by the Web App.


Before we start with the implementation of the Web application with Python and Flask let’s first find out what is REST API. Now, I believe that you have seen this term once twice in your life. The second part of the term – API stands for an application programming interface. Essentially, it API represents the set of rules that programs use to communicate with each other. For example, in the server-client architecture server side of the application is programmed in a way that exposes methods that can be called by the client-side of the application. What does this mean is that the client can call a method on the server inside of its code and get a certain result from it. REST stands for “Representational State Transfer”. This represents a set of rules that developers should follow when they build their APIs. It defines how the API should look like, so APIs are standardized.

One of the rules defines that data or resources could be gathered when you link a specific URL. For example, you can link ‘’ and get the list of blogs as a response. URL ‘ ‘ is called request, and the list of clients that you get back is called a response. Every request is composed of 4 parts:

  • The Endpoint (route) – This is the URL we mentioned previously. It is structured like this – “root-endpoint/?”. The root-endpoint is the starting point and it can be followed by the path and query parameters. The path defines what specific resource is required. For example, the root-endpoint of GitHub’s API is ‘’, while the full endpoint to the list of my repositories on GitHub is
  • The Method – There are five types of requests that can be sent, and the method defines this type:
    • GET – Used to get or read information.
    • POST – Used to create a new resource.
    • PUT and PATCH – They are used to update resources.
    • DELETE – Deletes resource.
  • The Headers – The headers are used to provide additional information to both client and server in a form of property-values pairs. List of valid headers on MDN’s HTTP Headers Reference.
  • The Body – This section contains information that the client sends to the server. It is not used in GET requests.

What we want to create in this article is the Web server, which serves a model for Iris predictions. We want to build API using which the client-side of the application can get predictions from the model. That is done using the Python framework Flask.

Flask Basics

As already mentioned, Flask is a python based web framework which has almost no dependencies to external libraries. This makes it very light. Since we are creating simple Web application, this makes Flask perfect choice for our implementation. The standard folder structure for the Flask folder looks like this:

>tree flask_app
Folder PATH listing for volume Data
Volume serial number is 42E0-3374
│   ├───css
│   └───model

Files that are not changed and are assets are located in the static folder. This is the place for the model that we created, so we can copy it over there. The template folder is reserved for HTML files. Flask looks into this folder when it needs to serve dynamic files. We create base.html which looks like this:

<!doctype html>

<html lang="en">
  <meta charset="utf-8">
  <link rel="stylesheet" href="{{url_for('static', filename='css/main.css')}}" />
  {% block head%}{% endblock %}


  {% block body %}{% endblock %}

This is just a basic HTML structure with some curly brackets. This way we are able to quickly inherit this basic structure and implement custom pages. Notice that we added main.css file in the css folder. Ok, let’s install Flask and TensorFlow using pipenv:

pipenv install
pipenv install flask
pipenv install tensorflow==2.0.0
pipenv sync -d

Once Flask is installed, along with its flask command-line script is installed as well. When you run flask run command, it will initiate the Flask package to run the HTTP server using an app object. Where is the app object, one might ask? Let’s create file in the root folder and define this object:

app = Flask(__name__)

def hello_world():
    return 'Hello, world!'

As you can see app object is an instance of the Flask object. This is our HTTP server. Routes are defined using @app.route(‘endpoint’) decorator, In the example above, we define the only route on the localhost:8080/. When we run this using flask run or using python we get this in the browser:

Nice and easy, right? We got all the pieces, model and web server. All we have to do is put them together.

Putting it all Together

In the beginning, we focus on UI, meaning we want to create a HTML page that handles this functionality. Thanks to the base.html we created previously, we can create simple index.html page like this:

{% extends 'base.html' %}

{% block head%}

{% endblock %}

{% block body %}
    <h1>IRIS PREDICTOR</h1>

    <form action="/predict" method="POST">
            <p>Spatial Length</p>
            <input type="text" name = "spatial_length" id="spatial_length">
            <p>Spatial Width</p>
            <input type="text" name = "spatial_width" id="spatial_width">

            <p>Petal Length</p>
            <input type="text" name = "petal_length" id="petal_length">

            <p>Petal Width</p>
            <input type="text" name = "petal_width" id="petal_width">

        <input type="submit" value="Predict">
    <h2>Predicted: {{pred}}</h2>
{% endblock %}

This page extends base.html. As you can see it, we added one form in the body of the HTML page. It is quite simple, it has four text inputs for the four features and the Submit button. Note that this button sends POST request to the ‘/predict’ endpoint with this information. In the end it displays whatever is stored in the pred parameter.

Also, we extended the main.css file with some stylings and now it looks like this:

body {
    margin: 0;
    padding: 0;
    margin-left: 20px;
    font-family: "Helvetica Neue", Helvetica, Arial, sans-serif;
    color: #444;

input[type=text] {
    border: 2px solid #E64C27;
    border-radius: 4px;
    width: 30%;
    padding: 12px 20px;
    margin: 8px 0;

input[type=button], input[type=submit], input[type=reset] {
    background-color: #E64C27;
    border: none;
    color: white;
    padding: 16px 32px;
    width: 32%;
    text-decoration: none;
    margin: 20px 2px;
    cursor: pointer;

Now, we focus on the file and the Flask app object. Importing necessary libraries and creating app object is the obvious first step:

from flask import Flask, request, jsonify, render_template
import tensorflow as tf
import numpy as np
from tensorflow.keras import backend
from tensorflow.keras.models import load_model

app = Flask(__name__)

Then we want to import the created model in the app object. We do that in the function load_model_to_app() which has decorator @app.before_first_request. Basically, this method will be executed during the initialization of the application. In this method we load neural network form the file:

def load_model_to_app():
    app.predictor = load_model('./static/model/model.h5')

Then we define API method that is called if root endpoint is hit, i.e. this endpoint – ‘/’. For that we use method index with the decorator @app.route(‘/’). This method returns rendered template index.html. Here is what that looks like:

def index():
    return render_template('index.html', pred = 0)

Finally, we create method predict() for the endpoint ‘/predict’. This method of course has a decorator that looks like this @app.route(‘/predict’, methods=[‘POST’]). We define this endpoint with POST method. In this method, we prepare data received from the input and get predictions from the model. Due to the fact that we used softmax in the last layer of the network, these predictions come in the form of an array, with the probability for every class, for example – [0.1, 0.1, 0.8]. So, we need to do an additional step and get the index of the element from the predictions array that has the largest value. Here is what this method looks like:

@app.route('/predict', methods=['POST'])
def predict():
    data = [request.form['spatial_length'],

    data = np.array([np.asarray(data, dtype=float)])

    predictions = app.predictor.predict(data)
    print('INFO Predictions: {}'.format(predictions))

    class_ = np.where(predictions == np.amax(predictions, axis=1))[1][0]

    return render_template('index.html', pred=class_)

Here is the complete file:

from flask import Flask, request, jsonify, render_template
import tensorflow as tf
import numpy as np
from tensorflow.keras import backend
from tensorflow.keras.models import load_model

app = Flask(__name__)

def load_model_to_app():
    app.predictor = load_model('./static/model/model.h5')

def index():
    return render_template('index.html', pred = 0)

@app.route('/predict', methods=['POST'])
def predict():
    data = [request.form['spatial_length'],
    data = np.array([np.asarray(data, dtype=float)])

    predictions = app.predictor.predict(data)
    print('INFO Predictions: {}'.format(predictions))

    class_ = np.where(predictions == np.amax(predictions, axis=1))[1][0]

    return render_template('index.html', pred=class_)

def main():
    """Run the app."""'', port=8000, debug=False)  # nosec

if __name__ == '__main__':

As a test let’s enter values from the first instance of the test part of the dataset and see if we get the correct predictions:

Once we press Predict button:

We can see that this is the correct value, meaning we successfully integrated model within this simple Web App.


In this article, we learned the basics of deep learning model deployment. While this is a good solution for simple and small applications, you will rarely have to build this simple solution. Usually, only one worker will not satisfy web app requests, so this app needs to be scaled. That is why in the next blog post we will explore how you can use Docker and TensorFlow Serving. However, this is a great starting point when it comes to the deployment of this type of model.

Thank you for reading!

Nikola M. Zivkovic

Nikola M. Zivkovic

CAIO at Rubik's Code

Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.

Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.