Code that accompanies this article can be downloaded here.
In general, a lot of concepts in machine learning and deep learning can be abstracted using multi-dimensional matrices – tensors. In math, tensors are described as geometric objects that describe linear relationships between other geometric objects. For the simplification of all concepts, tensors can be observed as multi-dimensional arrays of data, ie. matrices. When we observe them like n-dimensional arrays we can apply matrix operations easily and effectively. That is what TensorFlow is actually doing. In this framework, a tensor is a primitive unit and we can do various operations with them. For example, take a look at the code below:
The output of the code snippet from above looks like this:
tf.Tensor( [[4 6 8] [4 6 8]], shape=(2, 3), dtype=int32)
As you can see we have defined two constants and we add one value to the other. As a result, we got a Tensor object with the result of the adding. Also, we can see the shape of the output and its data type. If you are familiar with TensorFlow 1.0, you will notice that we haven’t created a session and then run this code. This is just one of the benefits of using the new version of TensorFlow.
Of course, we don’t want just to do simple arithmetic operations we want to use this library for building predictors, classifiers, generative models, neural networks and so on. In general, when you are building such solutions, we have to go through several steps:
- Analysis and preprocessing of the data
- Building and training a model (machine learning model, neural network, …)
- Evaluating model
- Making new predictions
Since training of these models can be an expensive and long process we might want to use different machines to do this. Training these models on CPU can take quite a long time, so using GPU is always better options. The fastest option for training these models is tensor processing unit or TPUs. These were introduced by Google back in 2016. and they are is an AI accelerator application-specific integrated circuit (ASIC). However, they are still quite expensive. Apart from this, we want to deploy our model to different platforms, like cloud, embedded systems (IoT) or integrate it in other languages. That is why TensorFlow ecosystem looks something like this:
We can see that all these major points of developing these solutions are covered within this ecosystem. When it comes to Python, we usually analyze and handle data using libraries like numpy and pandas. Then we use this data to push it into the model that we have built. This is a bit out of the scope of this article, and data analysis is a topic for itself. However, TensorFlow is giving us some modules using which we can do some preprocessing and feature engineering. Apart from that, it provides datasets (tensorflow.datasets) that we can use for training some of our custom solutions and for research in general.
The most important parts of TensorFlow is TensorFlow Hub. There we can find numerous modules and low-level APIs that we can use. On top of these let’s say core modules we can find high-level API – Keras. We might say that road for 2.0 version was paved in TensorFlow 1.10.0 when Keras was incorporated as default High-Level API. Before this Keras was a separate library and tensorflow.contrib module was used for this purpose. With TensorFlow 1.10.0 we got the news that tensorflow.contrib module will be soon removed and that Keras is taking over. And that was one of the main focuses of TensorFlow 2.0, to ease up the use and to clean up the API. In fact, many APIs from 1.0 are either moved or completely removed. For example, tf.app and tf.flags no longer exist and some less used functions from tf.* are moved to other modules.
Apart from this High-Level API which we will use later in this article, there are several pre-trained models. These models are trained on some set of data and can be customized for your solution. This approach in the development of a machine learning solution is also called transferred learning. Transferred learning is gaining popularity among artificial intelligence engineers because it is speeding up the process. Of course, you may choose to use these pre-trained models as out of the box solutions.
There are several distribution options for TensorFlow as well, so we can choose which platform we want to train our models. This is decided during the installation of the framework, so we will investigate it more in the later chapters. We can choose TensorFlow distribution that runs on CPU, GPU or TPU. Finally, once we built our model, we can save it. This model can be incorporated into other applications on different platforms. Here we are entering the world of deployment. It is important to note that building a model is a completely different process from the rest of the application development. In general, data scientist build these models and save them. Later these models are called from business logic components of the application.
TensorFlow provides APIs for a wide range of languages and it is available for different operating systems. In this article, we are going to use Python on Windows 10 so only the installation process on this platform will be covered. TensorFlow is available only for versions of Python 3.5 and above, so make sure that you have the correct version of Python installed on your system. In this article, we use Python 3.7. For other operating systems and languages you can check the official installation guide. As we will mention in this step we need to decide which distribution do we want to use. There are two options for installing TensorFlow:
- TensorFlow with CPU support
- TensorFlow with GPU support
If your system has an NVIDIA® GPU then you can install TensorFlow with GPU support. Of course, GPU version is faster, but CPU is easier to install and to configure.
If you are using Anaconda installing TensorFlow can be done following these steps. First, you need to create a conda environment “tensorflow” by running the command:
conda create -n tensorflow pip python=3.7
Activate created environment using:
Finally, you need to call the command to install TensorFlow inside the created environment. For the CPU version run this:
pip install tensorflow==2.0.0-alpha0
If you want to use GPU distribution run the command:
pip install tensorflow-gpu==2.0.0-alpha0
Of course, you can install TensorFlow using “native pip”, too.
Now, there is an option if you don’t want to install TensorFlow at all. Guys from Google were so awesome to provide us virtual environment so we can train and build our models directly in the browser – Collaboratory. It’s a Jupyter notebook environment that runs entirely in the cloud. The necessary lines of code you need to add on top of your Collab Jupyter Notebook:
from future import absolute_import, division, print_function, unicode_literals
!pip install -q tensorflow==2.0.0-alpha0
Cool, now we have our TensorFlow installed. Let’s see what are some cool things that TensorFlow 2.0 brings and let’s solve some problems with it.
Keras – High-Level API
As mentioned previously, Keras is default High-Level API of the TensorFlow. In this article, we will use this API to build a simple neural network later, so let’s explore a little bit how it functions. Depending on the type of a problem we can use a variety of layers for the neural network that we want to build. Essentially, Keras is providing different types of layers (tensorflow.keras.layers) which we need to connect into a meaningful graph that will solve our problem. There are several ways in which we can do this API when building deep learning models:
- Using Sequential class
- Using Functional API
- Model subclassing
The first approach is the simplest one. We are using Sequential class, which is actually a placeholder for layers and we add layers in the order we want to. We may want to choose this approach when we want to build neural networks in the fastest way possible. There are many types of Keras layers we can choose from, too. The most basic one and the one we are going to use in this article is called Dense. It has many options for setting the inputs, activation functions and so on. Apart from Dense, Keras API provides different types of layers for Convolutional Neural Networks, Recurrent Neural Networks, etc. This is out of the scope of this post. So, let’s see how one can build a Neural Network using Sequential and Dense.
First, we import the Sequential and Dense. After that, we create one object of the Sequential class. Then we add the first layer to the Neural Network using function add and Dense class. The first parameter in the Dense constructor is used to define a number of neurons in that layer. What is specific about this layer is that we used input_dim parameter. By doing so, we added additional input layer to our network with the number of neurons defined in input_dimparameter. Basically, by this one call, we added two layers. First one is the input layer with two neurons, and the second one is the hidden layer with three neurons.
Another important parameter, as you may notice, is activation parameter. Using this parameter we define activation function for all neurons in a specific layer. Here we used ‘relu’ value, which indicates that neurons in this layer will use Rectifier activation function. Finally, we call add method of the Sequential object once again and add another layer. Because we are not using input_dim parameter one layer will be added, and since it is the last layer we are adding to our Neural Network it will also be the output layer of the network.
The functional approach is similar, but it is more flexible. It is easy to understand and we may want to choose this approach when building complex models with a lot of operations. Here is how the same network from above looks like when functional API is used:
Finally, we may want to choose Model sub-classing approach. This approach is the favorite one for the people with heavy software developing background, especially for the engineers with Object Oriented knowledge. In this approach, we choose to inherit Model class and define our own forward pass. Again, here is how the simple neural network that we implemented with previous approaches looks like:
Iris Classification Neural Network
Code that accompanies this article can be downloaded here.
In this example, we will build a simple neural network that can predict the class of the Iris flower. For this purpose, we use the Iris Data Set. This data set is probably one of the best-known datasets to be found in the pattern recognition literature, along with the MNIST dataset. In essence, this dataset is used for “Hello World” examples for classification problems. We can take different approaches to this problem, but we will use a simple neural network. If you want to learn more about neural networks you can check our series of articles on the topic. In there you can find information about the structure of neural networks, its main components and the ways they learn.
Dataset itself was first introduced by Ronald Fisher back in 1936. Ronald was British statistician and botanist and he used this example in his paper The use of multiple measurements in taxonomic problems. The dataset contains 3 classes of 50 instances each. Each class refers to one type of iris plant: Iris setosa, Iris virginica, and Iris versicolor. First class is linearly separable from the other two, but the latter two are not linearly separable from each other. Each record has five attributes:
- Sepal length in cm
- Sepal width in cm
- Petal length in cm
- Petal width in cm
- Class (Iris setosa, Iris virginica, Iris versicolor)
The goal of the neural network, we are going to create is to predict the class of the Iris flower based on other features. Meaning it needs to create a model, a neural network, which is going to describe a relationship between attribute values and the class.
In order to solve this problem, we are going to take steps we defined in one of the previous chapters:
- Analysis and preprocessing of the data
- Building and training a model
- Evaluating model
- Making new predictions
Data Analysis and Preprocessing
Data analysis is a topic for itself. In here, we will not go so deep into feature engineering and analysis, but we are going to observe some basic steps:
- Univariate Analysis – Analysing types and nature of every feature.
- Missing Data Treatment – Detecting missing data and making a strategy about it.
- Correlation Analysis – Comparing features among each other.
- Splitting Data – Because we have one set of information we need to make a separate set of data for training the neural network and set of data to evaluate the neural network.
Using the information that we gather during this analysis we can take appropriate actions during the creation of the model itself. First, we import the data:
As you can see we use Pandas library for this, and we also print out first five rows of data. Here is how that looks like:
Once this is done, we want to see what is the nature of every feature. For that we can use Pandas as well:
Output looks like this:
As we can see the Species or the output has type int64. However, we understand that this is not what we want it to be. We want this feature to be a categorical variable. This means we need to modify this data a little bit, again using Pandas:
Once this is done, we check is there missing data in our data set. This is done using this function:
Output of this call is:
Missing data can be a problem for our neural network. If there is missing data in our dataset, we need to define a strategy on how to handle it. Some of the approaches are that missing values are replaced with the average value of the feature or its max value. However, there is no silver bullet and sometimes different strategies give better results than the others. Ok, off to the correlation analysis. During this step, we are checking how features relate to each other. Using Pandas and Seaborn modules we were able to get an image which shows matrix with levels of dependency between some of the features – correlation matrix:
Here is how that matrix looks like:
We wanted to find the relationship between Spices and some of the features using this correlation matrix. The values, as you can see, are between -1 and 1. We are aiming for the ones that have a value close to 1 or -1, which means that these features have too much in common, ie. too much influence on each other. If we have that situation it is suggested to provide just one of those features to a model. This way we would avoid the situation in which our model gives overly optimistic (or plain wrong) predictions. However, in this dataset, we are having little information one way or another, so if we would remove all dependencies, we would have no data 🙂
Finally, lets split data into training and testing set. Because a client will usually give us one large chunk of data we need to leave some data for the testing. Usually, this ratio is 80:20. In this article, we will use 70:30, just to play around. For this purpose we use a function from SciKit Learn library:
In the end, we have four variables that contain input data for training and testing, and output data for training and testing. We can build our model now.
Building and Training a Model
We need a quite simple neural network for this classification. In here, we use model sub-classing approach, but you may try out other approaches as well. Here is how IrisClassifier class looks like:
It is the small neural network, with two layers of 10 neurons. The final layer is having 3 neurons because there are 3 classes of Iris flower. Also, in the final layer as the activation function is using softmax. This means that we will get an output in the form of probability. Let’s train this neural network. For this, we are using the fit method and pass prepared training data:
The number of epochs is defining how much time the whole training set will be passed through the network. This can last for a couple of minutes and output looks like this:
And we are done. We created a model and trained it. Now, we have to evaluate it and see if we have good results.
Evaluation and New Predictions
Evaluation is done with the call of the evaluate method. We provide testing data to it and it runs predictions for every sample and compare it with the real result:
In this particular case we got the accuracy of 95.56%:
45/45 [==============================] - 0s 756us/step
Finally, lets get some predictions:
Here are the results that we got in comparison with real results:
These good results would be fishy if we are working on some other dataset with real data. We could suspect that ‘overfitting’ happened. However, on this simple dataset, we will accept these results as good results.
We covered big ground in this article. We had a chance to see how the ecosystem of TensorFlow is looking in general and how we can install them. Also, we checked some of the major changes from version 1.0 to 2.0. We could see how we can analyze data and how to build neural networks using high-level API Kreas. Finally, we solved one problem using these techniques.
Thank you for reading!
Read more posts from the author at Rubik’s Code.