The code that accompanies this article can be downloaded here.
Open-source library TensorFlow.js was introduced about a year ago. However, I didn’t manage to try it out up until now. In this article, we are going to get to know how to use this technology, and we are going to do it on one real-world classification problem. The idea is to use possibilities of TensorFlow.js to build and run our machine learning and deep learning modes in a browser or under Node.js. To be honest, I was a bit skeptical at first. However, this turned out as a cool way to keep web developers and data scientists closer together.
TensorFlow.js incudes Keras API and exposes it as it’s high-level API. This is very nice and it eases up the process of building machine learning and deep learning models. It also includes a lower level API, previously called deeplearn.js, which can be used for linear algebra and automatic differentiation. Eager execution is supported as well. Underneath
In this article, we are going to build a simple neural network using TensorFlow.js which will solve a simple classification problem. However, before that let’s see how we can install TensorFlow.js.
There are several ways in which we can use TensorFlow.js. First one, of course, is using it just by adding script tag inside of our main HTML file:
You can also install it using npm or yarn for setting it up under Node.js:
npm install @tensorflow/tfjs
yarn add @tensorflow/tfjs
As you remember from previous posts, TensorFlow has GPU support for higher performances. You can instal it like this:
npm install @tensorflow/tfjs-node-gpu
npm install @tensorflow/tfjs-node-gpu
Use this option only if your system has
npm install @tensorflow/tfjs-node
yarn add @tensorflow/tfjs-node
Wine Quality Classification Problem
If you read some of our previous articles, you may notice that we like using this dataset. That is because this dataset is really good for simple classification analysis, but it comes from real-world. Our goal is to predict the quality of the wine based on the provided chemical data. Data itself is about Vinho Verde, a unique product from the Minho region of Portugal. This product is taking up to 15% of the total Portuguese wine production and Portugal is the tenth biggest wine producer in the world.
Information was collected from May/2004 to February/2007 and due to privacy and logistic issues, only physicochemical and sensory variables are available. Price and origin of data are not provided in the dataset. The dataset contains two .csv files, one for red wine (1599 samples) and one for white wine (4898 samples). For the purpose of this article, we will use only the white wine samples.
Every sample contains these features:
- Fixed acidity
- Volatile acidity
- Citric acid
- Residual sugar
- Free sulfur dioxide
- Total sulfur dioxide
- Quality (score between 0 and 10)
The dataset presented in tabular form looks something like this:
I know that in
Data Analysis is consisting of several sub-steps itself:
- Univariate Analysis – Analysing types and nature of every feature.
- Missing Data Treatment – Detecting missing data and making a strategy about it.
- Outlier Detection – Detecting anomalies in the data. Outliers are samples that diverge from an overall pattern in some data.
- Correlation Analysis – Comparing features among each other.
During the univariate analysis, we noticed that output data quality is actually integer not category. This will be handled during implementation. Apart from that, we could
During Missing Data Treatment phase we notice that some samples have Fixed acidity feature empty. Our strategy is to replace this information with the mean value of that feature. Other options are available too, like changing missing values with max feature value, or some default value. Let’s check the quality distribution and detect outliers:
From the picture above, we can see that most of the wines fall in the category between 5 and 6. This means the most of the wines are average and we have just a few wines with high or low quality. Finally, let’s check
As you can see we can not detect feature that effects quality too much. The only thing that can raise our suspicion is the high correlation between residual sugar feature and density feature. However, we will leave both features in the game and see where we will land.
The whole code that accompanies this blog post can be found here.
Dataset itself comes in .csv file format. So the first thing we had to do was to convert this into JSON file and upload it. You can find whole new created JSON file here. In general, every sample from
As you can see we added mentioned script tag for TensorFlow.js and additional for
This function essentially reveals our workflow. First, we get data from this location, using
This is achieved simply by using fetch method. After that, we use displayData to plot some interesting graphs:
Note that in the gist above singlePlot function is presented as well. This method wraps tfjs-vis functionality and displays only one graph. The displayData function utilizes this method to plot three graphs. Here they are:
Here we can see distribution of quality by different feature. Once we visualized data, we can create our model. This is done in the function createModel:
The main goal here was not to generate a perfect model for this problem, but to try out some of the TensorFlow.js possibilities. If you are familiar with building neural network models with Keras, this API will be easy to understand. However, if you want to learn more about neural networks, you can check our huge series on them here.
Let’s have a quick overview. We use
Awesome! Now, we prepare data for model itself. Our model will not work with JSON objects, or with arrays for that matter. We need to create tensor objects. This is done in prepareData function:
In this function, we first convert JSON objects into simple arrays. We split data into inputs and outputs. In this particular example, we haven’t split data into train and test sets, which is something that can be improved. Once this is done, we convert them into tensors. Finally, we normalize data, meaning we put it on the same scale. This is something that we noticed during the data analysis phase. Also, note that we use
So, we have done all preparation steps and we can train our model using trainModel function:
Once again, you can notice that TensorFlow.js kept the API that is similar to the TensorFlow API in Python. We compile our model with Adam optimizer and cathegorical crossentropy. Then we run the training process with
Finally, we use evaluateModel method to evaluate accuracy of our neural network. This method is just a wrapper for evaluate method of created model:
The output of evaluation is printed in the console:
We got accuracy of just 51%, meaning there is a lot of place for improvement of our model. However, we were able to do all this in the browser, which is awesome.
Thank you for reading!
Read more posts from the author at Rubik’s Code.