The code that accompanies this article can be received after subscription

* indicates required

A couple of months ago, Microsoft released the integration of Jupyter Notebooks into Visual Studio Code. This on its own was a big deal for me, because VS Code is my favorite IDE and I often run Machine Learning and Data Science experiments in Jupyter Notebook with Python. This way I could utilize the full power of VS Code and still have the flexibility and interactiveness (I am not sure if this is the real word :)) of the Jupyter Notebooks.

Additionally, Microsoft integrated .NET Interactive into this whole picture. Now, this seems like a game-changer to me. While .NET is an awesome tool, when it comes to doing things quickly and especially when it comes to performing some Data Science experiments, it is still far behind languages like Python. In my opinion, this is closing the gap a bit and using .NET Interactive Notebooks is a refreshing way of using this technology. In this article, we explore just that – how we can use various .Net technologies in a new way. Additionally, we explore how we can use these notebooks for machine learning with ML.NET.

Ultimate Guide to Machine Learning with Python

This bundle of e-books is specially crafted for beginners.
Everything from Python basics to the deployment of Machine Learning algorithms to production in one place.
Become a Machine Learning Superhero TODAY!

In this article, we explore:

1. Installation

2. Using .Net Interactive in Jupyter Notebooks

3. Data Visualization with .NET Interactive Notebooks

4. Machine learning with ML.NET and .NET Interactive Notebooks

1. Installation

To use .NET Interactive Notebooks you need to install Visual Studio Code. Further, you need to install .NET Interactive Notebooks extension:

.NET Interactive Notebooks Extension

These notebooks support several languages: C#, F#, PowerShell, JavaScript, HTML, SQL, etc. As you can see this is a very wide range of languages and technically you can use them for a lot of things. In this article, we focus on the C# and ML.NET.

Data that we use in this article is from PalmerPenguins Dataset. This dataset has been recently introduced as an alternative to the famous Iris dataset. It is created by Dr. Kristen Gorman and the Palmer Station, Antarctica LTER. You can obtain this dataset here, or via Kaggle.

This dataset is essentially composed of two datasets, each containing data of 344 penguins. Just like in Iris dataset there are 3 different species of penguins coming from 3 islands in the Palmer Archipelago. Also, these datasets contain culmen dimensions for each species. The culmen is the upper ridge of a bird’s bill. In the simplified penguin’s data, culmen length and depth are renamed as variables culmen_length_mm and culmen_depth_mm.

2. Using .Net Interactive Notebooks

The .NET Interactive Notebooks are just like Jupyter Notebooks except in them you can run code that in C#, F#, etc. They even have the same extension – .ipynb. However, in order to run them, you need to select the proper kernel, ie. .NET Interactive kernel. You can do it like this:

.Net Interactive Notebooks Select Kernel

Just like Jupyter Notebooks, .NET Interactive Notebooks combine code with other rich media, such as narrative text, images, mathematical equations, and so on. In essence, it is a single document that can hold all the information you need, like a full explanation of your project, along with necessary images and equations, and with the interactive code that you can run and see the results immediately.

The Notebook is composed of cells and kernels. There are two types of cells:

  • A markdown cell contains text formatted using Markdown. These cells contain text, images, and equations that we talked about. This is a “documentation part” of Jupyter Notebook.
  • A code cell contains the code which is executed by the kernel. Once the code is run, the notebook displays the output below the code cell that generated it.

Here is how you can create a markdown cell:

.NET Interactive Notebooks: Markdown cell

These cells can contain all the usefull information about your project. The code cells, on the other hand, can be run against the selected kernel:

.NET Interactive Notebooks: Code cell

This might be counterintuitive and unusual to use C# in this script-like way at the beginning. However, you will find it very useful, for quick checks. As you can see, variables are shared between cells.

3. Data Visualization with .NET Interactive Notebooks

Ok, now we know how to run very simple code within .NET Interactive Notebooks. Let’s see how we can use these notebooks for data visualization. For this purpose, we can use Microsoft SandDance, which is a part of the Extension Labs NuGet package.

In order to do so, we need to do two things: to download the PalmerPenguins dataset and put the penguins_size.csv file in the root folder and to install the Extension Labs NuGet package. To install any NuGet package, you need to have a code cell, which starts with keyword #r, followed by NuGet package name and version. In this case:

#r "nuget:Microsoft.DotNet.Interactive.ExtensionLab,*-*"

Once you have run this, you can create a cell with the usings:

var data = DataFrame.LoadCsv("penguins_size.csv");
data.ExploreWithSandDance().Display();

Finally, you can load the data and run Sand Dance:

var data = DataFrame.LoadCsv("penguins_size.csv");
data.ExploreWithSandDance().Display();

Overall, that the output looks something like this, and from this point you can use various Sand Dance options:

.NET Interactive Notebooks: Data Visualization

4. Machine Learning with ML.NET and .NET Interactive Notebooks

Finally, let’s see how we can utilize .NET Interactive Notebooks together with ML.NET. The goal of this tutorial is to create ML.NET machine learning model that is able to classify PalmerPenguin data. This model predicts the class of the penguin based on the rest of the data. First, we need to install all necessary NuGet packages:

#r "nuget:Microsoft.Data.Analysis,*-*"
#r "nuget:Microsoft.ML,*-*"
#r "nuget:Microsoft.DotNet.Interactive.ExtensionLab,*-*"

Then we need to add the usings:

using Microsoft.Data.Analysis;
using Microsoft.ML;
using Microsoft.ML.Data;

using System.IO;
using System.Text;

4.1 Data Model

Ok, now we can fit ML.NET code into this. In order to load data from the dataset and use it with ML.NET algorithms, we need to implement classes that are going to model this data. So, we create a cell that implements two classes: PalmerPenguinData and PricePalmerPenguinPredictions. These classes model input and output data. Output is the class of the penguin, while the rest of the data is input.

public class PalmerPenguinsData
{
    [LoadColumn(0)]
        public string Label { get; set; }

    [LoadColumn(1)]
    public string Island { get; set; }

    [LoadColumn(2)]
    public float CulmenLength { get; set; }

    [LoadColumn(3)]
    public float CulmenDepth { get; set; }

    [LoadColumn(4)]
    public float FliperLength { get; set; }

    [LoadColumn(5)]
    public float BodyMass { get; set; }

    [LoadColumn(6)]
    public string Sex { get; set; }
}

public class PalmerPenguinsPrediction
{
    [ColumnName("PredictedLabel")]
    public string PredictedLabel { get; set; }
}
Programming Visual

4.2 Load Data

Once data model classes are created, we can use them to load the data. To do so, we create a new cell:

var mlContext = new MLContext();

var trainingDataView = mlContext.Data.LoadFromTextFile<PalmerPenguinsData>("penguins_size.csv", hasHeader: true, separatorChar: ',');

var dataSplit = mlContext.Data.TrainTestSplit(trainingDataView, testFraction: 0.3);

In this cell, we have done more than just data loading, we actually initialize a complete ML.NET functionality by creating an MlContext object. The core of ML.NET can be found within two classes MLContext and DataView. The MLContext class is a singleton class, and its object provides access to most of the ML.NET functionalities, like various machine learning algorithms which are called trainers in the context of ML.NET.

The dataSplit field contains loaded data. Data is split into train and test datasets within this structure. We can actually see how this data looks by using the following cell:

var mlContext = new MLContext();

var trainingDataView = mlContext.Data.LoadFromTextFile<PalmerPenguinsData>("penguins_size.csv", hasHeader: true, separatorChar: ',');

var dataSplit = mlContext.Data.TrainTestSplit(trainingDataView, testFraction: 0.3);
.NET Interactive Table

Now, here you can see the benefits of .NET Interactive Notebooks. This is an easy way to actually see the data within the C# variable. This is very cool!

4.3 Initialize and Train Machine Learning ML.NET Model

Now to the important and fun bit. We want to initialize and train the model. In fact, we want to create a complete training pipeline, which pre processes the data, train the model and save the model. Here is how we do it:

var model = mlContext.MulticlassClassification.Trainers.SdcaNonCalibrated(labelColumnName: "Label", featureColumnName: "Features");

var pipeline = mlContext.Transforms.Conversion.MapValueToKey(inputColumnName: nameof(PalmerPenguinsData.Label), outputColumnName: "Label")
                .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "Sex", outputColumnName: "SexFeaturized"))
                .Append(mlContext.Transforms.Text.FeaturizeText(inputColumnName: "Island", outputColumnName: "IslandFeaturized"))
                .Append(mlContext.Transforms.Concatenate("Features",
                                               "IslandFeaturized",
                                               nameof(PalmerPenguinsData.CulmenLength),
                                               nameof(PalmerPenguinsData.CulmenDepth),
                                               nameof(PalmerPenguinsData.BodyMass),
                                               nameof(PalmerPenguinsData.FliperLength),
                                               "SexFeaturized"
                                               ))
               .Append(mlContext.Transforms.NormalizeMinMax("Features", "Features"))
               .Append(model)
               .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel"));;

var trainedModel = pipeline.Fit(dataSplit.TrainSet);

mlContext.Model.Save(trainedModel, dataSplit.TrainSet.Schema, "model.mdl");

First, we create an object of SdcaNonCalibrated class. This object is the machine learning algorithm we use for this problem. Essentially is a variation of logistic regression that is based on the Stochastic Dual Coordinate Ascent (SDCA) method. The algorithm can be scaled because it’s a streaming training algorithm as described in a KDD best paper.

Then we create a training pipeline. This pipeline first does some pre-processing of the data and then utilizes mentioned machine learning model. Then we call the Fit method on this pipeline. With this, we initiate the training process. Finally, we save the model into the model.mdl file. 

Programming Visual

4.4 Evaluate the Model

To evaluate the model, we use Evaluate method with test data:

var testSetTransform = trainedModel.Transform(dataSplit.TestSet);
var metrics = mlContext.MulticlassClassification.Evaluate(testSetTransform);

The output is the metrics variable, which contains some useful information about our model. For example, we can print out Macro Accuracy:

metrics.MacroAccuracy
0.991869918699187

4.5 Using the Model for Prediction

Here is how we can use the model that is saved in the file, to run predictions on new samples:

var newSample = new PalmerPenguinsData
                    {
                        Island = "Torgersen",
                        CulmenDepth = 18.7f,
                        CulmenLength = 39.3f,
                        FliperLength = 180,
                        BodyMass = 3700,
                        Sex = "MALE"
                    };


using (var stream = new FileStream("model.mdl", FileMode.Open, FileAccess.Read, FileShare.Read))
{
    var loadedModel = mlContext.Model.Load(stream, out _);
    var predictionEngine = mlContext.Model.CreatePredictionEngine<PalmerPenguinsData, PalmerPenguinsPrediction>(loadedModel);

    var prediction = predictionEngine.Predict(newSample);

    Console.WriteLine($"Prediction: {prediction.PredictedLabel}");
}

Conclusion

In this article, we explored how we can use .NET Interactive Notebooks in combination with ML.NET for modern machine learning with .NET technology stack.

Ultimate Guide to Machine Learning with Python

This bundle of e-books is specially crafted for beginners.
Everything from Python basics to the deployment of Machine Learning algorithms to production in one place.
Become a Machine Learning Superhero TODAY!

Nikola M. Zivkovic

Nikola M. Zivkovic

Nikola M. Zivkovic is the author of books: Ultimate Guide to Machine Learning and Deep Learning for Programmers. He loves knowledge sharing, and he is an experienced speaker. You can find him speaking at meetups, conferences, and as a guest lecturer at the University of Novi Sad.