Machine Learning and Deep Learning (AI, in general) are no longer just buzzwords. They became an integral part of our businesses and startups. This affects software development too, in fact, it goes even further. We can’t observe machine learning components just as another part of the ecosystem, because they are part of the system that makes decisions. These components also are shifting our focus to data, which brings a different mindset when it comes to the infrastructure. Because of all these things building machine learning-based applications is not an easy task. There are several areas where data scientists, software developers and DevOps engineers need to work together in order to make a high-quality product.
Are you afraid that AI might take your job? Make sure you are the one who is building it.
STAY RELEVANT IN THE RISING AI INDUSTRY! 🖖
In this article, we cover 18 machine learning practices that we think will help you achieve that. These practices are divided into 5 sections. Each section is composed of several tips and tricks that may help you build awesome machine learning applications. Here is what is covered in this article:
Objective and Metrics Best Practices
In this section, we consider the business aspects of machine learning applications. This step is arguably the most important one. At the beginning of every project, we need to define a business problem we are trying to solve. This will drive everything, from the features of your application to the infrastructure and steps when it comes to gathering the data. Here are some of the things you should pay special attention to during this process:
1. Start with a Business Problem Statement and Objective
As we mentioned, making a business problem statement is crucial when it comes to building machine learning applications. However, since it is not techy and exciting a lot of people de-prioritize it and overlook it. So, the advice is – spend some time on your problem, think about it and think about what are you trying to achieve. Define how the problem affecting the profitability of your company. Don’t just look it from the perspective “I want more clicks on my website“, or “I want to earn more money“. A well-defined problem looks something like this – “What helps me sell more e-books?“. Based on this you should be able to define the objective. The objective is a metric that you are trying to optimize. It is of great importance to establish the right success metric because it will give you the feel of progress. Also, the objective might (and probably will) change over time, as you learn more about your data.
2. Gather Historical Data From Existing Systems
Sometimes the requirements are not that clear, so you are not really able to come up with the proper objective straight away. This is often the case when working with legacy systems and introducing machine learning into them. Before you go to the nuances of what your application will do and which role machine learning is playing in it, gather as much as possible from the current system. This way historical data can help you with the task at hand. Also, this data can already give you indications where the optimization is necessary and which actions will give the best result.
3. Use Simple Metric for First Objective
Making a successful machine learning project is an incremental process. In order to get to the final goal, be ready to iterate through several solutions. That is why it is important to start small. Your first objective should be a simple metric that is easily observable and attributable. For example, user behavior is the easiest feature to observe. Things like “Was recommended item marked as spam?“. You should avoid modeling indirect effects, at least in the beginning. Indirect effects can give your business enormous values, later on, however they use complicated metrics.
Infrastructure Best Practices
Infrastructure has multiple roles when it comes to machine learning applications. One of the major tasks is to define how we gather, process and receive new data. After that, we need to decide how we train our models and version them. Finally deploying the model in production is a topic that we need to consider as well. In all these tasks infrastructure plays a crucial role. In fact, chances are that you will probably spend more time working on the infrastructure of your system, than on the machine learning model itself:
Here are some tips and tricks that you need to consider when building it.
4. Infrastructure is Testable without Model
Complete infrastructure should be independent of the machine learning model. In essence, you should strive to create an end to end solution where each aspect of the system is self-sufficient. The machine learning model should be encapsulated so the rest of the system is not depending on it. This way you are able to manipulate and restructure the rest of the system fairly easily if necessary. By isolating parts of the system that gather and pre-process the data, train model, test model, serve model and so on, you will be able to mock and replace parts of the system with more ease. It is like practicing the Single Responsibility Principle on a higher level of abstraction.
5. Deploy Model only After it Passes Sanity Checks
Tests are an important barrier that separates you from the problems in the system. In order to provide the best experience to the users of your machine learning application, make sure that you do tests and sanity checks before deploying your model. This can be automated too. For example, you train your model and perform tests on the test dataset. You can check if the metrics you have chosen for your model are providing good results. You can do that with standard metrics like accuracy, f1 score, and recall as well. If the model provides satisfying results only then it will be deployed to production.
6. On-Premise or Cloud
Hardware or cloud? Bare metal or someone else servers? An age-old question 🙂 The benefit of choosing a cloud is that it saves time, it is easier to scale and it includes a low financial barrier to entry. You have the support of the provider too. Talking about providers, there many options out there, with the big players like Microsoft Azure, AWS and GCP. However, on-premise hardware is a one-time investment on which you can run as many experiments without affecting the costs. Today, there are many pre-built deep learning servers available, such as Nvidia workstations and Lambda Labs.
7. Separate Services for Model Training and Model Serving
This one is a sort of conclusion that you can make from points 4 and 5, however, it is really important so it is good to mention it separately. In general, you should always strive to separate the training model component from the serving model component. This will give you the ability to test your infrastructure and model. Apart from that, you will have bigger control of your model in production.
8. Use Containers and Kubernetes in Deployment
Microservices architecture can help you achieving previous points. With that using technology like Docker and Kubernetes, you should be able to encapsulate separate parts of the system. This way you can make incremental improvements in each of them and replace each component if necessary. Also, scaling with Kubernetes is a painless process.
Data Best Practices
All of these “Software 2.0” solutions would not be possible without data. Data can come in many shapes and forms, and we often need to work really hard to distill information from it. In this chapter, we cover some of the best practices when it comes to data gathering and pre-processing.
9. Data Quantity
In order to make good predictions or pattern detection, you need a lot of data. That is why it is important to set the proper component in your system that will gather data for you. If you have no data, it is good to invest in some existing dataset and then improve the model over time with the data gathered from your system. Finally, sometimes you can short-circuit the initial lack of data with transfer learning. For example, if you are working on an object detection app, you can use YOLO.
10. Data Quality and Transformations
Real-world data is messy. Sometimes it is incomplete and sparse, other times is noisy or inconsistent. In order to make it better, it is necessary to invest in data pre-processing and feature engineering. If you properly encapsulated it you can get the data from the data gathering component and apply necessary transformations (like imputation, scaling, etc.) in the transformation component. This component has a twofold purpose. It prepares the training data and uses the same transformations on the new data samples that come into your system. In essence, it creates features that are extracted from the raw inputs.
11. Document each Feature and Assign Owner
Machine Learning Systems can become large and datasets can have many features. Also, features can sometimes be created from other features. It is good to assign each feature to one team member. This team member will know why a certain transformation has been applied and what this feature represents. Another good approach is to create a document with a detailed description of each feature.
12. Plan to Launch and Iterate
Don’t be afraid to get into it and get better over time. Your features and models will change over time, so it is important to have this in mind. Also, it might happen that the UI of your application is changed and you are now able to get more data from the user behavior. In general, it is good to keep an open mind about this and be ready to start small and improve over iterations.
Model Best Practices
It may seem that even though machine learning application revolves around the power of machine learning models, usually they are neatly tucked behind large infrastructure components. This is true to a certain degree, but there is a good reason for it. In order to actually utilize that power, those other components are necessary as well, but they are useless without a good machine learning model to put it all together. Here are some tips and tricks that you should keep in mind while working with machine learning and deep learning models.
13. Starting with an Interpretable Model
Keep the first model simple and get the infrastructure right. Don’t start with complicated neural network architectures right away. Maybe try to solve a problem with a simple Decision Tree first. There are multiple reasons for this. The first one is that building the complete system takes time. Getting other components and data right in the beginning will give you the ability to extend your experimhttps://rubikscode.net/2020/09/28/back-to-machine-learning-basics-decision-tree-random-forest/ents later. The second reason is that businesses will understand what is happening in the interpretable model, which can give you more confidence and trust to continue with fancier models.
14. Use Checkpoints
Probably the best advice that one can give you when working with machine learning models is that you should use checkpoints. A checkpoint is an intermediate dump of a model’s internal state (parameters and hyperparameters). Using these machine learning frameworks can resume the training from this point whenever. This will give you the ability to incrementally train the model and make a good trade when it comes to performance vs. training time. Also, this way you are more resilient to hardware or cloud failures.
15. Performance over fancy metric
A lot of times, the data scientists can lose themselves in various metrics. This may lead to trying actions that may improve various vanity metrics but lower the performance of the system as a whole. This is of course bad approach and the performance of the complete system should be always in the first place. Thus, if there is some change that improves log loss but degrades the performance of the system, look for another feature. If this starts happening often, it is time to rethink the objective of the model.
16. Production Data to Training Data
The best way to improve your model over time is to use data that is used during serving time for the next training iteration. This way you will move your model to the real-word scenario and improve the correctness of your predictions. The best way to do this is to automate this, meaning store every new sample that comes from the serving model and then uses it for training.
Code Best Practices
All that math, planning, and design need to be coded. It is the piece that holds it all together. It is important to focus on your code if you want to make a long-lasting solution. In this chapter, we share several tips and tricks that you should pay attention to when it comes to the code of your project.
17. Write Clean Code
Learn how to write code properly. Name your variables and functions like a grown-up, add comments, and pay attention to the structure. You can choose to use object-oriented programming or functional programming and write a lot of tests. Even if you are working alone on the project, make sure you nail this, because sooner or later you will work in a team. Clean code helps all members of the team be in sync and on the same page. Never forget that the team is larger than the individual and clean code is one of the tools for building a great team.
18. Write a lot of Tests
Automate as many tests as possible. These are guards of continuous progress. There are several levels of tests that one can write when it comes to building one machine learning application. In general, you should write a lot of unit tests to verify the functionalities of each component of the system. For this, you can use the Test Driven Development approach. Integration tests are good for testing how components are working with each other. Finally, system tests are there to test your solution end-to-end. Additional tests for the model are also part of this. Don’t save your model if it is not passing the sanity checks, or put them in production. Performance tests can help you with this.
In this article, we covered some of the best practices when it comes to creating one machine learning application. We focused on the technical and business aspects and learned how to set objectives. Apart from that, we shared some tips and tricks when it comes to handling infrastructure and code. Finally, we talked about what can you do from the perspective of data and models.
Thank you for reading!
Nikola M. Zivkovic
CAIO at Rubik's Code
Nikola M. Zivkovic a CAIO at Rubik’s Code and the author of book “Deep Learning for Programmers“. He is loves knowledge sharing, and he is experienced speaker. You can find him speaking at meetups, conferences and as a guest lecturer at the University of Novi Sad.
Rubik’s Code is a boutique data science and software service company with more than 10 years of experience in Machine Learning, Artificial Intelligence & Software development. Check out the services we provide.