Last week at the Build conference, Microsoft presented a bunch of new ideas, improvements and platforms that it is gonna take us a couple of months to digest all of that. One of the most exciting news (at least for a bunch at Rubik’s Code) was that Azure will from now on provide endpoints for Hugging Face models.
While this feature was already available at AWS, we are really happy that now we can deploy and scale state-of-the-art Transformer models on Microsoft Azure with just a couple of clicks. This combined with the news that Meta and Microsoft are joining forces for AI in Metaverse and news that Hugging Face reached a $2B evaluation in their master plan to become GitHub for Machine Learning got our minds boggling.
This bundle of e-books is specially crafted for beginners.
Everything from Python basics to the deployment of Machine Learning algorithms to production in one place.
Become a Machine Learning Superhero TODAY!
In this article, we cover
- Problems with Deploying Hugging Face Transformers
- Azure Hugging Face Endpoint Features
- How to create Azure Hugging Face Endpoint?
1. Problems with Deploying Hugging Face Models
Hugging Face started a couple of years back providing transformers library and Hugging Face Hub with a bunch of pre-trained and fine-tuned models. They singlehandedly brought transfer learning for NLP into the mainstream.
Today you can use more than 40 000 models to perform NLP tasks like Article Summarization, Text Classification, Named Entity Recognition, etc. Recently Hugging Face went beyond NLP. Today, you can find not just Transformers models, but other models for different problems in Hugginf Face Hub.
The mission of Hugging Face is to democratize good machine learning. We’re striving to help every developer and organization build high-quality, ML-powered applications that have a positive impact on society and businesses. With Hugging Face Endpoints, we’ve made it simpler than ever to deploy state-of-the-art models, and we can’t wait to see what Azure customers will build with them!
While Hugging Face provides a bunch of models, putting those models into production and scaling them, still remains a challenging task. Among popular methods is still creating APIs with FastAPI and using Kubernetes and Docker to host them. However, you still need to set that all up.
Another big problem is that you need to take care of scaling and security yourself. That is why this new Azure service is focused on quickly creating managed endpoints. This way you can easily create an API that is hosting some of the models from the hub and create inference for it.
2. Azure Hugging Face Endpoint Features
Azure Hugging Face Endpoint follows the “path” of the Hugging Face development. First, the support for all NLP tasks available in the Hugging Face pipeline API is now available through this Azure Service. Basically, any NLP tasks like classification, summarization, translation, named entity recognition, etc. can now be performed like this. Image and audio task types are still not available and they are planned for later. Apart from that, all public PyTorch models from the Hugging Face Hub are also supported.
The cool thing about this service is that you can deploy this inference on different Azure Instance Types, meaning you can play around with CPUs and GPUs and hit the right price-performance tradeoff. This also means that you can use Azure Autoscale, and scale this service easily. Finally, you can rely on Azure security and compliance. Basically, with this service you can create fully managed, scalable and secure NLP Inferences.

To sum it up, Hugging Face Endpoint in Azure currently supports:
- All pipeline API NLP Task
- PyTorch models from Hugging Face Hub
- Inference on a wide range of CPU and GPU Azure instance types
- Automatic scaling with Azure Autoscale
- Azure security and compliance
According to representatives from both, Hugging Face and Microsoft, this is just the beginning of the collaboration between these two companies.
This is the start of the Hugging Face and Azure collaboration we are announcing today as we work together to bring our solutions, our machine learning platform, and our models accessible and make it easy to work with on Azure. Hugging Face Endpoints on Azure is our first solution available on the Azure Marketplace, but we are working hard to bring more Hugging Face solutions to Azure. We have recognized [the] roadblocks for deploying machine learning solutions into production and started to collaborate with Microsoft to solve the growing interest in a simple off-the-shelf solution.”
3. How to create Azure Hugging Face Endpoint?
Ok, let’s see how we can create Azure Hugging Face Endpoint. Note that this feature is still in preview.
Create new resource
The first thing you need to do is add a new resource. Just search “Hugging Face” and you will start creating it.
Configure
Then you need to configure the basics. First, select the Subscription and the Resource Group. Also, configure the region where the instance will be created and give a name to the endpoint. Finally, select the model and task. In our example below, we selected Token Classification, meaning we create an inference for Named Entity Recognition.
For the Model ID, head to Hugging Face Hub and pick a model that is most applicable for your use case. We picked this one.
Configure Instance
In this step, you need to configure which Azure Instance Type will be used to host your endpoint. Also, here you can configure autoscaling.
In our example, we say that a minimal number of instances is 1 and the maximal is 2. We also set Scale Target Request Per Minute to 1000. This means that when the number of requests per minute in our inference reaches 1000, a new instance will be created.
Review and Deploy
Finally, we need to review and start the deployment.
Conclusion
This service is really neat and we are looking forward to new collaborations between Hugging Face and Microsoft. In the end, check out the full presentation during the Build Conference and get a better feel of what we can expect from this combo.
Hugging Face has been on a mission to democratize good machine learning. With their Transformers open source library and the Hugging Face Hub, they are enabling the global AI community to build state-of-the-art machine learning models and applications in an open and collaborative way. Every day, over 100,000 people all over the world download more than 1 million models and datasets to solve real business problems with AI. I’m excited we’re bringing together the best of Hugging Face and the Azure platform and to offer to our customers new integrated experiences that build on the secure, compliant, and responsible AI foundation we have in AzureML, our MLops platform.
This bundle of e-books is specially crafted for beginners.
Everything from Python basics to the deployment of Machine Learning algorithms to production in one place.
Become a Machine Learning Superhero TODAY!

Nikola M. Zivkovic
CAIO at Rubik's Code
Nikola M. Zivkovic is the author of the books Ultimate Guide to Machine Learning and Deep Learning for Programmers. He loves knowledge sharing, and he is an experienced speaker. You can find him speaking at meetups, conferences, and as a guest lecturer at the University of Novi Sad.