A Beginner's Guide To Deploying Hugging Face Models on SageMaker: Unlocking AI Capabilities for NLP, CV, and Gen AI

Learn how to elevate your SageMaker endpoint for any Hugging Face model to run your GenAI or traditional Machine Learning models.

Sai Sharanya Nalla

May. 16, 24 · Tutorial

Likes (3)

Comment

Save

1.0K Views

Hugging Face is the technology startup, with an active open-source community, that drove the worldwide adoption of transformer-based models thanks to its eponymous Transformers library. Hugging Face and AWS collaborated to enable you to train and deploy over 10,000 pre-trained models on Amazon SageMaker. For more information on training Hugging Face models at scale on SageMaker, refer to "AWS and Hugging Face collaborate to simplify and accelerate adoption of Natural Language Processing models" and the sample notebooks.

In this post, we discuss different methods to create a SageMaker endpoint for a Hugging Face model.

Overview

If you’re unfamiliar with transformer-based models and their place in the natural language processing (NLP) landscape, here is an overview. A lot of use cases in NLP can be modeled as supervised learning tasks. The classic supervised learning scenario is based on learning in isolation, where a model is trained on a specific dataset for a specific task. Any change in the dataset or task requires training a new model. This scenario becomes challenging in the absence of sufficient labeled data to train a task-specific model.

Transfer learning alleviates this challenge by first pre-training — using vast amounts of data to build knowledge in an unsupervised manner — and then fine-tuning, namely transferring that knowledge, supplemented by a labeled dataset, to adapt to a downstream task. Although transfer learning has been a part of NLP over the past decade, the field had a major breakthrough in 2017 with the transformer architecture ("Attention is all you Need") proposed by Vaswani et al. Since then, adaptations of the transformer architecture in models such as BERT, RoBERTa, GPT-2, and DistilBERT have pushed the boundaries for state-of-the-art NLP models on a wide range of tasks, such as text classification, question answering, summarization, and text generation. Hugging Face enables you to develop NLP applications for such tasks without the need to train state-of-the-art transformer models from scratch, which could be expensive in terms of computation, cost, and time.

The Hugging Face Deep Learning Containers (DLCs) make it easier not only to train Hugging Face transformer models on SageMaker, but also deploy them, thereby making the management of inference infrastructure easier. The Hugging Face Inference Toolkit for SageMaker is an open-source library for serving Hugging Face Transformers models on SageMaker. It utilizes the SageMaker Inference Toolkit for starting up the model server, which is responsible for handling inference requests.

You can deploy models with Hugging Face DLCs on SageMaker in the following ways:

A fully managed method to deploy the model to a SageMaker endpoint without the need for writing any custom inference functions - these models could either be:
1. Fine-tuned models based on your use case
2. Pre-trained models from the Hugging Face Hub
A module that provides more customization through an inference script and allows you to override the default methods of the HuggingFaceHandlerService: This module consists of a model_fn() to override the default method for loading the model. After the model is loaded, predictions are obtained by either implementing a transform_fn() or by implementing input_fn(), predict_fn(), or output_fn() to override the default preprocessing, prediction, and post-processing methods, respectively.

One of the benefits of using the Hugging Face SDK is that it handles inference containers on your behalf and you don’t need to manage Docker files or Docker registries. For more information, refer to Deep Learning Containers Images.

In the following sections, we walk through the three methods to deploy endpoints.

Create a SageMaker Endpoint With a Trained Model

To deploy a SageMaker-trained Hugging Face model from Amazon Simple Storage Service (Amazon S3), make sure that all required files are saved in model.tar.gz file, including the Tokenizer, and use model_data to point your saved model file in Amazon S3. See the following code:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker 


# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://bucket/model.tar.gz", # S3 path to your trained sagemaker model
   role=<SageMaker Role>, # IAM role with permissions to create an Endpoint
   transformers_version="4.6", # transformers version used
   pytorch_version="1.7", # pytorch version used
   py_version="py36", # python version of the DLC
)

Create a SageMaker Endpoint With a Model From the Hugging Face Hub

You shouldn’t use this feature in production for loading large models; models over 10 GB aren’t supported with this feature.

To deploy a model directly from the Hub to SageMaker, you need to initialize the following environment variables:

HF_MODEL_ID – Defines the model ID, which is automatically loaded from Hugging Face when creating a SageMaker endpoint; the Hub provides over 10,000 models, all available through this environment variable
HF_TASK – Defines the task for the used Transformers pipeline; for a full list of tasks, see Pipelines

The value of HF_TASK can be one from the following list:

"feature-extraction", "text-classification",  "token-classification","table-question-answering","question-answering", "fill-mask",  "summarization",  "translation",  "text2text-generation,  "text-generation","zero-shot-classification" or "conversational"

The following is a code snippet showing the steps:

from sagemaker.huggingface import HuggingFaceModel
import sagemaker 

# Hub Model configuration. https://huggingface.co/models
hub = {
  'HF_MODEL_ID':'distilbert-base-uncased-distilled-squad', # model_id from hf.co/models
  'HF_TASK':'question-answering' # NLP task you want to use for predictions
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,
   role=<SageMaker Role>, # iam role with permissions to create an Endpoint
   transformers_version="4.6", # transformers version used
   pytorch_version="1.7", # pytorch version used
   py_version="py36", # python version of the DLC
)

Next, you deploy the Hugging Face model to SageMaker and specify the initial instance count and instance type. For more information about the various supported instance types, see Amazon SageMaker Pricing.

deploy returns a Predictor object, which you can use to do inference on the endpoint hosting your Hugging Face model. Each Predictor provides a predict method, which can do inference with NumPy arrays or Python lists. See the following code:

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.m5.xlarge"
)

predict returns the result of inference against your model. By default, the inference result is a JSON serializer. See the following code:

# example request, you always need to define "inputs"
data = {"inputs": {
       "question": "Which name is also used to describe the Amazon rainforest in English?",
       "context": "The Amazon rainforest (Portuguese: Floresta Amazônica or Amazônia; Spanish: Selva Amazónica, Amazonía or usually Amazonia; French: Forêt amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America."
} } 
result = predictor.predict(data)

Create a SageMaker Endpoint Using a Custom Inference Script

The Hugging Face Inference Toolkit allows you to override the default methods of HuggingFaceHandlerService by specifying a custom inference.py with model_fn and optionally input_fn, predict_fn, output_fn, or transform_fn. Therefore, you need to create a named code/ with a inference.py file in it. For example:

model.tar.gz/
  |- pytorch_model.bin
  |- ....
  |- code/
    |- inference.py

In this example, pytroch_model.bin is the model file saved from training, inference.py is the custom inference module, and requirements.txt is a requirements file to add additional dependencies. The custom module can override the model_fn, input_fn, predict_fn, output_fn or transform_fn methods. For more information, see the GitHub repo.

Clean Up

Make sure you delete the SageMaker endpoints to avoid unnecessary costs:

predictor.delete_endpoint()

Documentation and Code Samples To Get Started

AI AWS Amazon SageMaker Deep learning NLP

Published at DZone with permission of Sai Sharanya Nalla. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending