Model as a Service in the Generative AI Era

Learn about a cloud-based service where machine learning or Generative AI models are hosted in the cloud and are easily available for consumption through chat-based APIs.

Bhala Ranganathan

Feb. 28, 24 · Analysis

Likes (1)

Comment

Save

3.5K Views

Generative AI space has seen huge advancements in recent times. AI models are getting better and better at tasks like text summarization, question answering, chatting, etc. For example, Bing Copilot has seen several improvements by taking advantage of the GPT-4 technology. Google also announced their Gemini and Bard models. However, training and fine-tuning such models require massive computing infrastructure, and they cost a lot. This is a huge barrier to AI adoption because not many players in the market may be able to come up with such large language models from scratch. However, building models or applications on top of an already existing foundational or base model is something that could solve this problem. This helps businesses because they don’t have to come up with a foundational model by themselves but can take advantage of an already existing one and fine-tune it to suit their needs or directly consume the model.

MaaS and MaaP

Model as a service (MaaS) refers to a cloud-based service where such machine learning or Generative AI models are hosted in the cloud and are easily available for consumption through simple chat-based APIs. The ease of use and lower learning curves for trying these services have accelerated their adoption. In general, MaaS simplifies model consumption.

Model as a platform (MaaP) is different from MaaS, where the model providers get access to the underlying infrastructure provided by the cloud provider rather than giving access to their model directly. In such cases, it could mean the model provider takes care of building, deploying, and managing their machine learning applications by leveraging the cloud infrastructure. MaaP empowers organizations to create comprehensive ML solutions.

MaaS Building Blocks

There are three parties involved in this process, namely the Model provider, Model publisher, and Model consumer. Model provider is typically the one who creates the model, and they can be open or closed-source models—e.g., Open AI, Hugging Face, etc. The model publisher could be a cloud provider who accepts this model from a model provider and makes it available for consumption to consumers—e.g., Amazon, Microsoft, etc. Model consumers consume the available models published by the model publisher. E.g. Chat applications, bots etc.

In order for a model provider to publish their model for consumption as a service, depending on the cloud provider they choose, there might be a few fundamental steps involved, as mentioned below:

Model registry: Model providers may want to use their registry service to provide all metadata associated with the model, like weights, params, bin files, safe tensors, etc.
Model catalog: A repository of available models for consumption. One may expose the foundational model directly or a model built on top of the foundational model.
Model endpoints: Model providers may want to specify the computer they want to provision as part of their model deployment, and the cloud service may expose an endpoint that consumers can use to access the model.

MaaS Interface

Typically, MaaS offerings are chat-based interfaces that generate a text when an input prompt is given. One can also send sampling parameters like temperature, repetition penalty, top k, max tokens, etc., in the request. Below is a hello world example of a request sent to the Facebook/opt-125m model hosted in localhost as a service. In the request, we are sending a few sampling parameters that the model accepts. As we can see, the model responds with a generated text as output.

    Shell
   
 

   curl http://0.0.0.0:5001/generate -H "Content-Type: application/json" -d '{
    "model": "facebook/opt-125m",
    "prompt":"Hibiscus is a beautiful",
    "max_tokens":20,
    "temperature":0.8,
    "top_p":0.95
  }'
  

    Shell
   
    {"text":["Hibiscus is a beautiful plant.  It will grow and live for years to come."]}

Conclusion

In conclusion, Model as a Service (MaaS) offers a convenient and efficient way to leverage pre-trained machine learning models for specific tasks without the overhead of model development. It enables organizations to focus on their core applications instead of building models on their own. Google, Microsoft, and Amazon are notable cloud providers that offer MaaS, and the list of models they support is expected to increase as new model providers arise. The cloud infrastructure behind the scenes should also scale well to support these models.

AI Machine learning generative AI

Opinions expressed by DZone contributors are their own.

Related

Trending