How to Utilize Python Machine Learning Models
Learn how to serve and deploy machine learning models built in Python locally, on the cloud, and Kubernetes with an open-source framework.
Join the DZone community and get the full member experience.
Join For FreeEver trained a new model and just wanted to use it through an API straight away? Sometimes you don't want to bother writing Flask code or containerizing your model and running it in Docker. If that sounds like you, you definitely want to check out MLServer. It's a Python-based inference server that recently went GA, and what's really neat about it is that it's a highly-performant server designed for production environments. That means that, by serving models locally, you are running in the exact same environment as they will be in when they get to production.
This blog walks you through how to use MLServer by using a couple of image models as examples.
Dataset
The dataset we're going to work with is the Fashion MNIST dataset. It contains 70,000 images of clothing in greyscale 28x28 pixels across 10 different classes (top, dress, coat, trousers, etc......…).
If you want to reproduce the code from this blog, make sure you download the files and extract them into a folder named data
. They have been omitted from the GitHub repo because they are quite large.
Training the Scikit-learn Model
First up, we're going to train a support vector machine (SVM) model using the scikit-learn framework. We'll then save the model to a file named Fashion-MNIST.joblib
.
import pandas as pd
from sklearn import svm
import time
import joblib
#Load Training Data
train = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)
y_train = train['label']
X_train = train.drop(['label'], axis=1)
classifier = svm.SVC(kernel="poly", degree=4, gamma=0.1)
#Train Model
start = time.time()
classifier.fit(X_train.values, y_train.values)
end = time.time()
exec_time = end-start
print(f'Execution time: {exec_time} seconds')
#Save Model
joblib.dump(classifier, "Fashion-MNIST.joblib")
Note: The SVM algorithm is not particularly well suited to large datasets because of its quadratic nature. Depending on your hardware, the model in this example will take a couple of minutes to train.
Serving the Scikit-learn Model
Ok, so we've now got a saved model file Fashion-MNIST.joblib
. Let's take a look at how we can serve that using MLServer...
First up, we need to install MLServer.
pip install mlserver
The additional runtimes are optional but make life really easy when serving models; we'll install the Scikit-Learn and XGBoost ones too
pip install mlserver-sklearn mlserver-xgboost
You can find details on all of the inference runtimes here.
Once we've done that, all we need to do is add two configuration files:
settings.json
- This contains the configuration for the server itself.model-settings.json
- As the name suggests, this file contains the configuration for the model we want to run.
For our settings.json
file, it's enough to just define a single parameter:
{
"debug": "true"
}
The model-settings.json
file requires a few more bits of info as it needs to know about the model we're trying to serve:
{
"name": "fashion-sklearn",
"implementation": "mlserver_sklearn.SKLearnModel",
"parameters": {
"uri": "./Fashion_MNIST.joblib",
"version": "v1"
}
}
The name
the parameter should be self-explanatory. It gives MLServer a unique identifier which is particularly useful when serving multiple models (we'll come to that in a bit). The implementation
defines which pre-built server, if any, to use. It is heavily coupled to the machine learning framework used to train your model. In our case, we trained the model using scikit-learn, so we're going to use the scikit-learn implementation for MLServer. For the model, parameters
we just need to provide the location of our model file as well as a version number.
That's it, two small config files, and we're ready to serve our model using the command:
mlserver start .
Boom, we've now got our model running on a production-ready server locally. It's now ready to accept requests over HTTP and gRPC (default ports 8080
and 8081
respectively).
Testing the Model
Now that our model is up and running. Let's send some requests to see it in action.
To make predictions on our model, we need to send a POST request to the following URL:
http://localhost:8080/v2/models/<MODEL_NAME>/versions/<VERSION>/infer
That means to access the scikit-learn model that we trained earlier; we need to replace the MODEL_NAME
with fashion-sklearn
and VERSION
with v1
.
The code below shows how to import the test data, make a request to the model server and then compare the result with the actual label:
import pandas as pd
import requests
#Import test data, grab the first row and corresponding label
test = pd.read_csv('../../data/fashion-mnist_test.csv', header=0)
y_test = test['label'][0:1]
X_test = test.drop(['label'],axis=1)[0:1]
#Prediction request parameters
inference_request = {
"inputs": [
{
"name": "predict",
"shape": X_test.shape,
"datatype": "FP64",
"data": X_test.values.tolist()
}
]
}
endpoint = "http://localhost:8080/v2/models/fashion-sklearn/versions/v1/infer"
#Make request and print response
response = requests.post(endpoint, json=inference_request)
print(response.text)
print(y_test.values)
When running the test.py
code above, we get the following response from MLServer:
{
"model_name": "fashion-sklearn",
"model_version": "v1",
"id": "31c3fa70-2e56-49b1-bcec-294452dbe73c",
"parameters": null,
"outputs": [
{
"name": "predict",
"shape": [
1
],
"datatype": "INT64",
"parameters": null,
"data": [
0
]
}
]
}
You'll notice that MLServer has generated a request id and automatically added metadata about the model and version that was used to serve our request. Capturing this kind of metadata is super important once our model gets to production; it allows us to log every request for audit and troubleshooting purposes.
You might also notice that MLServer has returned an array for outputs
. In our request, we only sent one row of data, but MLServer also handles batch requests and returns them together. You can even use a technique called adaptive batching to optimize the way multiple requests are handled in production environments.
In our example above, the model's prediction can be found in outputs[0].data
which shows that the model has labeled this sample with the category 0
(The value 0 corresponds to the category t-shirt/top
). The true label for that sample was a 0
too so the model got this prediction correct!
Training the XGBoost Model
Now that we've seen how to create and serve a single model using MLServer, let's take a look at how we'd handle multiple models trained in different frameworks.
We'll be using the same Fashion MNIST dataset, but this time, we'll train an XGBoost model instead.
import pandas as pd
import xgboost as xgb
import time
#Load Training Data
train = pd.read_csv('../../data/fashion-mnist_train.csv', header=0)
y_train = train['label']
X_train = train.drop(['label'], axis=1)
dtrain = xgb.DMatrix(X_train.values, label=y_train.values)
#Train Model
params = {
'max_depth': 5,
'eta': 0.3,
'verbosity': 1,
'objective': 'multi:softmax',
'num_class' : 10
}
num_round = 50
start = time.time()
bstmodel = xgb.train(params, dtrain, num_round, evals=[(dtrain, 'label')], verbose_eval=10)
end = time.time()
exec_time = end-start
print(f'Execution time: {exec_time} seconds')
#Save Model
bstmodel.save_model('Fashion_MNIST.json')
The code above, used to train the XGBoost model, is similar to the code we used earlier to train the scikit-learn model, but this time, our model has been saved in an XGBoost-compatible format as Fashion_MNIST.json
.
Serving Multiple Models
One of the cool things about MLServer is that it supports multi-model serving. This means that you don't have to create or run a new server for each ML model you want to deploy. Using the models we built above, we'll use this feature to serve them both at once.
When MLServer starts up, it will search the directory (and any sub-directories) for model-settings.json
files. If you've got multiple model-settings.json
files, then it'll automatically serve them all.
Note: you still only need a single settings.json
(server config) file in the root directory
Here's a breakdown of my directory structure for reference:
.
├── data
│ ├── fashion-mnist_test.csv
│ └── fashion-mnist_train.csv
├── models
│ ├── sklearn
│ │ ├── Fashion_MNIST.joblib
│ │ ├── model-settings.json
│ │ ├── test.py
│ │ └── train.py
│ └── xgboost
│ ├── Fashion_MNIST.json
│ ├── model-settings.json
│ ├── test.py
│ └── train.py
├── README.md
├── settings.json
└── test_models.py
Notice that there are two model-settings.json
files - one for the scikit-learn model and one for the XGBoost model.
We can now just run mlserver start .
and it will start handling requests for both models.
[mlserver] INFO - Loaded model 'fashion-sklearn' succesfully.
[mlserver] INFO - Loaded model 'fashion-xgboost' succesfully.
Testing Accuracy of Multiple Models
With both models now up and running on MLServer, we can use the samples from our test set to validate how accurate each of our models is.
The following code sends a batch request (containing the full test set) to each of the models and then compares the predictions received to the true labels. Doing this across the whole test set gives us a reasonably good measure for each model's accuracy, which gets printed at the end.
import pandas as pd
import requests
import json
#Import the test data and split the data from the labels
test = pd.read_csv('./data/fashion-mnist_test.csv', header=0)
y_test = test['label']
X_test = test.drop(['label'],axis=1)
#Build the inference request
inference_request = {
"inputs": [
{
"name": "predict",
"shape": X_test.shape,
"datatype": "FP64",
"data": X_test.values.tolist()
}
]
}
#Send the prediction request to the relevant model, compare responses to training labels and calculate accuracy
def infer(model_name, version):
endpoint = f"http://localhost:8080/v2/models/{model_name}/versions/{version}/infer"
response = requests.post(endpoint, json=inference_request)
#calculate accuracy
correct = 0
for i, prediction in enumerate(json.loads(response.text)['outputs'][0]['data']):
if y_test[i] == prediction:
correct += 1
accuracy = correct / len(y_test)
print(f'Model Accuracy for {model_name}: {accuracy}')
infer("fashion-xgboost", "v1")
infer("fashion-sklearn", "v1")
The results show that the XGBoost model slightly outperforms the SVM scikit-learn one:
Model Accuracy for fashion-xgboost: 0.8953
Model Accuracy for fashion-sklearn: 0.864
Summary
Hopefully, you've gained an understanding of how easy it is to serve models using MLServer. For further info, it's worth reading the docs and taking a look at the examples for different frameworks.
For MLFlow users, you can now serve models directly in MLFlow using MLServer, and if you're a Kubernetes user, you should definitely check out Seldon Core — an open-source tool that deploys models to Kubernetes (it uses MLServer under the covers).
All of the code from this example can be found here.
Published at DZone with permission of Ed Shee. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments