Create a Complete Computer Vision App in Minutes With Just Two Python Functions
Discover Pipeless, an open-source framework in computer vision development, and how you can create complete apps with just a few code functions.
Join the DZone community and get the full member experience.
Join For FreeThis article starts with an overview of what a typical computer vision application requires. Then, it introduces Pipeless, an open-source framework that offers a serverless development experience for embedded computer vision. Finally, you will find a detailed step-by-step guide on the creation and execution of a simple object detection app with just a couple of Python functions and a model.
Inside a Computer Vision Application
"The art of identifying visual events via a camera interface and reacting to them"
That is what I would answer if someone asked me to describe what computer vision is in one sentence. But it is probably not what you want to hear. So let's dive into how computer vision applications are typically structured and what is required in each subsystem.
- Really fast frame processing: Note that to process a stream of 60 FPS in real-time, you only have 16 ms to process each frame. This is achieved, in part, via multi-threading and multi-processing. In many cases, you want to start processing a frame even before the previous one has finished.
- An AI model to run inference on each frame and perform object detection, segmentation, pose estimation, etc: Luckily, there are more and more open-source models that perform pretty well, so we don't have to create our own from scratch, you usually just fine-tune the parameters of a model to match your use case (we will not deep dive into this today).
- An inference runtime: The inference runtime takes care of loading the model and running it efficiently on the different available devices (GPUs or CPUs).
- A GPU: To run the inference using the model fast enough, we require a GPU. This happens because GPUs can handle orders of magnitude more parallel operations than a CPU, and a model at the lowest level is just a huge bunch of mathematical operations. You will need to deal with the memory where the frames are located. They can be at the GPU memory or at the CPU memory (RAM) and copying frames between those is a very heavy operation due to the frame sizes that will make your processing slow.
- Multimedia pipelines: These are the pieces that allow you to take streams from sources, split them into frames, provide them as input to the models, and, sometimes, make modifications and rebuild the stream to forward it.
- Stream management: You may want to make the application resistant to interruptions in the stream, re-connections, adding and removing streams dynamically, processing several of them at the same time, etc.
All those systems need to be created or incorporated into your project and thus, it is code that you need to maintain. The problem is that you end up maintaining a huge amount of code that is not specific to your application, but subsystems around the actual case-specific code.
The Pipeless Framework
To avoid having to build all the above from scratch, you can use Pipeless. It is an open-source framework for computer vision that allows you to provide a few functions specific to your case and it takes care of everything else.
Pipeless splits the application's logic into "stages," where a stage is like a micro app for a single model. A stage can include pre-processing, running inference with the pre-processed input, and post-processing the model output to take any action. Then, you can chain as many stages as you want to compose the full application even with several models.
To provide the logic of each stage, you simply add a code function that is very specific to your application, and Pipeless takes care of calling it when required. This is why you can think about Pipeless as a framework that provides a serverless-like development experience for embedded computer vision. You provide a few functions and you don't have to worry about all the surrounding systems that are required.
Another great feature of Pipeless is that you can add, remove, and update streams dynamically via a CLI or a REST API to fully automate your workflows. You can even specify restart policies that indicate when the processing of a stream should be restarted, whether it should be restarted after an error, etc.
Finally, to deploy Pipeless you just need to install it and run it along with your code functions on any device, whether it is in a cloud VM or containerized mode, or directly within an edge device like a Nvidia Jetson, a Raspberry, or any others.
Creating an Object Detection Application
Let's deep dive into how to create a simple application for object detection using Pipeless.
The first thing we have to do is to install it. Thanks to the installation script, it is very simple:
curl https://raw.githubusercontent.com/pipeless-ai/pipeless/main/install.sh | bash
Now, we have to create a project. A Pipeless project is a directory that contains stages. Every stage is under a sub-directory, and inside each sub-directory, we create the files containing hooks (our specific code functions). The name that we provide to each stage folder is the stage name that we have to indicate to Pipeless later when we want to run that stage for a stream.
pipeless init my-project --template empty cd my-project
Here, the empty template tells the CLI to just create the directory, if you do not provide any template, the CLI will prompt you several questions to create the stage interactively.
As mentioned above, we now need to add a stage to our project. Let's download an example stage from GitHub with the following command:
wget -O - https://github.com/pipeless-ai/pipeless/archive/main.tar.gz | tar -xz --strip=2 "pipeless-main/examples/onnx-yolo"
That will create a stage directory, onnx-yolo
, that contains our application functions.
Let's check the content of each of the stage files; i.e., our application hooks.
We have the pre-process.py file, which defines a function (hook
) taking a frame and a context. The function makes some operations to prepare the input data from the received RGB frame in order to match the format that the model expects. That data is added to the frame_data['inference_input']
which is what Pipeless will pass to the model.
def hook(frame_data, context): frame = frame_data["original"].view() yolo_input_shape = (640, 640, 3) # h,w,c frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB) frame = resize_rgb_frame(frame, yolo_input_shape) frame = cv2.normalize(frame, None, 0.0, 1.0, cv2.NORM_MINMAX) frame = np.transpose(frame, axes=(2,0,1)) # Convert to c,h,w inference_inputs = frame.astype("float32") frame_data['inference_input'] = inference_inputs ... (some other auxiliar functions that we call from the hook function)
We also have the process.json file, which indicates Pipeless the inference runtime to use (in this case, the ONNX Runtime), where to find the model that it should load, and some optional parameters for it, such as the execution_provider
to use, i.e., CPU
, CUDA
, TensortRT
, etc.
{ "runtime": "onnx", "model_uri": "https://pipeless-public.s3.eu-west-3.amazonaws.com/yolov8n.onnx", "inference_params": { "execution_provider": "tensorrt" } }
Finally, the post-process.py file defines a function similar to the one at pre-process.py. This time, it takes the inference output that Pipeless stored at frame_data["inference_output"]
and performs the operations to parse that output into bounding boxes. Later, it draws the bounding boxes over the frame, to finally assign the modified frame to frame_data['modified']
. With that, Pipeless will forward the stream that we provide but with the modified frames including the bounding boxes.
def hook(frame_data, _): frame = frame_data['original'] model_output = frame_data['inference_output'] yolo_input_shape = (640, 640, 3) # h,w,c boxes, scores, class_ids = parse_yolo_output(model_output, frame.shape, yolo_input_shape) class_labels = [yolo_classes[id] for id in class_ids] for i in range(len(boxes)): draw_bbox(frame, boxes[i], class_labels[i], scores[i]) frame_data['modified'] = frame ... (some other auxiliar functions that we call from the hook function)
The final step is to start Pipeless and provide a stream. To start Pipeless, simply run the following command from the my-project directory:
pipeless start --stages-dir .
Once running, let's provide a stream from the webcam (v4l2
) and show the output directly on the screen
. Note we have to provide the list of stages that the stream should execute in order; in our case, it is just the onnx-yolo
stage:
pipeless add stream --input-uri "v4l2" --output-uri "screen" --frame-path "onnx-yolo"
And that's all!
Conclusion
We have described how creating a computer vision application is a complex task due to many factors and the subsystems that we have to implement around it. With a framework like Pipeless, getting up and running takes just a few minutes and you can focus just on writing the code for your specific use case. Furthermore, Pipeless' stages are highly reusable and easy to maintain so the maintenance will be easy and you will be able to iterate very fast.
If you want to get involved with Pipeless and contribute to its development, you can do so through its GitHub repository.
Published at DZone with permission of Miguel Angel Cabrera. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments