Face Detection + Gender Classification: Pipelining two models on Hailo devices

User Guide: Model Pipelining with DeGirum PySDK

Model pipelining is a versatile approach in AI, allowing multiple models to work in sequence. The output of one model is used as the input to another, enabling more sophisticated applications by combining specialized tasks. This guide introduces the concept using a practical example of face detection followed by gender classification.


Example: Face Detection and Gender Classification

In this example, we use two models:

  1. Face Detection Model: Detects faces in a video stream and generates bounding boxes around them.
  2. Gender Classification Model: Classifies the gender of each detected face.

The models are combined into a pipeline where the face detection model processes the input video, and its outputs (cropped face regions) are passed to the gender classification model for further analysis.


Code Reference

import degirum as dg, degirum_tools

# choose inference host address
inference_host_address = "@cloud"
# inference_host_address = "@local"

# choose zoo_url
zoo_url = "degirum/models_hailort"
# zoo_url = "../models"

# set token
token = degirum_tools.get_token()
# token = '' # leave empty for local inference

face_det_model_name = "yolov8n_relu6_face--640x640_quant_hailort_hailo8l_1"
gender_cls_model_name = "yolov8n_relu6_fairface_gender--256x256_quant_hailort_hailo8l_1"
video_source = "../assets/faces_and_gender.mp4"

# Load face detection and gender detection models
face_det_model = dg.load_model(
    model_name=face_det_model_name,
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=degirum_tools.get_token(),
    overlay_color=[(255,255,0),(0,255,0)]    
)

gender_cls_model = dg.load_model(
    model_name=gender_cls_model_name,
    inference_host_address=inference_host_address,
    zoo_url=zoo_url,
    token=degirum_tools.get_token(),
)

# Create a compound cropping model with 30% crop extent
crop_model = degirum_tools.CroppingAndClassifyingCompoundModel(
    face_det_model, 
    gender_cls_model, 
    30.0
)

#  Run AI inference on video stream and display inference results
# Press 'x' or 'q' to stop
with degirum_tools.Display("Faces and Gender") as display:
    for inference_result in degirum_tools.predict_stream(crop_model, video_source):
        display.show(inference_result)

How It Works

  1. Model Loading:

    • The face detection and gender classification models are loaded using dg.load_model.
    • The overlay_color parameter is used to differentiate detected faces visually.
  2. Pipeline Creation:

    • A compound model is created using degirum_tools.CroppingAndClassifyingCompoundModel, which combines face detection and gender classification.
    • The crop_extent parameter (30% in this example) ensures that the face region is appropriately cropped for the second model.
  3. Inference Execution:

    • The predict_stream function runs the compound model on the input video source.
    • Cropped face regions are passed from the detection model to the classification model.
  4. Result Display:

    • Detected faces and their classified genders are displayed in a dedicated window. Use the ‘x’ or ‘q’ keys to stop the display.

Applications

  • Video analytics
  • Security and surveillance
  • Retail and customer insights
  • Personalized user experiences

This pipelining approach allows you to extend workflows to include additional tasks, such as emotion detection or age estimation, making it a scalable and modular solution for complex AI applications.

Hi, where i can get information about degirum tools like CroppingAndClassifyingCompoundModel ?

Hi @kyurrii
We are currently working on comprehensive docs for degirum tools. In the meantime you can look at the code and code examples as the repo is public: DeGirum/degirum_tools: Utilities for use with PySDK. Please let us know if you have any specific questions.

1 Like

Ok. I looked through the examples, but it would be good really to have list of tools with specifications. So looking forward to appearance of the docs you mentioned.

Should we stop any development we are doing with the rpi5 examples that use GStreamer and start learning the PySDK if we plan to do multi model processing?

Hi @user116
Both the Gstreamer based pipelines and DeGirum PySDK are built on top of HailoRT. PySDK APIs are simpler for application development, and we try to provide working examples for many common scenarios. But it is up to the end user to choose the framework most suitable for their needs. Hope this helps.

@kyurrii
Happy to let you know that we have initial version of documentation for degirum_tools. It still needs work but the current version should already serve as a good starting point.

@shashi, Thanks for notification. Will check out asap.

I can’t find any multi-modal documentation with GStreamer so I’m assuming we should switch to PySDK as all the examples for multi-modal are pointing that direction. Thank you @shashi

Hi @user116
Multi model applications are possible with gstreamer as well but they are hard to develop and debug. PySDK makes it simpler for such use cases.

@shashi I’m liking the PySDK stuff I’ve been reading for the last few hours. I’ve got the hailo_examples installed and running with Jupyter but I haven’t been able to get the rpi camera running in any of those examples. All the examples use static images or video files so I’m still reading for how to use the live camera.

Hi @user116
Please see this example: hailo_examples/examples/016_custom_video_source.ipynb at main · DeGirum/hailo_examples · GitHub

Hi @shashi,
first of all fantastic work you have done with DeGirum. It really makes deploying and running models on hailo hardware that much simpler.
However, I have a question regarding the model pipelining, specifically a pipeline of an object detection + classification model. In my use case I only want to classify a few of the bbox predictions but as far as I could tell from your documentation is that CroppingAndClassifyingCompoundModel takes all bboxes of the OD model and inputs it into the classification model. Is there a way to make it so only a selected amount of bboxes will be used for the classification model?

Hi @Simon_Keilbach
Thank you for the kind words. You are right that the CroppingAndClassifyingCompoundModel processes all the detected boxes. However, it is not to difficult to process only a select few boxes. Do you have a criterion for selection which can be passed as a function or easily parametrizable? For example, bbox area? score? top5 boxes? If you let us know, we can think of and implement a mechanism that would allow you to do this. Even now, you can filter out the boxes and run classifier only on some detections, but performance might not be optimal.

Hi @shashi
thanks for the lightning-fast reply.
My main criterion would be the predicted label, e.g. if the label is either “Car” or “Bus” I would like to do a fine-grained analysis using the classification model (e.g. to determine a sub-class like “sports car”, “van”, “sedan”). If the label is something else, e.g. “tyre”, I dont need any fine-grained analysis meaning I dont want to proceed to feed its bbox into the classification model. Hope this makes it clear what my goal is.
With your last sentence, do you mean to first run the object detection, then filter out the bbox and crop and then load the classification model and perform classification?

Hi @Simon_Keilbach
Thanks for clarifying your goal. Your use case is much cleaner, and it should be relatively easy to support. I will discuss internally and get back to you on the best way to achieve your goal.

yes. This is what the compound model does internally with the added advantage of pipelining everything optimally for best performance. However, it currently processes all detection results and runs classifier on them.

1 Like

@Simon_Keilbach ,

To achieve it you may use class label filtering feature or model object.
If detector_model is your detector model object, then you may do this:

detector_model.output_class_set = {“Car”, “Bus”}

This assignment will limit detector_model results to only results having “Car” or “Bus” class labels.

@Vlad_Klimov
Thanks for the input. That is a very useful feature.
However, I am unsure if this is suited for my needs as I not only want to predict “Car”, “Bus” but also other classes, e.g. like “Tyre” or “Headlights” with the same object detection model. The only workaround would be to queue the model twice (once without any output_class_set set and the other time with the output_class_set = {“Car”, “Bus”}. This pipeline takes ~250ms (without the additional overhead from the classification model) to perform inference on both models (~100ms per model and 50ms for loading the model onto the hailo chip). If there is no other way at the moment to adress this issue I might stick with it; I am not trying to perform inference on video frames meaning sub 300ms inference time would be something I could live with

@Simon_Keilbach
We will provide a code snippet that can do what you need. We need to update our degirum_tools package for that purpose. We will keep you posted.

1 Like

@Simon_Keilbach ,

Quick fix would be to extend CroppingAndClassifyingCompoundModel class with your own implementation of queue_result1 method, which does cropping. You just add filtering logic there. Below is one possible implementation of such class:

class MyCroppingAndClassifyingCompoundModel(
    degirum_tools.CroppingAndClassifyingCompoundModel
):
    allowed_labels = ["person"] # list of labels to crop

    def queue_result1(self, result1):
        image_sz = degirum_tools.image_tools.image_size(result1.image)
        for idx, obj in enumerate(result1.results):
            if obj["label"] in self.allowed_labels:
                adj_bbox = self._adjust_bbox(obj["bbox"], image_sz)
                obj["bbox"] = adj_bbox
                cropped_img = degirum_tools.image_tools.crop_image(
                    result1.image, adj_bbox
                )
                self.queue.put(
                    (cropped_img, degirum_tools.compound_models.FrameInfo(result1, idx))
                )
1 Like