I am trying to understand how to use hailort to run inference on pretrained models with hailo8l on my raspberry pi 5.
While I did see the raspberry 5 examples, it is very complicated with streaming etc, and I want to start with a simple example where I take a model, image, pass the image trhoguh the model and see the output.
I did see in a few places (including using hailo tutorial command) something like working example code, but I couldn’t get a real prediction out of it.
# running on HailoRT v4.19.0, Raspberry Pi 5 AI HAT (Hailo8, python 3.10)
import numpy as np
import hailo_platform as hpf
hef_path = './retinaface_mobilenet_v1.hef'
hef = hpf.HEF(hef_path)
with hpf.VDevice() as target:
configure_params = hpf.ConfigureParams.create_from_hef(hef, interface=hpf.HailoStreamInterface.PCIe)
network_group = target.configure(hef, configure_params)[0]
network_group_params = network_group.create_params()
input_vstream_info = hef.get_input_vstream_infos()[0]
output_vstream_info = hef.get_output_vstream_infos()[0]
input_vstreams_params = hpf.InputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=hpf.FormatType.FLOAT32)
output_vstreams_params = hpf.OutputVStreamParams.make_from_network_group(network_group, quantized=False, format_type=hpf.FormatType.FLOAT32)
input_shape = input_vstream_info.shape
output_shape = output_vstream_info.shape
print(f"Input shape: {input_shape}, Output shape: {output_shape}")
with network_group.activate(network_group_params):
with hpf.InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as infer_pipeline:
for _ in range(10):
random_input = np.random.rand(*input_shape).astype(np.float32)
input_data = {input_vstream_info.name: np.expand_dims(random_input, axis=0)}
results = infer_pipeline.infer(input_data)
output_data = results[output_vstream_info.name]
print(f"Inference output: {output_data}")
The issues I’m having is:
I’m not even sure what output I should take. in the example(s) they do output_vstream_info = hef.get_output_vstream_infos()[0] but I don’t think it is the last layer where I should actually take. Instead I saw output_vstream_info = hef.get_output_vstream_infos()[-1] that made more sense since it 'feels ’ more like a last layer, but I really can’t tell.
How do I post-process to get the wanted results.? In the pi5 hailo examples they are using compiled postprocess in the inference pipeline, but I want to have more control (and also to learn how things work) and using something more like in the example code, where I take the inference result and process it myself.
If anyone has a working ‘real’ example (that if I’ll pass a training/real image the results would make sense) for any model that would be amazing.
Hi Dvir,
The vstream_info just hold the static info of the vstreams, such as shapes etc. The actual output tensor of the logits will be recieved from the infer command.
I think that your current issue lies in the fact that the simple pipelines that you’ve seen makes use of conpiled networks that have the post processing integrated in the HEF. In that case, they didn’t need to apply any bbox or NMS post processing.
While the pythonSDK looks relly cool and simple, the list of supported model is limited and I don’t have the models I’m looking for (face classification mainly, but also not sure I want yolo8 for face detection).
Thanks though, I’ll keep it for future reference, maybe with more models supported I would move to it : )
I’m not sure what you mean.
I think that I understand better now that somehow, all of the outputs suppose to be used for inference, and there isn’t a first/last layer I should use.
My problem is that the hef models’ output is different than the model itself (in the original pytroch repo) in the sense that in the pytorch repo, there is 3 outputs for bbox,classification, keypoints but in the inference here (in the line results = inference_pipeline.infer(input_data) I’m getting technically 9 outputs so not sure how to deal with that
Maybe I missed something, but looking at PySDK docs hailo_model_zoo.md and in the DeGirum AI hub models and I can’t find my models.
Currently I’m using retinaface_mobilenet_v1/scrfd_10g for face detection, and arcface_mobilefacenet for classification
Thanks
@dvir.itzko
Thanks for letting us know the models you need. We will let you know as soon as these are integrated to PySDK. Also, just curious why you do not want yolov8 for face detection.
TBH it is mainly the fact that I already checked and at least theoretically those models suit me with their performance and runtime. But, since you don’t have a face classification mode, I’ll have to figure out how to work with Hailo models myself.
@dvir.itzko
The retinaface_mobilenet_v1/scrfd_10/scrfd_2.5g_scrfd_500m models for face detection and arcface_mobilefacenet model for face embedding are now added to the zoo. Please try and let us know if you need any help in integrating these to your applications.
@shashi
Thanks for the reply!.
So first of all - the retinaface does work for me!. thats great news, now I’m just wondering what is this magic you have done to post process it. if you used the same hef file I’m using, then what was I missing?
I’m just wondering where can I get the files (like in the tutorial) so I can run it offline?
Hi @dvir.itzko
AI accelerators like the Hailo8/Hailo8L run the compute heavy portions of the ML models (like the conv layers) but the final tensor outputs need to go through a post-processor to convert them to human readable outputs like bounding boxes, scores, and landmark coordinates. The postprocessing logic varies from model to model. For popular models like YOLOv5 and YOLOv8, Hailo team already integrated these postprocessors so that the final results are already in the desired format. For other models, someone needs to write and integrate these postprocessors.
At DeGirum, we have developed a framework where the preprocessor, ML inference, and postprocessor are effectively pipelined to provide a seamless inference API that can take a user image and provide the final detection results. This pipeline also includes resizing the image to the size expected by the model and resizing outputs to original image size. This enables end users to develop AI applications easily without writing this type of boilerplate code over and over.
I am not sure I understand what you mean by running offline. If you use our cloud model zoo and run locally, the model is downloaded (the postprocessor python file, the model hef file, the labels file, and the model json). Alternatively, you can register for our AI Hub and manually download these files. Please let me know if you encounter any problems.
We are preparing a user guide to explain all these steps and I will keep you posted once the guide is available.
@shashi
Thanks again!
You provided exactly what I was looking for—I found the model in your AI Hub. Your infrastructure is truly impressive, but I believe the documentation and guides could use some enhancement. The information you shared here is crucial, yet I couldn’t find it anywhere else. Thanks once more for your help!
@dvir.itzko
Glad you found it useful. We are working on our documentation now and hope to release comprehensive user guides for all PySDK features by mid-February. I will keep you posted. In the meantime, please feel free to reach out if we can help in any way.