How can I compile yolov5m and my own yolov5 onnx to hef without quantization, optimization and nms

I can compile hailo yolov5m.onnnx and my own i3yolov5m onnx to hef with nms layer, and get the detection results. But when I integrate the hailo detection to my test application, and run over the test cases, the accuracy drops obviously comparing with tensorRT model which is running on GPU. (the tensorRT engine is built from same i3yolov5m.onnx)

Following the tutorial, I’m trying to build the onnx to hef without quantization and optimization. But seems I cannot. I I use the following code to build.

nnx_model_name = ‘i3yolov5m23’
onnx_path = ‘…/models/i3yolov5m23.onnx’
assert os.path.isfile(onnx_path), ‘Please provide valid path for ONNX file’

Initialize a new client runner

runner = ClientRunner(hw_arch=‘hailo8’)

Any other hw_arch can be used as well.

Translate YOLO model from ONNX

runner.translate_onnx_model(onnx_path, onnx_model_name,
end_node_names=[‘Conv_344’, ‘Conv_309’, ‘Conv_274’],
net_input_shapes={‘images’: [1, 3, 640, 640]})

calib_dataset = np.load(‘i3calib_set_640_640.npy’)

Call Optimize to perform the optimization process


Save the result state to a Quantized HAR file

quantized_model_har_path = f’{onnx_model_name}_quantized_model_nms.har’

hef = runner.compile()

file_name = f’{model_name}.hef’
with open(file_name, ‘wb’) as f:

I found I have to invoke “runner.optimize(calib_dataset)”, otherwise when I compile it to hef, I get warning that there is no weights.

The hef I got could run with hailo8, and I get 3 output streams. But when I tested, the output box is messy. I tried to compile yolov5m from hailo website, and get same wrong boxes. And I found the qp_zp of the output steam is about 181, 182 or 184 of yolov5s. I downloaded the yolo_vehicle from hailo website, and it works correctly.

Can you please let me know how to compile yolov5m and my yolo from onnx to hef without quantization and optimization? And can you please uplaod me a yolov5m onnx without nms layer to test?


Quantizing/Optimizing a network is an essential part of running a model on the Hailo-8 AI accelerator. Neural networks are developed and trained using floating point because it keeps the process simple. No need to worry about data ranges. However floating point hardware uses larger silicon area and power. One point that makes the Hailo-8 so power efficient is computing the values using integer hardware. Hailo-8 supports 8-bit (default) and 4 and 16-bit.

Nothing in live is free, so there will be a accuracy degradation when quantizing a network. The goal is to be in the single digit percentage range and in many cases below 1%.

This may require to use more advanced optimization algorithms. For these the Hailo Dataflow Compiler will need a GPU.

You can run your model after parsing in the emulator at full precision to confirm the model was parsed correctly and the model is still working as expected.

Thanks for you reply Klausk!

I do have a GPU on my computer to run tensorRT engine below. I could compile my own yolom5 to hef and run on hailo8 well, but when I compare the accuracy with tensorRT engine which is compiled from same yolov5m, the accuracy drops. So I will go on to test with more advance and optimization algorithms. I do have a GPU GTX 1050Ti on my computer, and can I run it in WSL?

I’m not recommend compile yolov5m and your yolo from onnx to hef without quantization and optimization.

I’m using yolov5 without NMS Layers. In my experience, the NMS layer supported by Hailo was the cause of the processing speed and performance degradation.

using NMS source code from yolov5 github, or coding your custom NMS for coustom yolov5 model.

That’s very important suggestion. Thank you!

I tried to compile my onnx to hailo8 hef without nms, but I cannot get correct results from the output. The boxes I extracted from 3 output streams are not correct. Do you have any clue for it?

Many thanks!

I would recommend to first check the normalization. It should be added to the model in the model script and then you should not normalize the input data, otherwise you will apply it twice.

How do you check the result? On the hardware or the emulator?

I checked the result on hardware.

I tried a few models on hailo website. After some modification, of the yolo sample code, I can get correct boxes from yolov5_vehicles.hef which has no nms. But when I compile my own yolo or hailo yolomv5m, they output some boxes, but they are not correct.

Do you try the yolov5s.hef from hailo Website? If you can get correct results, I may write wrong code to extract boxes from the output streams.

Another thing to check are the Anchors for the Yolov5 model.

Have a look at the alls script for the yolov5 model

GitHub Yolov5m alls script

It has the following line where it points to a JSON file:

nms_postprocess("../../../postprocess_config/yolov5m_nms_config.json", yolov5, engine=cpu)

This JSON file you can get from the Model Zoo. It is part of the “Pretrained” download. It uses the default anchors for the model. These can change during retraining.

GitHub - Hailo Model Zoo - Object detection - Hailo-8

You will need to extract the anchors from your model and adapt the JSON file. You can find the instruction here: - Training Yolov5

I found both yolov5m.hef and yolov5s.hef I downloaded from the URL are compiled with nms already. But the yolov5m_vehicles.hef is compiled without nms, so its output has 3 feature maps.

I double checked the anchors and they are all correct, but I still get wrong boxes.

Here is the yolov5s_no_nms.hef I compiled from yolov5s.onnx from hailo website.


Can you please help me to check the yolov5s_no_nms.hef, do I compile the hef wrongly?

Or, can you please send me a compiled yolov5m.hef or yolov5s.hef without nms, and output 3 feature maps?

Thanks a lot!