Difficulties Retraining YOLOv5

Hello!
I am using a Hailo8l on a Raspberry PI 5.
I followed the steps mentioned here (Retrain YOLOv5 on a custom dataset), except I trained the model on Google Colab and exported it as ONNX.

I tried running the following command in the /cfg/networks folder after making changes to the yaml file:
hailomz compile --ckpt ~/Documents/hailo/modifiedyolov5s.onnx --calib-path ~/Documents/hailo/calib/ --yaml modifiedyolov5s.yaml --hw-arch hailo8l

and got the following error:
hailo_sdk_client.model_translator.exceptions.MisspellNodeError: Unable to find end node names: [‘Conv_234’, ‘Conv_218’, ‘Conv_202’], please verify and try again.

Prior to this, I was able to successfully obtain a hef file by running the following:
hailo parser onnx modifiedyolov5s.onnx --hw-arch hailo8l
hailo optimize modifiedyolov5s.har --hw-arch hailo8l --use-random-calib-set
hailo compiler modifiedyolov5s_optimized.har --hw-arch hailo8l
However when I run this HEF on the RPI5, it was not able to detect anything at all. I tried providing a calibration folder (for the hailo optimize command) but was faced with the follow:
sample_file = next(iter(glob.iglob(glob_path)))
StopIteration

Thanks in advance!

Hey @limzhiyong2002

I understand you’re facing challenges compiling and optimizing your custom YOLOv5 model for Hailo8L on Raspberry Pi 5. Let’s address the main issues:

  1. MisspellNodeError: Unable to find end node names

This error suggests a mismatch between the expected and actual end node names in your ONNX model.

Solution:

  • Use Netron (https://netron.app/) to visualize your ONNX model and identify the correct end node names.
  • Update your modifiedyolov5s.yaml file with the correct names:
start_node_names:
  - 'input'
end_node_names:
  - 'actual_output_node_name1'
  - 'actual_output_node_name2'
  1. Calibration Issue: StopIteration Error

This indicates problems accessing your calibration dataset.

Solution:

  • Verify the calibration dataset path (~/Documents/hailo/calib/) is correct and contains valid files.
  • Try using random calibration:
    hailo optimize modifiedyolov5s.har --hw-arch hailo8l --use-random-calib-set
    
  • Or create a proper calibration set in TFRecord format.
  1. No Detections in Inference

This could be due to improper calibration or missing post-processing steps.

Solution:

  • Ensure proper calibration or adjust quantization.
  • Implement post-processing, including Non-Maximum Suppression (NMS):
boxes, scores, classes = yolov5_inference(model_output)
filtered_boxes = apply_nms(boxes, scores, classes)

Let me know if you need more clarification on any of these points!

Hi @omria,

Thank you for your response!

  1. I have attached a screenshot of my ONNX file below and tried putting ‘output0’ as the end node, however it does not work. Am I wrongly identifying the end nodes?

  2. ~/Documents/hailo
    ________________|-hailo_model_zoo
    ________________|- modifiedyolov5s.onnx
    ________________|- venv
    ________________|- calib
    ____________________|-000001.jpg
    ____________________|-000002.jpg
    ____________________|-000003.jpg
    ____________________|- more jpgs
    Illustrated above is the structure of my directory, in the hailo directory, I ran the following command: hailo optimize modifiedyolov5s.har --hw-arch hailo8l --calib-set-path ./calib/ and it gave me “StopIteration Error”, I know that the pathing is correct, otherwise it would give me “ValueError: Couldn’t detect CalibrationDataType”.

Using random calibration set does work, however I am then faced with issue 3 whereby there are no detections in inference.

Thank you!

Hi @limzhiyong2002

It seems you are encountering several issues with retraining and deploying your YOLOv5 model on Hailo8L. Let’s go through the potential solutions step-by-step:


1. MisspellNodeError: Unable to Find End Node Names

This error occurs when the end node names in your ONNX model do not match what the Hailo SDK expects.

Solution:

  • Use Netron (https://netron.app) to inspect the structure of your ONNX model.
  • From the screenshot you shared, it looks like the final output node is labeled output0. Try using this node name in your YAML configuration.

Here’s the updated YAML configuration:

start_node_names:
  - 'images'
end_node_names:
  - 'output0'

If there are additional output nodes, identify them using Netron and update the YAML accordingly.


2. StopIteration Error During Calibration

This error suggests that the calibration dataset may not be accessible or recognized properly.

Solutions:

  1. Check File Permissions:

    • Ensure all images in the ./calib/ directory have the correct read permissions.
  2. Validate Image Format:

    • Make sure your calibration images are in a valid format, like JPEG or PNG. If needed, try with fewer images first to test the setup.
  3. Use Random Calibration as a Temporary Fix:

    hailo optimize modifiedyolov5s.har --hw-arch hailo8l --use-random-calib-set
    

    If this works but later leads to inference issues, it indicates that your calibration dataset might need adjustments.

  4. Create a TFRecord Calibration Dataset:

    • If using TensorFlow, create the calibration dataset in TFRecord format as recommended by Hailo.

3. No Detections in Inference

This issue may result from improper calibration or missing post-processing steps like Non-Maximum Suppression (NMS).

Solutions:

  1. Apply NMS to the Model Output:
    After inference, use NMS to filter overlapping detections:

    boxes, scores, classes = yolov5_inference(model_output)
    filtered_boxes = apply_nms(boxes, scores, classes)
    
  2. Ensure Calibration Completeness:
    If random calibration works but gives no detections, it might be a quantization issue. Try calibrating again with:

    hailo optimize modifiedyolov5s.har --hw-arch hailo8l --calib-set-path ./calib/
    

    Confirm that the YAML file correctly matches your input and output nodes.

  3. Test with a Pre-Trained Model:
    To ensure the issue isn’t with the model itself, try running a pre-trained YOLOv5 model from the Hailo Model Zoo and confirm it performs as expected.


Next Steps

  1. Update YAML: Use the correct node name (output0), or validate with Netron if other nodes are required.
  2. Fix Calibration: Verify your calibration dataset or try using fewer images to test access.
  3. Post-Processing: Ensure NMS is applied to filter detections effectively.

Let me know if these steps help resolve your issues, or if you need further assistance!

Hi @omria,
Thanks for the update.

  1. For this YAML configuration, are you referring to the yaml files located in hailo_model_zoo/hailo_model_zoo/cfg/networks? And if so, is there a particular line to include this changes?

  2. I have checked the file permissions and they do indeed have read permissions. The files are all in JPEG format, so I doubt that is the issue as well. Using random calibration does work, however it leads to inferencing issues. I have yet to try creating a TFRecord for the calibration dataset as I deem it to be quite troublesome.

  3. How can I generate the .hef file of the pre-trained YOLOv5 model from the HMZ? Do I simply run hailomz optimize and compile on it?

An additional question:
What is the difference between using hailo DFC and hailomz to parse, optimize and compile the models into hef? Do they go through the same processes?