Hef conversion ideal number of imagesPose Estimation HEF Conversion – Worse Performance with More Calibration Images

Hi everyone,
I’m currently studying the conversion of a pose estimation model into the HEF format. I have some doubts: apparently, providing a folder with 100 images for optimization yields better results than using 200 or even 1000 images.

Specifically, I analyzed the outputs of different models (all starting from the same ONNX file, with the only difference being the dataset used for optimization — initially 100 random images, then 200 supposedly more suitable ones, and finally 1000), and tested them on the same input images.

Surprisingly, the model optimized with 100 images performs better — the keypoints appear noticeably more accurate to the eye. Even worse, the model optimized with 1000 images fails to detect any keypoints at all.

At this point, I’m wondering what I might be doing wrong. How is this behavior possible? What am I missing about the optimization process?

I also noticed that in the final stage, the model optimized with the first two datasets (100 and 200 images) was split into 2 or 3 contexts, while the one optimized with 1000 images — the one that fails — was split into 6 contexts. Could this be related?

1 Like

Hey @Simone_Tortorella,

Why More Calibration Images Can Sometimes Hurt Accuracy

  1. Optimization Level Depends on Dataset Quality, Not Just Size

    • Hailo’s optimize() API enables more aggressive quantization strategies at higher optimization_level settings (e.g., levels 2 and 4). These levels expect a dataset of at least 1024 diverse, well-prepared images.
    • If you used ~1000 images but didn’t explicitly set a higher optimization_level, those extra samples may not have been fully utilized.
    • Worse, if optimization_level=2 or higher was used, but the dataset had issues (e.g., not representative of inference conditions), the optimization might miscalibrate and reduce accuracy.
  2. Overfitting During Calibration

    • Quantization depends on activation statistics gathered during calibration.
    • A large but unbalanced or redundant dataset can skew these stats, leading to overfitting — the model performs well on calibration data but poorly during actual inference.
  3. Image Format or Preprocessing Mismatch

    • If your calibration images don’t match the input preprocessing used at inference (e.g., wrong color space, inconsistent normalization), quantization becomes misaligned, hurting accuracy — especially for sensitive outputs like keypoints.

What You Can Do

  1. Preprocess and Normalize Your Dataset

    • Make sure all calibration images are uniformly preprocessed to match your inference pipeline (resize, color format, normalization, etc.).

    • Then use a larger set — e.g., your 1000 images plus 24 handpicked edge cases — and run with:

      optimize(..., optimization_level=4)
      

      This takes longer but produces significantly better results when the dataset is well-prepared.

  2. Inspect Optimization Logs in the Hailo Profiler
    Look for:

    • Unusual activation ranges or sudden spikes.
    • Quantization histograms that look skewed or clipped.
    • Problematic context boundary placements, especially near layers that affect output structure (e.g., keypoint heads).
  3. Use analyze_noise()
    This tool shows you which layers are most affected by quantization errors. It’s especially useful if you’re seeing missing or degraded outputs like vanishing keypoints after optimization.

Thank you.
Do you have any suggestions regarding the optimization process?

I have trained a CNN for keypoint detection of a single person (each image contains only one individual). Should the dataset used for optimization consist exclusively of images containing just the person, or is it acceptable to include background elements, as is typically the case in the validation dataset?

Additionally, since I always use the same camera, the input format is consistent. Should I ensure that all images used for optimization have the same resolution, such as 512x384?

Finally, I understand that in Hailo Compile it is possible to specify the input format. Should I explicitly define it? And if I do, am I required to provide all input images at exactly that resolution?

One last, unrelated question: I annotate keypoints for a single person, but in the validation dataset there are images with not only backgrounds but also multiple people. I believe this might be incorrect—training on single-person images and then validating on multi-person scenes seems inconsistent and could negatively affect evaluation.
Would you agree?

Thanks,
Simone

Hi Omnia,
I’m currently using this command to convert my model:

bash

CopiaModifica

hailomz compile --ckpt yolov8s_pose_custom.onnx --calib-path val2017/ --yaml hailo_model_zoo/hailo_model_zoo/cfg/networks/yolov8s_pose.yaml

Alternatively, I also tried the three-step flow: hailomz parse, optimize, and compile.
However, when I run hailomz optimize --help, I don’t see any way to provide custom settings for the optimization process.

Here’s the issue:
The original .pt model gives correct outputs, but the resulting .hef file performs very poorly — the keypoints are wrong.

For calibration/fine-tuning, I used the COCO2017 val2017 dataset, which contains both people and background images.

Any suggestions on how to improve the optimization? Or how to pass more precise control parameters during the process?

Thanks in advance!

Hey @Simone_Tortorella,

You’re absolutely right that hailomz optimize --help doesn’t show much for fine-tuning, but there are definitely some ways to get better optimization results and reduce that keypoint degradation you’re seeing when going from .pt to .hef.

Use a custom YAML config file

You can control the optimization level and other parameters by checking out the YAML files in our model zoo: hailo_model_zoo/hailo_model_zoo/cfg at master · hailo-ai/hailo_model_zoo · GitHub

For optimization level, you can set:

optimization:
  optimize_level: 3  # Options: 1 (basic), 2 (standard), 3 (aggressive), 4 (max)

Let me know how this works out for you!

Hi, sorry but I need some clarification.

The yolov8s_pose.yaml file from hailo_model_zoo/cfg/networks/ contains the following lines (see below).
Could you please explain what additional options are available, where they can be added, and what their functions are?
I would really appreciate a detailed explanation of how to customize or extend this YAML configuration file.

base:
- base/yolov8_pose.yaml
network:
  network_name: yolov8s_pose
paths:
  alls_script: yolov8s_pose.alls
  network_path:
  - models_files/PoseEstimation/yolov8/yolov8s/pretrained/2023-06-11/yolov8s_pose.onnx
  url: https://hailo-model-zoo.s3.eu-west-2.amazonaws.com/PoseEstimation/yolov8/yolov8s/pretrained/2023-06-11/yolov8s_pose.zip
parser:
  nodes:
  - null
  - - /model.22/cv2.2/cv2.2.2/Conv
    - /model.22/cv3.2/cv3.2.2/Conv
    - /model.22/cv4.2/cv4.2.2/Conv
    - /model.22/cv2.1/cv2.1.2/Conv
    - /model.22/cv3.1/cv3.1.2/Conv
    - /model.22/cv4.1/cv4.1.2/Conv
    - /model.22/cv2.0/cv2.0.2/Conv
    - /model.22/cv3.0/cv3.0.2/Conv
    - /model.22/cv4.0/cv4.0.2/Conv
info:
  task: pose estimation
  input_shape: 640x640x3
  output_shape: 20x20x64, 20x20x1, 20x20x51, 40x40x64, 40x40x1, 40x40x51, 80x80x64,
    80x80x1, 80x80x51
  operations: 30.2G
  parameters: 11.6M
  framework: pytorch
  training_data: coco keypoints train2017
  validation_data: coco keypoints val2017
  eval_metric: mAP
  full_precision_result: 59.2
  source: https://github.com/ultralytics/ultralytics
  license_url: https://github.com/ultralytics/ultralytics/blob/main/LICENSE
  license_name: AGPL-3.0

Thank you in advance for your help!

Hey @Simone_Tortorella,

For all the available options, I’d suggest checking out the Hailo Model Zoo and DFC documentation at https://hailo.ai/developer-zone/documentation/

You’ll want to look at:

  • Model Zoo: Pages 17-23
  • DFC: Pages 24, 27, 41, and 72

Those sections should cover everything you’re looking for!

Maybe I should try: this

"Note: If problems are encountered with VRAM allocation during stages other than Adaround, it is possible attempt to resolve the issue by disabling the memory growth flag.
To do this, set the following environment variable:

HAILO_SET_MEMORY_GROWTH=false

By doing so, the default memory allocation method for tensorflow GPU will be modified, and the entire VRAM will be allocated and managed internally."

But I found nothing about yaml and flags in both guides.

The command did not work, the RAM saturates and the optimization process crashes. I don’t know what to do.

Hey @Simone_Tortorella,

Try lowering the optimization level to 1 or 2. If it compiles successfully, go ahead and test the output.

Hi, I did it and then I got that problem: Optimization level 3 for custom YOLOv8s Pose model leads to high RAM usage and crash - General - Hailo Community

Hi, there is a way to change the predefined number of epochs and dataset images?

Hey @Simone_Tortorella,

Yep, you can definitely adjust both the training epochs and dataset size for the AdaRound algorithm through our DFC API or CLI. By default, we set epochs to 320 and dataset_size to 1024, but those are completely customizable.

Python API approach

If you’re using the Python API, just pass your values into the post_quantization_optimization call. Say you want to run 100 epochs with 256 samples:

from hailo_sdk_client import ClientRunner
# ... your usual setup ...
runner.post_quantization_optimization(
    adaround,
    policy='enabled',
    epochs=100,         # instead of default 320
    dataset_size=256    # instead of default 1024
)

CLI approach

For the command line, use the same parameters:

alls post_quantization_optimization \
    adaround \
    --policy enabled \
    --epochs 100 \
    --dataset-size 256

You can tweak other stuff too like batch_size and warmup - the DFC guide has the complete parameter list.

For the full breakdown, check out the Hailo Dataflow Compiler User Guide (v3.30.0):

  • Page 113 - covers the AdaRound algorithm with usage examples and how to override defaults
  • Page 114 - has the parameter table showing both dataset_size and epochs settings plus all the other options you can tune

Hope that helps!