The accuracy drops significantly after converting an ONNX model to a HAR model.

The ONNX model has an accuracy of 61% on the test set, while its accuracy drops to 54.5% after converting it to HAR: below is my code for converting an ONNX model to HAR, and then quantizing the HAR model to convert it to HEF:

from hailo_sdk_client import ClientRunner
import os
import cv2
import numpy as np

input_size = 512  # 模型输入的尺寸
chosen_hw_arch = “hailo8l”  # 要使用的 Hailo 硬件架构,这里是 Hailo-8
onnx_model_name = “ms_unet_512”  # 模型的名字
file_path = “/home/zengzixuan/hailo-convert/checkpoint/ms_unet_512”
onnx_path = os.path.join(file_path, “ms_unet_512.onnx”) # 模型的路径
hailo_model_har_path = os.path.join(file_path, f"{onnx_model_name}.har") # 转换后模型的保存路径
hailo_quantized_har_path = os.path.join(file_path, f"{onnx_model_name}quantized.har")  # 量化后模型的保存路径
hailo_model_hef_path = os.path.join(file_path, f"{onnx_model_name}.hef") # 编译后模型的保存路径
images_path = “/home/zengzixuan/hailo-convert/Test/images”  # 数据集图像路径
CALIB_SAMPLE_NUM = 396

# 将 onnx 模型转为 har

runner = ClientRunner(hw_arch=chosen_hw_arch)
hn, npz = runner.translate_onnx_model(
model=onnx_path,
net_name=onnx_model_name,
start_node_names=\[“modelInput”\],
# 复制日志中推荐的end_node_names
# end_node_names=\[
# ‘/encoder1/transformer/Squeeze_1’,
# ‘/encoder1/shortcut/Conv’,
# ‘/encoder1/local_feat/local_feat.5/Mul’
# \]
)
runner.save_har(hailo_model_har_path)

# 校准数据集准备(与训练时预处理完全一致)

images_list = \[img_name for img_name in os.listdir(images_path) if os.path.splitext(img_name)\[1\] in \[“.jpg”,“.jpeg”,“.png”, “bmp”\]\]\[:CALIB_SAMPLE_NUM\] # 获取图像名称列表
calib_dataset = np.zeros((len(images_list), input_size, input_size, 3), dtype=np.float32)  # 初始化 numpy 数组

# ImageNet 标准化参数(与训练时一致)

mean = np.array(\[0.485, 0.456, 0.406\], dtype=np.float32)
std = np.array(\[0.229, 0.224, 0.225\], dtype=np.float32)

for idx, img_name in enumerate(sorted(images_list)):
# 1. 读取图像(BGR格式)
img = cv2.imread(os.path.join(images_path, img_name))


# 2. BGR → RGB 转换(关键!训练时用的是 RGB)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

# 3. 调整尺寸
resized = cv2.resize(img, (input_size, input_size))

# 4. 归一化到 [0, 1]
img_normalized = resized.astype(np.float32) / 255.0

# 5. ImageNet 标准化(与训练时完全一致)
img_normalized = (img_normalized - mean) / std

calib_dataset[idx, :, :, :] = img_normalized


# 确保校准集数量>0

if len(images_list) == 0:
raise ValueError(f"校准集为空!请检查:\\n1. 图片路径:{images_path}\\n2. 路径下是否有.jpeg/.jpg/.png/.bmp格式图片")

# 量化模型(使用正确预处理的校准数据)

runner = ClientRunner(har=hailo_model_har_path)
alls_lines = \[
‘model_optimization_flavor(optimization_level=1, compression_level=0)’,  # 提升优化等级以获得更好的精度
‘resources_param(max_control_utilization=0.6, max_compute_utilization=0.6, max_memory_utilization=0.6)’,
‘performance_param(fps=1)’,
\]
runner.load_model_script(‘\\n’.join(alls_lines))
runner.optimize(calib_dataset)
runner.save_har(hailo_quantized_har_path)

# 编译为 hef

runner = ClientRunner(har=hailo_quantized_har_path)
compiled_hef = runner.compile()
with open(hailo_model_hef_path, “wb”) as f:
f.write(compiled_hef)

Hey @user366,

That accuracy drop (from 61% to 54.5%) is already happening before quantization, at the HAR / SDK_NATIVE / SDK_FP_OPTIMIZED stage. So the first thing to do is:

1. Check HAR accuracy before quantization
Run the HAR model in SDK_FP_OPTIMIZED with the same preprocessing and test set as your ONNX model. This helps catch common issues like input format or normalization mismatches early.

from hailo_sdk_client import ClientRunner, InferenceContext

runner = ClientRunner(har=hailo_model_har_path)

with runner.infer_context(InferenceContext.SDK_FP_OPTIMIZED) as ctx:
    out = runner.infer(ctx, your_preprocessed_batch_np)  # must match ONNX preprocessing

If the HAR already performs worse here, then it’s not about quantization or HEF — it’s likely preprocessing or I/O parsing.

2. Double-check your preprocessing
Make sure the preprocessing used during calibration (and inference) matches exactly what the ONNX model expects:

  • BGR → RGB
  • Resize to 512×512
  • /255 scaling
  • ImageNet mean/std in RGB

If your ONNX model already expects inputs in a different format (like normalized or NCHW layout), this can easily throw off accuracy.

A good sanity check: run one image through your full preprocessing and compare ONNX vs SDK_NATIVE outputs. They should match closely.

3. Input/output nodes
In your translate_onnx_model, you’re only specifying start_node_names=["modelInput"]. That’s fine if modelInput is the true input and Hailo correctly picks the output node. But if it picks an intermediate tensor, you’ll see a drop right at the HAR level.

Check the real I/O nodes using Netron or similar, and pass both start_node_names and end_node_names explicitly if needed.

4. Once FP accuracy matches ONNX, then check quantized
After matching HAR FP accuracy to ONNX:

  • Use a decent calibration set (~1024 images is ideal; 396 is a bit low).

  • Evaluate the quantized model in SDK_QUANTIZED and compare to SDK_FP_OPTIMIZED.

  • If the quantized model still underperforms, try:

    • Adjusting optimization_level or compression_level.
    • Using 16-bit precision for sensitive layers.
    • Enabling clipping or fine-tuning with QAT.

Next step for you:
Run your HAR in SDK_FP_OPTIMIZED with the same exact inputs and preprocessing as ONNX. If that already gives 54.5%, then focus on preprocessing or model translation. If it gives 61%, then the problem is in the quantization step.

Let us know what you find — that’ll help narrow it down.