Non-Deterministic HEF Results on Hardware Despite Perfect HAR Emulation
I’m experiencing catastrophic non-deterministic failures when running a quantized ConvNeXt model on Hailo-8 hardware, despite both FP32 and quantized HAR files producing perfect results in SDK emulation. The same input image produces completely different predictions on each run, suggesting memory corruption or uninitialized buffers in the HEF runtime.
System Information
Development Machine (Compilation):
- Hailo Dataflow Compiler: v3.31.0
- Python: 3.10
- OS: Ubuntu 24.04 LTS
Target Hardware:
- Hailo-8 AI Module (26 TOPS)
- HailoRT: v4.19.0
- Platform: Raspberry Pi 5
- Interface: PCIe
Model:
- Architecture: ConvNeXt Small with SE blocks
- Input: 224x224x3 RGB images
- Output: 67 classes
- Framework: PyTorch → ONNX (opset 11) → Hailo
The Issue
HAR Emulation Works Perfectly
Both FP32 and quantized HAR files produce deterministic, correct results in SDK emulation:
FP32 HAR (SDK_NATIVE):
Run 1: Class 1 at 99.95%
Run 2: Class 1 at 99.95%
Run 3: Class 1 at 99.95%
Quantized HAR (SDK_QUANTIZED):
Run 1: Class 1 at 99.97%
Run 2: Class 1 at 99.97%
Run 3: Class 1 at 99.97%
HEF on Hardware Completely Fails
The compiled HEF produces non-deterministic, incorrect results on Hailo-8 hardware:
Same image, multiple runs:
Run 1: Class 17 at 19.0%
Run 2: Class 0 at 3.5%
Run 3: Class 2 at 7.4%
Run 4: Class 34 at 58.4%
Run 5: Class 19 at 10.5%
Expected result: Class 1 at ~99.97% (matches ONNX and HAR emulation)
Reproducible Test Code
HAR Emulation Test (Works
)
import numpy as np
import cv2
from hailo_sdk_client import ClientRunner, InferenceContext
# Preprocess image
image = cv2.imread('test_image.jpg')
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(rgb, (224, 224))
input_data = np.expand_dims(resized.astype(np.float32), axis=0)
# Test quantized HAR
runner = ClientRunner(har='model_quantized.har')
with runner.infer_context(InferenceContext.SDK_QUANTIZED) as ctx:
output = runner.infer(ctx, input_data, batch_size=1)
# Results: Class 1 at 99.97% ✅
HEF Hardware Test (Fails
)
import numpy as np
import cv2
import hailo_platform as hpf
# Preprocess image
image = cv2.imread('test_image.jpg')
rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
resized = cv2.resize(rgb, (224, 224))
input_image = resized.astype(np.float32) # [0-255] for model script normalization
# Load HEF
hef = hpf.HEF('model_quantized.hef')
with hpf.VDevice() as target:
configure_params = hpf.ConfigureParams.create_from_hef(
hef, interface=hpf.HailoStreamInterface.PCIe
)
network_group = target.configure(hef, configure_params)[0]
network_group_params = network_group.create_params()
input_vstream_info = hef.get_input_vstream_infos()[0]
output_vstream_info = hef.get_output_vstream_infos()[0]
input_vstreams_params = hpf.InputVStreamParams.make_from_network_group(
network_group, quantized=False, format_type=hpf.FormatType.FLOAT32
)
output_vstreams_params = hpf.OutputVStreamParams.make_from_network_group(
network_group, quantized=False, format_type=hpf.FormatType.FLOAT32
)
with network_group.activate(network_group_params):
with hpf.InferVStreams(network_group, input_vstreams_params, output_vstreams_params) as infer_pipeline:
input_data = {input_vstream_info.name: np.expand_dims(input_image, axis=0)}
results = infer_pipeline.infer(input_data)
output = results[output_vstream_info.name]
# Results: Non-deterministic random classes ❌
Model Script Configuration
I’m using a model script for on-chip ImageNet normalization:
imagenet_normalization.alls:
# ImageNet normalization converted to Hailo format
# PyTorch: mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]
# Hailo: mean * 255, std * 255
input_normalization = normalization([123.675, 116.28, 103.53], [58.395, 57.12, 57.375])
Compilation command:
hailo parser onnx model.onnx --har model.har
hailo optimize --calib-set-path calibration.npy \
--model-script imagenet_normalization.alls \
--output-har-path model_quantized.har
hailo compiler model_quantized.har
Calibration data: Raw [0-255] FLOAT32 pixels (1500 samples)
What I’ve Ruled Out
API usage - Tested with proper context managers and network_group_params
VStream parameters - Tested all combinations of quantized=True/False, FLOAT32/UINT8
Model script - Build logs confirm input_normalization layer is compiled and connected
Calibration data - Verified correct [0-255] range, proper shape (1500, 224, 224, 3)
ONNX model - Produces perfect deterministic results on CPU
Quantization - HAR emulation shows quantization is correct
Build Logs
No errors or warnings during compilation. The input_normalization layer is present in the compiled HEF:
training_convnext_small_se_balanced_context_0:
input_layer1, input_normalization, conv1, ...
Layer connections:
input_layer1 → input_normalization → conv1_defuse_width_feature_reshape
Questions
-
Why does HAR emulation work perfectly but HEF hardware fails? The quantized HAR produces deterministic Class 1 at 99.97% in SDK_QUANTIZED emulation, but the compiled HEF produces random, non-deterministic results on actual hardware.
-
Is this a known issue with ConvNeXt models or LayerNorm layers? The model uses LayerNorm extensively, which might have compatibility issues.
-
Could this be a DFC v3.31.0 bug?
-
Is model script normalization actually executing during HEF inference? Despite being compiled into the HEF, diagnostic tests suggest the normalization layer may not be executing on hardware.
-
What debugging steps can I take? Is there a way to inspect HEF execution at runtime or verify that preprocessing layers are actually running?
Additional Evidence
I created comprehensive diagnostic scripts that prove:
- ONNX baseline: Deterministic Class 1 at 99.97%

- FP32 HAR emulation: Deterministic Class 1 at 99.95%

- Quantized HAR emulation: Deterministic Class 1 at 99.97%

- HEF on hardware: Non-deterministic random classes

The non-determinism (different results on identical inputs) strongly suggests memory corruption or uninitialized buffers in the HEF runtime.
Thank you for your help!