Difference output from .har(ubuntu) vs .hef(hailo8l raspberry pi)

Hye hi im using hailo architecture hailo8l. The issue right now im facing is when im using the native or FP optimized sdk to infer the. har model i manage to get the output as same as the pt model.

this is the code i use to infer the .har model

This is native

from hailo_sdk_client import ClientRunner
model_name = “9conv_hailo_model”
har_path = f"{model_name}.har"
runner = ClientRunner(har=har_path)
from hailo_sdk_client import InferenceContext
with runner.infer_context(InferenceContext.SDK_NATIVE) as ctx:
results = runner.infer(ctx, calib_dataset)

This is Optimized

from hailo_sdk_client import ClientRunner
model_name = “9conv_quantized_model”
har_path = f"{model_name}.har"
runner = ClientRunner(har=har_path)
from hailo_sdk_client import InferenceContext
with runner.infer_context(InferenceContext.SDK_FP_OPTIMIZED) as ctx:
results = runner.infer(ctx, calib_dataset)

The pre processing im using is same as for the .har and .hef which is based below code:

Google Colab

Problem

  1. im getting input buffer size different from expected its

[HailoRT] [error] CHECK failed - Input buffer size 9830400 is different than expected 1228800 for input ‘net_9conv/input_layer1’

this is because the actual buffer size calculation should be: [ \text{Buffer size} = 1 \times 640 \times 640 \times 3 \times 8 = 4,915,200 \text{ bytes} ] This matches the buffer size you’re seeing. The confusion arises because float32 data type elements are 4 bytes each, not 1 byte.

  1. if i solve this with unit 8 the buffer size will be 1228800 same as the input layer but still im not getting the same output as the .har file

As i know after optimized the only this is did was convert to .hef using this code:

hef = runner.compile()
file_name = f"{model_name}.hef"
with open(file_name, “wb”) as f:
f.write(hef)

does the model structure will be changes when i compiled it. I have try every possible method on checking the .har and .hef but this seems odd. Hope someone address my issues

Hey @SAN

It sounds like you’re experiencing a discrepancy between the outputs generated by the .har file (used on Ubuntu with native and FP optimized SDKs) and the .hef file (used on the Hailo8L with the Raspberry Pi). This issue may stem from a few key factors related to model optimization, buffer size handling, and the nature of the .hef file conversion.

1. Difference Between .har and .hef Models

The .har file is a Hailo Architecture Runtime file used for inferencing on a host machine (like your Ubuntu system), whereas the .hef file is a Hailo Executable File compiled specifically for the Hailo hardware, such as the Hailo8L accelerator on the Raspberry Pi. The key difference lies in the hardware optimization during the compilation of the .hef file.

Does the model structure change when compiling to .hef?

Yes, when you compile a .har file to a .hef using the runner.compile() method, the model is optimized specifically for the Hailo hardware. This may involve:

  • Quantization: If your model was floating-point-based during training, it may be quantized to INT8 when compiled to .hef to run efficiently on Hailo’s architecture.
  • Layer Fusion: Some layers may be fused or optimized during compilation, which could alter the structure of the model slightly.
  • Memory Optimization: The memory handling and buffer sizes may be adjusted for efficiency on the Hailo chip.

These optimizations can lead to slight variations in the model’s behavior and output when compared to the original .har file.

2. Input Buffer Size Mismatch

The error you’re encountering regarding input buffer size:

[HailoRT] [error] CHECK failed - Input buffer size 9830400 is different than expected 1228800 for input ‘net_9conv/input_layer1’

This indicates that the expected buffer size for the input layer differs between your .har and .hef setups. The root cause here might be due to the data type and the quantization of the model.

  • For .har (native/FP), you’re likely using float32 (each element is 4 bytes).
  • For .hef, the model is likely quantized to UINT8, where each element is only 1 byte. This explains why the expected input buffer size is smaller for the .hef.

Solution:

To resolve the buffer size issue, ensure that:

  • You preprocess the input for .hef to be UINT8 instead of float32. In your preprocessing step, convert the input image data to uint8 before feeding it into the .hef model.

For example:

input_data = (image * 255).astype(np.uint8)  # Assuming image is already normalized

3. Output Differences

The reason why the output differs between .har and .hef files is likely due to the quantization applied during the conversion to .hef. In quantized models, there is a known difference in output behavior compared to floating-point models due to precision loss during quantization.

  • Floating-Point Precision (FP) models retain a higher level of accuracy but may be slower to run on hardware like Hailo8L.
  • Quantized (INT8) models are faster but may result in slightly different outputs due to reduced precision.

You can try quantizing the .har model using the Hailo SDK and comparing its output to the .hef version to see if the differences persist.

4. Ensure Consistent Preprocessing

To minimize differences in output, ensure that you’re applying the same preprocessing steps for both .har and .hef models:

  • Resize and normalize the images in the same way.
  • Ensure that the input format (e.g., RGB/BGR) is consistent across both models.

Conclusion:

  1. Model Structure: The model structure can change when converting .har to .hef due to quantization and hardware optimization.
  2. Input Buffer Size: Ensure that the input buffer is adjusted for the data type (uint8 for .hef models). This should resolve the buffer size mismatch error.
  3. Output Differences: These are likely due to quantization effects. You may want to compare the quantized .har model with the .hef output to understand how quantization affects your results.

Let me know if you need further assistance, and I’ll be happy to help!

Best regards,
Omri