Hey @SAN
It sounds like you’re experiencing a discrepancy between the outputs generated by the .har
file (used on Ubuntu with native and FP optimized SDKs) and the .hef
file (used on the Hailo8L with the Raspberry Pi). This issue may stem from a few key factors related to model optimization, buffer size handling, and the nature of the .hef
file conversion.
1. Difference Between .har
and .hef
Models
The .har
file is a Hailo Architecture Runtime file used for inferencing on a host machine (like your Ubuntu system), whereas the .hef
file is a Hailo Executable File compiled specifically for the Hailo hardware, such as the Hailo8L accelerator on the Raspberry Pi. The key difference lies in the hardware optimization during the compilation of the .hef
file.
Does the model structure change when compiling to .hef
?
Yes, when you compile a .har
file to a .hef
using the runner.compile()
method, the model is optimized specifically for the Hailo hardware. This may involve:
- Quantization: If your model was floating-point-based during training, it may be quantized to INT8 when compiled to
.hef
to run efficiently on Hailo’s architecture.
- Layer Fusion: Some layers may be fused or optimized during compilation, which could alter the structure of the model slightly.
- Memory Optimization: The memory handling and buffer sizes may be adjusted for efficiency on the Hailo chip.
These optimizations can lead to slight variations in the model’s behavior and output when compared to the original .har
file.
2. Input Buffer Size Mismatch
The error you’re encountering regarding input buffer size:
[HailoRT] [error] CHECK failed - Input buffer size 9830400 is different than expected 1228800 for input ‘net_9conv/input_layer1’
This indicates that the expected buffer size for the input layer differs between your .har
and .hef
setups. The root cause here might be due to the data type and the quantization of the model.
- For
.har
(native/FP), you’re likely using float32 (each element is 4 bytes).
- For
.hef
, the model is likely quantized to UINT8, where each element is only 1 byte. This explains why the expected input buffer size is smaller for the .hef
.
Solution:
To resolve the buffer size issue, ensure that:
- You preprocess the input for
.hef
to be UINT8 instead of float32. In your preprocessing step, convert the input image data to uint8 before feeding it into the .hef
model.
For example:
input_data = (image * 255).astype(np.uint8) # Assuming image is already normalized
3. Output Differences
The reason why the output differs between .har
and .hef
files is likely due to the quantization applied during the conversion to .hef
. In quantized models, there is a known difference in output behavior compared to floating-point models due to precision loss during quantization.
- Floating-Point Precision (FP) models retain a higher level of accuracy but may be slower to run on hardware like Hailo8L.
- Quantized (INT8) models are faster but may result in slightly different outputs due to reduced precision.
You can try quantizing the .har
model using the Hailo SDK and comparing its output to the .hef
version to see if the differences persist.
4. Ensure Consistent Preprocessing
To minimize differences in output, ensure that you’re applying the same preprocessing steps for both .har
and .hef
models:
- Resize and normalize the images in the same way.
- Ensure that the input format (e.g., RGB/BGR) is consistent across both models.
Conclusion:
- Model Structure: The model structure can change when converting
.har
to .hef
due to quantization and hardware optimization.
- Input Buffer Size: Ensure that the input buffer is adjusted for the data type (uint8 for
.hef
models). This should resolve the buffer size mismatch error.
- Output Differences: These are likely due to quantization effects. You may want to compare the quantized
.har
model with the .hef
output to understand how quantization affects your results.
Let me know if you need further assistance, and I’ll be happy to help!
Best regards,
Omri