Translating optimizing and compiling a resnet18 model

Hi,

Im working on a prioject that involves 68 face landmarks detection based on a resnet18 model:

class Network(nn.Module):
    def __init__(self, num_classes=None):
        super().__init__()
        self.model_name = 'resnet18'
        self.model = models.resnet18()
        self.model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.model.fc = nn.Linear(self.model.fc.in_features, num_classes)

    def forward(self, x):
        x = self.model(x)
        return x

I translated it to ONNX format, tested it as an onnx model, and its fine, parsing it to a har file:

bedrock@bedrock-pc:~/Desktop/project/AlertWatch/testing-for-forum$ hailo parser onnx --hw-arch hailo8 face-landmarks-detection.onnx 
[info] Current Time: 11:30:29, 11/18/24
[info] CPU: Architecture: x86_64, Model: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics, Number Of Cores: 16, Utilization: 0.8%
[info] Memory: Total: 14GB, Available: 11GB
[info] System info: OS: Linux, Kernel: 6.8.0-48-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: 0000:04:00.0: Number Of Lanes: 4, Speed: 8.0 GT/s PCIe
[info] Running `hailo parser onnx --hw-arch hailo8 face-landmarks-detection.onnx`
[info] Translation started on ONNX model face-landmarks-detection
[info] Restored ONNX model face-landmarks-detection (completion time: 00:00:00.14)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.44)
[info] Start nodes mapped from original model: 'input': 'face-landmarks-detection/input_layer1'.
[info] End nodes mapped from original model: '/model/fc/Gemm'.
[info] Translation completed on ONNX model face-landmarks-detection (completion time: 00:00:00.55)
[info] Saved HAR to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection.har

When testing the har file unoptimized the accuracy seemed to drop a bit and I had some issues with interpreting the outputs, however accuracy seemed sufficient enough.

For the optimization, calibration set I took ~6000 images from my dataset as a numpy array, Im not proficient in the field of AI and so model scripts are kind of black magic for me so i didnt use any:

bedrock@bedrock-pc:~/Desktop/project/AlertWatch/testing-for-forum$ hailo optimize --hw-arch hailo8 --calib-set-path calib_set.npy face-landmarks-detection.har 
[info] Current Time: 11:41:04, 11/18/24
[info] CPU: Architecture: x86_64, Model: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics, Number Of Cores: 16, Utilization: 0.9%
[info] Memory: Total: 14GB, Available: 11GB
[info] System info: OS: Linux, Kernel: 6.8.0-48-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: 0000:04:00.0: Number Of Lanes: 4, Speed: 8.0 GT/s PCIe
[info] Running `hailo optimize --hw-arch hailo8 --calib-set-path calib_set.npy face-landmarks-detection.har`
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.10)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:05<00:00, 11.26entries/s]
[info] Statistics Collector is done (completion time is 00:00:05.96)
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] No shifts available for layer face-landmarks-detection/conv18/conv_op, using max shift instead. delta=0.1326
[info] No shifts available for layer face-landmarks-detection/conv18/conv_op, using max shift instead. delta=0.0663
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Quantization-Aware Fine-Tuning skipped
[info] Layer Noise Analysis skipped
[info] Model Optimization is done
[info] Saved HAR to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection_optimized.har

I dont have a GPU on my system, and I dont really need the model to be super optimized so optimization level of 0 is ok for me.

Testing the optimized model has some decrease in accuracy but i stuck with it.
Compiling:

bedrock@bedrock-pc:~/Desktop/project/AlertWatch/testing-for-forum$ hailo compiler --hw-arch hailo8 face-landmarks-detection_optimized.har 
[info] Current Time: 11:43:39, 11/18/24
[info] CPU: Architecture: x86_64, Model: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics, Number Of Cores: 16, Utilization: 0.6%
[info] Memory: Total: 14GB, Available: 11GB
[info] System info: OS: Linux, Kernel: 6.8.0-48-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: 0000:04:00.0: Number Of Lanes: 4, Speed: 8.0 GT/s PCIe
[info] Running `hailo compiler --hw-arch hailo8 face-landmarks-detection_optimized.har`
[info] Compiling network
[info] To achieve optimal performance, set the compiler_optimization_level to "max" by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%

Validating context_0 layer by layer (100%)

 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 
 +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  +  + 

● Finished                                                            

[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/0 Iteration 4: Trying parallel mapping...  
          cluster_0  cluster_1  cluster_2  cluster_3  cluster_4  cluster_5  cluster_6  cluster_7  prepost 
 worker0  V          V          V          V          X          V          V          V          V       
 worker1  V          V          V          V          V          V          V          V          V       
 worker2  V          X          V          V          V          V          V          V          V       
 worker3  V          V          V          V          V          V          V          V          V       

  00:55
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0

[info] Iterations: 4
Reverts on cluster mapping: 1
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster   | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 100%                | 75%                 | 67.2%              |
[info] | cluster_1 | 50%                 | 56.3%               | 86.7%              |
[info] | cluster_2 | 56.3%               | 70.3%               | 44.5%              |
[info] | cluster_3 | 81.3%               | 100%                | 54.7%              |
[info] | cluster_4 | 100%                | 93.8%               | 35.9%              |
[info] | cluster_5 | 31.3%               | 28.1%               | 52.3%              |
[info] | cluster_6 | 93.8%               | 100%                | 66.4%              |
[info] | cluster_7 | 75%                 | 82.8%               | 96.9%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total     | 73.4%               | 75.8%               | 63.1%              |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 1m 35s)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 0.382812 Mbps, outputs: 0.00103760 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 5s)
[info] Compilation complete
[info] Saved HEF to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection.hef
[info] Saved HAR to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection_compiled.har

Running the compiled hef, i just get generic landmarks inferred on the image, like a static face with no tracking of the actual face landmarks.

Id appreciate some help as I want to test this on hailo8 and eventually run this on our SolidRun Hailo15 Board and the projects time frames are a bit tight.

Thanks,
Lior