Hi,
Im working on a prioject that involves 68 face landmarks detection based on a resnet18 model:
class Network(nn.Module):
def __init__(self, num_classes=None):
super().__init__()
self.model_name = 'resnet18'
self.model = models.resnet18()
self.model.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.model.fc = nn.Linear(self.model.fc.in_features, num_classes)
def forward(self, x):
x = self.model(x)
return x
I translated it to ONNX format, tested it as an onnx model, and its fine, parsing it to a har file:
bedrock@bedrock-pc:~/Desktop/project/AlertWatch/testing-for-forum$ hailo parser onnx --hw-arch hailo8 face-landmarks-detection.onnx
[info] Current Time: 11:30:29, 11/18/24
[info] CPU: Architecture: x86_64, Model: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics, Number Of Cores: 16, Utilization: 0.8%
[info] Memory: Total: 14GB, Available: 11GB
[info] System info: OS: Linux, Kernel: 6.8.0-48-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: 0000:04:00.0: Number Of Lanes: 4, Speed: 8.0 GT/s PCIe
[info] Running `hailo parser onnx --hw-arch hailo8 face-landmarks-detection.onnx`
[info] Translation started on ONNX model face-landmarks-detection
[info] Restored ONNX model face-landmarks-detection (completion time: 00:00:00.14)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.44)
[info] Start nodes mapped from original model: 'input': 'face-landmarks-detection/input_layer1'.
[info] End nodes mapped from original model: '/model/fc/Gemm'.
[info] Translation completed on ONNX model face-landmarks-detection (completion time: 00:00:00.55)
[info] Saved HAR to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection.har
When testing the har file unoptimized the accuracy seemed to drop a bit and I had some issues with interpreting the outputs, however accuracy seemed sufficient enough.
For the optimization, calibration set I took ~6000 images from my dataset as a numpy array, Im not proficient in the field of AI and so model scripts are kind of black magic for me so i didnt use any:
bedrock@bedrock-pc:~/Desktop/project/AlertWatch/testing-for-forum$ hailo optimize --hw-arch hailo8 --calib-set-path calib_set.npy face-landmarks-detection.har
[info] Current Time: 11:41:04, 11/18/24
[info] CPU: Architecture: x86_64, Model: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics, Number Of Cores: 16, Utilization: 0.9%
[info] Memory: Total: 14GB, Available: 11GB
[info] System info: OS: Linux, Kernel: 6.8.0-48-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: 0000:04:00.0: Number Of Lanes: 4, Speed: 8.0 GT/s PCIe
[info] Running `hailo optimize --hw-arch hailo8 --calib-set-path calib_set.npy face-landmarks-detection.har`
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:00.10)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [00:05<00:00, 11.26entries/s]
[info] Statistics Collector is done (completion time is 00:00:05.96)
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Matmul Equalization skipped
[info] No shifts available for layer face-landmarks-detection/conv18/conv_op, using max shift instead. delta=0.1326
[info] No shifts available for layer face-landmarks-detection/conv18/conv_op, using max shift instead. delta=0.0663
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Quantization-Aware Fine-Tuning skipped
[info] Layer Noise Analysis skipped
[info] Model Optimization is done
[info] Saved HAR to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection_optimized.har
I dont have a GPU on my system, and I dont really need the model to be super optimized so optimization level of 0 is ok for me.
Testing the optimized model has some decrease in accuracy but i stuck with it.
Compiling:
bedrock@bedrock-pc:~/Desktop/project/AlertWatch/testing-for-forum$ hailo compiler --hw-arch hailo8 face-landmarks-detection_optimized.har
[info] Current Time: 11:43:39, 11/18/24
[info] CPU: Architecture: x86_64, Model: AMD Ryzen 7 7840HS w/ Radeon 780M Graphics, Number Of Cores: 16, Utilization: 0.6%
[info] Memory: Total: 14GB, Available: 11GB
[info] System info: OS: Linux, Kernel: 6.8.0-48-generic
[info] Hailo DFC Version: 3.29.0
[info] HailoRT Version: 4.19.0
[info] PCIe: 0000:04:00.0: Number Of Lanes: 4, Speed: 8.0 GT/s PCIe
[info] Running `hailo compiler --hw-arch hailo8 face-landmarks-detection_optimized.har`
[info] Compiling network
[info] To achieve optimal performance, set the compiler_optimization_level to "max" by adding performance_param(compiler_optimization_level=max) to the model script. Note that this may increase compilation time.
[info] Loading network parameters
[info] Starting Hailo allocation and compilation flow
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
[info] Using Single-context flow
[info] Resources optimization guidelines: Strategy -> GREEDY Objective -> MAX_FPS
[info] Resources optimization params: max_control_utilization=75%, max_compute_utilization=75%, max_compute_16bit_utilization=75%, max_memory_utilization (weights)=75%, max_input_aligner_utilization=75%, max_apu_utilization=75%
Validating context_0 layer by layer (100%)
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + + + + + +
● Finished
[info] Solving the allocation (Mapping), time per context: 59m 59s
Context:0/0 Iteration 4: Trying parallel mapping...
cluster_0 cluster_1 cluster_2 cluster_3 cluster_4 cluster_5 cluster_6 cluster_7 prepost
worker0 V V V V X V V V V
worker1 V V V V V V V V V
worker2 V X V V V V V V V
worker3 V V V V V V V V V
00:55
Reverts on cluster mapping: 0
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] Iterations: 4
Reverts on cluster mapping: 1
Reverts on inter-cluster connectivity: 0
Reverts on pre-mapping validation: 0
Reverts on split failed: 0
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Cluster | Control Utilization | Compute Utilization | Memory Utilization |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | cluster_0 | 100% | 75% | 67.2% |
[info] | cluster_1 | 50% | 56.3% | 86.7% |
[info] | cluster_2 | 56.3% | 70.3% | 44.5% |
[info] | cluster_3 | 81.3% | 100% | 54.7% |
[info] | cluster_4 | 100% | 93.8% | 35.9% |
[info] | cluster_5 | 31.3% | 28.1% | 52.3% |
[info] | cluster_6 | 93.8% | 100% | 66.4% |
[info] | cluster_7 | 75% | 82.8% | 96.9% |
[info] +-----------+---------------------+---------------------+--------------------+
[info] | Total | 73.4% | 75.8% | 63.1% |
[info] +-----------+---------------------+---------------------+--------------------+
[info] Successful Mapping (allocation time: 1m 35s)
[info] Compiling context_0...
[info] Bandwidth of model inputs: 0.382812 Mbps, outputs: 0.00103760 Mbps (for a single frame)
[info] Bandwidth of DDR buffers: 0.0 Mbps (for a single frame)
[info] Bandwidth of inter context tensors: 0.0 Mbps (for a single frame)
[info] Building HEF...
[info] Successful Compilation (compilation time: 5s)
[info] Compilation complete
[info] Saved HEF to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection.hef
[info] Saved HAR to: /home/bedrock/Desktop/project/AlertWatch/testing-for-forum/face-landmarks-detection_compiled.har
Running the compiled hef, i just get generic landmarks inferred on the image, like a static face with no tracking of the actual face landmarks.
Id appreciate some help as I want to test this on hailo8 and eventually run this on our SolidRun Hailo15 Board and the projects time frames are a bit tight.
Thanks,
Lior