@omria
- Here is the .alls file i used for yolo10m (other yolo models use default file provided)
normalization1 = normalization([0.0, 0.0, 0.0], [255.0, 255.0, 255.0])
quantization_param([conv64, conv77, conv89], force_range_out=[0.0, 1.0])
model_optimization_flavor(optimization_level=2)
nms_postprocess("/home/lauretta/quang/rain_detector/model_script/yolov10m_nms_config.json", meta_arch=yolov8, engine=cpu)
- the nms config i am using for the model
{
"nms_scores_th": 0.001,
"nms_iou_th": 0.45,
"image_dims": [
640,
640
],
"max_proposals_per_class": 100,
"classes": 2,
"regression_length": 16,
"background_removal": false,
"background_removal_index": 0,
"bbox_decoders": [
{
"name": "bbox_decoder45",
"stride": 8,
"reg_layer": "conv61",
"cls_layer": "conv64"
},
{
"name": "bbox_decoder56",
"stride": 16,
"reg_layer": "conv74",
"cls_layer": "conv77"
},
{
"name": "bbox_decoder66",
"stride": 32,
"reg_layer": "conv86",
"cls_layer": "conv89"
}
]
}
- I provided a calibration path which is a directory containing .jpg images. The inference pipeline is resizing the input images to model input size (640x640) and use the np.uint8 type.
- The inference pipeline is similar to this script
if __name__ == "__main__":
used_size = (640, 640)
cap1 = cv2.VideoCapture("sprinkler.mp4")
detector = HailoDetector("/home/lauretta/quang/yolov10m.hef")
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
used_size = (int(cap1.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap1.get(cv2.CAP_PROP_FRAME_HEIGHT)))
writer = cv2.VideoWriter("yolov10_sprinkler.mp4", fourcc, 20, used_size)
caps = [cap1]
start_time = time()
current_time = start_time
frame_proc = 0
n = len(caps)
while True:
ret1, frame1 = cap1.read()
if (not ret1):
break
frame_proc += n
# print(f"Current FPS: {n / (time() - current_time)}")
current_time = time()
# print(f"Average FPS: {frame_proc / (current_time - start_time)}")
frames = [frame1]
detections = detector(frames, 0.0)
for i, cam_dets in enumerate(detections):
vis_frame = frames[i]
frame_shape = vis_frame.shape
for det in cam_dets:
y1, x1, y2, x2, conf, cls = det
if conf > 0.01:
print("something")
x1 = x1 * frame_shape[1]
y1 = y1 * frame_shape[0]
x2 = x2 * frame_shape[1]
y2 = y2 * frame_shape[0]
label = f"{COCO_CLASSES[int(cls)]}: {conf:.2f}"
cv2.rectangle(vis_frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
cv2.putText(vis_frame, label, (int(x1), int(y1) - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# cv2.imshow(f"Camera {i}", vis_frame)
writer.write(vis_frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap1.release()
writer.release()
cv2.destroyAllWindows()
The HailoDetector will handler resizing the images to the expected input size as well as batching the images.
The snippet of optimize log (1st and last step) is as follow
[info] No GPU chosen, Selected GPU 0
e[36m<Hailo Model Zoo INFO> Start run for network yolov10m ...e[0m
e[36m<Hailo Model Zoo INFO> Initializing the hailo8 runner...e[0m
[info] Translation started on ONNX model yolov10m
[info] Restored ONNX model yolov10m (completion time: 00:00:00.15)
[info] Extracted ONNXRuntime meta-data for Hailo model (completion time: 00:00:00.94)
[info] NMS structure of yolov8 (or equivalent architecture) was detected.
[info] In order to use HailoRT post-processing capabilities, these end node names should be used: /model.23/one2one_cv3.0/one2one_cv3.0.2/Conv /model.23/one2one_cv2.0/one2one_cv2.0.2/Conv /model.23/one2one_cv2.1/one2one_cv2.1.2/Conv /model.23/one2one_cv3.1/one2one_cv3.1.2/Conv /model.23/one2one_cv2.2/one2one_cv2.2.2/Conv /model.23/one2one_cv3.2/one2one_cv3.2.2/Conv.
[info] Start nodes mapped from original model: 'images': 'yolov10m/input_layer1'.
[info] End nodes mapped from original model: '/model.23/one2one_cv2.0/one2one_cv2.0.2/Conv', '/model.23/one2one_cv3.0/one2one_cv3.0.2/Conv', '/model.23/one2one_cv2.1/one2one_cv2.1.2/Conv', '/model.23/one2one_cv3.1/one2one_cv3.1.2/Conv', '/model.23/one2one_cv2.2/one2one_cv2.2.2/Conv', '/model.23/one2one_cv3.2/one2one_cv3.2.2/Conv'.
[info] Translation completed on ONNX model yolov10m (completion time: 00:00:02.31)
[info] Appending model script commands to yolov10m from string
[info] Added nms postprocess command to model script.
[info] Saved HAR to: /home/lauretta/quang/rain_detector/yolov10m.har
e[36m<Hailo Model Zoo INFO> Preparing calibration data...e[0m
[info] Loading model script commands to yolov10m from /home/lauretta/quang/rain_detector/model_script/yolov10m.alls
[info] Loading model script commands to yolov10m from string
[info] The activation function of layer yolov10m/conv64 was replaced by a Sigmoid
[info] The activation function of layer yolov10m/conv77 was replaced by a Sigmoid
[info] The activation function of layer yolov10m/conv89 was replaced by a Sigmoid
[info] Found model with 3 input channels, using real RGB images for calibration instead of sampling random data.
[info] Starting Model Optimization
[info] Model received quantization params from the hn
[info] MatmulDecompose skipped
[info] Starting Mixed Precision
[info] Model Optimization Algorithm Mixed Precision is done (completion time is 00:00:00.42)
[info] LayerNorm Decomposition skipped
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
[info] Model Optimization Algorithm Statistics Collector is done (completion time is 00:00:22.07)
[info] Starting Fix zp_comp Encoding
[info] Model Optimization Algorithm Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Starting Matmul Equalization
[info] Model Optimization Algorithm Matmul Equalization is done (completion time is 00:00:00.02)
[info] Starting MatmulDecomposeFix
[info] Model Optimization Algorithm MatmulDecomposeFix is done (completion time is 00:00:00.00)
[info] activation fitting started for yolov10m/reduce_sum_softmax1/act_op
[info] No shifts available for layer yolov10m/conv44/conv_op, using max shift instead. delta=2.1318
[info] No shifts available for layer yolov10m/conv44/conv_op, using max shift instead. delta=1.0659
[info] No shifts available for layer yolov10m/conv84/conv_op, using max shift instead. delta=0.6651
[info] No shifts available for layer yolov10m/conv84/conv_op, using max shift instead. delta=0.3325
[info] No shifts available for layer yolov10m/conv89/conv_op, using max shift instead. delta=0.1925
[info] No shifts available for layer yolov10m/conv89/conv_op, using max shift instead. delta=0.0963
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Starting Quantization-Aware Fine-Tuning
[warning] Dataset is larger than expected size. Increasing the algorithm dataset size might improve the results
[info] Using dataset with 1024 entries for finetune
Epoch 1/4
e[1m 1/128e[0m e[37m━━━━━━━━━━━━━━━━━━━━e[0m e[1m3:26:01e[0m 97s/step - _distill_loss_yolov10m/conv57: 0.3000 - _distill_loss_yolov10m/conv61: 0.1377 - _distill_loss_yolov10m/conv64: 0.3819 - _distill_loss_yolov10m/conv70: 0.2914 - _distill_loss_yolov10m/conv74: 0.2233 - _distill_loss_yolov10m/conv77: 0.0323 - _distill_loss_yolov10m/conv83: 0.3711 - _distill_loss_yolov10m/conv86: 0.2080 - _distill_loss_yolov10m/conv89: 1.0000 - total_distill_loss: 2.9458
e[1m128/128e[0m e[32m━━━━━━━━━━━━━━━━━━━━e[0me[37me[0m e[1m59se[0m 459ms/step - _distill_loss_yolov10m/conv57: 0.2392 - _distill_loss_yolov10m/conv61: 0.1138 - _distill_loss_yolov10m/conv64: 0.2686 - _distill_loss_yolov10m/conv70: 0.2289 - _distill_loss_yolov10m/conv74: 0.1382 - _distill_loss_yolov10m/conv77: 0.4470 - _distill_loss_yolov10m/conv83: 0.2096 - _distill_loss_yolov10m/conv86: 0.1196 - _distill_loss_yolov10m/conv89: 1.0000 - total_distill_loss: 2.7651
[info] Model Optimization Algorithm Quantization-Aware Fine-Tuning is done (completion time is 00:05:36.00)
[info] Starting Layer Noise Analysis
[info] Model Optimization Algorithm Layer Noise Analysis is done (completion time is 00:01:14.04)
[info] Model Optimization is done
[info] Saved HAR to: /home/lauretta/quang/rain_detector/yolov10m.har