Sorry for the delayed reply and thank you for your response to my question.
I’m attaching the hailort.log for your review, and I have also included a description of the output from the hef in the user app.
hailort.log
I’ve attached the hailort.log from my application runs. The content is identical for both the first and second runs, with the exception of the timestamps. I couldn’t find any significant errors or warnings in the logs that would point to abnormal behavior.
Don’t worry about the weird timestamps. That’s just a quirk of the embedded Linux we’re using.
[2023-09-19 16:58:47.524] [1187] [HailoRT] [info] [device.cpp:46] [Device] OS Version: Linux 5.15.5-rt22+g9b1463aa0ee6 #1 SMP PREEMPT_RT Mon Jun 12 12:31:27 UTC 2023 aarch64
[2023-09-19 16:58:47.539] [1187] [HailoRT] [info] [control.cpp:100] [control__parse_identify_results] firmware_version is: 4.18.0
[2023-09-19 16:58:48.193] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:204] [print_execution_results] Planned internal buffer memory: CMA memory 0, user memory 4917760. memory to edge layer usage factor is 0.7497463
[2023-09-19 16:58:48.193] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:212] [print_execution_results] Default Internal buffer planner executed successfully
[2023-09-19 16:58:48.419] [1187] [HailoRT] [info] [device_internal.cpp:57] [configure] Configuring HEF took 234.223892 milliseconds
[2023-09-19 16:58:48.432] [1187] [HailoRT] [info] [edge_elements.cpp:52] [create] Created (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800)
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [queue_elements.cpp:255] [create] Created (PushQEl0yolov7/input_layer1 | timeout: 10s)
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [filter_elements.cpp:101] [create] Created (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3))
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [vstream.cpp:754] [InputVStreamImpl] Creating yolov7/input_layer1...
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [vstream_builder.cpp:102] [create_inputs] Input pipeline 'yolov7/input_layer1': (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3)) >> (PushQEl0yolov7/input_layer1 | timeout: 10s) >> (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800) >> HW
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [multi_io_elements.cpp:159] [create] Created (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms5yolov7/conv70_121 | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms1yolov7/conv82_121 | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms0yolov7/conv92_121 | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:756] [create] Created (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [vstream.cpp:1107] [OutputVStreamImpl] Creating yolov7/yolov5_nms_postprocess...
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [vstream_builder.cpp:627] [create_output_post_process_nms] Output pipeline 'yolov7/yolov5_nms_postprocess': HW >> (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000) >> (PullQEl_nms5yolov7/conv70_121) >> (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000) >> (PullQEl_nms1yolov7/conv82_121) >> (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400) >> (PullQEl_nms0yolov7/conv92_121) >> (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640) >> (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [core_op.cpp:156] [activate] Activating yolov7 took 0.29264 milliseconds. Note that the function is asynchronous and thus the network is not fully activated yet.
[2023-09-19 16:58:48.462] [1187] [HailoRT] [info] [device.cpp:46] [Device] OS Version: Linux 5.15.5-rt22+g9b1463aa0ee6 #1 SMP PREEMPT_RT Mon Jun 12 12:31:27 UTC 2023 aarch64
[2023-09-19 16:58:48.462] [1187] [HailoRT] [info] [control.cpp:100] [control__parse_identify_results] firmware_version is: 4.18.0
[2023-09-19 16:58:48.611] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:204] [print_execution_results] Planned internal buffer memory: CMA memory 0, user memory 4917760. memory to edge layer usage factor is 0.7497463
[2023-09-19 16:58:48.611] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:212] [print_execution_results] Default Internal buffer planner executed successfully
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [device_internal.cpp:57] [configure] Configuring HEF took 164.90879 milliseconds
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:52] [create] Created (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:255] [create] Created (PushQEl0yolov7/input_layer1 | timeout: 10s)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [filter_elements.cpp:101] [create] Created (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3))
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [vstream.cpp:754] [InputVStreamImpl] Creating yolov7/input_layer1...
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [vstream_builder.cpp:102] [create_inputs] Input pipeline 'yolov7/input_layer1': (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3)) >> (PushQEl0yolov7/input_layer1 | timeout: 10s) >> (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800) >> HW
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [multi_io_elements.cpp:159] [create] Created (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms5yolov7/conv70_121 | timeout: 10s)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms1yolov7/conv82_121 | timeout: 10s)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms0yolov7/conv92_121 | timeout: 10s)
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [queue_elements.cpp:756] [create] Created (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [vstream.cpp:1107] [OutputVStreamImpl] Creating yolov7/yolov5_nms_postprocess...
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [vstream_builder.cpp:627] [create_output_post_process_nms] Output pipeline 'yolov7/yolov5_nms_postprocess': HW >> (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000) >> (PullQEl_nms5yolov7/conv70_121) >> (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000) >> (PullQEl_nms1yolov7/conv82_121) >> (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400) >> (PullQEl_nms0yolov7/conv92_121) >> (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640) >> (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [core_op.cpp:156] [activate] Activating yolov7 took 0.30256 milliseconds. Note that the function is asynchronous and thus the network is not fully activated yet.
Output from the HEF
I believe what you were asking for with “output from the HEF” is the output tensor from the yolov7.hef after it’s been loaded onto the Hailo-8.
To investigate, I saved the output arrays to a text file with 6 decimal places using fprintf(). The outputs vary on each run, so I’ve included a couple of specific cases to illustrate the issue.
NOTE: I’ve consistently passed the same image to the Hailo-8 on both the first and second runs after a reboot.
Case 1: Lower Score on the Second Run
In this case, the second run after a reboot shows a lower confidence score than the first run. The bounding box coordinates also show subtle differences.
- First Run (Frame 2): 1.000000, 0.326564, 0.641075, 0.360249, 0.655543, 0.305880, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.274725, 0.476800, 0.441451, 0.681533, 0.901959, 0.000000 …
- Second Run (Frame 2): 1.000000, 0.325678, 0.640553, 0.360743, 0.655280, 0.341173, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.273461, 0.469207, 0.443990, 0.685891, 0.792153, 0.000000 …
I used a video that shows a single boat on the water, along with a few very small people on the boat. YOLOv7 identifies ‘boat’ as the 9th class (index 8), and while it seems to detect the boat in both runs, the results are inconsistent.
The score for the boat detection dropped from 90.2% on the first run to 79.2% on the second run.
Case 2: Detection Failure on the Second Run
Here, the second run after a reboot fails to detect an object that was successfully detected on the first run.
- First Run (Frame 3): 1.000000, 0.324733, 0.641238, 0.359335, 0.655380, 0.290193, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.275083, 0.473771, 0.440701, 0.682305, 0.894115, 0.000000 …
- Second Run (Frame 3): 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000 …
Additional Details
1.
I was initially concerned about the input data, so I saved the src_data array values to a file immediately before calling hailo_vstream_write_raw_buffer(). The input has been identical on every single run.
2.
The problem occurs when the second chip recognized on the PCIe bus is involved. By “the second chip,” I mean the second device obtained from the list returned by hailo_scan_devices().
When I use the first Hailo-8 chip alone recognized on the PCIe bus (without using the second chip), the output (viewed to 6 decimal places) is consistently the same across all runs for every frame of the given video. The detection rate is 100%, meaning the expected object was detected in every frame of the video.
However, when using the second chip alone (without using the first chip), the expected object was still detected in every frame of the video, but the confidence scores and bounding box coordinates were not exactly the same across runs and varied slightly.
The problem intensifies notably when I use two Hailo-8 devices in parallel. Even on the first run after a reboot, when using two Hailo-8 devices in parallel, there are a few frames on both devices where the object isn’t detected, which wasn’t an issue with a single device (I used the same video as in the single-chip case). On the second run, this issue becomes even more severe, with a greater number of frames where the object is not detected, and many frames that are detected but showing lower confidence scores.
* In this text, the phrase “use a Hailo-8 chip” refers to completing the setup steps such as hailo_create_device(), hailo_create_hef(), hailo_create_input_vstreams(), and hailo_create_output_vstream(), and then performing processing on the Hailo-8 chip by calling hailo_vstream_write_raw_buffer() and hailo_vstream_read_raw_buffer() on those vstreams.
* In all cases described here, only the yolov7.hef was loaded and used for all Hailo-8 configurations.
I hope this information helps pinpoint the root cause. Please let me know if you need any more data from my end.