Inference degradation after first execution using Hailo-8 (HailoRT 4.18.0)

Hi,

I’ve encountered an issue where the system does not work properly starting from the second execution.

Currently, I’m using HailoRT 4.18.0, and I’ve developed a custom user application.

The application loads the yolov7.hef onto two Hailo-8 NPUs. It then establishes input and output vstreams (input_vstream and output_vstream) for each Hailo-8, and processes two videos frame by frame through each vstream concurrently.

For each frame, the application performs inference and highlights the object with the highest score by drawing a bounding box on the original image. The process continues for a predefined number of frames (e.g., 300 frames).

On the first run after booting, the bounding boxes and scores appear reasonable, with objects detected in almost every frame, and the confidence scores are around 80-90%. However, after closing the application and restarting it via shell, the inference results significantly degrade. In some cases, no objects are detected, and when detected, the scores are extremely low.

Could you help me understand why this issue occurs on the second execution? Any assistance would be greatly appreciated.

Hey @candy24910,

Could you help us out by providing some logs? There should be a hailort log available - if you could run it on both the first and second runs and grab the logs from each, that would be great.

Also, would you mind printing the output from the hef in the user app for both runs and sharing it here? Having all that info will make it much easier for us to debug this issue together!

Sorry for the delayed reply and thank you for your response to my question.

I’m attaching the hailort.log for your review, and I have also included a description of the output from the hef in the user app.

hailort.log


I’ve attached the hailort.log from my application runs. The content is identical for both the first and second runs, with the exception of the timestamps. I couldn’t find any significant errors or warnings in the logs that would point to abnormal behavior.

Don’t worry about the weird timestamps. That’s just a quirk of the embedded Linux we’re using.

[2023-09-19 16:58:47.524] [1187] [HailoRT] [info] [device.cpp:46] [Device] OS Version: Linux 5.15.5-rt22+g9b1463aa0ee6 #1 SMP PREEMPT_RT Mon Jun 12 12:31:27 UTC 2023 aarch64
[2023-09-19 16:58:47.539] [1187] [HailoRT] [info] [control.cpp:100] [control__parse_identify_results] firmware_version is: 4.18.0
[2023-09-19 16:58:48.193] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:204] [print_execution_results] Planned internal buffer memory: CMA memory 0, user memory 4917760. memory to edge layer usage factor is 0.7497463
[2023-09-19 16:58:48.193] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:212] [print_execution_results] Default Internal buffer planner executed successfully
[2023-09-19 16:58:48.419] [1187] [HailoRT] [info] [device_internal.cpp:57] [configure] Configuring HEF took 234.223892 milliseconds
[2023-09-19 16:58:48.432] [1187] [HailoRT] [info] [edge_elements.cpp:52] [create] Created (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800)
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [queue_elements.cpp:255] [create] Created (PushQEl0yolov7/input_layer1 | timeout: 10s)
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [filter_elements.cpp:101] [create] Created (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3))
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [vstream.cpp:754] [InputVStreamImpl] Creating yolov7/input_layer1...
[2023-09-19 16:58:48.437] [1187] [HailoRT] [info] [vstream_builder.cpp:102] [create_inputs] Input pipeline 'yolov7/input_layer1': (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3)) >> (PushQEl0yolov7/input_layer1 | timeout: 10s) >> (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800) >> HW
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [multi_io_elements.cpp:159] [create] Created (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms5yolov7/conv70_121 | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms1yolov7/conv82_121 | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms0yolov7/conv92_121 | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [queue_elements.cpp:756] [create] Created (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [vstream.cpp:1107] [OutputVStreamImpl] Creating yolov7/yolov5_nms_postprocess...
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [vstream_builder.cpp:627] [create_output_post_process_nms] Output pipeline 'yolov7/yolov5_nms_postprocess': HW >> (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000) >> (PullQEl_nms5yolov7/conv70_121) >> (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000) >> (PullQEl_nms1yolov7/conv82_121) >> (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400) >> (PullQEl_nms0yolov7/conv92_121) >> (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640) >> (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.439] [1187] [HailoRT] [info] [core_op.cpp:156] [activate] Activating yolov7 took 0.29264 milliseconds. Note that the function is asynchronous and thus the network is not fully activated yet.
[2023-09-19 16:58:48.462] [1187] [HailoRT] [info] [device.cpp:46] [Device] OS Version: Linux 5.15.5-rt22+g9b1463aa0ee6 #1 SMP PREEMPT_RT Mon Jun 12 12:31:27 UTC 2023 aarch64
[2023-09-19 16:58:48.462] [1187] [HailoRT] [info] [control.cpp:100] [control__parse_identify_results] firmware_version is: 4.18.0
[2023-09-19 16:58:48.611] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:204] [print_execution_results] Planned internal buffer memory: CMA memory 0, user memory 4917760. memory to edge layer usage factor is 0.7497463
[2023-09-19 16:58:48.611] [1187] [HailoRT] [info] [internal_buffer_manager.cpp:212] [print_execution_results] Default Internal buffer planner executed successfully
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [device_internal.cpp:57] [configure] Configuring HEF took 164.90879 milliseconds
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:52] [create] Created (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:255] [create] Created (PushQEl0yolov7/input_layer1 | timeout: 10s)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [filter_elements.cpp:101] [create] Created (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3))
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [vstream.cpp:754] [InputVStreamImpl] Creating yolov7/input_layer1...
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [vstream_builder.cpp:102] [create_inputs] Input pipeline 'yolov7/input_layer1': (PreInferEl0yolov7/input_layer1 | Reorder - src_order: NHWC, src_shape: (640, 640, 3), dst_order: NHCW, dst_shape: (640, 640, 3)) >> (PushQEl0yolov7/input_layer1 | timeout: 10s) >> (HwWriteEl0yolov7/input_layer1 | hw_frame_size: 1228800) >> HW
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [multi_io_elements.cpp:159] [create] Created (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms5yolov7/conv70_121 | timeout: 10s)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms1yolov7/conv82_121 | timeout: 10s)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [edge_elements.cpp:287] [create] Created (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400)
[2023-09-19 16:58:48.768] [1187] [HailoRT] [info] [queue_elements.cpp:613] [create] Created (PullQEl_nms0yolov7/conv92_121 | timeout: 10s)
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [queue_elements.cpp:756] [create] Created (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [vstream.cpp:1107] [OutputVStreamImpl] Creating yolov7/yolov5_nms_postprocess...
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [vstream_builder.cpp:627] [create_output_post_process_nms] Output pipeline 'yolov7/yolov5_nms_postprocess': HW >> (HwReadEl5yolov7/conv70_121 | hw_frame_size: 1632000) >> (PullQEl_nms5yolov7/conv70_121) >> (HwReadEl1yolov7/conv82_121 | hw_frame_size: 408000) >> (PullQEl_nms1yolov7/conv82_121) >> (HwReadEl0yolov7/conv92_121 | hw_frame_size: 122400) >> (PullQEl_nms0yolov7/conv92_121) >> (NmsPPMuxEl0YOLOv5-Post-Process | Op YOLOV5, Name: YOLOv5-Post-Process, Score threshold: 0.200, IoU threshold: 0.60, Classes: 80, Cross classes: false, Max bboxes per class: 80, Image height: 640, Image width: 640) >> (UserBufQEl_post_infer0yolov7/yolov5_nms_postprocess | timeout: 10s)
[2023-09-19 16:58:48.769] [1187] [HailoRT] [info] [core_op.cpp:156] [activate] Activating yolov7 took 0.30256 milliseconds. Note that the function is asynchronous and thus the network is not fully activated yet.


Output from the HEF


I believe what you were asking for with “output from the HEF” is the output tensor from the yolov7.hef after it’s been loaded onto the Hailo-8.

To investigate, I saved the output arrays to a text file with 6 decimal places using fprintf(). The outputs vary on each run, so I’ve included a couple of specific cases to illustrate the issue.

NOTE: I’ve consistently passed the same image to the Hailo-8 on both the first and second runs after a reboot.

Case 1: Lower Score on the Second Run

In this case, the second run after a reboot shows a lower confidence score than the first run. The bounding box coordinates also show subtle differences.

  • First Run (Frame 2): 1.000000, 0.326564, 0.641075, 0.360249, 0.655543, 0.305880, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.274725, 0.476800, 0.441451, 0.681533, 0.901959, 0.000000 …
  • Second Run (Frame 2): 1.000000, 0.325678, 0.640553, 0.360743, 0.655280, 0.341173, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.273461, 0.469207, 0.443990, 0.685891, 0.792153, 0.000000 …

I used a video that shows a single boat on the water, along with a few very small people on the boat. YOLOv7 identifies ‘boat’ as the 9th class (index 8), and while it seems to detect the boat in both runs, the results are inconsistent.

The score for the boat detection dropped from 90.2% on the first run to 79.2% on the second run.

Case 2: Detection Failure on the Second Run

Here, the second run after a reboot fails to detect an object that was successfully detected on the first run.

  • First Run (Frame 3): 1.000000, 0.324733, 0.641238, 0.359335, 0.655380, 0.290193, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 1.000000, 0.275083, 0.473771, 0.440701, 0.682305, 0.894115, 0.000000 …
  • Second Run (Frame 3): 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000, 0.000000 …

Additional Details



1.

I was initially concerned about the input data, so I saved the src_data array values to a file immediately before calling hailo_vstream_write_raw_buffer(). The input has been identical on every single run.


2.

The problem occurs when the second chip recognized on the PCIe bus is involved. By “the second chip,” I mean the second device obtained from the list returned by hailo_scan_devices().

When I use the first Hailo-8 chip alone recognized on the PCIe bus (without using the second chip), the output (viewed to 6 decimal places) is consistently the same across all runs for every frame of the given video. The detection rate is 100%, meaning the expected object was detected in every frame of the video.

However, when using the second chip alone (without using the first chip), the expected object was still detected in every frame of the video, but the confidence scores and bounding box coordinates were not exactly the same across runs and varied slightly.

The problem intensifies notably when I use two Hailo-8 devices in parallel. Even on the first run after a reboot, when using two Hailo-8 devices in parallel, there are a few frames on both devices where the object isn’t detected, which wasn’t an issue with a single device (I used the same video as in the single-chip case). On the second run, this issue becomes even more severe, with a greater number of frames where the object is not detected, and many frames that are detected but showing lower confidence scores.


* In this text, the phrase “use a Hailo-8 chip” refers to completing the setup steps such as hailo_create_device(), hailo_create_hef(), hailo_create_input_vstreams(), and hailo_create_output_vstream(), and then performing processing on the Hailo-8 chip by calling hailo_vstream_write_raw_buffer() and hailo_vstream_read_raw_buffer() on those vstreams.

* In all cases described here, only the yolov7.hef was loaded and used for all Hailo-8 configurations.



I hope this information helps pinpoint the root cause. Please let me know if you need any more data from my end.