multistream detection example support multi-device ??
we want to test multi-stream & multi-device with this examples.
we guess that this example support only hailo-8 x 1 HW configuration. right?
we also know there was multistream and multi device example in tappas. but this support only Hailo-8 x 2 configuration.
we tried this example with hailo-8 x 1, Hailo-8 x 4 HW but unknow error.
more information.
When we tested on our own PCIe card, from 1 to 4 pipelines, the performance(FPS) almost doubled every time the number of NPUs increased. However, from 5 pipelines, there was not much difference in performance between 1 NPU or 4 NPUs.
So I wonder if the example of multi-stream detection does not support multi devices.
one more question.
Based on your guide, we tested again.
you confirmed that multistream pipeline only use one hailo-8 device.
we also monitored with your guide(hailo_minitor)…
When we using H4 cards(HAilo-8 x 4), hailo_miniot shows 4 device working.
this means 4pcs device works not 1pcs under multistream pipeline.
really multistream only use 1pcs hailo-8 ?
When we monitoring,
we set device=4 & source(pipeline)=4, each hailo-8 device works 100%. and FPS very good.
but the workload decrease below 90%…and more if we increase the number of pipeline from 5 to 12…
I think it could be understandable if multistream only use 1pcs hailo-8. so No.1 questions is important to me…
I believe your answer but I want to get your double confirmation because the test result makes me confused.
Can you please confirm which example you are using? Are you using the example from Tappas or one of the examples from our GitHub Hailo Application Code Examples?
our test result for your understanding. (Intel 12th 13 CPU)
from 8Channel in sheet(actually, we think it is from 5CH), all FPS value under Hailo-8 x 1 or x2 or x4 are same…
we can assume that this example works on 1 chip.
but as i said, the FPS from 1CH to 4CH are very dramatic result per number of NPU.
If example works with only 1 chip, all FPS should be same even if we set the NPU as multi.
I hope this is helpful for your understanding about my question.
So, the multistream_detection in TAPPAS supports only one device but can be easily modified to support multiple devices by adding the device_count parameter to the hailonet element.
I guess that is what you did. Right?
I think I see why. Your results are influenced by other factors e.g. the video decoding capability and compute power of the main CPU and the PCIe. That is why the numbers with more channel degrade. Here is how you can test this further.
You can test the hef file without the app using the HailoRT CLI. Run
hailortcli run resources/yolov5m_wo_spp_60p.hef --device-count 1
hailortcli run resources/yolov5m_wo_spp_60p.hef --device-count 2
hailortcli run resources/yolov5m_wo_spp_60p.hef --device-count 4
This will give you the maximum FPS for that network on your system. e.g. 217 FPS, 434 FPS and 865 FPS. If your numbers are lower then you know that you loose some performance due to the PCIe bandwidth available.
The rest of the performance is lost due to the app (pre- and post-processing) and video decoding capability of you host CPU. With a higher end CPU e.g. i5/i7/i9 you would get increasingly higher numbers. I get very similar numbers for 1 video channel like you but >2x for 16 video channel and 4x Hailo-8.
we changed device count from 1 to 8 and source from 1 to 12.
we also compared test result with 12th Intel celeron CPU, i3-12100 CPU and 13th i7-13700 CPU on same mainboard and same hailo card, OS.
We found as below based on final test results.
multistream model works well based on multi device.
multistream model’s resolution increase per source #.
640640 => 25601920(Source9~12)
Because of this resolution change, the frame rate getting lower fastly from source 5#.
From source 5#, resolution’s height was higher. (from 640 to 1280).
This affect to FPS.
Evenif we change CPU to i7, the difference of FPS per NPU# was similar from source 8#…
(I can’t upload my test result on this community because this is our test report based on our HW setup.)
Till source 4, the FPS per NPU#(from1 to 8) getting higher per NPU#.
very good performance…
Finally, we can understand multistream benchmark configuration now.