My input is a full HD video stream (1080p 50fps). I need to track a small object (such as the football). The object can be event 10x10 pixels.
I’m using today hailo tile cropper in order to send tiles of each frame in order to detect small object. Hailo 8 is limited with image size and I’m trying to find the balance between:
model/image size (640, 1280) - can I use 1920x1080 image to hailo)?
CPU usage (to divide the frame into tiles)
How can I support video with 30-50 fps.
tracking objects
Hailo Team → Can you share your knowledge or previous experience about the considerations and potential gaps in order to address my issue efficient enough.
My bottleneck right now is high CPU utilization (I’m using RPI5).
Hi @Erez
What is the input resolution to your model? If you are using yolov8 models, the input is typically 640x640. So, it depends at which tiling you are seeing acceptable accuracy. Without using Hailo, did you experiment with setting input resolution to 1280x1280 (or 1280x720) and evaluate accuracy?
Should I train the model with higher resolution such as 1280x1280?
Can I train my model with full HD resolution such as 1920*1080?
As far as I remember there are memory limitation with Hailo 8 so it can’t support full hd resolution. What is the highest image that I can provide to Hailo 8?
Hi, to train for 1080p you’ll need a 1080p dataset. This might be hard to find and might give worse accuracy. I suggest going the tiling path.
Video operations are indeed a bottleneck on the pi if you are using the CPU. You might be able to utilize the rpi GPU but I have no experience with that. If you do manage something like this I’m sure the community will be happy to hear about it.
Note that RPi’s picamera and rpicam_app frameworks do use the raspberry pi GPU for rescaling and format conversion.
What is your video input source? Are you using rpi camera?
@Erez
I do not know the highest image youc an provide to Hailo8 but I can confirm that we were able to run yolov8n-seg models at 1280x1280 resolution. Whether you want to train at higher resolution partly depends on your training data as well. As for training at 1920x1080 (non-square) resolution: it also depends on your training data. If your entire data is of same resolution (and at least aspect ratio), you can train at higher resolution. Your batch size during training may decrease due to increased memory requirements.
I’ve my own dataset that comprise from 1080p images,
As far as I know RPI5 doesn’t have any video encoder so the entire image processing is being made by the CPU (which is a huge disappointment!, RPI are not going to add video encoders in future SBC, at least for now).
My video input source is 1080p 50fps by a custom camera, actually I’m converting HDMI input to MIPI and then using it as input stream. The final camera is not being decided, yet.
BTW → I’m looking for a motorized camera that support 4k 60fps, optical zoom, auto focus and HDR and WDR capabilities. Sony 9500H is a great example.
@shashi
My entire dataset is 1080p resolution so there is no problem about that. I will try to train the model on higher resolution. I guess that I will ask a lot about how to customize the onnx conversion from 640*640 to 1920x1080.
Chat GPT indicates, as you wrote, that Hailo 8 should support full HD resolution. I will try to use it and I will report my results.
If you’re using the MIPI interface, I believe you can leverage the Raspberry Pi camera infrastructure, which benefits from hardware acceleration on their platform.
I’m not sure about your exact inference speed requirements, but if you’re planning to run 1080p at 50 FPS, the PCIe bandwidth required just to transfer the video is ~2.5 Gbps—which is already half of the available PCIe bandwidth on the Raspberry Pi 5. Additionally, if your model doesn’t fit within a single context, you’ll need extra PCIe bandwidth for weight switching, which could further limit performance.
It might be worth considering a more efficient approach instead of relying solely on brute force. For example, tracking the ball’s location and running inference only on specific tiles could help optimize performance. Of course, the best approach depends on your application.