How to Run Batched Inference (Batch Size 2) with YOLOX on Hailo-8?

I’m currently working on running YOLOX-tiny on the Hailo-8 accelerator using C++, referencing the hailo-ai/Hailo-Application-Code-Examples GitHub repository.
My setup uses the imx8 CPU, and the code is based on the following example:Hailo-Application-Code-Examples/runtime/cpp/object_detection
/object_detection.cpp

I’m working on a task that requires running inference on two images simultaneously.
Currently, I’m running inference twice in a loop, one image at a time. Each inference takes around 18 milliseconds, so processing two images takes 36 milliseconds, which exceeds my target of 33 milliseconds.

To improve this, I’m considering running inference with a batch size of 2, hoping that processing both images together will reduce the overall latency.

However, the object_detection.cpp example only shows how to process a single image.
Could anyone guide me on how to modify the code to process two images at once, i.e., how to perform batched inference with a batch size of 2?

Hey @pal-uchi,

We’re planning to add this feature to the example you’re following. For now, here are the steps to implement batching with two images:

  1. Update your input tensor from [1, C, H, W] to [2, C, H, W] to include the batch dimension.

  2. Load and preprocess two images - resize, normalize, and convert both to the appropriate format expected by your model (typically CHW).

  3. Create a single batch buffer by concatenating the two processed images, then copy this data into your reshaped input tensor.

  4. Submit for async inference and make sure your post-processing can handle the outputs for each image in the batch.

Let us know if you need any clarification on implementing these steps!

Thanks for the response, @omria san.

I have one question regarding STPE1.

  1. Update your input tensor from [1, C, H, W] to [2, C, H, W] to include the batch dimension.

What should I change to change the tensor size? C program at runtime or DFC settings for HEF generation?