Hi,How can I set the batch size to 2? For example, I would like to just add two identical images. I would be very grateful for your answer.“inference_result = model(frame)
results = inference_result.results
box1 = np.array([det[‘bbox’] for det in results], dtype=np.float32)
score1 = np.array([det[‘score’] for det in results], dtype=np.float32)” “model = dg.load_model(
model_name=‘yolov11n_5’,
inference_host_address=‘@local’,
zoo_url=‘/home/zoo_url/yolov11n_5’
)”
Hi, @An_ti11 ,
You can use model.predict_batch()
instead of mode.predict()
to effectively pipeline a sequence of frames, see detailed description here: Running AI Model Inference | DeGirum Docs
In a few words, you provide frame iterator as a method parameter, which, in turn, also returns iterator over results, which you can use in for loop: for result in model.predict_batch(["image1.jpg", "image2.jpg"]):
Your input iterator may yield various frame types:
- strings containing image filenames
- numpy arrays with image bitmaps
- PIL image objects
If you want to process camera stream, degirum_tools
package provides convenient wrappers like degirum_tools.predict_stream(model, video_source)
, see example here: hailo_examples/examples/004_rtsp.ipynb at main · DeGirum/hailo_examples
Thank you for your response, but I have a problem: I’m using YOLOv11n with the following code:
for result in model.predict_batch(frames):
boxes = [det['bbox'] for det in result.results]
scores = [det['score'] for det in result.results]
all_boxes_list.append(boxes)
all_scores_list.append(scores)
I pass five images, and when checking FPS using:
hailortcli run yolov11n_5.hef --batch-size 5
I get:
Running streaming inference (yolov11n_5.hef):
Transform data: true
Type: auto
Quantized: true
Network best/best: 100% | 1235 | FPS: 246.65 | ETA: 00:00:00
> Inference result:
Network group: best
Frames count: 1235
FPS: 246.66
Send Rate: 2424.74 Mbit/s
Recv Rate: 1087.66 Mbit/s
That’s 246 FPS per frame, as I understand. However, in my code, when I process five images in a batch, I only get around 11–12 FPS overall, and I have no idea why this is happening.
@An_ti11 ,
To accurately measure FPS please try longer batch, say, 500-1000 frames.
Also, keep in mind that on the first frame the model is loaded into accelerator, causing extra delay.
Also, degirum_tools has a function, which measures inference performance of a model. You can try this to see accurate results:
import degirum_tools
# assuming your model object is `model`
profile = degirum_tools.model_time_profile(model, iterations=500)
print(f"observed FPS = {profile.observed_fps:.2f}, single frame inference = {profile.time_stats["CoreInferenceDuration_ms"].avg:.2f} ms")
@An_ti11
Please note that even after your try @Vlad_Klimov 's suggestions, you will not be able to match the 246FPS number you get from profiling because internally PySDK still does batch of 1.
So the problem is that I can’t use batching in PySDK, only sequential inference calls?
Hi @An_ti11
There is pipelining inside PySDK but no native batching at an inference call level.
Thank you for your response. Perhaps you know of any libraries that implement full batching?
@An_ti11 , next release of PySDK (ETA - next week) will support Hailo batching.
Hi @An_ti11, PySDK 0.17.0 has been released and supports Hailo batching. Please let us know if you have any questions!
Hi, I’m now using 0.17.0.
The input tensor is (1, 640, 640, 3) and I want to set batch size = 2.
But when I’m using model.predict_batch(input_data)
where the size of data is (2, 640, 640, 3), it says input sensor size is not matched.
Only when I convert it to (2, 1, 640, 640, 3), it works. But it’s slow.
Do I do anything wrong?
Thank you so much!
Hi @Zoey_Goh ,
Batching in PySDK works a little bit different: predict_batch
always accepts single frames, but internally, it accumulates them in a batch when batching is enabled.
By default, batching is enabled with batch size 8. To change the batch size, you assign model.eager_batch_size
property of your model object.
Please be advised that for signle-context models (smaller models which completely fit into Hailo accelerator internal memory) batching does not improve performance, so we internally force batch size to Auto for such models. Only for multi-context models batching makes sense. FPS improvements due to batching depend on the model. You may try to measure FPS vs. batch size to figure out optimal setting.
Tell me if you need help with benchmarking: degirum_tools has nice function model_time_profile
to do it.
Our model’s batch size is uncertain and depends on the number of connected cameras, which may vary. Does this mean we can only process in a sequential manner using a for-loop?
Hi @Zoey_Goh ,
You may implement it many ways, depending on your needs.
Approach A.
Multiplex frames from multiple cameras in the frame source function and feed this multiplexed stream into single model object. Then demultiplex results in the for-loop body. To simplify demultiplexing, your source function (which you pass as predict_batch()
argument) may return a tuple of a frame and arbitrary frame info. This arbitrary frame info is then accessible via result.info
property. You may put camera index into frame info.
Approach B.
Create as many model objects as you have cameras. Run each model prediction loop in a separate thread in parallel with others. Since batching is done internally, this will work nicely if you have the same model name for each model object.
Approach C.
If you have indeed too many cameras, you may even use synchronous predict method, model.predict
. Yes, it is less effective than predict_batch, but when many model objects are used in parallel, a bunch of such objects still can provide enough load for accelerator to avoid starvation. This may simplify your code where you can avoid those for-loops over predict_batch().
Tell me if you need any help implementing particular solution.
Also, when I try to call model_time_profile
method, it throws
bb = degirum_tools.model_time_profile(model, 500)
File “/home/lib/python3.10/site-packages/degirum_tools/inference_support.py”, line 475, in model_time_profile
raise NotImplementedError
NotImplementedError
Yeah… for non-image input types it is not implemented…
What is the model input type? Can you share model JSON?
Thank you so much! I think Approach B would work well for me.
Could you please help me with the example implementations of Approach B? I really appreciate it.
Also, for advanced use cases we have dgstreams framework, implemented as a part of degirum_tools package. You may take a look at the docs:
Streams | DeGirum Docs
And examples:
PySDKExamples/examples/dgstreams at main · DeGirum/PySDKExamples
Basically, it is like GStreamer, but in Python, and much simpler.
You connect multiple gizmos in an execution graph, when each gizmo runs in a separate thread.
Please take a look at this example:
PySDKExamples/examples/dgstreams/multi_camera_multi_model_detection.ipynb at main · DeGirum/PySDKExamples
Most likely it is exactly what you need.