hailo-8 AI accelerator chip, how much SRAM does it have ?

Sameer_Nilkhan · April 13, 2026, 6:01pm

I am reading a paper where they have separated a yolo model to run some layers on hailo-8 NPU chip and other layers on other accelerator, one of the criterion for the breaking point of the graph was the space limit as to how much can be handled by hailo-8 NPU chip without being dependent on DRAM. Hence I want to know what is the limit of model size which can be run on HAILO-8 NPU chip without the need of DRAM.

Michael · April 13, 2026, 7:03pm

Hi @Sameer_Nilkhan,

Maybe this will be helpful: Hailo-8 for attention-based object detection models

Thanks,

Sameer_Nilkhan · April 14, 2026, 8:42am

From the post you referred I understood that it all depends on the size of context but that’s what I want to know, what size of context can be loaded on by hailo-8 npu at once so that we don’t need context switch ? Is there any fixed size ?

Michael · April 14, 2026, 9:05am

Hi @Sameer_Nilkhan,

I would suggest to look at the profiler report of the compiled model and see why it needed multi context and how much more needs to be reduce to fit a single context.

Sameer_Nilkhan · April 15, 2026, 3:22am

Thanks, this did help. So what I understood is that one way to decide what part of model would fit in one context is to compile that part of model using HAILO DFC and then check the profiler report for that compiled model which will show how much context we need.
But from what I understood is that, what takes up the space as one context is model weights, input and intermediate activations. So if we know input size, weights size and we also know the dimension of feature map obtained after each layer so we know the size of activations after each layer also, so if we cumulatively add these sizes then can we find a size till which we can fit in one context ?