Does it have PCIe ATS/PRI/PASID capabilities?
It looks having 4/8GB local memory. Is it possible to do the inference without loading parameters onto the local memory? I mean parameters in the host memory.
Does it have PCIe ATS/PRI/PASID capabilities?
It looks having 4/8GB local memory. Is it possible to do the inference without loading parameters onto the local memory? I mean parameters in the host memory.
Welcome to the Hailo Community!
The Hailo-10H is accessed through the HailoRT API and executes models entirely from its local memory. The runtime loads execution contexts that define the model’s compute, memory, and control configuration, rather than streaming weights from host memory.
LLMs require high DDR bandwidth. Using host memory over PCIe would be much slower in both bandwidth and latency, leading to poor performance, and is therefore not supported.