So here’s what’s happening with your model - our parser has to unroll all 64 frames into one big static computation graph during parsing, and that’s where things get messy. Your model weights (~326M parameters) are only using a few gigs, but the real memory hog is those activation buffers. Think feature maps times 64 time steps - during ONNX shape inference and graph building, this can easily blow up to tens or hundreds of GB. That’s why you’re running out of memory even with 200GB available.
The harsh reality is that a full 64-frame ViT-Large model just won’t fit in the Hailo-8’s memory in one go. But here’s what you can try:
Parse it in chunks
Break down the parsing by targeting specific parts of your graph using our CLI flags:
This only parses from “input.1” to node “192” (you’ll need to figure out the right boundary nodes for your model), which cuts down the peak memory usage significantly.
Bottom line - parsing a 64-frame ViT-Large as one static graph is just too much for the Hailo-8. I’d start with the segmented approach or maybe rework it to process frames individually. Those tend to be the most realistic options for this hardware.
Interesting. My understanding was that V-JEPA2, this case, turns a 64-frame 256x256 image stack into 16x16x2 tubelets, so it should ultimately be (256/16)**2 * (64/2) = 8192 tokens into the transformer stack.
So, not quite “rolling out” over frames, insofar as repeated inferences of the same model, for a sequence on inputs.
For example, running the stock code on the HF model page give us an output of size torch.Size([1, 8192, 1024]).
Still pretty big. But surprised we’re getting > 200GB for the graph, here.
Maybe it has something to do with having all the transformers materialzied (24 for this model) at-once?
I’ll try the partitioning technique, though. (also pretty interesting - implies that any model could be “pipelined” into a lot of smaller models)
So: Is there public information on what the maximum memory is? And is it the case for the Hailo8 & family that the whole model (graph, weights, intermediate product) must all fit into the RAM on the module?