Welcome to the Hailo Community!
The Hailo Dataflow Compiler can divide networks that do not fit into a single Hailo-8 into multiple contexts. During execution the HailoRT runtime loads these context over PCIe. Some CNNs have a few contexts and provide still good performance. LLM require one hundred or more context which will result in not so good performance while loading the host CPU.
The Hailo-10 has a DDR interface which allows it to run the contexts without the host. There have been some improvements to optimize context switching. Also Hailo-10 can execute two 4-bit operations instead of one 8-bit OP. Because LLM are so large they will use 4-bit layers even more than CNNs to improve efficiency.
Hailo-8 does support transformer based networks. It has a very flexible architecture that allows us to add new layer support via our Hailo Dataflow Compiler. To compare models I recommend you have a look at the Model Explorer in the Developer Zone.
Have a look at our Model Zoo and the Explorer. Plenty of models to choose from including the latest YOLOs.