Hi as I know on hailo8l model conversion is it require model quantization from float to int. Is there any available or any future upgrade on using model conversion without model quantization to int, instead is there any solution to use the network on floating point.
Short answer no.
Floating point hardware is larger and requires more energy for each computation than integer. It also requires more memory to store each weight of the model.
With our architecture we can run inference more efficient and at a much lower cost than on a GPU or CPU at the price of converting a model into a HEF (Hailo Executable Format) file.
The future is not supporting floating point. Actually it is quite the opposite. Neural networks are getting larger and quantization algorithms are getting more advanced allowing more and more layers to be quantized to 4-bit further improving efficiency and performance.
Most of the hard work is done for you in the Hailo Dataflow Compiler and we provide tools like our profiler report and layer noise analysis that will help you optimize that step even further when necessary.
If i not mistaken i did go through this website where it say in future there will be some update on support for floating point models .
I do not know what they are referring too.
You can run a model in our emulator in floating point. This is used to validate a model after parsing.
As I wrote above, the future is even smaller quantization. You can google “quantizing LLMs” and find many pages from different sources.
You may have read that AI consumes significant energy globally. Reducing this cost is a priority for all AI companies. Quantizing models, while preserving accuracy, lowers power consumption and increases performance, even on large servers equipped with GPUs. This is particularly crucial at the edge.