Can I run inference on a model that was quantized to have fully 16-bit weights?

Firstly yes, Hailo supports compiling a model with all weights quantized to 16-bit. But, notice that this will not work in all cases. I’ll elaborate:
For the HailoRT CLI tool (hailortcli run or hailortcli run2) - there’s no problem running a 16-bit weights HEF filw.
Python API - The Hailo Python API supports 16-bit HEFs, but the format type of the input params and output params needs to be set to format_type=FormatType.UINT16, like so:

input_vstreams_params = InputVStreamParams.make(network_group, quantized=False, format_type=FormatType.UINT16)
output_vstreams_params = OutputVStreamParams.make(network_group, quantized=False, format_type=FormatType.UINT16)

C++ API - It would work, BUT notice that if you are using actual video\images for inference, the way to load it is using OpenCV. OpenCV uses by default uint8, which means that it WOULD NOT WORK with 16-bit inputs. You’ll either get a segmentation fault when trying to load the data or the inference results would be bad because OpenCV would add “garbage” data to fill the 8 remaining bits. The simplest workaround in this case is to have the entire model in 16-bit except for the input layer(s).

In conclusion - Hailo supports full 16-bit weights models, but you need to take care when testing it’s accuracy using the HailoRT API, especially the C++ API when working with OpenCV.