If you try to quantize the whole model to 16-bits using the alls command:
'model_optimization_config(compression_params, auto_16bit_weights_ratio=1.0)\n',
and it fails because not all the layers of the model support 16-bits quantization, please try to quantize only the convolution layers by using:
'quantization_param({conv*}, precision_mode=a16_w16)\n',
in the alls file.
Note if you do that, the input layer might be modified to accept 16-bits data instead of 8-bits data if the first compute layer is a convolution layer.
To verify that, you can use the tool 'hailortcli parse-hef'
to see the format of the input layer, whether it is UINT8 or UIN16.
If it is UINT16, please refer to the post Can I run inference on a model that was quantized to have fully 16-bit weights? to see how to handle this situation in your inference code.