How to get memory size anf FLOPS for a quantized model, Python API?

Hi
I use the Python API to quantize a model, and I see that I can use several optimization and compression levels, which provide different quantized models, with different performances. But it would be interesting to be able to compare the memory size required to store the parameters of the models, and the FLOPS if it changes.
Is this possible using the Python API?
Thanks for your help

Hi @fabrice.auzanneau,

You can get information about a parsed/optimized/compiled model running the Profiler. This will generate a HTML report will all the details of the model.

Thanks, but this is not in the Python API. I’m doing a parameter study on several criteria (optim level, compression level, batch size, etc) using a python script, so I need a tool that can provide values (memory size, FLOPS, etc) that I can plot in a curve afterwards, without needing to copy them from the profiler’s graphical interface.

Is this possible?

@fabrice.auzanneau yes, it is possible. You can sue the profile() API of the ClientRunner, like this:

runner.profile()

It returns a dictionary with 4 items:

Most of the parameters you need can be accessed with the [‘stats’] key.
See the DFC documentation for more details.

Thanks a lot, I’ll try that

Hi again
I have used the profile() method which is indeed very helpful. It provides lots of information, from which I can compute the GOPS and memory size for the weights.

I have a few questions:

First, here is the description of a layer (from ResNet 50 network) that I obtain:

{'type': 'conv',
 'input': ['resnet50_v1_7_zoo/batch_norm19'],
 'output': ['resnet50_v1_7_zoo/batch_norm20'],
 'input_shapes': [[-1, 28, 28, 128]],
 'output_shapes': [[-1, 28, 28, 128]],
 'original_names': ['resnetv17_stage2_conv8_fwd'],
 'compilation_params': {},
 'quantization_params': {'quantization_groups': 1,
  'precision_mode': 'a8_w8',
  'bias_mode': 'single_scale_decomposition'},
 'params': {'kernel_shape': [3, 3, 128, 128],
  'strides': [1, 1, 1, 1],
  'dilations': [1, 1, 1, 1],
  'padding': 'SAME_TENSORFLOW',
  'groups': 1,
  'layer_disparity': 1,
  'input_disparity': 1,
  'batch_norm': False,
  'elementwise_add': False,
  'activation': 'linear'},
 'weights': 147584.0,
 'macs': 115705856.0,
 'ops': 231211008.0}

I don’t see the biases number. Is it because my network’s layers don’t use biases or because they are included in the weights count?

Second, when I read a network from an onnx file, even if the network is in full precision (FP32), I obtain (before optimizing the network):
'precision_mode': 'a8_w8',
Is this normal?

Thanks for your help.

@fabrice.auzanneau

  • The weights field in the JSON corresponds to the Model Parameters in the profiler report, so it already includes the biases
  • 'precision_mode': 'a8_w8' is part of the 'quantization_params' and it is not relevant for models that are not quantized (i.e. after parsing). The parameters are still FP32 after parsing

Thank you, so I should multiply the size by 4 for the FP32 model. What about the ops numbers: do they change?

The ops number may differ from the ONNX/TF, since some modifications/optimizations are applied to the model at parsing time.

Yes, but I meant: do they change between parsed and quantized model?

Model modification commands (e.g. normalization, resize, …) and advanced optimization (4-bit compression) may have impact on the number of operations