How to get memory size anf FLOPS for a quantized model, Python API?

fabrice.auzanneau · January 23, 2025, 8:22am

Hi
I use the Python API to quantize a model, and I see that I can use several optimization and compression levels, which provide different quantized models, with different performances. But it would be interesting to be able to compare the memory size required to store the parameters of the models, and the FLOPS if it changes.
Is this possible using the Python API?
Thanks for your help

pierrem · January 23, 2025, 9:25am

Hi @fabrice.auzanneau,

You can get information about a parsed/optimized/compiled model running the Profiler. This will generate a HTML report will all the details of the model.

fabrice.auzanneau · January 23, 2025, 9:39am

Thanks, but this is not in the Python API. I’m doing a parameter study on several criteria (optim level, compression level, batch size, etc) using a python script, so I need a tool that can provide values (memory size, FLOPS, etc) that I can plot in a curve afterwards, without needing to copy them from the profiler’s graphical interface.

Is this possible?

pierrem · January 23, 2025, 9:53am

@fabrice.auzanneau yes, it is possible. You can sue the profile() API of the ClientRunner, like this:

runner.profile()

It returns a dictionary with 4 items:

Most of the parameters you need can be accessed with the [‘stats’] key.
See the DFC documentation for more details.

fabrice.auzanneau · January 23, 2025, 10:46am

Thanks a lot, I’ll try that

fabrice.auzanneau · January 24, 2025, 10:16am

Hi again
I have used the profile() method which is indeed very helpful. It provides lots of information, from which I can compute the GOPS and memory size for the weights.

I have a few questions:

First, here is the description of a layer (from ResNet 50 network) that I obtain:

{'type': 'conv',
 'input': ['resnet50_v1_7_zoo/batch_norm19'],
 'output': ['resnet50_v1_7_zoo/batch_norm20'],
 'input_shapes': [[-1, 28, 28, 128]],
 'output_shapes': [[-1, 28, 28, 128]],
 'original_names': ['resnetv17_stage2_conv8_fwd'],
 'compilation_params': {},
 'quantization_params': {'quantization_groups': 1,
  'precision_mode': 'a8_w8',
  'bias_mode': 'single_scale_decomposition'},
 'params': {'kernel_shape': [3, 3, 128, 128],
  'strides': [1, 1, 1, 1],
  'dilations': [1, 1, 1, 1],
  'padding': 'SAME_TENSORFLOW',
  'groups': 1,
  'layer_disparity': 1,
  'input_disparity': 1,
  'batch_norm': False,
  'elementwise_add': False,
  'activation': 'linear'},
 'weights': 147584.0,
 'macs': 115705856.0,
 'ops': 231211008.0}

I don’t see the biases number. Is it because my network’s layers don’t use biases or because they are included in the weights count?

Second, when I read a network from an onnx file, even if the network is in full precision (FP32), I obtain (before optimizing the network):
'precision_mode': 'a8_w8',
Is this normal?

Thanks for your help.

pierrem · January 24, 2025, 3:22pm

@fabrice.auzanneau

The weights field in the JSON corresponds to the Model Parameters in the profiler report, so it already includes the biases
'precision_mode': 'a8_w8' is part of the 'quantization_params' and it is not relevant for models that are not quantized (i.e. after parsing). The parameters are still FP32 after parsing

fabrice.auzanneau · January 24, 2025, 3:38pm

Thank you, so I should multiply the size by 4 for the FP32 model. What about the ops numbers: do they change?

pierrem · January 24, 2025, 3:48pm

The ops number may differ from the ONNX/TF, since some modifications/optimizations are applied to the model at parsing time.

fabrice.auzanneau · January 24, 2025, 3:54pm

Yes, but I meant: do they change between parsed and quantized model?

pierrem · January 24, 2025, 4:11pm

Model modification commands (e.g. normalization, resize, …) and advanced optimization (4-bit compression) may have impact on the number of operations

Topic		Replies	Views
Model Script Documentation General dfc , optimization	0	76	February 13, 2025
reconciling different outputs between quantized HAR and compiled HEF General	12	125	June 4, 2025
How to get dense output? General dfc , hailo8	4	47	June 9, 2025
Problem with noise analysis of Python API General	10	156	April 1, 2025
Accessing Quantization Offsets - Python SDK (4.19+) General	1	47	February 3, 2025

How to get memory size anf FLOPS for a quantized model, Python API?

Related topics