.hef gives seemingly incorrect results. Quantized .har is fine

phillip · November 25, 2024, 10:16pm

HI -

I’m trying to deploy this: Falconsai/nsfw_image_detection · Hugging Face

On the Hailo 8.

This verification code works fine on the quantized model .har.

import numpy as np
from PIL import Image
from hailo_sdk_client import ClientRunner
from hailo_sdk_client.exposed_definitions import InferenceContext
import os
import argparse
import time
import psutil
import gc

def log_memory_usage(stage):
    process = psutil.Process(os.getpid())
    print(f"\nMemory usage at {stage}: {process.memory_info().rss / 1024 / 1024:.2f} MB")

def simple_verify_har(img_path, har_path):
    """Simple HAR model verification with performance monitoring"""
    start_total = time.time()
    
    # Track initialization time
    print("Initializing ClientRunner...")
    log_memory_usage("before initialization")
    t0 = time.time()
    runner = ClientRunner(hef=har=har_path)
    print(f"Initialization took: {time.time() - t0:.2f} seconds")
    log_memory_usage("after initialization")
    
    # Track image loading and preprocessing time
    print("\nLoading and preprocessing image...")
    t0 = time.time()
    img = Image.open(img_path).convert('RGB')
    img = img.resize((224, 224))
    img_array = np.array(img).astype(np.float32) / 255.0
    img_array = np.expand_dims(img_array, 0)
    print(f"Image preprocessing took: {time.time() - t0:.2f} seconds")
    log_memory_usage("after preprocessing")
    
    # Get input layer info
    print("\nGetting model information...")
    t0 = time.time()
    input_layer = [layer for layer in runner.get_hn_dict()["layers"] 
                  if runner.get_hn_dict()["layers"][layer]["type"] == "input_layer"][0]
    print(f"Getting model info took: {time.time() - t0:.2f} seconds")
    
    # Track inference time
    print("\nRunning inference...")
    t0 = time.time()
    with runner.infer_context(InferenceContext.SDK_NATIVE) as ctx:
        print("Context created, starting inference...")
        t1 = time.time()
        outputs = runner.infer(ctx, {input_layer: img_array}, batch_size=1)
        print(f"Pure inference took: {time.time() - t1:.2f} seconds")
    print(f"Total inference context took: {time.time() - t0:.2f} seconds")
    log_memory_usage("after inference")
    
    # Process results
    probabilities = softmax(outputs[0])
    print("\nResults:")
    print(f"Safe probability: {probabilities[0]:.4f}")
    print(f"NSFW probability: {probabilities[1]:.4f}")
    
    print(f"\nTotal execution time: {time.time() - start_total:.2f} seconds")
    
def softmax(x):
    exp_x = np.exp(x - np.max(x))
    return exp_x / exp_x.sum()

def main():
    parser = argparse.ArgumentParser(description='Run inference on HAR model with diagnostics')
    parser.add_argument('--image', required=True, help='Path to input image')
    parser.add_argument('--har', required=True, help='Path to HAR file')
    
    args = parser.parse_args()
    
    # Clear any existing memory
    gc.collect()
    
    simple_verify_har(args.image, args.har)

if __name__ == "__main__":
    main()

Then I run hailo compiler nsfw_model_quantized.har

This works, no errors…

Then I run the verification script on the saved .har file AFTER the compilation is complete.

I know the script is loading the compiled model because the debug logs show the size as being much smaller.

The results are correct stlll…

Then I run inference on the .hef using the HailoRT library, and I get basically 50/50 results on everything. However, I’m not sure how to debug the issue? Can anyone help?

detector.cpp

#include "hailo/hailort.hpp"
#include <opencv2/opencv.hpp>
#include <iostream>
#include <vector>
#include <chrono>
#include <iomanip>

using namespace hailort;

class NSFWDetector {
private:
    std::unique_ptr<VDevice> vdevice;
    std::shared_ptr<ConfiguredNetworkGroup> network_group;
    std::unique_ptr<InferVStreams> pipeline;
    std::map<std::string, std::vector<uint8_t>> input_data;
    std::map<std::string, std::vector<uint8_t>> output_data;
    std::map<std::string, MemoryView> input_views;
    std::map<std::string, MemoryView> output_views;
    const std::string HEF_FILE = "nsfw_model.hef";
    constexpr static hailo_format_type_t FORMAT_TYPE = HAILO_FORMAT_TYPE_AUTO;

    Expected<std::shared_ptr<ConfiguredNetworkGroup>> configure_network_group(VDevice &vdevice) {
        auto hef = Hef::create(HEF_FILE);
        if (!hef) return make_unexpected(hef.status());
        
        auto configure_params = vdevice.create_configure_params(hef.value());
        if (!configure_params) return make_unexpected(configure_params.status());
        
        auto network_groups = vdevice.configure(hef.value(), configure_params.value());
        if (!network_groups) return make_unexpected(network_groups.status());
        if (1 != network_groups->size()) return make_unexpected(HAILO_INTERNAL_FAILURE);
        
        return std::move(network_groups->at(0));
    }

    void print_vstream_info() {
        std::cout << "\n=== Input VStreams (" << pipeline->get_input_vstreams().size() << ") ===" << std::endl;
        for (const auto &input_vstream : pipeline->get_input_vstreams()) {
            auto info = input_vstream.get().get_info();
            std::cout << "\nVStream Name: " << input_vstream.get().name() << std::endl;
            std::cout << "Format:" << std::endl;
            std::cout << "  Type: " << info.format.type << std::endl;
            std::cout << "  Order: " << info.format.order << std::endl;
            std::cout << "Shape:" << std::endl;
            std::cout << "  Height: " << info.shape.height << std::endl;
            std::cout << "  Width: " << info.shape.width << std::endl;
            std::cout << "  Features: " << info.shape.features << std::endl;
            std::cout << "Frame Size: " << input_vstream.get().get_frame_size() << " bytes" << std::endl;
        }
    }

    std::pair<float, float> calculate_probabilities(const uint8_t* scores) {
        // Convert uint8 scores to probabilities using softmax
        float score0 = static_cast<float>(scores[0]);
        float score1 = static_cast<float>(scores[1]);
        
        // Apply softmax
        float max_score = std::max(score0, score1);
        float exp0 = std::exp(score0 - max_score);
        float exp1 = std::exp(score1 - max_score);
        float sum = exp0 + exp1;
        
        return std::make_pair(exp0/sum, exp1/sum);
    }

public:
    hailo_status init() {
        auto vdevice_exp = VDevice::create();
        if (!vdevice_exp) return vdevice_exp.status();
        vdevice = std::unique_ptr<VDevice>(vdevice_exp.release());

        auto network_group_exp = configure_network_group(*vdevice);
        if (!network_group_exp) return network_group_exp.status();
        network_group = network_group_exp.value();

        auto input_params = network_group->make_input_vstream_params({}, FORMAT_TYPE,
            HAILO_DEFAULT_VSTREAM_TIMEOUT_MS, HAILO_DEFAULT_VSTREAM_QUEUE_SIZE);
        if (!input_params) return input_params.status();

        auto output_params = network_group->make_output_vstream_params({}, FORMAT_TYPE,
            HAILO_DEFAULT_VSTREAM_TIMEOUT_MS, HAILO_DEFAULT_VSTREAM_QUEUE_SIZE);
        if (!output_params) return output_params.status();

        auto pipeline_exp = InferVStreams::create(*network_group, input_params.value(), output_params.value());
        if (!pipeline_exp) return pipeline_exp.status();
        pipeline = std::make_unique<InferVStreams>(std::move(pipeline_exp.value()));

        print_vstream_info();

        // Pre-allocate buffers
        for (const auto &input_vstream : pipeline->get_input_vstreams()) {
            size_t frame_size = input_vstream.get().get_frame_size();
            input_data.emplace(input_vstream.get().name(), std::vector<uint8_t>(frame_size));
            input_views.emplace(input_vstream.get().name(),
                MemoryView(input_data[input_vstream.get().name()].data(), 
                          input_data[input_vstream.get().name()].size()));
        }

        for (const auto &output_vstream : pipeline->get_output_vstreams()) {
            size_t frame_size = output_vstream.get().get_frame_size();
            output_data.emplace(output_vstream.get().name(), std::vector<uint8_t>(frame_size));
            output_views.emplace(output_vstream.get().name(),
                MemoryView(output_data[output_vstream.get().name()].data(),
                          output_data[output_vstream.get().name()].size()));
        }

        return HAILO_SUCCESS;
    }

    hailo_status detect(const cv::Mat& image) {
        auto start_time = std::chrono::high_resolution_clock::now();
        
        for (const auto &input_vstream : pipeline->get_input_vstreams()) {
            // 1. Get shape info
            auto shape = input_vstream.get().get_info().shape;
            
            // 2. Resize image
            cv::Mat resized;
            cv::resize(image, resized, cv::Size(shape.width, shape.height));
            
            // 3. Convert BGR to RGB since model expects RGB
            cv::Mat rgb_image;
            cv::cvtColor(resized, rgb_image, cv::COLOR_BGR2RGB);
            
            // 4. Convert to float and normalize to 0-1
            cv::Mat float_img;
            rgb_image.convertTo(float_img, CV_32F, 1.0/255.0);
            
            // 5. Convert back to uint8 for the hardware
            cv::Mat uint8_img;
            float_img.convertTo(uint8_img, CV_8U, 255.0);
            
            // Debug print first few values
            std::cout << "Input values: ";
            for(int i = 0; i < 5; i++) {
                std::cout << (int)uint8_img.data[i] << " ";
            }
            std::cout << std::endl;
            
            // Copy to input buffer
            std::memcpy(input_data[input_vstream.get().name()].data(), 
                       uint8_img.data, 
                       input_data[input_vstream.get().name()].size());
        }

        auto status = pipeline->infer(input_views, output_views, 1);
        
        if (status == HAILO_SUCCESS) {
            for (const auto &output_vstream : pipeline->get_output_vstreams()) {
                const auto& output = output_data[output_vstream.get().name()];
                const uint8_t* scores = output.data();
                
                // Print raw scores for debugging
                std::cout << "Raw scores: " << (int)scores[0] << ", " << (int)scores[1] << std::endl;
                
                // Calculate probabilities
                auto [prob_safe, prob_nsfw] = calculate_probabilities(scores);
                
                std::cout << "\nResults:" << std::endl;
                std::cout << "Safe: " << std::fixed << std::setprecision(4) 
                         << prob_safe * 100 << "%" << std::endl;
                std::cout << "NSFW: " << std::fixed << std::setprecision(4) 
                         << prob_nsfw * 100 << "%" << std::endl;
            }
        }
        
        auto end_time = std::chrono::high_resolution_clock::now();
        auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end_time - start_time);
        std::cout << "Inference time: " << duration.count() << "ms" << std::endl;
        
        return status;
    }
};

int main(int argc, char** argv) {
    if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " <image_path>" << std::endl;
        return -1;
    }

    std::string image_path = argv[1];
    cv::Mat image = cv::imread(image_path);
    if (image.empty()) {
        std::cerr << "Failed to load image: " << image_path << std::endl;
        return -1;
    }

    NSFWDetector detector;
    auto status = detector.init();
    if (HAILO_SUCCESS != status) {
        std::cerr << "Failed to initialize detector" << std::endl;
        return status;
    }

    status = detector.detect(image);
    if (HAILO_SUCCESS != status) {
        std::cerr << "Failed to run inference" << std::endl;
        return status;
    }

    return 0;
}

nina-vilela · November 26, 2024, 9:25am

Hi @phillip,

Welcome to the Hailo Community!

This is likely a quantization issue.

When testing the quantized har, you used:

But for quantized mode, you should use InferenceContext.SDK_QUANTIZED

Something I noticed is that you are performing the normalization as a pre-process before inference:

We recommend always adding the normalization to be performed on the device. This can be done by adding a normalization command to the model script.

It is also good to ensure that your calibration dataset is correctly created.

phillip · November 26, 2024, 10:44pm

Hi - You’re right it’s a quantization issue.

Here’s my calibration set code.

import os
import numpy as np
from PIL import Image

def create_calibration_set(image_dir="~/imagenet-sample-images", save_path="calib_set.npy"):
    """Create calibration set from images in the specified directory
    
    Args:
        image_dir (str): Path to directory containing images
        save_path (str): Path where to save the numpy array
    
    Returns:
        np.ndarray: Calibration dataset array
    """
    # Expand user directory if needed (handle ~)
    image_dir = os.path.expanduser(image_dir)
    
    # Get all image files from directory
    valid_extensions = {'.jpg', '.jpeg', '.png', '.bmp'}
    image_files = [
        f for f in os.listdir(image_dir)
        if os.path.splitext(f.lower())[1] in valid_extensions
    ]
    
    if not image_files:
        raise ValueError(f"No valid images found in {image_dir}")
    
    num_images = len(image_files)
    print(f"Found {num_images} images in {image_dir}")
    
    # Initialize calibration array (num_images, height, width, channels)
    calib_dataset = np.zeros((num_images, 224, 224, 3), dtype=np.uint8)

    # Process each image
    for i, image_file in enumerate(image_files):
        try:
            # Open image and convert to RGB to ensure 3 channels
            img_path = os.path.join(image_dir, image_file)
            img = Image.open(img_path).convert('RGB')
            
            # Resize image to 224x224 using LANCZOS resampling for better quality
            img_resized = img.resize((224, 224), Image.LANCZOS)
            
            # Convert to numpy array WITHOUT normalizing to keep [0,255] range
            img_array = np.array(img_resized, dtype=np.uint8)
            
            # Store in calibration dataset
            calib_dataset[i] = img_array
            
            if (i + 1) % 10 == 0:
                print(f"Processed {i + 1}/{num_images} images")
                
        except Exception as e:
            print(f"Error processing {image_file}: {str(e)}")
            # Fill with zeros if image processing fails
            calib_dataset[i] = np.zeros((224, 224, 3), dtype=np.uint8)
    
    # Save calibration set
    np.save(save_path, calib_dataset)
    print(f"Saved calibration set with {num_images} images to {save_path}")
    
    return calib_dataset

if __name__ == "__main__":
    # Create the calibration set
    calib_dataset = create_calibration_set()

The imagenet folder contains a mix of both classes of images.

Then here’s the optimization code

from hailo_sdk_client import ClientRunner
import os
import numpy as np

# Load HAR file
model_name = "nsfw_detection"
hailo_model_har_name = f"nsfw_model_hailo_model.har"
assert os.path.isfile(hailo_model_har_name), "Please provide valid path for HAR file"
runner = ClientRunner(har=hailo_model_har_name)

# Load calibration dataset
calib_dataset = np.load("calib_set.npy")

# Get input layer name
hn_layers = runner.get_hn_dict()["layers"]
input_layers = [layer for layer in hn_layers if hn_layers[layer]["type"] == "input_layer"]
print("Input layers:", input_layers)

# Create calibration dataset dict with correct input layer name
calib_dataset_dict = {input_layers[0]: calib_dataset}

# Optimize
runner.optimize(calib_dataset_dict)

# Save the quantized model
quantized_model_har_path = f"{model_name}_quantized_model.har"
runner.save_har(quantized_model_har_path)

phillip · November 26, 2024, 10:46pm

Also during optimization I see

[info] Using dataset with 64 entries for calibration

Does this indicate only 64 entries are being used? Is it random, could it be that the 64 picked are only for a single class? The calibration set is roughtly 50/50 split sfw/nsfw

phillip · November 26, 2024, 10:58pm

Lastly, here’s the logs from optimization

(hailo_venv) root@shepherd:~# python optimize.py
Input layers: ['nsfw_model/input_layer1']
[info] Starting Model Optimization
[warning] Reducing optimization level to 0 (the accuracy won't be optimized and compression won't be used) because there's no available GPU
[warning] Running model optimization with zero level of optimization is not recommended for production use and might lead to suboptimal accuracy results
[info] Model received quantization params from the hn
[info] Starting Mixed Precision
[info] Mixed Precision is done (completion time is 00:00:02.40)
[info] Starting LayerNorm Decomposition
[info] Using dataset with 64 entries for calibration
Calibration: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [03:31<00:00,  3.31s/entries]
[info] LayerNorm Decomposition is done (completion time is 00:03:59.71)
[info] Starting Statistics Collector
[info] Using dataset with 64 entries for calibration
Calibration: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 64/64 [06:03<00:00,  5.68s/entries]
[info] Statistics Collector is done (completion time is 00:06:18.14)
[info] Starting Fix zp_comp Encoding
[info] Fix zp_comp Encoding is done (completion time is 00:00:00.00)
[info] Starting Matmul Equalization
[info] Matmul Equalization is done (completion time is 00:00:06.58)
[info] No shifts available for layer nsfw_model/conv1/conv_op, using max shift instead. delta=0.1419
[info] activation fitting started for nsfw_model/reduce_sum_softmax1/act_op
[info] No shifts available for layer nsfw_model/conv3/conv_op, using max shift instead. delta=2.4421
[info] No shifts available for layer nsfw_model/conv3/conv_op, using max shift instead. delta=1.2211
[info] No shifts available for layer nsfw_model/conv5/conv_op, using max shift instead. delta=1.6960
[info] No shifts available for layer nsfw_model/conv5/conv_op, using max shift instead. delta=0.8480
[info] activation fitting started for nsfw_model/reduce_sum_softmax2/act_op
[info] No shifts available for layer nsfw_model/conv7/conv_op, using max shift instead. delta=5.0690
[info] No shifts available for layer nsfw_model/conv7/conv_op, using max shift instead. delta=2.5345
[info] No shifts available for layer nsfw_model/conv9/conv_op, using max shift instead. delta=1.7729
[info] No shifts available for layer nsfw_model/conv9/conv_op, using max shift instead. delta=0.8865
[info] activation fitting started for nsfw_model/reduce_sum_softmax3/act_op
[info] No shifts available for layer nsfw_model/conv11/conv_op, using max shift instead. delta=5.3758
[info] No shifts available for layer nsfw_model/conv11/conv_op, using max shift instead. delta=2.6879
[info] No shifts available for layer nsfw_model/conv13/conv_op, using max shift instead. delta=2.6140
[info] No shifts available for layer nsfw_model/conv13/conv_op, using max shift instead. delta=1.3070
[info] activation fitting started for nsfw_model/reduce_sum_softmax4/act_op
[info] No shifts available for layer nsfw_model/conv15/conv_op, using max shift instead. delta=4.6418
[info] No shifts available for layer nsfw_model/conv15/conv_op, using max shift instead. delta=2.3209
[info] No shifts available for layer nsfw_model/conv17/conv_op, using max shift instead. delta=3.4396
[info] No shifts available for layer nsfw_model/conv17/conv_op, using max shift instead. delta=1.7198
[info] activation fitting started for nsfw_model/reduce_sum_softmax5/act_op
[info] No shifts available for layer nsfw_model/conv19/conv_op, using max shift instead. delta=5.8287
[info] No shifts available for layer nsfw_model/conv19/conv_op, using max shift instead. delta=2.9144
[info] No shifts available for layer nsfw_model/conv21/conv_op, using max shift instead. delta=2.9919
[info] No shifts available for layer nsfw_model/conv21/conv_op, using max shift instead. delta=1.4960
[info] activation fitting started for nsfw_model/reduce_sum_softmax6/act_op
[info] No shifts available for layer nsfw_model/conv23/conv_op, using max shift instead. delta=5.5836
[info] No shifts available for layer nsfw_model/conv23/conv_op, using max shift instead. delta=2.7918
[info] No shifts available for layer nsfw_model/conv25/conv_op, using max shift instead. delta=2.5264
[info] No shifts available for layer nsfw_model/conv25/conv_op, using max shift instead. delta=1.2632
[info] activation fitting started for nsfw_model/reduce_sum_softmax7/act_op
[info] No shifts available for layer nsfw_model/conv27/conv_op, using max shift instead. delta=5.6599
[info] No shifts available for layer nsfw_model/conv27/conv_op, using max shift instead. delta=2.8299
[info] No shifts available for layer nsfw_model/conv29/conv_op, using max shift instead. delta=0.7931
[info] No shifts available for layer nsfw_model/conv29/conv_op, using max shift instead. delta=0.3966
[info] activation fitting started for nsfw_model/reduce_sum_softmax8/act_op
[info] No shifts available for layer nsfw_model/conv31/conv_op, using max shift instead. delta=5.3894
[info] No shifts available for layer nsfw_model/conv31/conv_op, using max shift instead. delta=2.6947
[info] No shifts available for layer nsfw_model/conv33/conv_op, using max shift instead. delta=3.4202
[info] No shifts available for layer nsfw_model/conv33/conv_op, using max shift instead. delta=1.7101
[info] activation fitting started for nsfw_model/reduce_sum_softmax9/act_op
[info] No shifts available for layer nsfw_model/conv35/conv_op, using max shift instead. delta=4.9457
[info] No shifts available for layer nsfw_model/conv35/conv_op, using max shift instead. delta=2.4729
[info] No shifts available for layer nsfw_model/conv37/conv_op, using max shift instead. delta=3.3804
[info] No shifts available for layer nsfw_model/conv37/conv_op, using max shift instead. delta=1.6902
[info] activation fitting started for nsfw_model/reduce_sum_softmax10/act_op
[info] No shifts available for layer nsfw_model/conv39/conv_op, using max shift instead. delta=4.8448
[info] No shifts available for layer nsfw_model/conv39/conv_op, using max shift instead. delta=2.4224
[info] No shifts available for layer nsfw_model/conv41/conv_op, using max shift instead. delta=2.7105
[info] No shifts available for layer nsfw_model/conv41/conv_op, using max shift instead. delta=1.3553
[info] activation fitting started for nsfw_model/reduce_sum_softmax11/act_op
[info] No shifts available for layer nsfw_model/conv43/conv_op, using max shift instead. delta=4.5563
[info] No shifts available for layer nsfw_model/conv43/conv_op, using max shift instead. delta=2.2782
[info] No shifts available for layer nsfw_model/conv45/conv_op, using max shift instead. delta=1.7637
[info] No shifts available for layer nsfw_model/conv45/conv_op, using max shift instead. delta=0.8819
[info] activation fitting started for nsfw_model/reduce_sum_softmax12/act_op
[info] No shifts available for layer nsfw_model/conv47/conv_op, using max shift instead. delta=2.4727
[info] No shifts available for layer nsfw_model/conv47/conv_op, using max shift instead. delta=1.2363
[info] No shifts available for layer nsfw_model/conv49/conv_op, using max shift instead. delta=1.8855
[info] No shifts available for layer nsfw_model/conv49/conv_op, using max shift instead. delta=0.9427
[info] Finetune encoding skipped
[info] Bias Correction skipped
[info] Adaround skipped
[info] Quantization-Aware Fine-Tuning skipped
[info] Layer Noise Analysis skipped
[info] Model Optimization is done

nina-vilela · November 27, 2024, 10:20am

I don’t see a model script being used. Did you try adding the normalization as suggested?

phillip · November 27, 2024, 5:01pm

When I added it, i got an error saying “normalization1 layer already exists”, which I thought implied that the existing model already handled normalization.

I can try adding it in and calling it normalization2 or something?

Also - I thought that normalization in the model script was an optimization, but not requried, so long as I properly normalize the image beforehand?

phillip · November 28, 2024, 1:02am

Just an update, adding normalization to the model script fixed the issue. Ultimatley I think I was passing uint8 to the model and it was expecting float32.

Thanks for your help!

Topic		Replies	Views
Strong Performance Degradation after conversion from HAR to HEF General hailort , raspberry-pi , hailo8	6	296	December 8, 2024
reconciling different outputs between quantized HAR and compiled HEF General	15	171	July 20, 2025
Problems optimizing (quantization) dinov2 - ONNX to HEF General dfc , raspberry-pi , hailo8 , error	6	631	November 21, 2024
Difference output from .har(ubuntu) vs .hef(hailo8l raspberry pi) General	1	303	October 14, 2024
Optimization of .har file General	8	130	February 25, 2025

.hef gives seemingly incorrect results. Quantized .har is fine

Related topics