Hi @Theo_Vioux
It is hard to say if people can be recognized at certain angles especially if it is just a side view where only one eye is visible. Face embedding models rely on 5 keypoints (two eyes, nose, two lip corners) and quality of embeddings can degrade if only half of the face is visible. This being said, the final accuracy depends on the total number of people you want to recognize. If it is a small set, a lot of such optimizations would work as probability of embeddings crossing a threshold that leads to false identification is low.
Hi,
There should be 3 or 4 people, but ideally they should be recognizable from the back as well. If that’s not possible, I’d make sure that at least the front of the face is recognizable by a camera. With the face detection+face recognition+pose estimation model, I have about 5 FPS which is not enough, do you see a way to optimize this?
Hi @shashi
For person detection, I use the
yolo11n_pose--640x640_quant_hailort_multidevice_1
model, but since it is light for its nano version, it is not accurate enough to detect body joints (for example hand detected only if head is detected and occlusion). Do you have a more accurate version, even if it degrades fps a bit (I was thinking of an m version, or one configured directly on hailo8 and not multidevice). If there’s no compatible model, would it be better to change the output tensors currently set to :
{
"ConfigVersion": 6,
"Checksum": "01082212afab31375c5ba2f66641681d2719e5ea18053d336c4bc2ab37a81362",
"DEVICE": [
{
"DeviceType": "HAILO8L",
"RuntimeAgent": "HAILORT",
"ThreadPackSize": 6,
"SupportedDeviceTypes": "HAILORT/HAILO8L, HAILORT/HAILO8"
}
],
"PRE_PROCESS": [
{
"InputN": 1,
"InputH": 640,
"InputW": 640,
"InputC": 3,
"InputQuantEn": true
}
],
"MODEL_PARAMETERS": [
{
"ModelPath": "yolo11n_pose--640x640_quant_hailort_multidevice_1.hef"
}
],
"POST_PROCESS": [
{
"OutputPostprocessType": "PoseDetectionYoloV8",
"OutputNumClasses": 1,
"LabelsPath": "labels_yolo11n_pose.json",
"OutputNumLandmarks": 17
}
]
}
Hi @Theo_Vioux
We added yolo11s_pose model compiled for Hailo8 to our hailo model zoo. Please see if that helps.
Hi @shashi
Do you have a more accurate pose detection model because despite the occlusions, sometimes a joint is detected slightly off center in the RGB sensor, placing a dot on the wall behind in the 2D image sometimes which causes my infrared depth sensor to misinterpret the depth of that dot, and take the z of the wall. In fact, the data is quite noisy, and I want to make sure I have the most accurate model possible, so that the noise is as low as possible and post-processed correctly.
Hi @Theo_Vioux
We compiled the yolo11m pose detection model with keypoint outputs in 16 bit precision as the keypoints are more sensitive to quantization. The model can be found here: DeGirum AI Hub. Please see if this is better for your use case.
Hello @shashi,
Thanks for your feedback.
So far, I don’t see too much of a noticeable difference. The problem being that the detected joint points fluctuate too much in space, even when I stay still, is there any way to stabilize this?
Hello @shashi
The coco pose model for multi-human pose estimation can be used to detect wrists, but with mediocre accuracy because they are thin and therefore generate a lot of false detection (often an object behind when the wrist is moving). Is there a Hailo-compatible model with hand estimation instead, like mediapipe could do for example?
Hi @Theo_Vioux
You need just hand detection or keypoints on the hands also?
Just hand detection being accurate (no keypoints on it), I saw that the
hand_landmark_lite--224x224_quant_hailort_hailo8_1
model was available, but is it accurate enough? Because I haven’t managed to get it to detect a hand yet with code, if not do you have any others?
Hi @Theo_Vioux
For just hand detection, you can use this model: https://hub.degirum.com/degirum/hailo/yolov8n_relu6_hand--640x640_quant_hailort_hailo8_1
The landmarks model takes a detected hand as input. Hence, if you provide a larger image with multiple hands in it, it will not show impressive results. You can use these two models in series: use had model to detect hands and then crop the hands and run the landmarks model.
Do you have a code example, because with :
model_names = [
"yolo11s_pose--640x640_quant_hailort_hailo8_1",
"yolov8n_relu6_hand--640x640_quant_hailort_hailo8_1"
]
I get :
Version is not supported
The version 11 of 'yolov8n_relu6_hand--640x640_quant_hailort_hailo8_1' model parameter collection is NEWER than the current version 10 of DeGirum framework software.
Hi @Theo_Vioux
Please upgrade PySDK to latest version.
Hi @shashi
I would finally like to return to facial recognition for my project and I have 2 questions regarding the Re-ID of person, I do not understand why by applying this recognition code on 2 cameras, I obtain a video stream of 2 fps (only) at most for 1 person, while I would like to at least aim for 10-15 fps if possible, how to optimize? I specify that I have 30 fps without facial recognition done by the function def detect_and_recognize_faces(frame, camera_serial)
Here is the code from the tutorial you did and adapted on my cameras:
import degirum as dg
import degirum_tools
import pyrealsense2 as rs
import numpy as np
import cv2
import time
inference_host_address = "@local"
zoo_url = '../models'
token = ''
face_det_model_name = "yolov8n_relu6_widerface_kpts--640x640_quant_hailort_hailo8_1"
face_det_model = dg.load_model(
model_name=face_det_model_name,
inference_host_address=inference_host_address,
zoo_url=zoo_url,
token=token,
overlay_color=(0, 255, 0) # Green color for bounding boxes
)
face_rec_model_name = "arcface_mobilefacenet--112x112_quant_hailort_hailo8_1"
face_rec_model = dg.load_model(
model_name=face_rec_model_name,
inference_host_address=inference_host_address,
zoo_url=zoo_url,
token=token
)
def align_and_crop(img, landmarks, image_size=112):
_arcface_ref_kps = np.array(
[
[38.2946, 51.6963], # Left eye
[73.5318, 51.5014], # Right eye
[56.0252, 71.7366], # Nose
[41.5493, 92.3655], # Left mouth corner
[70.7299, 92.2041], # Right mouth corner
],
dtype=np.float32,
)
# Ensure the input landmarks have exactly 5 points (as expected for face alignment)
assert len(landmarks) == 5
# Validate that image_size is divisible by either 112 or 128 (common image sizes for face recognition models)
assert image_size % 112 == 0 or image_size % 128 == 0
# Adjust the scaling factor (ratio) based on the desired image size (112 or 128)
if image_size % 112 == 0:
ratio = float(image_size) / 112.0
diff_x = 0 # No horizontal shift for 112 scaling
else:
ratio = float(image_size) / 128.0
diff_x = 8.0 * ratio # Horizontal shift for 128 scaling
# Apply the scaling and shifting to the reference keypoints
dst = _arcface_ref_kps * ratio
dst[:, 0] += diff_x # Apply the horizontal shift
# Estimate the similarity transformation matrix to align the landmarks with the reference keypoints
M, inliers = cv2.estimateAffinePartial2D(np.array(landmarks), dst, ransacReprojThreshold=1000)
assert np.all(inliers == True)
# Apply the affine transformation to the input image to align the face
aligned_img = cv2.warpAffine(img, M, (image_size, image_size), borderValue=0.0)
return aligned_img, M
from lancedb.pydantic import LanceModel, Vector
import uuid
import numpy as np
from typing import List, Dict
class FaceRecognitionSchema(LanceModel):
id: str # Unique identifier for each entry
vector: Vector(512) # Face embeddings, fixed size of 512
entity_name: str # Name of the entity
@classmethod
def prepare_face_records(cls, face_embeddings: List[Dict], entity_name: str) -> List['FaceRecognitionSchema']:
if not face_embeddings:
return []
formatted_records = []
for embedding in face_embeddings:
formatted_records.append(
cls(
id=str(uuid.uuid4()), # Generate a unique ID
vector=np.array(embedding, dtype=np.float32), # Convert embedding to float32 numpy array
entity_name=entity_name
)
)
return formatted_records
from pathlib import Path
import logging
from typing import Any
# Configure logging for better output control
logging.basicConfig(level=logging.WARNING, format="%(asctime)s - %(levelname)s - %(message)s")
def populate_database_from_images(
input_path: str,
face_det_model: Any,
face_rec_model: Any,
tbl: Any # LanceDB table object
) -> None:
path = Path(input_path)
num_entities = 0 # Counter for the number of entities added to the database
# Find all image files and identities in the directory and subdirectories
image_files = [str(file) for file in path.rglob("*") if file.suffix.lower() in (".png", ".jpg", ".jpeg")]
identities = [file.stem.split("_")[0] for file in path.rglob("*") if file.suffix.lower() in (".png", ".jpg", ".jpeg")]
if not image_files:
logging.warning(f"No image files found in {input_path}.")
return
for identity, detected_faces in zip(identities, face_det_model.predict_batch(image_files)):
try:
# Count number of detected faces
num_faces = len(detected_faces.results)
# Skip images with more than one face
if num_faces > 1:
logging.warning(f"Skipped {detected_faces.info} as it contains more than one face ({num_faces} faces detected).")
continue
elif num_faces == 0:
logging.warning(f"Skipped {detected_faces.info} as no faces were detected.")
continue
# Process the single detected face
result = detected_faces.results[0]
# Generate face embedding
aligned_img, _ = align_and_crop(detected_faces.image, [landmark["landmark"] for landmark in result["landmarks"]])
face_embedding = face_rec_model(aligned_img).results[0]["data"][0]
# Prepare records for the database
records = FaceRecognitionSchema.prepare_face_records([face_embedding], identity)
# Add records to the database if valid
if records:
tbl.add(data=records)
num_entities += len(records)
else:
logging.warning(f"No valid records generated for {detected_faces.info}.")
except Exception as e:
logging.error(f"Error processing {file}: {e}", exc_info=True)
# Log summary
logging.info(f"Successfully added {num_entities} entities to the database table.")
total_entities = tbl.count_rows()
logging.info(f"The table now contains {total_entities} entities.")
import lancedb
# Database and table setup
uri = "../.temp/face_database"
table_name = "face"
# Path to the directory containing the sample dataset for indexing.
input_path = "../test_scripts/visage"
# Connect to the database
db = lancedb.connect(uri=uri)
print(db.uri)
# Initialize the table
if table_name not in db.table_names():
tbl = db.create_table(table_name, schema=FaceRecognitionSchema)
else:
tbl = db.open_table(table_name)
schema_fields = [field.name for field in tbl.schema]
if schema_fields != list(FaceRecognitionSchema.model_fields.keys()):
raise RuntimeError(f"Table {table_name} has a different schema.")
# Process images and populate the database
populate_database_from_images(
input_path=input_path,
face_det_model=face_det_model,
face_rec_model=face_rec_model,
tbl=tbl
)
from typing import List, Any
import numpy as np
def identify_faces(
embeddings: List[np.ndarray], # List of NumPy arrays representing face embeddings
tbl: Any, # The database or table object supporting the search method
field_name: str, # Name of the vector column in the database
metric_type: str, # Metric type for distance calculation (e.g., "cosine", "euclidean")
top_k: int, # Number of top results to fetch from the database
threshold: float = 0.3 # Distance threshold for assigning labels
) -> List[str]:
identities = [] # List to store the assigned labels
similarity_scores = [] # List to store the similarity scores
for embedding in embeddings:
# Perform database search
search_result = (
tbl.search(
embedding,
vector_column_name=field_name
)
.metric(metric_type)
.limit(top_k)
.to_list()
)
# Check if search_result has any entries
if not search_result:
identities.append("Unknown")
continue
# Calculate the similarity score
similarity_score = round(1 - search_result[0]["_distance"], 2)
# Assign a label based on the similarity threshold
identity = search_result[0]["entity_name"] if similarity_score >= threshold else "Unknown"
# Append the label to the results list
identities.append(identity)
similarity_scores.append(similarity_score)
return identities, similarity_scores
#DATABASE
import lancedb
# database related parameters
top_k = 1
field_name = "vector"
metric_type = "cosine"
# Database and table parameters
uri = "../.temp/face_database"
table_name = "face"
# Connect to the database
db = lancedb.connect(uri=uri)
tbl = db.open_table(table_name)
# check the schema of the table to ensure it matches the expected schema
schema_fields = [field.name for field in tbl.schema]
if schema_fields != list(FaceRecognitionSchema.model_fields.keys()):
raise RuntimeError(f"Table {table_name} has a different schema.")
############################ CAMERAS ############################
import cv2
import numpy as np
import pyrealsense2 as rs
import degirum_tools as dg
serial_numbers = ['317422075525', '335622071790'] #from cameras
class RealSenseWrapper:
def __init__(self, serial):
self.pipeline = rs.pipeline()
self.config = rs.config()
self.config.enable_device(serial)
self.config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
self.pipeline.start(self.config)
def read(self):
frames = self.pipeline.wait_for_frames()
color_frame = frames.get_color_frame()
if not color_frame:
return False, None
color_image = np.asanyarray(color_frame.get_data())
return True, color_image
def release(self):
self.pipeline.stop()
def detect_and_recognize_faces(frame, camera_serial):
detected_faces = face_det_model.predict(frame)
recognized_faces = []
if detected_faces.results:
for face in detected_faces.results:
landmarks = [landmark["landmark"] for landmark in face["landmarks"]]
aligned_face, _ = align_and_crop(frame, landmarks)
embedding_result = face_rec_model.predict(aligned_face)
embedding = embedding_result.results[0]["data"][0]
# Identification
identities, similarity_scores = identify_faces(
[embedding], tbl, field_name, metric_type, top_k
)
label = identities[0]
score = similarity_scores[0]
bbox = face["bbox"] # [x1, y1, x2, y2]
name_display = label if score >= 0.3 else "Unknown"
recognized_faces.append({
"bbox": bbox,
"name": name_display,
"face": aligned_face,
"score": score
})
cv2.putText(frame, f"{name_display}: {score:.2f}", (int(bbox[0]), int(bbox[1]) - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.9, (0, 255, 0), 2)
cv2.rectangle(frame, (int(bbox[0]), int(bbox[1])), (int(bbox[2]), int(bbox[3])), (0, 255, 0), 2)
return recognized_faces
cameras = [RealSenseWrapper(serial) for serial in serial_numbers]
prev_times = [time.time() for _ in cameras] #FPS
# Main capture loop
def main_loop():
while True:
for i, camera in enumerate(cameras):
ret, frame = camera.read()
if not ret:
print(f"Erreur : Pas d'image pour la caméra {serial_numbers[i]}")
continue
current_time = time.time()
fps = 1.0 / (current_time - prev_times[i])
prev_times[i] = current_time
recognized = detect_and_recognize_faces(frame, serial_numbers[i])
for face in recognized:
print(f"Caméra {serial_numbers[i]}: {face['name']} ({face['score']:.2f})")
#FPS
cv2.putText(frame, f"FPS: {fps:.2f}", (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 255), 2)
cv2.imshow(f"Camera {i+1} - {serial_numbers[i]}", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
if __name__ == "__main__":
main_loop()
# Libération des caméras après utilisation
for camera in cameras:
camera.release()
# Fermer toutes les fenêtres OpenCV
cv2.destroyAllWindows()
My other question is : can the Re-ID model repvgg_a0_person_reid_512.hef from hailo_model_zoo/hailo_models/reid/README.rst at master · hailo-ai/hailo_model_zoo · GitHub
be adapted with Degirum tools in post-processing regarding its label in JSON?
Hi @Theo_Vioux
The code from the guide was for illustrating all the steps to educate the users about the face recognition pipeline. It is not optimized for streaming. We are working on a face reocgnition package optimized for streaming and will keep you posted. Regarding the repvgg person reid model: it cannot be used for face recognition. But if you want to use it for tracking, it is easy to integrate it into pysdk as this model has very minimal postprocessing (dequantizing the output to get floating point embeddings). There is no label file for this model.
I have integrated a JSON template of the following form for the Re-ID .hef template :
{
“ConfigVersion”: 6,
“Checksum”: “”,
“DEVICE”: [
{
“DeviceType”: “HAILO8”,
“RuntimeAgent”: “HAILORT”,
“SupportedDeviceTypes”: “HAILORT/HAILO8”
}
],
“PRE_PROCESS”: [
{
“InputN”: 1,
“InputH”: 256,
“InputW”: 128,
“InputC”: 3,
“InputQuantEn”: true
}
],
“MODEL_PARAMETERS”: [
{
“ModelPath”: “repvgg_a0_person_reid_512.hef”
}
],
“POST_PROCESS”: [
{
“OutputPostprocessType”: “None”
}
]
}
but I get the following error (the path is correct because when I replace the name with a template downloaded directly from the hub, it works, so it probably comes from the JSON I created) :
raise DegirumException(
degirum.exceptions.DegirumException: Model ‘repvgg_a0_person_reid_512’ is not found in model zoo ‘../models’
Hi @Theo_Vioux
Please fill some non-empty value for Checksum and try again.
Hello @shashi
Indeed, I hadn’t noticed it was empty, thank you.