The hailo-dfc docker script fails to recognize non VGA-compatible gpus like H100/L40S

mineto_tsukada · October 15, 2024, 8:41am

Hi teams.

I found that he hailo-dfc docker script (hailo_ai_sw_suite_docker_run.sh) won’t recognize non VGA-compatible gpus like L40S/H100.

Cause

The original script tries to detect gpu availability by this one liner script; readonly NVIDIA_GPU_EXIST=$(lspci | grep "VGA compatible controller: NVIDIA"), but this captures only VGA-compatible gpus like RTX/GTX series gpus because the lspci outputs of non VGA-compatible gpus are like below;

18:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
2a:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)

Workaround

I replaced that line with this and resolved the problem;

readonly NVIDIA_GPU_EXIST=$(nvidia-smi &> /dev/null && echo true || echo false)

The above utilizes the result of nvidia-smi command to check the availability of nvidia gpus (I think this is a more general solution than the original grep-based one). Please consider to use my idea if you preffer. thanks.

mineto_tsukada · October 15, 2024, 8:47am

NOTE: the workaround is only available when nvidia-smi is available.

mineto_tsukada · October 15, 2024, 8:54am

One more note; you need to recreate the container by hailo_ai_sw_suite_docker_run.sh --override. Please backup data before recreating the container.

omria · October 16, 2024, 1:21pm

hey @mineto_tsukada

It looks like the Hailo-DFC Docker script (hailo_ai_sw_suite_docker_run.sh) is failing to detect non-VGA compatible GPUs like H100 and L40S. This happens because the original script relies on:

readonly NVIDIA_GPU_EXIST=$(lspci | grep "VGA compatible controller: NVIDIA")

This command only detects VGA-compatible GPUs (like the RTX/GTX series). However, high-performance GPUs such as H100 and L40S show up differently in the lspci output, like this:

18:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)
2a:00.0 3D controller: NVIDIA Corporation Device 2330 (rev a1)

Workaround

You can resolve the issue by modifying the script to use nvidia-smi, which provides a more reliable way to detect NVIDIA GPUs. Replace the original line with:

readonly NVIDIA_GPU_EXIST=$(nvidia-smi &> /dev/null && echo true || echo false)

This change ensures that the script checks for GPU availability using nvidia-smi.

Things to Keep in Mind

Ensure nvidia-smi is installed:
- The workaround will only work if NVIDIA drivers and nvidia-smi are installed and accessible.
Recreate the Docker Container:
- After modifying the script, you need to recreate the container by running:
```
./hailo_ai_sw_suite_docker_run.sh --override
```
- Backup your data before recreating the container to avoid data loss.

This solution provides a more general method to detect both VGA-compatible and non-VGA-compatible GPUs. Let me know if this resolves the issue or if you need further assistance!

rosslote · January 20, 2025, 1:05pm

Am i right in understanding that nvidia-smi has to be installed on the host machine before running the script? If the version I have is 535 will that cause issues.

Topic		Replies	Views
GPU not detected in Hailo Suite General	18	558	March 14, 2025
Software suite giving No available GPU from within docker General dfc , optimization	6	116	January 31, 2025
GPU Not Detected in Docker Container General hailo8	1	375	June 20, 2024
Hailo Dataflow Compiler / GPU driver can't be detected General dfc	2	83	March 18, 2025
No available GPU in Hailo Software Suite Docker General	3	185	December 23, 2024

The hailo-dfc docker script fails to recognize non VGA-compatible gpus like H100/L40S

Cause

Workaround

Workaround

Things to Keep in Mind

Related topics