Symptoms
nvidia-smi: command not foundNVIDIA-SMI has failed because it couldn't communicate with the NVIDIA drivertorch.cuda.is_available()returnsFalse- CUDA version mismatch errors
Step 1 — Check If GPU Is Present
>_BASH
$# Check PCI devices$lspci | grep -i nvidia$$# Should show something like:$# 00:04.0 3D controller: NVIDIA Corporation A100-SXM4-80GB (rev a1)If no NVIDIA device appears, the GPU may not be attached. Contact support.
Step 2 — Check Driver Installation
>_BASH
$# Check if driver module is loaded$lsmod | grep nvidia$$# Check driver version$cat /proc/driver/nvidia/version$$# If not loaded, try loading manually$modprobe nvidiaStep 3 — Reinstall NVIDIA Drivers
>_BASH
$# Remove existing drivers$apt-get purge -y nvidia-* libnvidia-*$apt-get autoremove -y$$# Install fresh$apt-get install -y nvidia-driver-535$rebootAfter reboot:
>_BASH
$nvidia-smiStep 4 — Fix CUDA Version Mismatch
>_BASH
$# Check driver version$nvidia-smi | grep "Driver Version"$# Driver Version: 535.104.05$$# Check CUDA version supported by driver$nvidia-smi | grep "CUDA Version"$# CUDA Version: 12.2$$# Ensure installed CUDA matches$nvcc --versionIf versions don't match, reinstall CUDA to match the driver's supported version.
Step 5 — Check Secure Boot
Secure Boot can prevent unsigned NVIDIA modules from loading:
>_BASH
$# Check Secure Boot status$mokutil --sb-state$$# If enabled, disable in BIOS or sign the moduleStep 6 — PyTorch CUDA Issues
PYTHON
import torch
print(torch.version.cuda) # Should match installed CUDA
print(torch.cuda.is_available()) # Should be True
print(torch.cuda.device_count()) # Should be >= 1If CUDA is not available in PyTorch despite drivers working:
>_BASH
$# Reinstall PyTorch with correct CUDA version$pip3 install --force-reinstall torch --index-url https://download.pytorch.org/whl/cu122Common Error → Fix Table
| Error | Fix |
|---|---|
nvidia-smi not found | Install nvidia-driver-535 |
Driver/library version mismatch | Reboot after driver install |
CUDA driver version insufficient | Update driver to match CUDA version |
torch.cuda.is_available() = False | Reinstall PyTorch with correct CUDA build |
