GPU compute inside the Windows Subsystem for Linux has matured, but it’s still confusing to know which stacks truly work end‑to‑end. This guide cuts through the noise. We’ll explain how GPU Compute in WSL2: CUDA/DirectML—What Actually Works today, what’s still experimental, and how to set up a reliable development workflow. You’ll learn how to verify your environment, install and test CUDA on WSL2, understand where DirectML fits (and when to run it outside WSL), wire up Docker, and avoid the common performance and stability pitfalls that derail GPU projects.
By the end, you’ll have a concrete checklist to get NVIDIA CUDA working inside WSL2, the truth about DirectML with WSL, plus practical optimizations and fixes you can apply on day one.
Overview
In plain English: WSL2 is a lightweight Linux VM with tight Windows integration. Since Windows 10/11 added GPU paravirtualization (GPU‑PV) to WSL2, Linux processes can access your GPU via a special device (/dev/dxg). However, not all GPU APIs are equal inside WSL:
-
CUDA on WSL2 (NVIDIA only): This is the most mature, fully supported GPU compute path in WSL2. You install the Windows NVIDIA driver and the CUDA toolkit inside your Linux distro. Frameworks like PyTorch and TensorFlow with CUDA generally “just work.”
-
DirectML with WSL2 (cross‑vendor via DirectX 12): DirectML is a Windows API. The practical, supported way to use DirectML today is from Windows-native apps (e.g., onnxruntime-directml, tensorflow-directml, torch-directml) — not from inside Linux in WSL2. There have been experiments and previews bridging D3D12/DirectML to Linux in WSL, but mainstream frameworks do not ship stable Linux builds that target DirectML in WSL. If you want vendor‑agnostic acceleration via DirectML, run your ML code on Windows, not inside the Linux distro.
-
ROCm/oneAPI in WSL2 (AMD/Intel): Support exists but is narrower, more hardware‑specific, and sometimes preview‑grade. For AMD, ROCm on WSL has limited, evolving support on certain GPUs/drivers; check AMD’s latest docs. For Intel, oneAPI/Level Zero and OpenCL support in WSL has improved, but you’ll need the right driver stack and toolchain versions. Both are workable for specific setups, but CUDA on NVIDIA remains the smoothest path.
Typical scenarios:
- You have an NVIDIA GPU and want to train PyTorch/TensorFlow models inside WSL2 using CUDA. This is the most reliable route.
- You want cross‑vendor acceleration using DirectML. Run the workload on Windows (PowerShell/CMD/Windows Python), not inside the WSL distro.
- You want GPU in Docker containers under WSL2 for reproducible environments. This is supported (especially with NVIDIA) and works well with Docker Desktop’s WSL 2 backend.
Quick Reference Table
Command | Purpose | Example Output |
---|---|---|
wsl –status | Show WSL version/kernel/Wslg state | Default Version: 2; Kernel version: 5.15.x; WSLg: enabled |
wsl –update | Update WSL kernel/Wslg from Microsoft Store | Updates installed successfully |
ls /dev/dxg | Verify GPU‑PV device in WSL2 | /dev/dxg |
nvidia-smi | Verify NVIDIA driver linkage from WSL2 | Driver Version: 555.xx, CUDA Version: 12.x, GPU list… |
nvcc –version | Check CUDA toolkit inside WSL2 | Cuda compilation tools, release 12.x |
python -c “import torch; print(torch.cuda.is_available())” | Check PyTorch CUDA | True |
docker run –gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi | Verify GPU in Docker/WSL2 | Standard nvidia-smi table with your GPU |
clinfo | Inspect OpenCL (OpenCLOn12) in WSL2 | Platform Name: Microsoft Corporation; OpenCL 1.2 |
dxdiag (Windows) | Confirm DirectX 12 and driver | D3D12, driver/vendor details |
onnxruntime‑directml (Windows pip) | Test DirectML providers | [‘DmlExecutionProvider’,’CPUExecutionProvider’] |
Key Concepts & Prerequisites
-
Windows builds
- Windows 11 22H2 or newer recommended (Build 22621+). Windows 10 is supported, but the Store version of WSL is strongly recommended for the latest kernel and WSLg.
- Install WSL from the Microsoft Store for faster updates.
-
WSL2 and WSLg
- Use WSL2, not WSL1. WSLg (the GUI/graphics layer) also brings the GPU PV device.
- Command: wsl –status to verify WSL version and kernel.
-
GPU drivers on Windows
- NVIDIA: Install the latest Game Ready/Studio driver that includes WSL support (modern drivers do). You do NOT need the Windows CUDA toolkit for WSL use.
- AMD: Use the latest Pro/Adrenalin drivers and consult AMD docs for ROCm on WSL status; support is limited to certain GPUs.
- Intel: Install the latest Arc/Intel Graphics drivers; check oneAPI documentation for WSL GPU compute specifics.
-
Distro and packages
- Ubuntu 22.04+ (or Ubuntu 20.04) is the most commonly supported distro for CUDA in WSL.
- For CUDA: You’ll install the CUDA toolkit inside the WSL distro.
- Python, build-essential, git, curl/gnupg may be required depending on your stack.
-
Docker (optional)
- Docker Desktop with “Use the WSL 2 based engine” enabled is the simplest way to run GPU containers in WSL2.
- Alternatively, a native Docker Engine inside WSL can work with NVIDIA Container Toolkit, but Docker Desktop is easier to get right.
-
Permissions
- You’ll need admin rights on Windows to install drivers and configure certain system settings.
-
Hardware basics
- A modern GPU with DirectX 12 support is required for WSLg GPU‑PV.
- For CUDA, you need an NVIDIA GPU with recent WDDM and CUDA support.
Step‑by‑Step Guide
- Verify your WSL2 and GPU‑PV baseline
- In PowerShell:
wsl –status
wsl –update - In your WSL distro:
uname -r
ls /dev/dxg
dmesg | grep -i dxg
Expected: WSL version 2, kernel 5.15+; /dev/dxg present; no dxg errors. If /dev/dxg is missing, update WSL from the Store and upgrade Windows.
- Install/Update your Windows GPU driver
- NVIDIA: Install the latest driver from nvidia.com (Game Ready or Studio). Reboot.
- AMD/Intel: Install the latest stable drivers from the vendor. Reboot.
- Sanity check in Windows:
- dxdiag → Save All Information… and confirm D3D12.
- NVIDIA users can also run nvidia-smi in a Windows terminal to verify.
- Test NVIDIA GPU visibility from WSL2
- In WSL:
nvidia-smi
Expected: You see your GPU(s), driver version, and CUDA version. If not, your Windows driver likely isn’t WSL‑enabled or WSL needs updating.
- Install CUDA toolkit inside WSL2 (NVIDIA)
- Example for Ubuntu 22.04 (update the repo if you’re on a different release):
sudo apt-get update
sudo apt-get install -y gnupg curl
sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub | sudo gpg –dearmor -o /etc/apt/keyrings/cuda-archive-keyring.gpg
echo “deb [signed-by=/etc/apt/keyrings/cuda-archive-keyring.gpg] https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /” | sudo tee /etc/apt/sources.list.d/cuda-ubuntu22.04-x86_64.list
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-4 - Add CUDA to PATH (optional, for nvcc):
echo ‘export PATH=/usr/local/cuda-12.4/bin:$PATH’ >> ~/.bashrc
echo ‘export LD_LIBRARY_PATH=/usr/local/cuda-12.4/lib64:$LD_LIBRARY_PATH’ >> ~/.bashrc
source ~/.bashrc - Verify:
nvcc –version
nvidia-smi
- Test a CUDA framework (e.g., PyTorch) in WSL2
- Install Python tooling:
sudo apt-get install -y python3-venv python3-pip
python3 -m venv ~/venv
source ~/venv/bin/activate - Install PyTorch CUDA build (choose versions from pytorch.org/“Get Started”; example):
pip install –upgrade pip
pip install torch torchvision –index-url https://download.pytorch.org/whl/cu124 - Test:
python -c “import torch;print(torch.cuda.is_available());print(torch.cuda.get_device_name(0) if torch.cuda.is_available() else ‘No GPU’)”
Expected: True and your GPU name.
- DirectML: What to do in practice
- DirectML is a Windows API via DirectX 12. Today, the supported way to use DirectML is to run frameworks on Windows, not inside the WSL Linux environment.
- Windows example in PowerShell/CMD:
py -m venv %USERPROFILE%\dmlvenv
%USERPROFILE%\dmlvenv\Scripts\activate
pip install onnxruntime-directml
python -c “import onnxruntime as ort; print(ort.get_available_providers())”
Expected: [‘DmlExecutionProvider’, ‘CPUExecutionProvider’]. - If you need vendor‑agnostic acceleration and want to stay inside WSL, consider ONNX Runtime CPU or explore OpenCL (1.2) via OpenCLOn12 for certain compute tasks (not ideal for most modern DL workloads).
- GPU in Docker on WSL2
- Docker Desktop route (recommended):
- Install Docker Desktop, enable “Use the WSL 2 based engine.”
- Integrate your WSL distro in Docker Desktop settings.
- Test:
docker run –rm –gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
- Native Docker Engine route in WSL (advanced):
- Install Docker Engine in WSL.
- Install NVIDIA Container Toolkit inside the same WSL distro (follow NVIDIA docs).
- Configure:
sudo nvidia-ctk runtime configure –runtime=docker
sudo systemctl restart docker - Test with the same docker run command above.
- Optional: Tune WSL with .wslconfig
- Create or edit %UserProfile%.wslconfig (Windows side):
[wsl2]
memory=12GB
processors=8
swap=4GB
localhostForwarding=true - Then:
wsl –shutdown - These limits help keep WSL memory/CPU in check during long training jobs.
Troubleshooting
-
/dev/dxg missing or GPU not visible
- Fix:
- Update WSL from the Store: wsl –update
- Upgrade Windows to a recent build.
- Reboot after installing/updating GPU drivers.
- Ensure you are on WSL2 (not WSL1): wsl –set-version
2
- Fix:
-
nvidia-smi fails in WSL
- Fix:
- Install a current NVIDIA Windows driver (supports WSL).
- Verify in Windows Device Manager that your discrete GPU isn’t disabled.
- Run nvidia-smi in Windows to confirm driver health; then try again in WSL.
- Fix:
-
CUDA library not found in Python/TensorFlow/PyTorch
- Fix:
- Ensure CUDA toolkit installed in WSL and LD_LIBRARY_PATH includes /usr/local/cuda-12.x/lib64
- Install the correct CUDA‑matched wheel (e.g., cu124 for CUDA 12.4).
- Verify:
ldconfig -p | grep -i cuda
- Fix:
-
Docker can’t see the GPU (–gpus all fails)
- Fix (Docker Desktop):
- Make sure “Use the WSL 2 based engine” is enabled.
- Integrate your WSL distro in Docker Desktop.
- Update to latest Docker Desktop; reboot.
- Fix (native Docker Engine):
- Install NVIDIA Container Toolkit and run:
sudo nvidia-ctk runtime configure –runtime=docker
sudo systemctl restart docker
- Install NVIDIA Container Toolkit and run:
- Fix (Docker Desktop):
-
DNS not working in WSL
- Symptoms: apt/pip fail to resolve hostnames after VPN switches.
- Fix (WSL distro):
sudo bash -c ‘printf “[network]\ngenerateResolvConf=false\n” > /etc/wsl.conf’
sudo rm /etc/resolv.conf
echo “nameserver 1.1.1.1” | sudo tee /etc/resolv.conf
wsl –shutdown
Reopen WSL.
-
Slow Git or file I/O
- Cause: Working on Windows paths (/mnt/c/…) from WSL.
- Fix: Keep repos under Linux filesystem (e.g., ~/code). If you must use /mnt/c, mount with metadata and set Git configs:
- Use Linux path: clone to ~/projects
- git config –global core.filemode false
- git config –global core.autocrlf input
-
ext4.vhdx bloating (WSL disk too big)
- Fix (PowerShell as Admin):
wsl –shutdown
Optimize-VHD -Path “$env:LOCALAPPDATA\Packages*\LocalState\ext4.vhdx” -Mode full - Tip: Back up first. The path differs per distro publisher.
- Fix (PowerShell as Admin):
-
GPU resets/driver timeouts (long kernels)
- Fix: Increase TDR (use with caution).
- Registry (Windows): HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers
- TdrDelay (DWORD) = 10
- TdrDdiDelay (DWORD) = 20
- Reboot. Don’t set excessively high values; diagnose kernel issues.
- Registry (Windows): HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers
- Fix: Increase TDR (use with caution).
Optimization & Best Practices
-
Prefer Linux filesystem for projects
- Keep your code, datasets, and virtual environments in WSL’s Linux filesystem (e.g., /home/you). Accessing /mnt/c adds overhead and can crush performance during heavy I/O.
-
Use a dedicated conda/venv per framework
- Isolate PyTorch/TensorFlow environments to keep CUDA/cuDNN dependencies clean. Upgrades are less disruptive.
-
Pin compatible versions
- Match PyTorch/TensorFlow wheels to your CUDA version (e.g., cu121 vs cu124). If you update CUDA in WSL, update wheels accordingly.
-
Keep WSL and drivers current
- wsl –update and current NVIDIA/AMD/Intel drivers eliminate many edge issues.
-
Control memory/CPU with .wslconfig
- Right‑size memory and cores for your training jobs to avoid starving Windows or thrashing swap. Example:
[wsl2]
memory=50%
processors=75%
- Right‑size memory and cores for your training jobs to avoid starving Windows or thrashing swap. Example:
-
Cache pip/conda effectively
- Use pip cache and/or a local wheelhouse to avoid repeated downloads in CI or rebuilds.
-
Docker images: build lean, pin bases
- Use nvidia/cuda base images with the exact CUDA tag you need. Clean apt caches in Dockerfiles to slim images.
-
When to use DirectML
- If you need vendor‑agnostic acceleration and your workload/framework supports DirectML, run it on Windows native Python. Keep your dev environment split: Use WSL for Linux tooling and Windows for the DML‑accelerated run. VS Code Remote can make this seamless.
-
OpenCLOn12 reality check
- OpenCL 1.2 via D3D12 is usable for some compute workloads but isn’t a drop‑in for modern DL frameworks. Use it with eyes open and benchmark carefully.
Project‑Based Example: Training PyTorch on CUDA in WSL2 and in Docker
Goal: Confirm that an NVIDIA GPU is usable both directly in WSL2 and inside a container.
A) Native WSL2 (no Docker)
- Verify GPU:
nvidia-smi - Create a venv and install PyTorch CUDA:
python3 -m venv ~/venv
source ~/venv/bin/activate
pip install –upgrade pip
pip install torch torchvision –index-url https://download.pytorch.org/whl/cu124 - Quick training snippet (MNIST‑like toy):
python – << ‘PY’
import torch, torch.nn as nn, torch.optim as optim
device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’
print(‘Device:’, device, torch.cuda.get_device_name(0) if device==’cuda’ else ”)
x = torch.randn(256, 784, device=device)
y = torch.randint(0, 10, (256,), device=device)
model = nn.Sequential(nn.Linear(784, 256), nn.ReLU(), nn.Linear(256, 10)).to(device)
opt = optim.SGD(model.parameters(), lr=0.01)
loss_fn = nn.CrossEntropyLoss()
for i in range(50):
opt.zero_grad()
loss = loss_fn(model(x), y)
loss.backward()
opt.step()
print(‘Done. Final loss:’, float(loss))
PY
Expected: Device: cuda and smooth training. If it falls back to CPU, revisit CUDA install and wheel compatibility.
B) Docker on WSL2 with GPU
- Test GPU access:
docker run –rm –gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi - Minimal Dockerfile for PyTorch:
FROM nvidia/cuda:12.4.0-runtime-ubuntu22.04
RUN apt-get update && apt-get install -y python3-pip && rm -rf /var/lib/apt/lists/*
RUN pip3 install –upgrade pip && pip3 install torch –index-url https://download.pytorch.org/whl/cu124
CMD [“python3”, “-c”, “import torch; print(‘CUDA?’, torch.cuda.is_available())”] - Build and run:
docker build -t torch-cuda .
docker run –rm –gpus all torch-cuda
Expected: CUDA? True. You can now add your full codebase to this image for a reproducible training environment.
Conclusion
GPU Compute in WSL2: CUDA/DirectML—What Actually Works boils down to this:
- NVIDIA + CUDA in WSL2 is the most reliable, broadly supported path for running GPU compute inside the Linux environment. It works both natively and inside Docker on the WSL 2 backend.
- DirectML is a Windows technology. Use it from Windows-native frameworks to get vendor‑agnostic acceleration; don’t expect mainstream Linux/WSL builds to target DirectML directly today.
- AMD ROCm and Intel oneAPI on WSL2 can work for specific hardware and driver stacks, but they lag CUDA in simplicity and ecosystem coverage. Check vendor docs before committing.
- With a few best practices (.wslconfig tuning, Linux filesystem for projects, pinned versions), WSL2 can be fast, stable, and production‑friendly for GPU development.
You now have the commands, configs, and gotchas to make your WSL2 GPU setup dependable.
FAQ
Does CUDA in WSL2 perform the same as native Linux?
Performance is typically close, but not identical. Overheads are small for many workloads. I/O patterns, memory pressure, and CPU contention (tunable via .wslconfig) often matter more than the WSL hypervisor layer itself. Always benchmark on your specific workload.
Can I use DirectML from inside WSL2 Linux?
In general, no. DirectML is a Windows API. While GPU‑PV exposes D3D12 to WSL, mainstream ML frameworks do not ship stable Linux/WSL builds targeting DirectML today. If you need DirectML, run it on Windows.
What about AMD ROCm or Intel oneAPI on WSL2?
Support exists but is more limited and hardware‑specific compared to NVIDIA CUDA. For AMD, check whether your GPU and driver combo supports ROCm on WSL. For Intel, review oneAPI/Level Zero on WSL guidance. Expect more friction than CUDA on NVIDIA.
How do I pick the right PyTorch/TensorFlow build for CUDA in WSL?
Match the framework wheel to your installed CUDA toolkit version (e.g., cu121, cu124). PyTorch provides explicit index URLs; TensorFlow typically requires a specific CUDA/cuDNN combo. If you upgrade CUDA, reinstall a compatible framework build.
Is GPU compute in Docker on WSL2 production‑ready?
Yes, especially with NVIDIA GPUs. Docker Desktop’s WSL 2 backend plus the NVIDIA Container Toolkit integration is stable. Use exact base image tags (e.g., nvidia/cuda:12.4.0‑runtime‑ubuntu22.04), keep drivers current, and test with nvidia-smi in containers.
Good luck, and enjoy a smoother, faster GPU workflow in WSL2!