Introduction
If your PC suddenly crashes with the stop code VIDEO_TDR_FAILURE and mentions files like nvlddmkm.sys (NVIDIA) or atikmpag.sys/atikmdag.sys (AMD), you’re dealing with a graphics driver timeout that Windows couldn’t recover from. This Blue Screen of Death (BSOD) often appears under GPU load—gaming, 3D rendering, video playback, or even just waking from sleep—and it’s critical to fix because repeated crashes can corrupt data and render your system unstable. This guide goes beyond generic advice with step-by-step, stable fixes that actually work, including clean driver installs, firmware updates, diagnostics, and proven configuration changes to stop VIDEO_TDR_FAILURE on Windows 10 and Windows 11.
H2: Understanding the Error
The stop code VIDEO_TDR_FAILURE (often 0x00000116; related 0x00000117 is VIDEO_TDR_TIMEOUT_DETECTED) indicates that Windows’ Timeout Detection and Recovery (TDR) system found the GPU was not responding within a reasonable time (default ~2 seconds). Normally, Windows tries to reset the graphics driver. If the reset fails, you get a BSOD.
When you see filenames like nvlddmkm.sys (NVIDIA) or atikmpag.sys/atikmdag.sys (AMD), Windows is pointing at the display driver module that timed out or crashed. On Intel iGPU, you might see igdkmd64.sys. The root cause can be software, firmware, or hardware.
Common scenarios that trigger VIDEO_TDR_FAILURE:
- Heavy GPU load: modern games, high-refresh displays, VR, or GPU compute.
- Rapid power-state changes: alt-tabbing, switching displays, waking from sleep/hibernate.
- Unstable GPU/VRAM overclock or undervolt, or aggressive CPU/RAM overclocks (XMP/EXPO).
- Old or buggy graphics/chipset drivers, or bad driver installs.
- BIOS/UEFI issues, PCIe power management conflicts, resizable BAR/Above 4G decoding misconfigurations.
- Hardware faults: overheating, inadequate PSU, failing GPU or RAM, riser/extender issues.
- Disk or OS corruption, malware, or problematic Windows updates.
H2: Common Causes
Most likely root causes of VIDEO_TDR_FAILURE (nvlddmkm.sys/atikmpag.sys):
- Graphics driver issues
- Corrupt or mismatched NVIDIA/AMD/Intel drivers
- Bad upgrade path; leftover files from previous GPU brand or older version
- Beta or unstable driver build
- System firmware and platform
- Outdated BIOS/UEFI
- PCIe settings like Resizable BAR/Above 4G Decoding, CSM, Fast Boot conflicts
- Overclocking and power
- Unstable GPU/VRAM overclock/undervolt
- Aggressive CPU or RAM (XMP/EXPO) overclocks
- Low-quality or undersized PSU
- Problematic power plans or PCIe Link State Power Management
- Thermal and physical
- Overheating GPU/VRAM/VRM/CPU
- Dust, failing fans, dried thermal paste
- Loose connections, bad PCIe risers/adapters
- Windows and storage
- Corrupted OS files, disk errors
- Problematic or partially applied Windows Updates
- Memory and disk
- Faulty RAM (intermittent errors)
- Failing SSD/HDD (SMART warnings)
- Software conflicts and malware
- Overlay tools, capture software, third-party overclock utilities
- Malware interfering with drivers
Tip: While the BSOD names the GPU driver (nvlddmkm.sys/atikmpag.sys), this is often a symptom. The real cause might be RAM instability, power, or firmware.
H2: Preliminary Checks
Before deep troubleshooting, do these quick, safe checks.
Safe Mode boot
- If you crash on boot or under load, try Safe Mode (loads minimal drivers).
- Method 1 (from Windows): Settings > Update & Security > Recovery > Advanced startup > Restart now. Then Troubleshoot > Advanced options > Startup Settings > Restart > Press 4 (Enable Safe Mode) or 5 (Safe Mode with Networking).
- Method 2 (Shift + Restart): Hold Shift while clicking Restart from the Start menu, then navigate as above.
- Method 3 (msconfig): Win+R > type msconfig > Boot tab > check Safe boot > Network > OK > Restart.
Back up important data
- Use File History, OneDrive, or copy critical files to an external drive/cloud before major changes.
- Create a System Restore Point when possible.
Run basic health checks
-
System file and component store repair:
-
Open an elevated Command Prompt (Run as administrator) and run:
sfc /scannow
DISM /Online /Cleanup-Image /RestoreHealth
-
-
Quick disk check (online scan):
chkdsk C: /scan
-
SMART health (PowerShell as admin):
Get-PhysicalDisk | Get-StorageReliabilityCounter
Look for Reallocated Sectors, Wear, or Media Errors. Use vendor tools (e.g., Samsung Magician, WD Dashboard) for detailed checks.
-
Visual check
- Ensure GPU and power cables are firmly seated.
- Clean dust; confirm fans spin; monitor temps with HWInfo or similar.
H2: Step-by-Step Troubleshooting
Work through these steps in order. Test after each step to see if the problem is resolved.
- Revert all overclocks and power tweaks
- Set BIOS to defaults (at least for CPU, RAM). Temporarily disable XMP/EXPO.
- In GPU tools (MSI Afterburner, Radeon Tuning), click Reset to stock clocks and voltages.
- Remove undervolts or custom fan curves for now.
- Why: Many VIDEO_TDR_FAILURE BSODs vanish once clocks are truly stable.
- Clean-install a known-stable graphics driver with DDU
- Download:
- For NVIDIA: latest WHQL Game Ready or Studio driver from NVIDIA. If issues started after a recent update, download the previous stable version.
- For AMD: latest Adrenalin WHQL; if issues began recently, try the prior WHQL build.
- For Intel ARC/iGPU: latest Graphics driver from Intel.
- Block Windows from auto-replacing drivers temporarily:
- Control Panel > System > Advanced system settings > Hardware tab > Device Installation Settings > Select “No (your device might not work as expected).”
- Use DDU (Display Driver Uninstaller) in Safe Mode:
- Disconnect from the internet (to prevent auto driver pull).
- Boot to Safe Mode.
- Run DDU, select GPU vendor, choose Clean and do NOT restart for old vendor remnants (if switching brands), then Clean and restart for your current vendor.
- Install your chosen driver:
- NVIDIA: choose Custom > check Perform a clean installation. Consider installing Driver only + PhysX (skip GeForce Experience) for testing.
- AMD: choose Factory Reset during installer if offered; or use driver-only install via Device Manager or minimal setup package.
- Intel: standard install.
- Optional stability choices:
- NVIDIA: try Studio Driver for maximum stability if you’re not gaming-focused.
- AMD: prefer WHQL builds; avoid Optional/Beta when troubleshooting.
- Reconnect internet and reboot once more.
- Update chipset, storage, and platform drivers
- Install the latest chipset drivers (Intel/AMD) from your motherboard/laptop vendor.
- Update Intel ME/AMT or AMD PSP firmware packages if provided.
- Update SATA/NVMe drivers (vendor-provided where applicable).
- Why: GPU timeouts can originate from PCIe/interrupt/power management issues resolved by platform drivers.
- Update Windows fully
- Settings > Windows Update > Check for updates. Install all pending quality updates, .NET, and hardware updates (except GPU if you’re controlling versions).
- Reboot and retest.
- Adjust power and PCIe settings for stability
- Power plan:
- Desktops: try Balanced first; if issues persist, test High performance or AMD Ryzen High Performance.
- Advanced power settings (Control Panel > Power Options > Change plan settings > Advanced):
- PCI Express > Link State Power Management: set to Off.
- Processor power management > Minimum processor state: try 5–10%.
- NVIDIA Control Panel:
- Manage 3D settings > Power management mode: Prefer maximum performance for affected apps.
- AMD Adrenalin:
- Disable Chill, Radeon Anti-Lag, and overlays while testing.
- Monitor thermals and power delivery
- Use HWInfo to watch GPU temperature, hotspot, VRAM temps, and power limit throttling under load.
- Clean dust, ensure good airflow, verify fans curve is adequate.
- If using PCIe riser/extender, test the GPU directly in the slot.
- Update BIOS/UEFI and review key settings
- Update to the latest BIOS/UEFI from your system/motherboard vendor.
- Review:
- Resizable BAR/Above 4G Decoding: Leave default/vendor-recommended; if instability started after enabling, test with it off.
- Fast Boot: Disable for testing.
- CSM: Prefer UEFI-only where supported.
- Memory profile (XMP/EXPO): Keep disabled until stability is confirmed, then re-enable/test.
- Test memory (RAM) stability
- Run Windows Memory Diagnostic:
- Win+R > mdsched.exe > Restart now and check.
- For deeper testing, use MemTest86 (USB boot) for multiple passes. Any errors = fix RAM (downclock, loosen timings, increase voltage within spec, or replace).
- Check and repair storage and system files
-
Re-run:
sfc /scannow
DISM /Online /Cleanup-Image /RestoreHealth -
Check disk file system:
chkdsk C: /f
You’ll be prompted to schedule at next reboot; accept and restart.
-
Review SSD/HDD health with vendor tools. Replace failing drives.
- Analyze minidumps to identify the culprit
-
Enable and verify minidumps:
- Win+R > type sysdm.cpl > Advanced tab > Startup and Recovery > Settings:
- Write debugging information: Small memory dump (256 KB)
- Dump file: %SystemRoot%\Minidump
- Ensure C:\Windows\Minidump exists after next crash.
- Win+R > type sysdm.cpl > Advanced tab > Startup and Recovery > Settings:
-
Use BlueScreenView or WhoCrashed for a quick view:
- Look for faulting module (e.g., nvlddmkm.sys, atikmpag.sys, dxgkrnl.sys).
-
Use WinDbg (Preview) for deeper insight:
-
Install from Microsoft Store.
-
Open a dump from C:\Windows\Minidump.
-
Run:
!analyze -v
lmvm nvlddmkm
lmvm atikmpag -
Examine “Probably caused by” and the call stack. Repeated mention of a third-party filter/overlay/driver can guide removal or updates.
-
- Remove conflicting software and overlays
- Uninstall or disable temporarily:
- Third-party overclocking tools (Afterburner, ASUS GPU Tweak, Precision X), overlays (Discord, Steam, GeForce Experience overlay, Radeon Overlay), capture/streaming tools (OBS), RGB control and monitoring utilities.
- Reboot and test. Re-add only what you need, one at a time.
- Repair or reinstall the Visual runtimes and DirectX
- Reinstall Microsoft Visual C++ Redistributables (x86 and x64 for 2012–2022).
- Update DirectX Runtime (June 2010 offline runtime for legacy games) where needed.
- Perform a System Restore or In-place repair install
- System Restore:
- rstrui.exe > choose a restore point predating the issue.
- In-place repair (Windows 10/11):
- Download the latest ISO or use the Media Creation Tool/Installation Assistant.
- Run setup from within Windows > Keep personal files and apps.
- This refreshes system files without wiping data.
- Hardware isolation tests
- Test with integrated GPU (if available): Remove the discrete GPU, connect monitor to motherboard video, uninstall dGPU drivers. If stable for a few days, the dGPU or its power path may be at fault.
- Swap components where possible:
- Try a different PSU of known quality.
- Move GPU to another PCIe slot.
- Test single RAM stick at JEDEC speeds; rotate sticks/slots.
- Try the GPU in another PC.
- Persistent artifacts, crashes under any load, or failures across systems strongly indicate GPU hardware failure—consider warranty/RMA.
H2: Advanced Diagnostics
Use these tools when the cause remains elusive.
H3: Driver Verifier (use with caution)
-
Purpose: Stresses and monitors drivers to surface faulty ones; can cause additional BSODs.
-
How to enable:
-
Run Command Prompt as admin:
verifier
-
Choose Create standard settings > Select driver names from a list.
-
Check third-party drivers related to graphics, overlays, capture, storage, and network. Avoid selecting Microsoft drivers initially.
-
Reboot and use the PC normally. If you BSOD, note the new stop code and driver named.
-
-
How to disable after testing (important):
verifier /reset
Reboot. If you get stuck in a boot loop, use Safe Mode to run the reset command.
H3: Event Viewer
- Open Event Viewer (eventvwr.msc) > Windows Logs > System.
- Filter by Display, nvlddmkm, amdkmdag, atikmpag, or Event ID 4101: “Display driver stopped responding and has recovered.”
- Check for patterns around the crash time—power events, disk errors, or service failures.
H3: GPU stress and thermal testing
- Tools: OCCT, 3DMark, Unigine Superposition/Heaven, or vendor diagnostics.
- Procedure:
- Observe temps, hotspot deltas, and power draw.
- If crashes occur quickly under these tests, suspect GPU cooling, power, or the card itself.
H3: TDR registry tuning (diagnostic, not a permanent fix)
- Increasing TDR delay can help diagnose borderline-stable systems, but it’s not recommended as a long-term solution.
- Registry path:
- HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\GraphicsDrivers
- Values (DWORD, decimal):
- TdrDelay = 8 (seconds)
- TdrDdiDelay = 8
- Steps:
- Run regedit, navigate to the key, create the values if missing, set as above, reboot.
- If stability improves only with higher delays, you still have an underlying issue (driver, power, thermals, or hardware). Remove these tweaks after fixing the root cause.
H2: When to Seek Professional Help
Consider professional service or RMA when:
- You see persistent artifacts (colored blocks, lines) even at boot/BIOS or under multiple operating systems.
- The system crashes reliably under light GPU load despite clean drivers and stock settings.
- MemTest86 or vendor diagnostics report errors you cannot resolve by reseating or replacing components.
- The GPU fails in another known-good PC, or another GPU works fine in your PC.
- Your PSU is low quality/old and measurements suggest unstable rails; replacing is advisable.
If the GPU or system is still under warranty, initiate RMA. Avoid “reflow” or oven-bake tricks; they are temporary and risky.
H2: Prevention Tips
- Practice driver hygiene
- Prefer WHQL drivers.
- Use DDU when switching GPU brands or after repeated driver-related crashes.
- Avoid auto-overclock features until stability is verified.
- Keep platform current
- Update Windows, chipset drivers, and BIOS/UEFI periodically.
- Manage thermals and power
- Clean dust quarterly, ensure proper case airflow, and renew thermal paste on older GPUs if comfortable.
- Use a quality PSU sized for your GPU/CPU with headroom.
- Consider a UPS to prevent crashes from power dips.
- Be careful with overclocks
- Increase clocks in small increments and validate with extended stress tests.
- Watch temperatures and error counters; prioritize stability over marginal performance.
- Backups and restore points
- Keep regular backups and create restore points before big driver/Windows changes.
- Limit background conflict
- Minimize overlays and monitoring tools while gaming or using GPU compute.
H2: Conclusion
The VIDEO_TDR_FAILURE BSOD with nvlddmkm.sys or atikmpag.sys is almost always fixable. Start with the basics—revert overclocks, clean-install a stable graphics driver with DDU, update chipset/BIOS, adjust power settings, and verify thermals. If issues persist, analyze minidumps, use Driver Verifier carefully, test memory and storage, and isolate hardware. With a systematic approach, you can achieve a stable, crash-free system and keep it that way with good driver hygiene and maintenance. You’ve got this.
H2: FAQ
H4: Can I ignore the VIDEO_TDR_FAILURE BSOD if it only happens sometimes?
No. Even infrequent BSODs risk data corruption and signal deeper instability. Follow the steps to stabilize your driver stack and hardware; sporadic crashes often worsen over time.
H4: Does VIDEO_TDR_FAILURE mean my GPU is failing?
Not necessarily. Many cases are caused by driver corruption, overclocks, power management, or platform drivers. Only after clean installs, stock settings, and firmware updates fail—and especially if artifacts or cross-system failures appear—should you suspect hardware.
H4: Is increasing TdrDelay a safe fix for nvlddmkm.sys/atikmpag.sys errors?
It’s a diagnostic workaround, not a real fix. Raising TdrDelay may reduce BSODs by giving the GPU more time, but if the underlying issue is instability or a bad driver, crashes can return or degrade performance. Use it temporarily and revert after addressing root causes.
H4: Which driver version is best to fix the nvlddmkm.sys BSOD?
Use a stable WHQL driver. If the issue started after a recent update, roll back to the last version that worked. For NVIDIA, consider Studio Driver for stability; for AMD, stick to WHQL Adrenalin releases. Always perform a DDU clean install.
H4: Will reinstalling Windows fix VIDEO_TDR_FAILURE?
An in-place repair or clean install can fix OS-level corruption, but it won’t repair failing hardware or poor firmware/BIOS settings. Try the steps in this guide first; if problems persist across a clean OS, suspect hardware or firmware.