BSOD Codes

THREAD_STUCK_IN_DEVICE_DRIVER: GPU Timeout Diagnosis Beyond the Basics


Introduction

The THREAD_STUCK_IN_DEVICE_DRIVER Blue Screen of Death (BSOD) — sometimes shown as “THREAD STUCK IN DEVICE DRIVER” with bug check code 0x000000EA — typically appears when a graphics driver thread gets stuck in an infinite loop, waiting for the GPU to respond. On modern Windows 10/11 systems, this can look very similar to a GPU timeout or TDR (Timeout Detection and Recovery) issue. Whether it’s a one-off crash or a recurring failure, this error is critical to fix because it points to potential instability in your graphics stack, firmware, or hardware.

This guide goes beyond generic advice. You’ll get a complete, step-by-step approach to diagnose and resolve THREAD_STUCK_IN_DEVICE_DRIVER (0xEA) using proven methods, including minidump analysis, Driver Verifier, firmware/BIOS strategies, and performance/power tuning that often gets overlooked. If you want the most thorough playbook for this specific stop code, you’re in the right place.

Understanding the Error

The 0xEA bug check means a kernel-mode driver (most commonly the GPU driver) submitted work to the GPU that never completed, leaving a driver thread stuck. Windows expects GPU operations to complete within a reasonable timeframe, and when they don’t, it can escalate from a TDR reset to a BSOD.

In plain language: the graphics driver (e.g., nvlddmkm.sys, igdkmd64.sys, amdkmdag.sys) waited on the GPU and never got an answer. The cause can be software (driver bug, OS corruption), firmware (GPU VBIOS, motherboard BIOS), settings (overclocks, power savings), or hardware (failing GPU, RAM, PSU).

Common scenarios that trigger THREAD_STUCK_IN_DEVICE_DRIVER:

  • Immediately after updating GPU drivers or Windows updates
  • During gaming, 3D rendering, video editing, or hardware-accelerated browsing
  • On resume from sleep/hibernate
  • After overclocking GPU/VRAM/CPU or enabling features like HAGS (Hardware-Accelerated GPU Scheduling)
  • With thermals or power issues (overheating, underpowered PSU, loose PCIe power cables)
  • With certain BIOS/UEFI settings (CSM, Above 4G Decoding/Re-Size BAR) or outdated firmware
  • When system files are corrupted or malware interferes with drivers

Common Causes

Below are the most frequent culprits behind THREAD_STUCK_IN_DEVICE_DRIVER (0xEA) and related GPU timeout symptoms:

  • Corrupted or buggy GPU drivers (NVIDIA, AMD, Intel)
  • Overclocking/undervolting of GPU, VRAM, or CPU; unstable XMP/EXPO memory profiles
  • Overheating or dust buildup; failing fans or poor airflow
  • Power delivery problems: inadequate PSU, bad power cables, unstable power plan
  • Windows updates or feature changes (e.g., HAGS, MPO, DirectX upgrades) causing conflicts
  • BIOS/UEFI misconfiguration or outdated firmware; GPU VBIOS issues (rare but possible)
  • Faulty RAM/VRAM, or storage corruption impacting driver files
  • Malware or unstable third-party overlays and capture tools (e.g., OSD, injectors)
  • External device/monitor problems (adapter/driver conflicts, MST daisy-chain)
  • Third-party kernel drivers (RGB, monitoring, USB, or anti-cheat) conflicting at low level

Skimmable view:

  • Driver issues: GPU driver corruption, bad updates, leftover files from previous GPUs
  • Hardware faults: GPU/VRAM failure, RAM errors, PSU instability, overheating
  • Firmware/BIOS: Outdated BIOS, risky settings (CSM, Re-Size BAR), buggy VBIOS
  • OS problems: System file corruption, broken Windows components, bad updates
  • Software conflicts: Overlays, screen recorders, virtualization, malware
  • Power/thermal: Aggressive power savings, Link State PM, hot case environment

Preliminary Checks

Before deep troubleshooting, complete these must-do checks.

  • Boot into Safe Mode (if normal boot is unstable)

    • Press and hold Shift while selecting Restart -> Troubleshoot -> Advanced options -> Startup Settings -> Restart -> Press 4 or F4 for Safe Mode (or 5/F5 for Safe Mode with Networking).
    • Alternatively, from a working desktop: Run -> msconfig -> Boot tab -> check Safe boot -> Minimal -> OK -> Restart.
  • Back up important data

    • Copy critical files to external media or cloud.
    • Create a System Restore Point: Run -> SystemPropertiesProtection -> Create.
  • Run basic health checks

    • Open an elevated Command Prompt (Run as administrator) and run:

      sfc /scannow

      Then:

      DISM /online /cleanup-image /restorehealth

      For a quick disk check (no reboot):

      chkdsk /scan

      For a full fix (will schedule at reboot):

      chkdsk C: /f

See also  Fix SYSTEM_SERVICE_EXCEPTION BSOD on Windows 11/10 (Step-by-Step with WinDbg)

Step-by-Step Troubleshooting

Follow these steps in order—from simplest to most advanced. Test stability after each step.

  1. Cleanly reinstall the GPU driver (DDU method)
  • Uninstall any overlay or tuning utilities (GeForce Experience, Radeon Software, MSI Afterburner, RTSS) temporarily.
  • Boot into Safe Mode.
  • Use Display Driver Uninstaller (DDU) to remove current GPU drivers and residual files.
  • Reboot normally and install the latest WHQL driver from your GPU vendor:
    • NVIDIA: Game Ready or Studio driver
    • AMD: Adrenalin WHQL
    • Intel: Latest Arc/Iris Xe driver
  • Choose a Clean install if offered. Avoid beta drivers initially.
  1. Reset overclocks and power settings
  • Return GPU/VRAM/CPU to stock clocks/voltages. Disable undervolting for now.
  • In Windows, set Power Plan to High performance or AMD Ryzen Balanced as appropriate.
  • Disable PCI Express Link State Power Management:
    • Control Panel -> Power Options -> Change plan settings -> Change advanced power settings -> PCI Express -> Link State Power Management -> Off.
  • If you enabled HAGS (Hardware-Accelerated GPU Scheduling), toggle it:
    • Settings -> System -> Display -> Graphics -> Default graphics settings -> Hardware-accelerated GPU scheduling. Turn Off (or On) and test.
  1. Check temperatures and physical connections
  • Monitor GPU temps using HWInfo, GPU-Z, or vendor tools. Keep under load below typical GPU thermal limits (often <85°C for many cards).
  • Inspect and reseat: PCIe card firmly seated, PCIe power connectors firmly attached at both PSU and GPU ends.
  • Clean dust from heatsinks, fans, and filters. Ensure case airflow is adequate.
  1. Roll back or update Windows and firmware
  • If crashes started after a Windows Update, try rolling back that update:
    • Settings -> Windows Update -> Update history -> Uninstall updates.
  • Update motherboard BIOS/UEFI to the latest stable version from the vendor.
  • Update chipset drivers (Intel/AMD) and Intel ME/AMD PSP firmware if applicable.
  • Check for GPU VBIOS updates only from official sources and apply cautiously.
  1. Toggle advanced graphics features that commonly trigger timeouts
  • Disable/Enable MPO (Multiplane Overlay) — known to cause flicker/timeouts on some systems:
    • Registry (Run regedit):
      • Create DWORD 32-bit under:
        HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Dwm
        Name: OverlayTestMode
        Value: 5 to disable MPO; delete value to re-enable.
  • Set Hardware-accelerated GPU scheduling Off (or On) and retest.
  • For Intel Xe/Arc, update to latest driver, and test with Resizable BAR On/Off via BIOS.
  1. Memory and storage integrity tests
  • Run Windows Memory Diagnostic:
    • Run -> mdsched.exe -> Restart now and check for problems.
  • For more rigorous testing, run MemTest86 from USB for multiple passes (ideally overnight).
  • Check drive health:
    • Use vendor tools (Samsung Magician, WD Dashboard, Crucial Storage Executive) to verify SMART and update SSD firmware.
    • Ensure system drive has at least 15–20% free space.
  1. Configure and read minidumps
  • Ensure minidumps are enabled:

    • Run -> sysdm.cpl -> Advanced -> Startup and Recovery (Settings) -> Write debugging information: Small memory dump (256 KB); Minidump directory: C:\Windows\Minidump
    • Uncheck “Automatically restart” temporarily to read the BSOD.
  • After the next BSOD, examine minidumps:

    • Use BlueScreenView or WhoCrashed for quick analysis.

    • Or use WinDbg (Preview) from the Microsoft Store:

      • Open dump file from C:\Windows\Minidump

      • Run:

        !analyze -v

        Look for the “Probably caused by” driver, such as:

        • nvlddmkm.sys (NVIDIA)
        • amdkmdag.sys / atikmpag.sys (AMD)
        • igdkmd64.sys (Intel)
      • Get module info:

        lmvm nvlddmkm

        Check timestamps and versions. If the flagged driver is 3rd party (e.g., an RGB or capture driver), target that first.

  1. Driver Verifier (advanced; use with caution)
  • Driver Verifier stresses drivers to expose faults; it can cause additional BSODs during testing.

  • Start Verifier:

    • Run -> verifier
    • Choose Create standard settings -> Automatically select unsigned drivers and/or Automatically select drivers built for older versions of Windows. Avoid verifying Microsoft drivers initially.
    • Alternatively, select Select driver names from a list and choose suspicious non-Microsoft drivers (GPU driver, capture tools, storage filter drivers).
  • Reboot and use your PC normally to trigger the fault. If BSODs occur, check the minidumps again for the driver at fault.

  • Turn off Verifier after testing:

    • Run:

      verifier /reset

    • Reboot.

  1. Event Viewer and Reliability Monitor
  • Open Event Viewer (eventvwr.msc):
    • Windows Logs -> System
    • Filter for Critical and Error entries around the time of the crash: look for “Display driver nvlddmkm stopped responding and has recovered” (TDR), Kernel-Power 41, BugCheck, or device errors.
  • Use Reliability Monitor (perfmon /rel) to see a timeline of crashes and recently installed drivers/updates.
  1. BIOS/UEFI GPU-related settings
  • Test with the following toggles (change one at a time, retest):
    • CSM (Compatibility Support Module): disable for pure UEFI, unless older GPU requires CSM.
    • Above 4G Decoding / Re-Size BAR: toggle On/Off depending on GPU support; test both ways.
    • Primary Display: set to PEG (discrete GPU) or IGD (integrated) for testing scenarios. Try temporarily unplugging the discrete GPU cable and booting on iGPU if available.
  • Disable any CPU/RAM overclock and set XMP/EXPO to Auto during troubleshooting.
  1. Adjust TDR timeout (temporary diagnostic)
  • You can extend the GPU timeout to see if operations complete:
    • Registry (regedit):
      • Path: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\GraphicsDrivers
      • Create DWORD (32-bit) named TdrDelay and set value to 8 (seconds).
      • Optionally create TdrDdiDelay (DWORD) and set to 10.
    • Reboot and test. If this reduces crashes, a performance bottleneck or driver issue may be involved. Treat as a workaround, not a final fix.
  1. System Restore or In-place repair (no data loss)
  • If problems began recently:
    • Use System Restore to revert to a point before the crashes.
  • If system files are deeply corrupted:
    • Perform an in-place upgrade repair:
      • Download the Windows 10/11 Media Creation Tool.
      • Run setup.exe from within Windows.
      • Choose Keep personal files and apps.
    • This refreshes Windows components without wiping data.
  1. Last-resort clean installation
  • Back up all data, export licenses, and prepare installers.
  • Perform a clean install of Windows, then install the latest chipset, GPU, and device drivers fresh. Test before adding extra software.
See also  NTFS_FILE_SYSTEM BSOD: File System Integrity CHKDSK and Real-World Fixes

Advanced Diagnostics

Use these techniques to pinpoint root causes beyond the basics.

  • Driver Verifier best practices

    • Start with Standard settings and target non-Microsoft drivers.

    • If you need deeper analysis, enable additional checks (IRP Logging, Force IRQL Checking, I/O Verification) but expect more BSODs during testing.

    • Always remember to run:

      verifier /reset

      when finished.

  • WinDbg deep-dive

    • In WinDbg, after !analyze -v:
      • Identify suspect routine names in the stack (e.g., dxgkrnl.sys calls into vendor driver).
      • Check multiple dumps for recurring module names/timestamps.
      • If call stacks point to an overlay/monitoring driver (e.g., RTCore64.sys for MSI Afterburner), remove it and test.
  • Cross-check hardware stability

    • GPU stress tests: Use vendor-provided tests or tools like 3DMark Time Spy/Port Royal; watch for artifacts or driver resets.
    • CPU/RAM stress: Run Prime95 (Blend) or AIDA64 stability test for at least 1–2 hours.
    • If GPU-only tests crash but CPU/RAM do not, focus on GPU/PSU/driver path.
  • Power supply validation

    • A borderline PSU can cause GPU timeouts under load spikes.
    • Compare GPU TDP to PSU rating; ensure quality PSU with adequate 12V rail amperage.
    • Test with a known-good PSU if available.
  • External peripherals and monitors

    • Test different DisplayPort/HDMI cables.
    • Remove adapters/docks; connect directly to the GPU.
    • Turn off G-Sync/FreeSync temporarily; test different refresh rates.
  • Malware and integrity

    • Run Microsoft Defender Offline scan.
    • Uninstall questionable drivers (e.g., old controller drivers, USB filter drivers, VPN adapters).
  • Vendor-specific known issues

    • NVIDIA: nvlddmkm.sys timeouts with MPO/HAGS; try toggles, different driver branches (Studio vs Game Ready).
    • AMD: Some Adrenalin features (Enhanced Sync/Anti-Lag/RTSS interaction) can trigger TDRs; disable overlays/recording.
    • Intel: Ensure latest Arc/iGPU driver; test with Re-Size BAR on/off.

Minidump Analysis: How-To

  • Enable small memory dumps:

    • Run -> sysdm.cpl -> Advanced -> Startup and Recovery -> Settings
    • Under “Write debugging information,” select Small memory dump (256 KB).
    • Ensure C:\Windows\Minidump exists and “Automatically restart” is unchecked while testing.
  • Locate minidumps:

    • C:\Windows\Minidump*.dmp
  • Analyze with BlueScreenView:

    • It lists the drivers involved and highlights likely culprits. Look for repeated patterns (nvlddmkm.sys, amdkmdag.sys, igdkmd64.sys).
  • Analyze with WinDbg (Preview):

    • Open dump -> run:

      !analyze -v

      Then:

      lmvm nvlddmkm
      lmvm igdkmd64
      lmvm amdkmdag

    • If a third-party filter/overlay driver repeatedly appears, remove/replace it and retest.

  • Interpret results carefully:

    • “Probably caused by” is a clue, not proof; corroborate with Event Viewer and your change history.
    • Distinguish 0xEA from 0x116/0x117 (VIDEO_TDR_FAILURE/TDR_TIMEOUT). They’re related to GPU timeouts; the fix paths overlap.
See also  BUGCODE_USB_DRIVER: USB Stack Crashes with Safe Rollback Path

When to Seek Professional Help

Consider professional diagnosis or hardware replacement when:

  • You still get THREAD_STUCK_IN_DEVICE_DRIVER (0xEA) after a clean OS install and minimal driver set.
  • MemTest86 shows memory errors, or GPU stress tests consistently artifact/crash.
  • The PSU is underpowered or unstable, and swapping PSUs isn’t feasible.
  • The GPU shows visible physical damage, fan failure, or recurring high temps despite proper cooling.
  • The system only stabilizes with TdrDelay increased significantly, indicating underlying hardware slowness/failure.

A repair shop can perform component-level tests, swap parts quickly, and validate with known-good hardware.

Prevention Tips

Keep your system stable and avoid future GPU timeout BSODs:

  • Practice driver hygiene:
    • Update GPU drivers to stable branches; avoid unnecessary betas.
    • Use DDU when switching GPU brands or after severe driver corruption.
  • Maintain firmware:
    • Keep BIOS/UEFI, chipset drivers, and SSD firmware up to date.
  • Avoid aggressive overclocks without long stress testing; document changes.
  • Manage thermals:
    • Clean dust quarterly, ensure airflow, monitor temps.
  • Stable power:
    • Use a quality PSU sized for your GPU and CPU; avoid daisy-chaining PCIe cables if separate cables are recommended.
  • Sensible Windows settings:
    • If HAGS or MPO causes issues, keep them disabled.
    • Keep Windows Update on but defer major feature updates for a few weeks.
  • Regular backups and restore points so you can roll back quickly if issues arise.

Conclusion

The THREAD_STUCK_IN_DEVICE_DRIVER (0xEA) BSOD is a classic sign of a GPU driver thread stuck waiting on hardware, often overlapping with modern GPU timeout/TDR behavior. The fastest path to resolution is methodical: start with a clean GPU driver install, reset overclocks, check thermals/power, and then move to firmware, Windows integrity, and minidump analysis. Use Driver Verifier and Event Viewer to pinpoint bad actors, and don’t hesitate to adjust BIOS/UEFI settings or try a System Restore or in-place repair if needed.

Most systems can be stabilized without replacing hardware. With the right approach, you can eliminate THREAD STUCK IN DEVICE DRIVER crashes and restore a reliable, high-performance Windows experience.

FAQ

Can I ignore the THREAD_STUCK_IN_DEVICE_DRIVER BSOD if it only happened once?

A single crash may be a fluke, but it’s often an early warning. At minimum, ensure drivers and Windows are current, check temps, and scan with SFC/DISM. If it recurs, follow the full troubleshooting steps.

Does this BSOD mean my GPU is failing?

Not necessarily. Many 0xEA/timeout cases are resolved by clean driver installs, firmware updates, or disabling problem features (HAGS/MPO). Persistent crashes after a clean OS and known-good PSU can indicate hardware issues.

Should I increase the TdrDelay to fix GPU timeouts?

Increasing TdrDelay can reduce crashes by giving the GPU more time, but it’s a workaround. If extending the timeout helps, continue diagnosing drivers, thermals, power, and firmware to address the root cause.

What’s the difference between THREAD_STUCK_IN_DEVICE_DRIVER (0xEA) and VIDEO_TDR_FAILURE (0x116)?

Both relate to GPU operations not completing. 0xEA flags a driver thread stuck in an infinite loop; 0x116 indicates a TDR failure. Causes and fixes overlap: drivers, firmware, settings, and hardware stability.

Will reinstalling Windows always fix this?

A clean install removes software variables, so it often helps. However, if hardware or firmware is the root cause (e.g., failing GPU/PSU, bad RAM, outdated BIOS), the issue may persist until those are addressed.

About the author

Jonathan Dudamel

Jonathan Dudamel

I'm Jonathan Dudamel, an experienced IT specialist and network engineer passionate about all things Windows. I have deep expertise in Microsoft project management, virtualization (VMware ESXi and Hyper-V), and Microsoft’s hybrid platform. I'm also skilled with Microsoft O365, Azure ADDS, and Windows Server environments from 2003 through 2022.

My strengths include Microsoft network infrastructure, VMware platforms, CMMS, ERP systems, and server administration (2016/2022).