WHEA_UNCORRECTABLE_ERROR: Hardware Error Records (WHEA) For Non-Experts

Contents show

Introduction

The Windows stop code WHEA_UNCORRECTABLE_ERROR (also known as BSOD stop code 0x124) is a critical Blue Screen error tied to the Windows Hardware Error Architecture (WHEA). It typically appears under load (gaming, rendering), after sleep/hibernation, during driver or firmware changes, or seemingly at random. Because it signals a fatal hardware-level error that Windows could not correct, it’s essential to address it methodically.

This guide goes beyond generic advice. You’ll get a complete, step-by-step approach—starting with simple checks and moving to advanced diagnostics—to identify whether the root cause is driver-level, firmware/BIOS, hardware instability, or Windows corruption, and how to fix it.

We’ll cover: what the error means, common triggers, safe-mode access, essential health checks (SFC/DISM/CHKDSK), minidump analysis with WinDbg or BlueScreenView, Driver Verifier, Event Viewer WHEA logs, and practical hardware steps (RAM testing, storage diagnostics, firmware updates, BIOS options, reseating, thermals, PSU).

Understanding the Error

A WHEA_UNCORRECTABLE_ERROR means that Windows received a report of a non-recoverable hardware error. In plain terms, something at a low level—CPU, memory, motherboard, PCIe device, or firmware—reported an error that could not be corrected on the fly. As a safety measure, Windows stops (BSOD) to prevent data corruption.

WHEA is the framework Windows uses to collect and log Hardware Error Records from sources like the Machine Check Architecture (MCA) in modern CPUs, PCIe Advanced Error Reporting (AER), and other error sources. When you see this BSOD, Windows usually creates a minidump and logs a WHEA-Logger event in Event Viewer.

Typical scenarios that trigger WHEA_UNCORRECTABLE_ERROR:

Heavy CPU/GPU load: gaming, video encoding, stress tests
Power/thermal issues: overheating, inadequate power supply, transient power dips
Driver or firmware changes: new graphics/SSD drivers, BIOS/UEFI updates, Windows updates
Overclocking or aggressive BIOS settings: CPU multiplier, GPU OC, XMP/EXPO memory profiles, PBO (AMD), undervolting
Faulty/unstable hardware: RAM errors, faulty CPU core, failing SSD/NVMe, flaky PCIe device or riser cable
Storage or file system corruption
Rarely, malware causing instability indirectly (e.g., driver tampering)

Common Causes

Skim these likely causes first:

Overclocking or unstable BIOS settings
- CPU/GPU overclocks, XMP/EXPO memory profiles, PBO/CO (AMD), undervolting, unusual C-state or power settings
Driver problems
- GPU, storage (NVMe/RST), chipset, network, audio; outdated or buggy third-party drivers
Firmware/BIOS issues
- Outdated motherboard BIOS/UEFI, SSD firmware bugs, Intel/AMD microcode fixes pending
Memory (RAM) faults or misconfiguration
- Bad DIMM, mismatched kits, unstable memory timings/voltage
Storage problems
- Bad sectors, failing HDD/SSD/NVMe, corrupt file system
Thermals and power delivery
- Overheating CPU/GPU/VRM, dust buildup, inadequate or failing PSU
PCIe device errors
- Faulty GPU, Wi-Fi/BT card, capture card, riser cable or loose seating
Windows corruption
- Damaged system files, incomplete updates
Rare/Indirect causes
- Malware or security software drivers; virtualization/Hyper-V conflicts

Preliminary Checks

Before deep dives, make sure you can boot and protect your data.

Boot into Safe Mode

Use Safe Mode to load minimal drivers and see if the issue persists.

Windows 10/11:
1. Hold Shift and click Restart from the Start menu.
2. Choose Troubleshoot > Advanced options > Startup Settings > Restart.
3. Press 4 (or F4) for Enable Safe Mode or 5 (F5) for Safe Mode with Networking.

If you can only reach the BSOD loop, interrupt boot 3x (power off during Windows logo) to trigger Automatic Repair, then navigate to the same Safe Mode options.

Back Up Important Data

If the system is unstable, back up immediately:

Copy documents/pictures to an external drive or cloud.
Use File History or a third-party backup tool in Safe Mode if necessary.

Run Basic Health Checks

Open an elevated Command Prompt (Run as administrator):

System File Checker:

sfc /scannow

Deployment Imaging (DISM) to repair component store:

dism /online /cleanup-image /restorehealth

Check Disk (schedule if C: is in use):

chkdsk C: /f /r

If prompted to schedule at next restart, press Y and reboot. These steps can repair file system and Windows image issues that might exacerbate WHEA errors.

Step-by-Step Troubleshooting

Work through these from easiest to most advanced. Test system stability between steps (e.g., normal use or a light stress like running a game for 15–30 minutes).

Revert Overclocks and Aggressive BIOS Settings

In BIOS/UEFI, choose Load Optimized Defaults (or similar).
Disable CPU/GPU overclocks, set XMP/EXPO to Disabled (run JEDEC default first).
Disable PBO/Curve Optimizer (AMD) or any undervolt.
Save and reboot. Many WHEA errors vanish when the system is back at stock settings.

Update or Roll Back Problem Drivers

Prioritize GPU, chipset, storage (NVMe/RST), network, audio.
Download directly from OEM:
- GPU: NVIDIA/AMD/Intel
- Chipset: AMD or Intel chipset driver (from your motherboard or OEM laptop page)
- Storage: Samsung NVMe driver or Intel RST if applicable
If the error started after a driver update, use Device Manager > Properties > Driver > Roll Back Driver (if available).
Avoid “driver updater” apps. Stick to vendor sites or Windows Update for optional drivers cautiously.

Install BIOS/UEFI and SSD Firmware Updates

Check your motherboard or laptop OEM support page for BIOS/UEFI updates. Read the changelog; microcode or stability fixes often address WHEA/MCE.
Update SSD/NVMe firmware via vendor tools (Samsung Magician, Crucial Storage Executive, Intel MAS).
Important: Ensure stable power, do not interrupt firmware updates.

Windows Update and Known Issues

Install pending Windows Updates (Settings > Windows Update).
If WHEA began after a recent update, consider Uninstall updates (Advanced Options) or System Restore to a prior point. Test stability.

Check Thermals and Power

Monitor temperatures with HWiNFO64 or HWMonitor.
- CPU temps ideally under ~85–90°C under sustained load.
- GPU under its vendor thermal limits; watch for VRM/hotspot spikes.
Clean dust, ensure case airflow, verify fans/pumps, re-seat heatsinks if recently changed.
For desktop PCs: test a different power outlet/strip; if the PSU is marginal or old, consider testing with a known-good PSU.

Memory Diagnostics

Quick test: Windows Memory Diagnostic:
- Press Win+R, run: mdsched.exe, choose Restart now and check for problems.
Thorough test: MemTest86 (USB boot), minimum 4 passes.
- If errors: test one DIMM at a time and each slot. Replace the faulty module/slot.
- If errors only appear with XMP/EXPO, keep it off or tune memory voltage/timings conservatively.

Storage Health and File System Integrity

Review SMART data using CrystalDiskInfo or vendor tools. Look for reallocated sectors, pending sectors, media errors, high error counts.
Manufacturer diagnostics: run extended tests (e.g., SeaTools for HDDs, vendor NVMe diagnostic/SMART self-tests).
If storage is failing or shows growing bad sectors, back up and replace.

Reseat and Inspect Hardware (Desktop)

Power down, unplug, hold power button 10 seconds to discharge.
Reseat RAM, GPU, M.2 SSD, and all power/data cables.
Remove PCIe risers/adapters temporarily. Test GPU in a different slot if possible.
Inspect for bent pins (LGA socket), debris in slots, or damaged cables.

Minidump Analysis (Identify Faulty Drivers/Modules)

Ensure minidumps are enabled:
- Control Panel > System > Advanced system settings > Advanced > Startup and Recovery > Settings:
  - Write debugging information: Small memory dump (256 KB)
  - Dump file: %SystemRoot%\Minidump
- Ensure paging file is enabled on the system drive (Automatic recommended).
Find dumps: C:\Windows\Minidump
- Copy the latest .dmp to the Desktop to work with it.

Option A: Use BlueScreenView (easier)

Download NirSoft BlueScreenView.
Open the dump and look at the Caused By Driver and stack.
If it points to a third-party driver (e.g., nvlddmkm.sys, amdkmdag.sys, iaStorAC.sys), update or roll it back.
If it shows hal.dll or ntoskrnl.exe, that’s a symptom, not the root cause—continue with hardware/firmware checks.

Option B: Use WinDbg (Preview) from Microsoft Store (advanced)

Open the dump in WinDbg (Run as admin).
Run:

!analyze -v

Note any “Probably caused by” module and WHEA error record references.
If an error record is shown, you can inspect it (the specific address varies). Look for indications like Machine Check Exception, Cache Hierarchy Error, Bus/Interconnect Error, or PCIe AER hints.
Repeated MCEs under CPU load often indicate CPU/motherboard/VRM/BIOS issues; PCIe-related records hint at GPU/slot/cable issues.

Driver Verifier (Pinpoint Misbehaving Drivers)

Warning: Driver Verifier can cause additional BSODs while testing. Use it on non-critical systems or when you can tolerate more crashes to isolate the driver.
Enable (elevated Command Prompt):

verifier /standard /all

Reboot and use the system; when a BSOD occurs, analyze the new minidump. It may now implicate a specific third-party driver.
To disable:

verifier /reset

If you cannot boot, enter Safe Mode and run the reset command there.

Event Viewer: WHEA-Logger Details

Open Event Viewer > Windows Logs > System.
Filter Current Log for source: WHEA-Logger.
Look for Event ID 18 (fatal), 17/19 (corrected) messages. Details may include CPU core, memory bank, or PCIe device information.
If the errors consistently mention a specific PCIe Bus/Device/Function, focus on that device (often the GPU or NVMe drive).

System Restore or In-Place Repair

If issues started recently, try System Restore to a point before the problem.
If corruption persists, perform an In-Place Repair (Windows 10/11):
- Download the latest ISO from Microsoft, mount it, run setup.exe, choose Keep personal files and apps.
- This refreshes Windows system files without wiping apps/data.

Clean Install (Last Resort)

Back up everything.
Create installation media (Media Creation Tool), wipe the OS drive, install fresh, then install drivers slowly and test between installs to catch the trigger early.

Advanced Diagnostics

Use Driver Verifier Strategically

Prefer targeting non-Microsoft drivers only (Custom settings > Select driver names > exclude Microsoft).
Monitor for crashes and minidumps pointing to a particular .sys file.
Keep it enabled for a few hours of normal use; disable if your system becomes unbootable (Safe Mode > verifier /reset).

Deep-Dive WinDbg on WHEA Records

In WinDbg:
- Start with:

!analyze -v

If you see a WHEA error record address in the analysis output, note the textual summary. It often hints whether the error came from CPU (MCA), memory controller, or PCIe/AER.
Repeated “cache hierarchy” or “internal parity” CPU errors under load: suspect CPU cooling, motherboard VRM/power delivery, BIOS microcode, or a marginal CPU.
Repeated PCIe Bus Errors: suspect GPU/NVMe/riser, slot integrity, or PSU rails under GPU load.

Hardware Stress Tests (With Caution)

CPU: OCCT, Prime95 (Small FFTs)—watch temps closely; stop if temps exceed safe limits.
GPU: Unigine Superposition, 3DMark, or OCCT GPU; observe for artifacting or instant BSOD.
Memory: MemTest86 extended run overnight if intermittent errors suspected.
Storage: Vendor long tests; consider copying large files to stress controllers.

BIOS/UEFI Power and Compatibility Settings

If PCIe errors persist, try:
- Set PCIe Link Speed to Gen3 (temporary diagnostic for marginal links/devices).
- Disable experimental features like Resizable BAR only for testing (re-enable later).
Don’t disable essential protections (e.g., Secure Boot) permanently; only for troubleshooting if needed.
If you previously enabled Memory Context Restore or aggressive power savings, test with defaults.

Physical Inspection Beyond Reseating

Check for bulging or leaking capacitors on older motherboards/PSUs.
Inspect PCIe connectors, 8-pin/6-pin GPU power, and EPS CPU power connectors for heat damage or looseness.
For laptops: ensure vents are clear, consider repasting if out of warranty and experienced.

When to Seek Professional Help

Consider pro-level diagnostics or hardware replacement when:

MemTest86 shows errors that persist with multiple DIMMs/slots.
WHEA-Logger consistently flags the same CPU core/bank despite reverting to stock and updating BIOS—possible CPU or motherboard fault.
Storage SMART reports escalating errors or vendor tests fail—replace the drive.
BSODs persist after a clean install with only essential drivers.
You suspect an inadequate/failing PSU but don’t have a spare to test.
Physical damage, liquid exposure, or recurring thermal throttling/overheating is observed.

A qualified technician can run bench tests with known-good PSU, RAM, and CPU/motherboard to isolate the culprit faster.

Prevention Tips

Keep BIOS/UEFI and critical firmware (SSD/NVMe) up to date.
Update chipset and GPU drivers from vendor sources. Avoid generic driver updaters.
Practice driver hygiene: change one thing at a time, test stability, and keep known good versions on hand.
Avoid aggressive overclocking/undervolting on daily-use systems. If you must, stress-test thoroughly and keep headroom.
Maintain good thermals: clean dust, ensure proper airflow, monitor temps under load.
Use a quality PSU sized appropriately for your GPU/CPU.
Perform regular backups so recovery is painless if you must reinstall.
Keep Windows Updated, but be ready to roll back a problematic update if issues appear.

Conclusion

The WHEA_UNCORRECTABLE_ERROR (stop code 0x124) indicates a hardware-level failure that Windows can’t self-correct. While intimidating, most cases can be solved with a calm, structured approach:

Start with Safe Mode, backups, and SFC/DISM/CHKDSK.
Revert overclocks, install driver/firmware/BIOS updates, and verify thermals/power.
Run memory and storage diagnostics; reseat hardware if needed.
Use minidump analysis (BlueScreenView/WinDbg), Driver Verifier, and Event Viewer to pinpoint culprits.
If necessary, perform System Restore, In-Place Repair, or a clean install.
Seek professional help for persistent hardware faults.

With patience and methodical testing, most WHEA BSODs are fixable. You’ve got this.

FAQ

Can I ignore the WHEA_UNCORRECTABLE_ERROR if it only happens occasionally?

No. Even infrequent WHEA BSODs indicate an underlying stability issue (hardware, firmware, or drivers). Ignoring it risks data corruption and more frequent crashes over time. Investigate before it worsens.

Does this error always mean my hardware is failing?

Not always. While it’s a hardware-reported error, many cases stem from unstable BIOS settings (overclocking/XMP), buggy drivers, outdated firmware, or thermal/power issues—all fixable without replacing parts. True hardware faults do occur and can be confirmed via diagnostics.

What does it mean if WinDbg shows hal.dll or ntoskrnl.exe?

It’s common to see hal.dll or ntoskrnl.exe in the stack for WHEA BSODs. They’re part of Windows internals and not the root cause. Focus on the WHEA error record, Event Viewer (WHEA-Logger) entries, and any third-party driver implicated.

How do I get Windows to create minidump files?

Enable Small memory dump (256 KB) in Startup and Recovery settings and ensure the page file is enabled on the system drive. Dumps will appear under C:\Windows\Minidump after the next crash. Use BlueScreenView or WinDbg to analyze them.

Is malware a likely cause of WHEA_UNCORRECTABLE_ERROR?

It’s uncommon. Malware rarely triggers pure hardware-machine-check errors. However, malicious or poorly written kernel drivers can destabilize the system. It’s wise to run a reputable malware scan, but prioritize driver/firmware/thermal/hardware checks.

WHEA_UNCORRECTABLE_ERROR: Hardware Error Records (WHEA) for Non-Experts

Introduction

Understanding the Error

Common Causes