drivers/ras/amd

AMD Zen Server RAS: Address Translation Library and FRU Memory Poison Manager

Reliability and serviceability support for AMD's Zen-based server platforms, including the Address Translation Library that decodes memory-error reports back to physical DIMM locations and the FRU Memory Poison Manager that tracks bad memory pages across reboots. Used on EPYC servers and Instinct MI300 accelerators to handle ECC events in datacentre and HPC systems.

keep conf=0.91 deploy=low replacement=none subsystem=ras category=infrastructure
91%

recommendation

It should stay because the code is new (added in 2024), actively maintained with bug fixes landing through 2025 and 2026, and directly supports hardware AMD is still selling today, including EPYC 9004 server CPUs and Instinct MI300 AI accelerators. There is no sign of any deprecation or removal discussion upstream.

repository signals

13 files
5,695 source lines
35 commits, 5y
+5,939 / −198 lines added / removed, 5y
12 authors, 5y
monthly commits · 2021-04-21 → 2026-04-21 · 35 total · active in 13/61 months
2021 2022 2023 2024 2025 2026 2021-04: 0 commits · +0 −0 2021-05: 0 commits · +0 −0 2021-06: 0 commits · +0 −0 2021-07: 0 commits · +0 −0 2021-08: 0 commits · +0 −0 2021-09: 0 commits · +0 −0 2021-10: 0 commits · +0 −0 2021-11: 0 commits · +0 −0 2021-12: 0 commits · +0 −0 2022-01: 0 commits · +0 −0 2022-02: 0 commits · +0 −0 2022-03: 0 commits · +0 −0 2022-04: 0 commits · +0 −0 2022-05: 0 commits · +0 −0 2022-06: 0 commits · +0 −0 2022-07: 0 commits · +0 −0 2022-08: 0 commits · +0 −0 2022-09: 0 commits · +0 −0 2022-10: 0 commits · +0 −0 2022-11: 0 commits · +0 −0 2022-12: 0 commits · +0 −0 2023-01: 0 commits · +0 −0 2023-02: 0 commits · +0 −0 2023-03: 0 commits · +0 −0 2023-04: 0 commits · +0 −0 2023-05: 0 commits · +0 −0 2023-06: 0 commits · +0 −0 2023-07: 0 commits · +0 −0 2023-08: 0 commits · +0 −0 2023-09: 0 commits · +0 −0 2023-10: 0 commits · +0 −0 2023-11: 0 commits · +0 −0 2023-12: 0 commits · +0 −0 2024-01: 4 commits · +3,795 −8 2024-02: 3 commits · +866 −3 2024-03: 5 commits · +243 −21 2024-04: 0 commits · +0 −0 2024-05: 1 commit · +1 −1 2024-06: 9 commits · +876 −122 2024-07: 2 commits · +79 −1 2024-08: 0 commits · +0 −0 2024-09: 0 commits · +0 −0 2024-10: 1 commit · +6 −2 2024-11: 0 commits · +0 −0 2024-12: 1 commit · +2 −0 2025-01: 0 commits · +0 −0 2025-02: 1 commit · +11 −3 2025-03: 0 commits · +0 −0 2025-04: 3 commits · +20 −3 2025-05: 0 commits · +0 −0 2025-06: 0 commits · +0 −0 2025-07: 0 commits · +0 −0 2025-08: 0 commits · +0 −0 2025-09: 0 commits · +0 −0 2025-10: 2 commits · +33 −16 2025-11: 1 commit · +5 −16 2025-12: 0 commits · +0 −0 2026-01: 0 commits · +0 −0 2026-02: 2 commits · +2 −2 2026-03: 0 commits · +0 −0 2026-04: 0 commits · +0 −0

sources

  1. kernel.org

    Official kernel docs describe AMD Address Translation Library (ATL) as the address-translation component for Zen-based systems' memory-error handling.

  2. spinics.net

    ATL received an upstream maintenance patch on 2026-03-07 ('Only load ATL when needed'), indicating current maintenance rather than removal.

  3. spinics.net

    A 2025 patch updated both ATL and FMPM for MI300 row-address masking, showing substantive post-merge fixes in this directory.

  4. amd.com

    AMD still markets the Instinct MI300 series accelerators, which align with the MI300-specific code paths present in this directory.

  5. amd.com

    AMD still markets EPYC 9004 server CPUs, matching the Zen 4 server platform scope called out by ATL documentation and Kconfig help.

codex reasoning notes (technical)

Real driver directory: local shell inspection found loadable modules in drivers/ras/amd/fmpm.c and drivers/ras/amd/atl/core.c plus Kconfig entries and MAINTAINERS coverage. URLs were obtained via web search: kernel.org RAS docs (search_query) for scope, spinics msg24046 (search_query) for 2026 ATL maintenance activity, spinics msg5628570 (search_query) for 2025 ATL/FMPM bug-fix traffic, and AMD product pages (search_query) for current EPYC 9004 and Instinct MI300 market availability. No removal/deprecation thread was found; this code is new (2024+), maintained, and tied to currently sold server/HPC platforms, so removal/deprecation is not indicated. Deployment is low because it is limited to ECC/RAS-capable AMD server and accelerator systems rather than broad commodity hardware.