12–14 Sept 2022
Europe/Dublin timezone

Linux plumbing of CXL error reporting

14 Sept 2022, 17:35
25m
"Herbert" (Clayton Hotel on Burlington Road)

"Herbert"

Clayton Hotel on Burlington Road

262
Compute Express Link MC Compute Express Link MC

Speakers

Mr Robert Richter (Advanced Micro Devices)Mr Yazen Ghannam (Advanced Micro Devices)

Description

With the introduction of CXL Type 3 Memory Devices a system may
contain multiple different memory controllers to support and provide
volatile memory. To add support of all those, generic and
architectural specific implementations across different subsystems
(CXL, PCI, ACPI, MCA, EDAC, etc.) are involved. CXL introduces
following errors:

  • CXL link and protocol errors and

  • CXL type 3 memory device errors.

So now a broad variety of error types and sources must be handled
additionally compared to what typically exists for mem controllers or
pci devices.

The CXL kernel interface also provides an ioctl user interface to
retrieve error events. And, there are a couple of new tools to
control all that.

All this raises new questions on how to share, rework and plumb
existing subsystems to make CXL work. And, should new APIs and
tools for collecting errors be introduced what would be more
suitable?

Topics of the discussion include:

  • Should there be the same look and feel as with dram mem controllers
    when reporting cxl memory errors?

  • Should errors be handled in userspace? How can access be
    serialized, how to support multiple users? Which tools should be
    the focus on?

  • How can we reuse PCIe AER and RCEC implementations of the pci
    stack for cxl? Should we join pci and cxl, in particular
    maintaining a struct pci_dev for each cxl dev?

  • What are the challenges in supporting multiple mem controllers
    in the system?

  • Is there a sufficient need to integrate cxl error reporting into
    the EDAC subsystem?

A very brief overview of CXL error reporting and involved Linux
subsystems will be presented to further discuss and find answers
for above questions.

I agree to abide by the anti-harassment policy Yes

Primary authors

Mr Robert Richter (Advanced Micro Devices) Mr Yazen Ghannam (Advanced Micro Devices)

Presentation materials

Diamond Sponsor

Platinum Sponsors





Gold Sponsors




Silver Sponsors





Speaker Gift Sponsor

Catchbox Sponsor

Video Recording Sponsor

Livestream Sponsor

T-Shirt Sponsor

Conference Services Provided by