Description
Compute Express Link is a cache coherent fabric that has been gaining momentum in the industry. Whilst the ecosystem is still catching up with CXL 3.0 and earlier features, CXL 3.1 launched just after the 2023 CXL uconf, bringing yet more challenges for the community (temporal sharing, advanced RAS features). There also has been controversy and confusion in the Linux kernel community about the state and future of CXL, regarding its usage and integration into, for example, the core memory management subsystem. Many concerns have been put to rest through proper clarification and setting of expectations.
The Compute Express Link microconference focuses on how to evolve the Linux CXL kernel driver and userspace components for support of the CXL specifications. The microconference provides a place to open the discussion, incorporate more perspectives, and grow the CXL community with a goal that the CXL Linux plumbing serves the needs of the CXL ecosystem while balancing the needs of the Linux project. Specifically, this microconference welcomes submissions detailing industry and academia use cases in order to develop usage model scenarios. Finally, it will be a good opportunity to have existing upstream CXL developers available in a forum to discuss current CXL support and to communicate areas that need additional involvement.
The earlier editions of the microconference resolved a number of open questions (CXL 1.1 RAS now upstream), and introduced new topics we expect to revisit this year (e.g. dynamic capacity / shared memory and error handling)
Suggested topics:
Ecosystem & Architectural review
Dynamic Capacity Devices - Status and next steps
Inter host shared capacity
Fabric Management - What should Linux enable (blast radius concerns)? Open source solutions?
Error handling and RAS (including OCP RAS API)
Testing and emulation
Security (ie: IDE/SPDM)
Managing vendor specificity
Virtualization of dynamic capacity.
Type 2 accelerator support - CXL 3.0+ approaches.
Coherence management of type2/3 memory (back-invalidation)
Peer2Peer (ie: Unordered IO)
Reliability, availability and serviceability (ie: Advanced Error Reporting, Isolation, Maintenance).
Hotplug (QoS throttling, policies, daxctl)
Hot remove
Documentation
Memory tiering topics that can relate to cxl (out of scope of MM/performance MCs)
Industry and academia use cases
A brief hello from the CXL uconf organizers.
The usual collection of small administrative elements.
CXL - Dynamic Capacity Devices (DCD)
CXL introduced Dynamic capacity device support in CXL 3.0 and 3.1. The feature
promises a lightweight memory hotplug feature which was designed to optimize
memory usage within data centers. The details of use cases for DCDs are still
playing out. Generally the use case is to reduce the cost of unused memory by...
Compute Express Link (CXL) is a low-latency, high-bandwidth, heterogeneous, and cache-coherent interconnect between a CPU or a device and other accelerator or memory devices. With CXL Type 3 Devices the memory is located on a device but can be used as system memory, the same as standard memory. This allows a flexible way to assign and manage system memory using memory devices.
As various...
Beyond simple error reporting, the CXL specification defines many features related to RAS. Examples being Memory Patrol Scrub and ECS control + features such as PPR directed at the runtime repair of memory. Whilst part of our motivation for looking at this area was to support the CXL features, moves such as OCP RAS API suggest there will be future opportunity for reuse.
There is...
This talk will present 'libcxlmi', a CXL Management Interface utility library. It provides type definitions for CXL specification structures, enumerations and helper functions to construct, send and decode CCI commands and payloads over both in-band (Linux) and out-of-band (OoB) link, typically MCTP-based CCIs over I2C or VDM.
The objective of this presentation is both to cover the design...
Benchmarking and efficiency estimation of CXL infrastructure is a crucial task for the whole CXL ecosystem. Which tool(s) can be used and how can we execute such benchmarking? Potentially, a benchmarking tool could simulate the target use-case (for example, huge relational database, in-memory database, huge social network, ML model training, Virtual Machine use-case, HPC use-case, and so on)....