Conveners
Kernel Memory Management MC
- Matthew Wilcox (Oracle)
- David Hildenbrand (Red Hat)
Description
Memory management keeps on being exciting. With a lot of activity on all different kinds of projects, some more controversial subjects that might be worth discussing this year:
- Making Transparent Huge Pages more ... transparent (toggles, policies, khugepaged, ...)
- Making (m)THP/large folios a first-class citizen in MM
- What other improvements might we see from mTHP?
- Where to use eBPF in MM, and where not
- Ongoing challenges with memdescs (e.g., allocation/freeing/walking)
- How might we make allocations guaranteed to not fail?
- Which CXL use cases do we want to support, and how far should we go?
- Challenges with hypervisor live-update, and the integration into other subsystems (MM, drivers, etc)
- guest_memfd and the interaction with other MM subsystems (hugetlb, GUP, ...)
- Making hugetlb less weird
-
Sourav Panda (Google), Suren Baghdasaryan11/12/2025, 15:00
Memory allocation profiling infrastructure provides a low-overhead
mechanism to make all kernel allocations in the system visible. This
allows for monitoring memory usage, tracking hotspots, detecting
leaks, and identifying regressions.Over the past year there were a number of suggested new features from its users, including:
Go to contribution page
- NUMA awareness
- MEMCG awareness
- Context capture... -
Gregory Price (Meta)11/12/2025, 15:15
The default global mempolicy is inclusive of all NUMA nodes - where the fallback allocation behavior is typically defined by NUMA distances. Tasks and cgroups are then expected to opt-in to more restrictive policies via set_mempolicy and cpusets interfaces.
This is the opposite of a typical isolation mechanisms - and leads to global resource (such as unmapped pagecache) having poor...
Go to contribution page -
Juan Yescas (Google), Kalesh Singh (Google)11/12/2025, 15:30
When device drivers reserve big blocks of MIGRATE_CMA pages, the underutilized MIGRATE_CMA can be used for MIGRATE_MOVABLE requests and these pages can be short-term pin for DMA, so if we require MIGRATE_CMA pages, the allocations might fail.
This topic has been discussed...
Go to contribution page -
Ackerley Tng11/12/2025, 15:45
There is active development on adding huge page support to guest_memfd to improve performance of CoCo VMs, specifically around obtaining huge pages from HugeTLB and from the normal buddy allocator in the form of Transparent Huge Pages. Huge page support relies heavily on the ability to restructure pages, to be able to track page users on a per-page basis, using struct page refcounts.
The...
Go to contribution page -
Joshua Hahn (Meta)11/12/2025, 16:00
zone_reclaim_mode was introduced in 2005 to prevent the kernel from facing the high remote access latency associated with NUMA systems of the time. With it, when the local node is full, future allocation attempts on the local node triggers local direct reclaim, instead of remote fallback allocations, even when remote nodes are free. This system-wide policy is the preferred way to consume...
Go to contribution page -
Chris Li (Google), YoungJun Park (LG Electronics)11/12/2025, 16:15
ABSTRACT
Enabling cgroup-level control over swap devices
PROPOSAL
In certain restricted environments, there is a technical requirement to use otherwise idle devices as extended swap memory - including remote storage systems accessible over the network. A motivating scenario is to configure background processes to use these slower network-backed swap devices, while foreground...
Go to contribution page -
Kees Cook (Google)11/12/2025, 17:00
Right now the generic interface to the slab allocator is strictly size based, but most of the allocations done via slab are actually instantiating specific objects, and their type information is much more useful to expose to the allocator than their size. (Though size is still important, give dynamically sized objects via flexible arrays.)
Type information is needed to make better choices...
Go to contribution page -
Alistair Popple11/12/2025, 17:15
Device private memory is used by device drivers to interact with the core mm to migrate data to memory that is inaccessible or unaddressable from the CPU. Currently that interaction uses struct pages and sometimes folios.
It has been pointed out[1] that if everything is converted to folios maybe we don't need these special struct pages anymore. I would like to explore whether removing...
Go to contribution page -
Pankaj Raghav (Samsung)11/12/2025, 17:30
Large folios were initially implemented with dependencies on Transparent Huge Pages (THP) infrastructure. As large folio adoption expands across the kernel, CONFIG_TRANSPARENT_HUGEPAGE has become an overloaded configuration option, sometimes used as a proxy for large folio support [1][2].
While this coupling was discussed during the THP cabal, the specific dependencies remain unclear. This...
Go to contribution page -
Liam Howlett (Oracle)11/12/2025, 17:45
There have been several recent cases where the mm_struct is used without being fully initialized, in an unstable state, or taken longer than expected to exit. The most likely issues are often caused by external complications (zswap, oom, pte lock contention, and perf for example) which require mitigation one at a time.
I'd like to discuss what can be done to avoid having to fix each area...
Go to contribution page -
Harry Yoo (Oracle), Kamalesh Babulal (Oracle)11/12/2025, 18:00
The "zombie memory cgroup" problem is a long-standing issue in the Linux Kernel. It occurs when a memory cgroup is destroyed by users, but kernel metadata cannot be freed because its Least Recently Used (LRU) pages, particularly shared file pages, remain charged to it. These pages can outlive the cgroup that originally owned them, acting as a permanent pin. In environments where cgroups are...
Go to contribution page -
Lorenzo Stoakes (Oracle)
The anonymous memory reverse mapping is complicated, confusing and entails
overhead both in terms of locking and kernel metadata.This talk explores how it functions in practice, how it interacts with other
aspects of mm as well as real-world impact of the current implementation.Importantly it will examine how anon_vma locking functions and how this impacts workloads.
The talk will...
Go to contribution page -
Lorenzo Stoakes (Oracle)
The anonymous memory reverse mapping is complicated, confusing and entails
overhead both in terms of locking and kernel metadata.This talk explores how it functions in practice, how it interacts with other
aspects of mm as well as real-world impact of the current implementation.Importantly it will examine how anon_vma locking functions and how this impacts workloads.
The talk will...
Go to contribution page