Speakers
Description
Proposal
Live Update is a specialized reboot process where selected devices are kept operational and kernel state is preserved and recreated across a kexec. For devices, DMA and interrupts may continue during the reboot.
The primary use-case of Live Update is to enable hypervisor updates in cloud environments with minimal disruption to running virtual machines. During a Live Update, a VM can pause and its state is stored to memory while the hypervisor reboots. PCIe devices attached to those VMs (such as GPUs, NICs, and SSDs), are kept running during the Live Update. After the reboot, VMs are recreated and restored from memory, reattached to devices, and resumed. The disruption is limited to the time it takes to complete this entire process.
With Live Update infrastructure in place, other use-cases may emerge, like for example preserving the state of GPU doing LLM, freezing running containers with CRIU, and preserving large in-memory databases.
The Live Update and state persistence functionality touch on different parts of the kernel and this microconference aims to bring together people from different subsystems. Upstream support for Live Updates is still in its infancy and there are a lot of unsolved aspects that will benefit from direct communication.
Key problems that will be discussed:
Support for memfd/guest_memfd/hugetlb/tmpfs Preserving the state of VFIO, IOMMUFD, and IOMMU drivers. Preserving vCPUs and Orphaned Virtual Machines LUO systemd integration Integration of Live Update with PCI and Device Model Leveraging suspend/resume functionality for device state preservation Optimizing kernel shutdown and boot times.
Last year achievements:
Following “Memory persistence over kexec” BoF at LPC 2024 we we landed support for Kernel KHO, LUO, and memfd preservation.
Expanding the BoF to a full blown MC helped defining key data structures required for Live Update stability and isolating them into a dedicated kho/abi/ directory under include/linux.
Duing 2025 edition of Live Update MC we finalized the objectives and design for making KHO stateless and it’s now transitioned to a radix tree for memory preservation.
Key attendees:
- Alex Graf
- Alex Williamson
- Ben Herrenschmidt
- Bjorn Helgaas
- David Matlack
- David Rientjes
- David Woodhouse
- Evangelos Petrongonas
- Jason Gunthorpe
- Josh Hilke
- Luca Boccassi
- Michał Cłapiński
- Mike Rapoport
- Pasha Tatashin
- Pratyush Yadav
- Samiullah Khawaja
- Vipin Sharma