Conveners
Containers and checkpoint/restore MC
- Christian Brauner
- Stรฉphane Graber (Zabbly)
- Mike Rapoport
- Adrian Reber (Red Hat)
Description
The Containers and Checkpoint/Restore micro-conference focuses on both userspace and kernel related work.
The micro-conference targets the wider container ecosystem ideally with participants from all major container runtimes as well as init system developers.
The microconference will be discussing recent advancements in container technologies with some of the usual candidates being:
- VFS API improvements (new system calls, idmap, โฆ)
- CGroupV2 feature parity with CGroupV1 and migration path
- Dealing with the eBPF-ification of the world
- Mediating and intercepting complex system calls
- Making user namespaces more accessible
- Verifying the integrity of containers
- Improving the set of resource limits available
On the checkpoint/restore front, some of the potential topics include:
- Making CRIU work with modern Linux distributions
- Handling GPUs
- Restoring FUSE daemons
- Dealing with restartable sequences
- Use of eBPF
- Support of new kernel features
- Supporting shadow stack (x86, arm64)
- Support for madvise(MADV_GUARD_INSTALL)
- Support for mseal()
- Support for pidfd C/R, including process exit information
And quite likely a variety of other container and checkpoint/restore topics as things evolve between now and the event.
Past editions of this micro-conference have been the source of many developments in the Linux kernel, including:
- PIDfds
- VFS idmap (and adding it to a slew of filesystems)
- FUSE in user namespaces
- Unprivileged overlayfs
- Time namespace
- A variety of CRIU features and checkpoint/restore kernel interfaces with the latest among them being
- Unpriviledged checkpoint/restore
- Support of rseq(2) checkpointing
- IMA/TPM attestation work
Memory pages typically represent the largest component of a checkpoint, and handling this data efficiently is crucial for reducing the performance overhead of CRIU. Checkpoint compression is often used to minimize the storage requirements for container snapshots and to accelerate live migration by minimizing the amount of data that must be transferred over the network. However, existing...
Shadow stacks are a key security feature to guard against ROP attacks. Mike Rapoport has worked on enabling checkpoint/restore support for CET-based shadow stacks.
This talk extends that work in the realm of Arm64, specifically the GCS Guarded Control Stack (GCS) ARM extension. I'll present the process of adding GCS support to CRIU, including how process state is detected, dumped and...
[EROFS][1] is a modern, high-performance, block-based Linux image filesystem with an advanced on-disk format (e.g., separated layouts for (un)compressed data, (optional) external data blobs, (optional) data compression supporting multiple algorithms within a single filesystem, fine-grained data deduplication and (optional) metadata compression) and a highly optimized runtime implementation...
Currently, seccomp listeners (created via SECCOMP_FILTER_FLAG_NEW_LISTENER [1]) are limited to a single listener per process [2]. This becomes problematic in nested container scenarios -- for example, when an outer LXC runtime intercepts the mknod syscall while an inner container runtime needs to hook sysinfo. Today, container runtimes often work around this by disabling seccomp listeners...