18–20 Sept 2024
Europe/Vienna timezone

Session

Containers and checkpoint/restore MC

19 Sept 2024, 15:00

Description

The Containers and Checkpoint/Restore micro-conference focuses on both userspace and kernel related work. The micro-conference targets the wider container ecosystem ideally with participants from all major container runtimes as well as init system developers.

The microconference will be discussing recent advancements in container technologies with some of the usual candidates being:

  • VFS API improvements (new system calls, idmap, ...)
  • CGroupV2 feature parity with CGroupV1 and migration path
  • Dealing with the eBPF-ification of the world
  • Mediating and intercepting complex system calls
  • Making user namespaces more accessible
  • Verifying the integrity of containers

On the checkpoint/restore front, some of the potential topics include:

  • Making CRIU work with modern Linux distributions
  • Handling GPUs
  • Restoring FUSE daemons
  • Dealing with restartable sequences

And quite likely a variety of other container and checkpoint/restore topics as things evolve between now and the event.

Past editions of this micro-conference have been the source of many developments in the Linux kernel, including:

  • PIDfds
  • VFS idmap (and adding it to a slew of filesystems)
  • FUSE in user namespaces
  • Unprivileged overlayfs
  • Time namespace
  • A variety of CRIU features and checkpoint/restore kernel interfaces with the latest among them being
  • Unpriviledged checkpoint/restore
  • Support of rseq(2) checkpointing
  • IMA/TPM attestation work

Presentation materials

  1. Pavel Tikhomirov (Virtuozzo)
    19/09/2024, 15:00

    Unsolved CRIU problems.

    1) Restoring complex process trees.

    Processes can not enter into pre-existing process-session (sid), sessions can
    only be inherited. (Same for process-groups (pgid) in nested pid namespaces.)

    Probable solution 1 - CABA:
    The idea was to save as much of the...

    Go to contribution page
  2. Radostin Stoyanov (Red Hat)
    19/09/2024, 15:20

    Container checkpointing has recently been enabled in orchestration platforms like Kubernetes, where the smallest deployable unit is a Pod (a group of containers). However, these platforms are often used to deploy distributed applications running across multiple nodes, which presents a new challenge: How to create consistent global checkpoints of distributed applications running in multiple...

    Go to contribution page
  3. Stéphane Graber (Zabbly)
    19/09/2024, 15:35

    Containers are a user space fiction, there is no single container concept within the Linux kernel and what set of components constitutes a container isn't something we expect everyone to agree on any time soon (if ever).

    That said, we've seen many ask for ways to easily figure out whether a process belongs to a container, if so, which one, who/what's responsible for it, ...

    Some of the...

    Go to contribution page
  4. Aleksandr Mikhalitsyn (Canonical)
    19/09/2024, 15:50

    This talk is about a problem of integration between the concept of an "isolated" ([1], [2], [3], [4]) user namespace and cgroup-v2 delegation model.

    The biggest challenge here is that cgroup delegation is based on cgroupfs inodes ownership and cgroupfs superblock is shared between all containers which makes it impossible to deal with cgroupfs as with any other containerized filesystem like...

    Go to contribution page
  5. Aleksa Sarai (SUSE LLC)
    19/09/2024, 16:10

    With the introduction of extensible-struct syscalls such as openat2 and clone3, the inability to usefully filter syscalls with pointer arguments makes it harder for various programs to make use of newer kernel features because of both default container and self-hardening seccomp profiles. The inability for systemd and other system utilities to use RESOLVE_IN_ROOT and related openat2...

    Go to contribution page
  6. Ariel Miculas
    19/09/2024, 17:00

    PuzzleFS is a container filesystem designed to address the limitations of the existing OCI format. The main goals of the project are reduced duplication, reproducible image builds, direct mounting support and memory safety guarantees, some inspired by the OCIv2 brainstorm document.

    Reduced...

    Go to contribution page
  7. Tycho Andersen (Netflix)
    19/09/2024, 17:15

    One question applications running in containers often ask is: how many CPUs do I have access to? They want to know, e.g., how many threads they can run in parallel for their threadpool size, or the number of thread-local memory arenas.

    The kernel offers many endpoints to query this information. There is /proc/cpuinfo, /proc/stat, sched_getaffinity(), sysinfo(), the cpuset cgroup hierarchy's...

    Go to contribution page
  8. Michal Koutný (SUSE)
    19/09/2024, 17:35

    Some users of systems with many cgroups may notice that things don't work as swiftly as with fewer cgroups. One part it is caused by simply greater amount of data that must be processed at higher hierarchy levels, another part is that more cgroups mean more frequent operations that affect the running system.

    In this talk, I sum up changes from roughly past two years done to better cope with...

    Go to contribution page
  9. Kamalesh Babulal
    19/09/2024, 17:55

    Enterprise users are likely one of the last holdovers still running cgroup
    v1. As they continue to transition to cgroup v2, we would like to discuss
    the deprecation (and potentially deletion) of cgroup v1.

    In 2022 [1], systemd proposed the removal of cgroup v1 support from systemd,
    but the community wasn't (yet) ready.

    Work has already begun in the kernel to isolate cgroup v1 [2] in...

    Go to contribution page
  10. Mathieu Desnoyers (EfficiOS Inc.)
    19/09/2024, 18:15
    • New machines with 512+ hardware threads (and thus logical CPUs) bring
      interesting challenges for user-space per-CPU data structures due to
      their large memory use.
    • The RSEQ per-memory-map concurrency IDs (upstreamed in Linux v6.3)
      allow indexing user-space memory based on indexes derived from the
      number of concurrently running threads,
    • I plan to apply the same concept to...
    Go to contribution page
Building timetable...
Diamond Sponsor
Platinum Sponsors
Gold Sponsors
Silver Sponsors
Conference Services Provided by