18–20 Sept 2024
Europe/Vienna timezone

Checkpoint Coordination for Distributed Containerized Applications

19 Sept 2024, 15:20
15m
"Room 1.15 - 1.16" (Austria Center)

"Room 1.15 - 1.16"

Austria Center

106
Containers and checkpoint/restore MC Containers and checkpoint/restore MC

Speaker

Radostin Stoyanov (Red Hat)

Description

Container checkpointing has recently been enabled in orchestration platforms like Kubernetes, where the smallest deployable unit is a Pod (a group of containers). However, these platforms are often used to deploy distributed applications running across multiple nodes, which presents a new challenge: How to create consistent global checkpoints of distributed applications running in multiple containers across different cluster nodes?

To address this challenge, we developed criu-coordinator — a tool that synchronizes checkpoint and restore operations among multiple CRIU instances to enable coordinated checkpointing for distributed applications. This talk will cover the design and architecture of criu-coordinator, and discuss its integration with existing container platforms.

Primary author

Co-authors

Viktória Spišaková (Masaryk University) Adrian Reber (Red Hat)

Presentation materials