Speaker
Description
Container checkpointing has recently been enabled in orchestration platforms like Kubernetes, where the smallest deployable unit is a Pod (a group of containers). However, these platforms are often used to deploy distributed applications running across multiple nodes, which presents a new challenge: How to create consistent global checkpoints of distributed applications running in multiple containers across different cluster nodes?
To address this challenge, we developed criu-coordinator — a tool that synchronizes checkpoint and restore operations among multiple CRIU instances to enable coordinated checkpointing for distributed applications. This talk will cover the design and architecture of criu-coordinator, and discuss its integration with existing container platforms.