Sep 12 – 14, 2022
Europe/Dublin timezone

Kernel Live Patching at Scale

Sep 13, 2022, 3:00 PM
45m
"Lansdowne" (Clayton Hotel on Burlington Road)

"Lansdowne"

Clayton Hotel on Burlington Road

LPC Refereed Track LPC Refereed Track

Speakers

Song Liu (Meta) Rik van Riel (Meta) David Vernet (Meta)

Description

Kernel live patching (KLP) makes it possible to apply quick fixes to a live Linux kernel, without having to shut down the workload to reboot a server. The kpatch tool chain and the livepatch infrastructure generally work well. However, using them on a closely monitored fleet with several million servers uncovers many corner cases. During the deployment of KLP at Meta, we ran into issues, including performance regressions, conflicts with tracing & monitoring tools, and KLP transitions sporadically failing depending on the behavior of the kernel at the time the patch is applied. In this presentation, we will share our experiences working with KLP at scale, describe the top issues we are facing, and discuss some ideas for future improvements.

First, we would like to briefly introduce how we build, deploy, and monitor KLPs at scale. We will then present some recent work to improve KLP infrastructure, including: eliminating performance hit when applying KLPs; making sure KLP works well with various tracing mechanisms; and fixing various corner cases with kpatch-build tool chain and livepatch infrastructure. Finally, we would like to discuss the remaining issues with KLP at scale, and how to address them. Specifically, we will present different reasons for KLP transition errors, and a few ideas/WIPs to address these errors.

I agree to abide by the anti-harassment policy Yes

Primary authors

Song Liu (Meta) Rik van Riel (Meta) David Vernet (Meta)

Presentation materials

Diamond Sponsor

Platinum Sponsors





Gold Sponsors




Silver Sponsors





Speaker Gift Sponsor

Catchbox Sponsor

Video Recording Sponsor

Livestream Sponsor

T-Shirt Sponsor

Conference Services Provided by