Linux Plumbers Conference 2025

Name: Linux Plumbers Conference 2025
Start: 2025-12-11T09:00:00+09:00
End: 2025-12-13T23:30:00+09:00
Location: No location set

11–13 Dec 2025

Asia/Tokyo timezone

2025

contact@linuxplumbersconf.org

Lessons from scaling BPF to detect RDMA Device Drivers Bugs in real time

12 Dec 2025, 13:00

30m

"Main Hall A" (Toranomon Hills Mori Tower)

"Main Hall A"

Toranomon Hills Mori Tower

290

eBPF Track eBPF Track

Prankur Gupta (Meta)

Training large models requires significant resources and failure of any GPU or Host can significantly prolong training times. At Meta, we observed that 17% of our jobs fail due to RDMA-related syscall errors which arise due to bugs in the RDMA driver code. Unlike other parts of the Kernel RDMA-related syscalls are opaque and the errors create a mismatched application/kernel view of hardware resources. As a result of this opacity and mismatch existing observability tools provided limited visibility and DevOps found it challenging to triage – we required a new scalable framework to analyze kernel state and identify the cause of this mismatch.

Direct approaches like tracing the kernel calls and capturing meta involved in the systems turned out to be prohibitively expensive. In this talk, we will describe the set of optimizations used to scale tracking kernel state and the map-based systems designed to efficiently export relevant state without impacting production workloads.

Maxim Samoylov Prankur Gupta (Meta) Theophilus Benson (Carnegie Mellon University)

BPF_RDMATracer_LPC_2025.pdf

Video

Linux Plumbers Conference 2025

2025

Lessons from scaling BPF to detect RDMA Device Drivers Bugs in real time

"Main Hall A"

Toranomon Hills Mori Tower

Speaker

Description

Primary authors

Presentation materials

Diamond Sponsors

Platinum Sponsors

Gold Sponsors

Silver Sponsors

T-Shirt Sponsor

Conference Services Provided by

Choose timezone

Linux Plumbers Conference 2025

2025

Speaker

Description

Primary authors

Presentation materials

Diamond Sponsors

Platinum Sponsors

Gold Sponsors

Silver Sponsors

T-Shirt Sponsor

Conference Services Provided by