Description
For the fourth year in a row, the eBPF & Networking Track is going to bring together developers, maintainers, and other contributors from all around the globe to discuss improvements to the Linux kernel’s networking stack as well as BPF subsystem and their surrounding user space ecosystems such libraries, loaders, compiler backends, and other related system tooling.
The gathering is designed to foster collaboration and face to face discussion of ongoing development topics as well as to encourage bringing new ideas into the development community for the advancement of both subsystems.
The track will be composed of talks, 30 minutes in length (including Q&A discussion). Topics will be advanced Linux networking and/or BPF related.
eBPF & Networking Track's technical committee: David S. Miller, Jakub Kicinski, Paolo Abeni, Eric Dumazet, Alexei Starovoitov, Daniel Borkmann (chair), Andrii Nakryiko and Martin Lau.
There has been recent work on adding the notion of exceptions to the BPF runtime in the Linux kernel. In this presentation, we will explore the necessary changes made to the BPF subsystem to fulfill this. We will also explore various implementation choices, reasons for making the feature as generic as possible, and the possibility of integrating similar features found in other languages (C++,...
In the rapidly evolving landscape of BPF as kernel extensions, the need for early termination is becoming increasingly critical, whether it's due to kernel stalling or the need to enforce execution time restriction on critical hook points :
- The recently added bpf_loop helper can be used to attach a very long running BPF program into the kernel which has been demonstrated to stall the...
This talk will present our automated tool, Agni, to check the correctness of range analysis in the Linux kernel’s eBPF verifier. Agni automatically extracts the semantics of the verifier's range analysis in logic (SMT) from the kernel's C source code. We use abstract interpretation theory to provide a formal specification of the soundness of range analysis. Our tool checks the verifier's range...
What has happened with the BPF Memory Model, two years after the presentation [1] at the Networking and BPF Summit held at the 2021 Linux Plumbers Conference?
Until recently, not much!
But that has changed, so much so that this presentation will cover a more detailed proposal for a BPF memory model.
[1] https://lpc.events/event/11/contributions/941/
eBPF is accelerating waves of innovation allowing applications to enhance the kernel’s capabilities at runtime, while guaranteeing stability and security. Such guaranteed safety is made possible by the verifier engine which statically verifies BPF code. However, the verifier implicitly makes assumptions about the runtime execution environment, which must hold for safety to be upheld. One...
Overview
Binary authorization is a common security requirement for modern systems. Fundamentally, only securely authorized binaries are allowed to perform certain risky operations. For example, only an authenticated sshd binary is allowed to bind port 22, or only limited authorized binaries should write to raw block devices with critical data. Many proposals have sought to solve...
Sysarmor is a security daemon used to detect possible threats, and enforce security rules at Meta. Sysarmor is deployed to higher threat environments, such as: collocated hosts, Meta Network Appliances, development servers, Meta cloud gaming, and public cloud (AWS/GCP). Sysarmor has over 40 BPF based detections, including areas such as: networking, privilege escalation, hardware attacks,...
The Linux kernel uses non-repudiable logging to attest to system integrity. Non-repudiation ensures that the validity of the log cannot be disputed, even in the presence of an untrusted actor. We present an extensible interface for user-defined programs to leverage TPM-based non-repudiable logging of any kernel data accessible to eBPF programs. With the large variety eBPF hook locations, our...
Bonus/fun evening session:
eBPF gains widespread adoption, and it is relatively easy for people in tech, nowadays, to find tutorials or blog posts to get started with eBPF and to understand how it works. Other people hear about eBPF, but are less familiar with the related concepts, and they struggle more to understand what it is about, and how it changes system...
We are looking for the new register-set data structure, instead of pt_regs, for function entry/exit trace events. This is because pt_regs is expected to save all registers including some control registers which are usually saved when an exception or interrupt happens. However, using ftrace it will not be able to be used on some architecture. Moreover, for most RISC architecture, saving all...
BPF for networking has seen a number of infrastructure improvements since the last year such as the introduction of tcx as the new tc BPF fast path with BPF link support. The next bigger step in this area is the introduction of a BPF programmable netdevice called "netkit" where the BPF logic is part of the driver's xmit routine. This talk elaborates on why it is needed, provides a detailed...
Application security and observability systems provide useful insight into L7 application networking. These systems promise nice looking service maps showing all your GRPC connections and how all the network services interact. They snoop DNS traffic providing the key insights of IP to DNS name mappings in a world where IPs are increasingly dynamic and meaningless from an identity perspective....
In this talk, we share some transport tunings built using eBPF to improve network performance and reliability. We will discuss examples of problems observed along with their solutions at different scopes – intra datacenter(small RTT) and inter-region(long RTT) network. Next, we talk about how we used one BPF attach-point (struct_ops) to try a TCP congestion control change aimed at improving...
In the Linux kernel the Static Keys feature allows the inclusion of seldom used features in the fast-path code via the 'asm goto' compiler feature and code live-patching techniques. When disabled, a static key incurs zero overhead.
While looking into ways to extend functionality of the pwru [1] utility to trace networking events it became clear that a similar Static Keys feature would be a...
Datadog has been using eBPF in production for observability, security and networking for several years now. While we managed to leverage eBPF to build new features, which would have been impossible otherwise, we also learned a lot the hard way. In this talk, we aim to get into the details of some gotchas, pitfalls and bugs uncovered over the years. You'll learn about eBPF hook points coverage...
At LPC 2022, we talked about experimenting with eBPF to extend the existing stack unwinding facility in the Linux kernel for interpreted languages, such as Ruby and Python, as well as runtimes emitting JITed code, like NodeJS.
While we have successfully implemented these features in parca-agent across both Arm64 and x86 architectures, there is...
Towards a standardized eBPF ISA - Conformance testing
The BPF Conformance Suite, consisting of a test runner and a suite of test cases, is a tool that addresses the challenge of ensuring cross-runtime compatibility for BPF programs.
This presentation will delve into the core aspects of the BPF Conformance Suite, including its purpose,...
Memory bandwidth is a bottleneck in many distributed services running at scale as I/O bandwidth has not kept up with CPU or NIC speeds over time. One limitation of kernel socket-based networking is that data is first copied into kernel memory via DMA, and then again into user memory, which adds pressure to overall memory bandwidth and comes with a CPU cost. The classic way of addressing this...
Homa, a unique transport protocol created specifically for hype-scale datacenters, provides optimized round-trip performance for request/reply messages. An in-depth evaluation of the Homa Linux module in contrast to TCP showed a considerable decrease in latency with RPC application benchmarks. Furthermore, our analysis of gRPC operating over Homa versus gRPC over TCP revealed significant...
Disabling bottom halves is essentially a per-CPU Big Kernel Lock. While some data structures have explicit locking - other rely on disabling BH. Depending on the load, networking has to wait until timer callbacks have finished. Even if preempted by a task with higher priority, it can not send a packet until all receiving is done.
This talk intends to discuss with the networking community,...
In large deployments, significant CPU cycles are used on encryption for transport security (QUIC, TLS, etc). CPU crypto instructions and ‘look-a-side’ accelerators can have significant performance penalties (memory copies, cache pollution, etc).
NIC or Inline offload solves many of these problems and it leverages the natural memory copy into the NIC to implement crypto-offload. Other...
AF_XDP is a relatively novel packet family which builds on top of XDP and supports directly accessing low-level networking queues from userspace. It exposes raw packet headers and payload and bypasses most of the kernel stack. Recently, it gained the support of NIC receive-side offloads and I'm actively working on...
What happens when your application opens upwards of 50k connections to a single
destination? Short answer - connect() syscall becomes slow. Cloudflare found out the
hard way.
Through this talk we would like to share our story of what we have learned about
connect() implementation for TCP in Linux, both its strong and weak sides. How
connect() latency changes under pressure, and how to...
In the container-centric ecosystem, achieving efficient network isolation without compromising on performance has become paramount. Not all containers require the stringent network isolation akin to VMs. Many can benefit from a more flexible approach, like using eBPF hooks, to mark and manage network traffic with QOS. This presentation delves into the application of cgroup-bpf based hooks...
The industry extensively relies on direct server response, DSR, and Meta has a long history of employing this technology for L4 load balancing. At the same time, our fleet went through an evolution of being an isolated subset of machines per team, to a more efficient model with a single shared pool that provides multi-tenant capacity. Moving services to network namespace becomes necessary to...
SYN Cookie is a technique used to protect servers from malicious connection requests. Under SYN flood, the Linux TCP stack encodes the client information into the initial sequence number (ISN) of SYN+ACK, which is called SYN Cookie, and decodes that from ACK of 3WHS so that the kernel can release resources for the connection and stays stateless during 3WHS.
For security reasons, SYN Cookie...