13–15 Nov 2023
America/New_York timezone

connect() - why you so slow?!

15 Nov 2023, 14:30
30m
"James River Salon C" (Omni Richmond Hotel)

"James River Salon C"

Omni Richmond Hotel

225
eBPF & Networking Track eBPF & Networking

Speaker

Frederick Lawler (Cloudflare)

Description

What happens when your application opens upwards of 50k connections to a single
destination? Short answer - connect() syscall becomes slow. Cloudflare found out the
hard way.

Through this talk we would like to share our story of what we have learned about
connect() implementation for TCP in Linux, both its strong and weak sides. How
connect() latency changes under pressure, and how to open connection so that the
syscall latency is deterministic and time-bound.

In this talk we would like to cover:

  • Why Cloudflare services sometimes experience pressure, where we need to open
    lots of connections to just one destination.
  • How we have been avoiding the connect() latency pitfall so far, and why it is
    no longer a viable option.
  • Our efforts to benchmark connect() syscall and characterize its latency as the
    the number of open connections increases.
  • Existing difficulties in tracing and monitoring connect() performance at scale
    in a production environment.
  • A look at how connect() is implemented in Linux for TCP; its evolution and
    previous attempts dealing with high-latency under pressure.
  • How to control how connect() takes with existing Linux APIs - recipes for how
    to open TCP connections with predictable syscall latency.

Primary author

Frederick Lawler (Cloudflare)

Presentation materials

Diamond Sponsors

Platinum Sponsor
Gold Sponsors




Silver Sponsors



Catchbox Sponsor
Livestream Sponsors

T-Shirt Sponsor
Conference Services Provided by