18–20 Sept 2024
Europe/Vienna timezone

Title: Multi-sized THP performance benchmarks and analysis on ARM64

20 Sept 2024, 15:45
45m
"Hall L2/L3" (Austria Center)

"Hall L2/L3"

Austria Center

300
LPC Refereed Track LPC Refereed Track

Speakers

Olivier Singla (Ampere Computing) Yang Shi (Ampere Computing)

Description

The Linux kernel has supported multi-sized THP since v6.8 allowing the use of intermediate sized huge pages less than 2M. ARM64 supports contiguous PTEs where multiple PTE entries can be coalesced one TLB entry. This will increase the size of memory covered by the TLB entries and avoid page table walks to create TLB entries.

We ran a series of benchmarks on Ampere Altra using some popular workloads in cloud: In-memory databases, kernel compilation etc using different sized huge pages: 2M, 128K, 64K and others.

This presentation will also cover how multi-sized THP work and includes hardware details on the operation of contiguous PTE and variable page size on ARM64.

We conclude the multi-sized THP may not boost all kind of workloads. The overhead of page table walk is a significant contributing factor for some workloads. The reduced page faults help performance for some workloads. We would recommend the use of a kernel with 16K page size as an optimal solution that has most performance gains but does not significantly increase the memory footprint.

Primary author

Yang Shi (Ampere Computing)

Co-author

Olivier Singla (Ampere Computing)

Presentation materials