13–15 Nov 2018
America/Vancouver timezone

Session

Performance and Scalability MC

14 Nov 2018, 09:00

Description

Core counts keep rising, and that means that the Linux kernel continues to encounter interesting performance and scalability issues. Which is not a bad thing, since it has been fifteen years since the ``free lunch'' of exponential CPU-clock frequency increases came to an abrupt end. During that time, the number of hardware threads per socket has risen sharply, approaching 100 for some high-end implementations. In addition, there is much more to scaling than simply larger numbers of CPUs.

Proposed topics for this microconference include optimizations for mmap_sem range locking; clearly defining what mmap_sem protects; scalability of page allocation, zone->lock, and lru_lock; swap scalability; variable hotpatching (self-modifying code!); multithreading kernel work; improved workqueue interaction with CPU hotplug events; proper (and optimized) cgroup accounting for workqueue threads; and automatically scaling the threshold values for per-CPU counters.

We are also accepting additional topics. In particular, we are curious to hear about real-world bottlenecks that people are running into, as well as scalability work-in-progress that needs face-to-face discussion.

Presentation materials

  1. Tim Chen
    14/11/2018, 09:00

    Cgroup accounting has significant overhead due to the need to constantly loop over all cpus to update statistics of cpu usages and blocked averages. We have seen that on 4 socket Haswell, database benchmarks like TPCC have 8% performance regression at the time of Haswell and 4.4 kernel when it is run under cgroup. On recent Cannon Lake platform using latest PCIE SSDs and 4.18 kernel, the...

    Go to contribution page
  2. Pavel Tatashin
    14/11/2018, 09:15

    Discuss two possible approaches to live update Linux that runs as a hypervisor without a noticeable effect on running Virtual Machines (VM). One method is to use cooperative multi-OSing paradigm to share the same machine between two kernels while the new kernel is booting, and the old kernel is still serving the running VM instances. Allow the new kernel to live migrate the drivers from the...

    Go to contribution page
  3. Steven Sistare (Oracle)
    14/11/2018, 09:30

    Summary:
    In this talk I discuss scalability of load balancing algorithms in the task scheduler, and present my work on tracking overloaded CPUs with a bitmap, and using the bitmap to steal tasks when CPUs become idle.

    Abstract:
    The scheduler balances load across a system by pushing waking tasks to idle CPUs, and by pulling tasks from busy CPUs when a CPU becomes idle. Efficient scaling is a...

    Go to contribution page
  4. Subhra Mazumdar
    14/11/2018, 10:00

    1) Scalability of scheduler idle cpu and core search on systems with large number of cpus

    Current select_idle_sibling first tries to find a fully idle core using select_idle_core which can potentially search all cores and if it fails it finds any idle cpu using select_idle_cpu. select_idle_cpu can potentially search all cpus in the llc domain. These don't scale for large llc domains and will...

    Go to contribution page
  5. Christopher Lameter (Jump Trading LLC), Mike Kravetz
    14/11/2018, 11:00

    Huge pages are essential to addressing performance botttlenecks
    since the base page sizes are not changing while the amount of memory is
    ever increasing. Huge pages can address TLB misses but also memmory
    overhead in the Linux kernel that arises through page faults and other
    compute intensive processing of small pages. Huge pages are required
    with contemporary high speed NVME ssds to reach...

    Go to contribution page
  6. Boqun Feng
    14/11/2018, 11:30

    Flexible workqueue: Currently we have two pool setting-up for workqueue: 1) per-cpu workqueue pool and 2) unbound workqueue pool, the former require the users of workqueues to have some knowledge of cpu online state, as shown in:

    https://lore.kernel.org/lkml/20180625224332.10596-2-paulmck@linux.vnet.ibm.com/T/#u

    While the latter (unbound workqueue) only has one pool per-NUMA, and that may...

    Go to contribution page
  7. Daniel Jordan
    14/11/2018, 11:45

    Certain CPU-intensive tasks in the kernel can benefit from multithreading, such as zeroing large ranges of memory, initializing massive state (struct page) at boot, VFIO page pinning, XFS quotacheck, and freeing memory on munmap/exit. There is currently no interface that provides this service. ktask is a framework built on workqueues that splits up the work, chooses the number of threads to...

    Go to contribution page
  8. Yang Shi (Alibaba Group)
    14/11/2018, 12:00

    The mmap_sem has long been a contention point in the memory management
    subsystem. In this session some mmap_sem related topics will be
    discussed. Some optimization has been merged by the upstream kernel to
    solve holding mmap_sem for write for excessive period of time in
    munmap path by downgrading write mmap_sem to read. And, some
    optimization are under discussion on the mailing list, i.e....

    Go to contribution page
  9. Daniel Jordan, Pavel Tatashin, Ying Huang
    14/11/2018, 12:15
  10. Daniel Jordan, Huang Ying (ying.huang@intel.com), Pavel Tatashin

    Welcome to scalability microconference

    Go to contribution page
  11. Daniel Jordan, Huang Ying (ying.huang@intel.com), Pavel Tatashin

    Welcome to scalability microconference

    Go to contribution page
Building timetable...
Platinum sponsors

Gold sponsors

Silver sponsors

Catchbox sponsor
T-Shirt sponsor