Over the past decade, many parts of PREEMPT_RT have been included in the official Linux codebase. Examples include real-time mutexes, high-resolution timers, lockdep, ftrace, RCU_PREEMPT, threaded interrupt handlers, and more. The number of patches that need integration has been significantly reduced, and the rest is mature enough to make their way into mainline Linux.
The scheduler is the core of Linux performance. With different topologies and workloads, giving the user the best experience possible is challenging, from low latency to high throughput and from small power-constrained devices to HPC, where CPU isolation is critical.
The DL server is a method that allows the usage of a SCHED_DEADLINE to schedule an entire scheduler. This mechanism can be used for multiple purposes. The base case is to
For example, to schedule the CFS scheduler, avoiding the starvation from SCHED_FIFO. The server's base was presented by peterz some years ago, but it raised the points. For example, the inversion of priority of CFS and...
Following surprising benchmark results showing that adding a global raw spinlock in the idle loop significantly improves performance of the scheduler-heavy hackbench benchmark on a 192 core AMD EPYC, a month-long investigation followed to understand the root cause of this behavior.
This presentation is meant to walk the audience through the findings and the resulting solution, opening...
How to reflect better the pressure that can be applied on the CPUs compute capacity into the scheduler to improve task placement deciion and load balancing.
This is a follow-up of the talk at OSPM and patchset will be published before LPC
At LPC 2022 we hosted an Energy Quality of Service (EQOS) API discussion. The proposed API enables user-space to inform the kernel about something it is expert in: itself. Callers do not require any knowledge of the hardware, unrelated tasks, or the internal workings of the scheduler. The session sparked a lot of follow-on discussions, with the main take-away being “okay, so prototype it and...
The proxy execution patch series continues to be worked on to stabilize and get it ready for validation for use in products.
But its complexity is high.
I want to have a discussion for ideas on how we might break things up into more fine grained patches to iteratively get upstream, without making it an epic effort (hello, PREEMPT_RT!), or overwhelming reviewers ("[PATCH 1/628]...
Implementing efficient spinlocks in userspace is not possible yet in Linux, even after years of different approaches and proposed solutions.The main gap to achieve it is the lack of ABI providing an easy and low-overhead way to check if the current lock holder is running or not.
In this session, we are going to present the problem, and to propose a solution for it using the restartable...
Here's a tour of what has been done in the front of CPU isolation
this year and what still need to be achieved. Among which topics will include examples such as:
- Memcg cache drain
- Disable per-CPU buffer_head cache
- IPI deferrals
- cpusets v2 improvements
- Osnoise tracer
- Need for a nohz_full cpuset interface?
- Sysidle (energy optimization)
What do we want?
- Better CPU isolation, in order to run time-sensitive tasks without interruption
What is (one of the things) preventing this?
While working on those, an interesting parallel programming strategy was noticed:
- Use per-cpu structures with local_lock, when a remote CPU needs any action performed, use queue_work_on(target_cpu).
Thomas will be open to people's questions about PREEMPT RT and other topics.
Before a CPU becomes idle, it kicks off idle load balance to pull tasks from other run queues to utilize the CPU and prevent it from idling. However, this search has a potential scalability problem when the number of CPUs and sched groups in the sched domain increases.
Idle load balance potentially traverses all sched domains and calculates the statistics one by one. The time cost on idle...