The DL server is a method that allows the usage of a SCHED_DEADLINE to schedule an entire scheduler. This mechanism can be used for multiple purposes. The base case is to
For example, to schedule the CFS scheduler, avoiding the starvation from SCHED_FIFO. The server's base was presented by peterz some years ago, but it raised the points. For example, the inversion of priority of CFS and...
Following surprising benchmark results showing that adding a global raw spinlock in the idle loop significantly improves performance of the scheduler-heavy hackbench benchmark on a 192 core AMD EPYC, a month-long investigation followed to understand the root cause of this behavior.
This presentation is meant to walk the audience through the findings and the resulting solution, opening...
How to reflect better the pressure that can be applied on the CPUs compute capacity into the scheduler to improve task placement deciion and load balancing.
This is a follow-up of the talk at OSPM and patchset will be published before LPC
At LPC 2022 we hosted an Energy Quality of Service (EQOS) API discussion. The proposed API enables user-space to inform the kernel about something it is expert in: itself. Callers do not require any knowledge of the hardware, unrelated tasks, or the internal workings of the scheduler. The session sparked a lot of follow-on discussions, with the main take-away being “okay, so prototype it and...
The Supervisor Software Events (SSE) extension provides a
mechanism to inject software events from an SBI implementation
to supervisor software such that it preempts all other traps and
interrupts. This brings interesting challenges for the SBI implementation (OpenSBI,KVM RISC-V, etc) and supervisor software (Linux).
Implementing efficient spinlocks in userspace is not possible yet in Linux, even after years of different approaches and proposed solutions.The main gap to achieve it is the lack of ABI providing an easy and low-overhead way to check if the current lock holder is running or not.
In this session, we are going to present the problem, and to propose a solution for it using the restartable...
Regressions that cause a device to no longer be probed by a driver can have a
big impact on the platform's functionality, and despite being relatively common
there isn't currently any generic way to detect them.
By enabling the community to catch device probe regressions in a way that
doesn't require additional work for every new platform, and that can catch
issues from config changes...
Here's a tour of what has been done in the front of CPU isolation
this year and what still need to be achieved. Among which topics will include examples such as:
- Memcg cache drain
- Vmstat
- Disable per-CPU buffer_head cache
- IPI deferrals
- cpusets v2 improvements
- Osnoise tracer
- Need for a nohz_full cpuset interface?
- Sysidle (energy optimization)
The current CI systems for the kernel offer basic and low-level
regression detection and handling capabilities based on test results, but they do that in their own specific way. We wonder if we can find more common ways of tackling the problem through post-processing the data provided by the different CI systems. We could then extract additional "hidden" information, look at failure trends,...
What do we want?
- Better CPU isolation, in order to run time-sensitive tasks without interruption
What is (one of the things) preventing this?
- queue_work_on(isolated_cpu)
While working on those, an interesting parallel programming strategy was noticed:
- Use per-cpu structures with local_lock, when a remote CPU needs any action performed, use queue_work_on(target_cpu).
- Works...
Thomas will be open to people's questions about PREEMPT RT and other topics.
KVM and VFIO provide an architecture-neutral irqbypass framework, but
its enablement requires an implementation of an architecture-specific
function, kvm_arch_irq_bypass_add_producer(). The RISC-V AIA and IOMMU
specifications provide novel support for guest interrupt delivery (most
notably MRIFs), which must be considered for RISC-V KVM's irqbypass
implementation. We have an initial...
Modern PCI devices can expose a whole slew of hardware behind a single PCI "device". While the PCI device itself is discoverable, everything behind it (via BARs) is not. These devices aren't fixed in what downstream devices are exposed nor their configuration. There's already a solution for discovering devices and their configuration which is Devicetree. There's also already a mechanism to...
IOMMU overhead memory, which is primarily page table memory, is allocated directly from the buddy allocator, and is not charged or accounted for. Also, there is no easy way to debug IOMMU translations as there are no user interfaces that allow walking through IOMMU page tables. Below are the proposals to solve the problems.
**Add an observability for IOMMU page table memory into...
Open discussion on iommufd topics that have not been settled on the mailing list prior to the conference:
- IOMMU based dirty tracking
- IOMMU nested translation
- IOMMU userspace command queue
- Unique driver features
- iommufd support of SVA/PRI/PASID
- ARM interrupt handling in VMs
- Driver enablement for iommufd features
The Android Micro Conference brings the upstream community and Android systems developers together to discuss issues and changes to the Android platform and their dependencies and interactions with the Linux kernel, allowing for collaboration on solutions for upstream.
Since last year's conference, there has been quite...
Rust is a systems programming language that is making great strides in becoming the next big one in the domain.
Rust for Linux is the project adding support for the Rust language to the Linux kernel. Rust has a key property that makes it very interesting as the second language in the kernel: it guarantees no undefined behavior takes place (as long as unsafe...