The kernel's load tracking scales the observed load by the frequency the CPU is running at, this scaled value is used to determine how loaded a CPU truly is and how its frequency should change. Currently, on X86, four-core turbo level is used as the maximum ratio for every CPU. However, Intel client Hybrid platforms have Pcores and Ecores, and Intel server platforms with...
To register a thermal zone device, the number of parameters required has been increase from 4 when it is first introduced to 8, and people are still willing to add more. This is hard to maintain because every time a new parameter is needed, either a new wrapper is added, or all the current thermal zone drivers need to be updated. Plus, there is already a structure, aka “struct...
The DTPM framework and the thermal control framework are using the same algorithm and mechanism when the power numbers are involved. That results in duplicated code. The DTPM framework interacts with the user space but nothing prevent to provide an in-kernel API where the power based cooling devices can directly act on. That will result in a simpler code and very explicit power value usage....
Energy-aware scheduling (EAS) introduced a simply, yet at that time, effective energy model to help guide task scheduling decisions and DVFS policies. As CPU core micro-architecture has evolved the error bars on the energy model to grow potentially leading to sub-optimal task placement. Are we getting to the point where we need to enhance the energy model, or look at new ways to bias task...
The energy model is dispatched through implicit values in the device tree and the power values are deduced from the formula P=CxFxV² by the energy model in the kernel. Unfortunately, the description is a bit fuzzy if the device is using the Adaptative Voltage Scaling or not performance based, as a battery or a back light. On the other side, complex energy models exist on out of tree kernels...
Running a workload on VM results in very disparate CPUfreq/sched behavior compared to running the same workload on the host. This difference in CPUfreq/sched behavior can cause significant power/performance regression (on top of virtualization overhead) for a workload when it is run on a VM instead of the host.
This talk will highlight some of the CPUfreq and scheduler load tracking...
Per core/cpu idle injection is very effective in controlling thermal conditions, without using CPU offline which has its own drawbacks. Since CPU temperature ramp up and ramp down is very fast, idle injection provides a fast enter and exit path.
Linux has support for per core idle injection for a while (https://www.kernel.org/doc/html/latest/driver-api/thermal/cpu-idle-cooling.html). But...
We introduced AMD P-State kernel CPUFreq driver [1] early of this year that is using ACPI CPPC based fine grain frequency control instead of legacy ACPI P-States, and it is merged into kernel 5.17 [2]. The AMD P-State will be used on most of the Zen2/Zen3 and future AMD processors.
There are two types of hardware implementations: “full MSR solution” and “shared memory solution”. “full MSR...
When a device is broken and return failure during suspend, the whole system is blocked from entering system low-power states. Thus user loses the top one power saving feature on their systems due to non-fatal device failures for their usage. In this case, making the system suspend work with tolerance of device failures is a gain. This may be achieved by a) disabling the device on behalf of...