Speaker
Description
Performance Monitor Control Unit (PMCU) is a device that offloads PMU accesses from CPUs, handling the configuration, event switching, and counter reading of core PMUs on Kunpeng SoC. It facilitates fine-grained and multi-PMU-event CPU profiling, while the software overhead of accessing PMUs, as well as its impact on target workloads, is reduced. In the current PMU counting scheme, the target CPUs have to handle events locally, affecting its own workload execution; PMCU, instead, accesses PMUs through external memory-mapped interfaces, providing non-intrusive CPU monitoring. PMCU's software stack is currently implemented with the 'perf_event' auxtrace framework. Its patchset contains the documentation, driver, and user perf tool support.
Implementation-wise, we wonder how to make PMCU synchronized with CPU internal accesses? PMUs can be accessed from CPU and PMCU simultaneously. The current ARM PMU standard does not appear to have a mechanism that synchronizes internal and external accesses. Hence, running arm_pmu and PMCU events at the same time may mess up the operation of PMUs, delivering incorrect data for both events, e.g. unexpected events or sample periods. We probably need a software solution to such a case, where two drivers access the same hardware.
Besides the above problem, we are looking forward to general feedback of PMCU from the kernel community, in terms of use cases, interfaces, implementation, etc.
Reference: https://lwn.net/Articles/922351/