#### Valentin Schneider <vschneid@redhat.com>

LPC 2024



2

#### Context

- CPU Isolation, NOHZ\_FULL, RCU\_NOCB...
  - Single userspace task on **isolated** CPU
  - No (voluntary) kernel entry
- Some **IPIs** still end up hitting the **isolated** CPU
  - text\_poke\_sync() (static keys & friends)
  - vunmap()'s flush\_tlb\_kernel\_range() (freeing / unmapping)
- Deferral concept: IPI **doesn't concern userspace**?
  - Don't send it
  - Execute related callback ASAP upon kernel entry



### Progress so far

- Tracepoints for IPIs & remote callbacks (v6.4)
  - trace\_ipi\_send\_{cpu,cpumask}
  - trace\_csd\_queue\_cpu
  - trace\_csd\_function\_{entry, exit}
- Free extra: use ftrace synthetic events + histograms to compute CSD delivery time [1]
- Ftrace tweaks to filter by cpumask (v6.6)

trace-cmd record -e 'sched\_switch' -f "CPU & CPUS{\$ISOLATED\_CPUS}" \
-e 'sched\_wakeup' -f "target\_cpu & CPUS{\$ISOLATED\_CPUS}" \
-e 'ipi\_send\_cpu' -f "cpu & CPUS{\$ISOLATED\_CPUS}" \
-e 'ipi\_send\_cpumask' -f "cpumask & CPUS{\$ISOLATED\_CPUS}" \
hackbench



3

### Deferral vs early entry code

Deferred operation is /!\ not immediately executed upon kernel entry /!\



📥 Red Hat

#### Instruction patching vs early entry code

Danger zone => static key in early entry text

- ► Early kernel entry≈noinstr
- Leverage objtool, warn about static keys used in .noinstr regions

- Some non-issue ones, <u>ro\_after\_init</u>
- Two problematic keys stand out:
  - **mds\_idle\_clear**; x86 mitigation, flipped at SMT hotplug
  - \_\_sched\_clock\_stable; flipped by mark\_tsc\_unstable(), called by a lot of \_\_init functions but also runtime ones (e.g. loading KVM module)
- Can we just let the IPI through and blame the user?



### TLB flush vs early entry code

Danger zone => accessing vmap'd addresses in early entry code

CONFIG\_VMAP\_STACK

- No in-flight stack changes between fork() and exit()/put\_task\_struct()
- Not a problem? (cue for one of you to disagree)

Something to track vmap'd addresses in .noinstr regions?



7

### TLB flush vs x86 being annoying

- Paging structure cache, Intel SDM volume 3, 4.10.3
  - CPU can cache "any part of the paging hierarchy"
  - Cached entries can be accessed speculatively



# Thank you!

8



youtube.com/user/RedHatVideos



facebook.com/redhatinc



twitter.com/RedHat

