

# Plumbers Conference

Richmond, Virginia | November 13-15, 2023

## Update on RISC-V CoVE

2

#### Atish Patra, Ravi Sahita November 14th 2023

- Platforms supporting multiple tenants (apps, VMs) rely on HW isolation primitives managed by a single host/privileged software -- no separation of TCB
- <u>RISC-V Supervisor Domains</u> is a priv ISA extension for isolation between multiple privileged (S/H mode) software execution contexts, thus enabling differentiated trust models - it entails:
  - Sdid & Memory Isolation Smsdid, Smmtt
  - Assigning interrupts to supervisor domains - Smsdia
  - IO-MTT associating IOMMU & MTT to supervisor domains
  - Metadata attached to address translation,  $\bigcirc$ e.g. shared memory - Svpams
  - Secure debug and performance  $\bigcirc$ monitoring controls



#### **RISC-V** Supervisor Domains (ISA TG)



- Rich-OS TEEs require *dynamic access-control* of physical memory
  - Region granularity (4KB and above multiples of Ο page sizes)
  - Regions may be donated by a hosting domain to Ο one or more supervisor domains (access-controlled by Smmtt)
  - Supervisor domain manager can use existing priv Ο ISA (S-stage, G-stage page tables, SPMP) to isolate between its workloads



• Supervisor domain memory accesses may require metadata to be specified with access e.g. to identify encryption contexts - Addressed by Svpams

#### Smmtt







| MTTL2 Entry Type | Description, INFO and TYPE field encoding                                                                                                                                                            |  |
|------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 1G_allow         | The 1G range of address is allowed for the domain<br>0. When configuring 1G ranges, RDSM ensures th<br>corresponding to 64M of address space, have iden                                              |  |
| 16_disallow      | The 1G range of address is not allowed for the dom<br>be 0. When configuring 1G ranges, RDSM ensure<br>each corresponding to 64M of address space, h<br>values.                                      |  |
| MTT_L1_DIR       | The INFO field provides the PPN of the MTTL1 parage hold a 2-bit PERM field to indicate the action domain (described in the MTTL1 entry Figure 10).                                                  |  |
| 2M_PAGES         | The 64M range of address space is partitioned in<br>page has access allowed/not. The INFO field bits<br>address range to indicate access disallowed(0b) o<br>bits 43:32 are reserved (must be zero). |  |

https://github.com/riscv/riscv-smmtt/releases/download/v1.0.3/smmtt-spec.pdf

## Smmtt



• 1xb - reserved (access causes access violation).

| 336 | 326 | 1 |
|-----|-----|---|
| -   | -   | - |

PERM

Figure 10: MTTL1 entry



#### Datacenter Confidential Computing



#### Current POC for CoVE ABI

#### **First POC**

#### Client/Embedded TEE

TEE/non-TEE isolation provided by TSM via G-stage PT (for POC HGAT emulates Conf. Mem isolation) VU (guest) (guest) Host OS/VMM TEE VM TEE VM vHart sched IO assignment Mem. Mgmt VS COVH-ABI COVG-ABI TEEI HS TSM-driver (subsumed TEE Security Manager (TSM) for POC) M-mode Firmware M RISC-V CPU(s) - via QEMU with H-extension Memory Isolation, Confidentiality IOMMU Root of Trust (MTT emulated via G-stage PT)

Common CoVE ABI to support both deployment models with incremental ISA requirements

. . .

Inux

#### Overall design and current status

Plumbers Conforance L Distance

Conference | Richmond, VA | Nov. 13-15, 2023

- All memory is non-confidential by default
- Explicit memory region conversion marking confidential guest memory region
- Explicit host physical pages conversion via *sbi\_covh\_convert\_pages* 
  - Page table management
  - o TVM state management
  - On demand zero pages
  - Boot time measured pages (guest code & data payload)
- TLB management by via sbi\_covh\_[global/local]\_fence after conversion
- Once converted, TSM can repurpose them for other TVMs until reclaim
- kvmtool support for now
  - Cross-vm and Qemu-kvm will be supported in the future version
  - guest\_memfd implementation once qemu-kvm support is available
- Only one additional ABI added in the first version required by the spec
  - KVM\_RISCV\_COVE\_MEASURE\_REGION
  - Any unification plans in this area ?

ential guest memory region \_*convert\_pages* 



data payload) nce after conversion er TVMs until reclaim

ted in the future version nu-kvm support is available equired by the spec \_inux

Plumbers Conference | Richmond, VA | Nov. 13-15, 2023

- Explicit MMIO region registration from TVM via sbi\_covg\_[add|remove]\_mmio\_region at runtime
- TSM is notified about MMIO region at runtime
- TSM forwards the fault to the host if request falls within the guest defined region
- Host emulates the MMIO load/store for TVM

#### Opens

- Explicit ABI call(current spec) vs PTE bits to indicate I/O pages (suggested on lore)
- Device filtering
  - All io-remapped memory converted to mmio region (implemented in 1st version)
  - Authorized devices via device filtering approach (based on last TDX patches)

## MMIO





\_inux Plumbers

Conference | Richmond, VA | Nov. 13-15, 2023

- Only paravirt I/O devices supported in the first version
- Guest initiates swiotlb bounce buffer sharing via sbi\_covg\_[share/unshare]\_memory\_region
- Arch hooks under mem\_encrypt/decrypt
- On-demand mapping via *sbi\_covh\_add\_tvm\_shared\_pages* at fault time

#### Opens

- Explicit ABI call (current spec) VS
- PTE bits to indicate private/shared (suggested on lore)
  - See current proposed Svpams extension in Smmtt spec.
  - May require additional accept ABI for guest?

#### **Device IO**



## Shared memory between host and TSM

Plumbers

\_inux

Conference | Richmond, VA | Nov. 13-15, 2023

- Host in VS mode results in traps for accessing hypervisor level CSRs
- Leverages **RISC-V Nested Acceleration (NACL) SBI extension** with some additional security rules
  - Shared memory between the L0(e.g TSM) & L1(KVM)  $\bigcirc$ hypervisor per host cpu
- CoVE ABI defines the CSR & GPR state & access available in NACL shared memory
  - Trap related CSRS (htinst, htval)
  - Guest time management CSRs (htimedelta &  $\bigcirc$ vstimecmp)
  - Guest interrupt enable state (vsie)  $\bigcirc$
  - GPRs to manage to TVM exits and MMIO loads
- TSM Ignores any updates from (untrusted) host to the CSRs and non-writable GPRs



| NACL Shared Memory Layout |                                                      |  |
|---------------------------|------------------------------------------------------|--|
| Offset                    | Description                                          |  |
| 0x0000-0x0FFF             | Scratch space (4 KB) defined by CoVE to contain GPRs |  |
| 0x1000 onwards            | H extension CSR space (1024 x (XLEN / 8) Bytes)      |  |

### Interrupt assignment to confidential domain

Plumbers Conference | Richmond, VA | Nov. 13-15, 2023

#### **Timer Interrupt**

INUX

- Relies on Sstc extension which allows direct timer interrupt injection to guest  $\bigcirc$
- vstimecmp (guest time compare CSR) read access for host but updates ignored  $\bigcirc$
- TSM's responsibility to restore it while switching back to TVM  $\bigcirc$
- $\bigcirc$

#### **IPI/External Interrupt**

- Only MSI based interrupt via RISC-V AIA specification  $\bigcirc$
- Direct vs-level interrupts possible by VS interrupt file update  $\bigcirc$
- Host domain can request confidential domain to inject allowed interrupts to TVM via  $\bigcirc$ sbi\_covi\_inject\_tvm\_cpu
- Host is responsible for convert/reclaim of VS interrupt file  $\bigcirc$
- Each vcpu migration causes interrupt files to be bind/unbind/rebind  $\bigcirc$
- This may result in co-ordinated TLB invalidation as well
- (See next slide on supervisor domain S-mode interrupt file assignment)

htimedelta updates once at TVM bootime by TSM. Host reads via NACL shared memory



Plumbers

Linux

Conference | Richmond, VA | Nov. 13-15, 2023



- Supervisor domains need not be aware of other domains
- Each domain is associated with a dedicated IMSIC S-interrupt file or an APLIC
  - Each domain interacts with its interrupt controller Ο instance as defined - no change to interface/behavior
- MTT or PMP used to restrict a supervisor domain (and its devices) to its memory mapped registers
- Wires connected statically or configurable to APLICs

#### ISA: Smsdia: Interrupt file assignment to supervisor domains



- All SD external interrupts reflected in a new M-mode CSR msdeia (expected to be a 64 bit CSR allowing for 64 supervisor domains that can have external interrupts assigned)
- SDID selects the external interrupt for the hart from msdeia
- M-mode can be interrupt using a new local interrupt msdeip that indicates if any of the SD external interrupts are pending can be masked using new M-mode CSR msdeie

M-mode can use this interrupt for scheduling decisions Ο

IMSIC CSRs: siselect, sireg, stopei - operate on S-mode register file selected per SDID

12





## Q/A & Thanks





# Plumbers Conference

Richmond, Virginia | November 13-15, 2023

Smmtt TG is discussing an IO-MTT interface to enable device requests to map to a supervisor domain (and hence an IOMMU and an MTT)

- Uses ratified RISC-V IOMMU with MTT checker
  - PA resolved by IOMMU are checked by MTT checker

<u>Inbound</u> Request IOVA

RISC-V CoVE-IO TG defining ABI for TEE-IO:

- Device attestation
- Device assignment to CoVE TVM

#### Non-ISA: IO-MTT: IO assignment to supervisor domains

