1# Device Trees in AVF 2 3This document aims to provide a centralized overview of the way the Android 4Virtualization Framework (AVF) composes and validates the device tree (DT) 5received by protected guest kernels, such as [Microdroid]. 6 7[Microdroid]: ../guest/microdroid/README.md 8 9## Context 10 11As of Android 15, AVF only supports protected virtual machines (pVMs) on 12AArch64. On this architecture, the Linux kernel and many other embedded projects 13have adopted the [device tree format][dtspec] as the way to describe the 14platform to the software. This includes so-called "[platform devices]" (which are 15non-discoverable MMIO-based devices), CPUs (number, characteristics, ...), 16memory (address and size), and more. 17 18With virtualization, it is common for the virtual machine manager (VMM, e.g. 19crosvm or QEMU), typically a host userspace process, to generate the DT as it 20configures the virtual platform. In the case of AVF, the threat model prevents 21the guest from trusting the host and therefore the DT must be validated by a 22trusted entity. To avoid adding extra logic in the highly-privileged hypervisor, 23AVF relies on [pvmfw], a small piece of code that runs in the context of the 24guest (but before the guest kernel), loaded by the hypervisor, which validates 25the untrusted device tree. If any anomaly is detected, pvmfw aborts the boot of 26the guest. As a result, the guest kernel can trust the DT it receives. 27 28The DT sanitized by pvmfw is received by guests following the [Linux boot 29protocol][booting.txt] and includes both virtual and physical devices, which are 30hardly distinguishable from the guest's perspective (although the context could 31provide information helping to identify the nature of the device e.g. a 32virtio-blk device is likely to be virtual while a platform accelerator would be 33physical). The guest is not expected to treat physical devices differently from 34virtual devices and this distinction is therefore not relevant. 35 36``` 37┌────────┐ ┌───────┐ valid ┌───────┐ 38│ crosvm ├──{input DT}──►│ pvmfw ├───────{guest DT}──►│ guest │ 39└────────┘ └───┬───┘ └───────┘ 40 │ invalid 41 └───────────► SYSTEM RESET 42``` 43 44[dtspec]: https://www.devicetree.org/specifications 45[platform devices]: https://docs.kernel.org/driver-api/driver-model/platform.html 46[pvmfw]: ../guest/pvmfw/README.md 47[booting.txt]: https://www.kernel.org/doc/Documentation/arm64/booting.txt 48 49## Device Tree Generation (Host-side) 50 51crosvm describes the virtual platform to the guest by generating a DT 52enumerating the memory region, virtual CPUs, virtual devices, and other 53properties (e.g. ramdisk, cmdline, ...). For physical devices (assigned using 54VFIO), it generates simple nodes describing the fundamental properties it 55configures for the devices i.e. `<reg>`, `<interrupts>`, `<iommus>` 56(respectively referring to IPA ranges, vIRQs, and pvIOMMUs). 57 58It is possible for the caller of crosvm to pass more DT properties or nodes to 59the guest by providing device tree overlays (DTBO) to crosvm. These overlays get 60applied after the DT describing the configured platform has been generated, the 61final result getting passed to the guest. 62 63For physical devices, crosvm supports applying a "filtered" subset of the DTBO 64received, where subnodes are only kept if they have a label corresponding to an 65assigned VFIO device. This allows the caller to always pass the same overlay, 66irrespective of which physical devices are being assigned, greatly simplifying 67the logic of the caller. This makes it possible for crosvm to support complex 68nodes for physical devices without including device-specific logic as any extra 69property (e.g. `<compatible>`) will be passed through the overlay and added to 70the final DT in a generic way. This _vm DTBO_ is read from an AVB-verified 71partition (see `ro.boot.hypervisor.vm_dtbo_idx`). 72 73Otherwise, if the `filter` option is not used, crosvm applies the overlay fully. 74This can be used to supplement the guest DT with nodes and properties which are 75not tied to particular assigned physical devices or emulated virtual devices. In 76particular, `virtualizationservice` currently makes use of it to pass 77AVF-specific properties. 78 79``` 80 ┌─►{DTBO,filter}─┐ 81┌─────────┐ │ │ ┌────────┐ 82│ virtmgr ├─┼────►{DTBO}─────┼─►│ crosvm ├───►{guest DT}───► ... 83└─────────┘ │ │ └────────┘ 84 └─►{VFIO sysfs}──┘ 85``` 86 87## Device Tree Sanitization 88 89pvmfw intercepts the boot sequence of the guest and locates the DT generated by 90the VMM through the VMM-guest ABI. A design goal of pvmfw is to have as little 91side-effect as possible on the guest so that the VMM can keep the illusion that 92it configured and booted the guest directly and the guest does not need to rely 93or expect pvmfw to have performed any noticeable work (a noteworthy exception 94being the memory region describing the [DICE chain]). As a result, both VMM and 95guest can mostly use the same logic between protected and non-protected VMs 96(where pvmfw does not run) and keep the simpler VMM-guest execution model they 97are used to. In the context of pvmfw and DT validation, the final DT passed by 98crosvm to the guest is typically referred to as the _input DT_. 99 100``` 101┌────────┐ ┌───────┐ ┌───────┐ 102│ crosvm ├───►{input DT}───►│ pvmfw │───►{guest DT}───►│ guest │ 103└────────┘ └───────┘ └───────┘ 104 ▲ ▲ 105 ┌─────┐ ┌─►{VM DTBO}──────┘ │ 106 │ ABL ├──┤ │ 107 └─────┘ └─►{ref. DT}──────────┘ 108``` 109 110[DICE chain]: ../guest/pvmfw/README.md#virtual-platform-dice-chain-handover 111 112### Virtual Platform 113 114The DT sanitization policy in pvmfw matches the virtual platform defined by 115crosvm and its implementation is therefore tightly coupled with it (this is one 116reason why AVF expects pvmfw and the VMM to be updated in sync). It covers 117fundamental properties of the platform (e.g. location of main memory, 118properties of CPUs, layout of the interrupt controller, ...) and the properties 119of (sometimes optional) virtual devices supported by crosvm and used by AVF 120guests. 121 122### Physical Devices 123 124To support device assignment, pvmfw needs to be able to validate physical 125platform-specific device properties. To achieve this in a platform-agnostic way, 126pvmfw receives a DT overlay (called the _VM DTBO_) from the Android Bootloader 127(ABL), containing a description of all the assignable devices. By detecting 128which devices have been assigned using platform-specific reserved DT labels, it 129can validate the properties of the physical devices through [generic logic]. 130pvmfw also verifies with the hypervisor that the guest addresses from the DT 131have been properly mapped to the expected physical addresses of the devices; see 132[_Getting started with device assignment_][da.md]. 133 134Note that, as pvmfw runs within the context of an individual pVM, it cannot 135detect abuses by the host of device assignment across guests (e.g. 136simultaneously assigning the same device to multiple guests), and it is the 137responsibility of the hypervisor to enforce this isolation. AVF also relies on 138the hypervisor to clear the state of the device on donation and (most 139importantly) on return to the host so that pvmfw does not need to access the 140assigned devices. 141 142[generic logic]: ../guest/pvmfw/src/device_assignment.rs 143[da.md]: ../docs/device_assignment.md 144 145### Extra Properties (Security-Sensitive) 146 147Some AVF use-cases require passing platform-specific inputs to protected guests. 148If these are security-sensitive, they must also be validated before being used 149by the guest. In most cases, the DT property is platform-agnostic (and supported 150by the generic guest) but its value is platform-specific. The _reference DT_ is 151an [input of pvmfw][pvmfw-config] (received from the loader) and used to 152validate DT entries which are: 153 154- security-sensitive: the host should not be able to tamper with these values 155- not confidential: the property is visible to the host (as it generates it) 156- Same across VMs: the property (if present) must be same across all instances 157- possibly optional: pvmfw does not abort the boot if the entry is missing 158 159[pvmfw-config]: ../guest/pvmfw/README.md#configuration-data-format 160 161### Extra Properties (Host-Generated) 162 163Finally, to allow the host to generate values that vary between guests (and 164which therefore can't be described using one the previous mechanisms), pvmfw 165treats the subtree of the input DT at path `/avf/untrusted` differently: it only 166performs minimal sanitization on it, allowing the host to pass arbitrary, 167unsanitized DT entries. Therefore, this subtree must be used with extra 168validation by guests e.g. only accessed by path (where the name, "`untrusted`", 169acts as a reminder), with no assumptions about the presence or correctness of 170nodes or properties, without expecting properties to be well-formed, ... 171 172In particular, pvmfw prevents other nodes from linking to this subtree 173(`<phandle>` is rejected) and limits the risk of guests unexpectedly parsing it 174other than by path (`<compatible>` is also rejected) but guests must not support 175non-standard ways of binding against nodes by property as they would then be 176vulnerable to attacks from a malicious host. 177 178### Implementation details 179 180DT sanitization is currently implemented in pvmfw by parsing the input DT into 181temporary data structures and pruning a built-in device tree (called the 182_platform DT_; see [platform.dts]) accordingly. For device assignment, it prunes 183the received VM DTBO to only keep the devices that have actually been assigned 184(as the overlay contains all assignable devices of the platform). 185 186[platform.dts]: ../guest/pvmfw/platform.dts 187 188## DT for guests 189 190### AVF-specific properties and nodes 191 192For Microdroid and other AVF guests, some special DT entries are defined: 193 194- the `/chosen/avf,new-instance` flag, set when pvmfw triggered the generation 195 of a new set of CDIs (see DICE) _i.e._ the pVM instance was booted for the 196 first time. This should be used by the next stages to synchronise the 197 generation of new CDIs and detect a malicious host attempting to force only 198 one stage to do so. This property becomes obsolete (and might not be set) when 199 [deferred rollback protection] is used by the guest kernel; 200 201- the `/chosen/avf,strict-boot` flag, always set for protected VMs and can be 202 used by guests to enable extra validation; 203 204- the `/avf/untrusted/defer-rollback-protection` flag controls [deferred 205 rollback protection] on devices and for guests which support it; 206 207- the host-allocated `/avf/untrusted/instance-id` is used to assign a unique 208 identifier to the VM instance & is used for differentiating VM secrets as well 209 as by guest OS to index external storage such as Secretkeeper. 210 211[deferred rollback protection]: ../docs/updatable_vm.md#deferring-rollback-protection 212