1# Device Trees in AVF
2
3This document aims to provide a centralized overview of the way the Android
4Virtualization Framework (AVF) composes and validates the device tree (DT)
5received by protected guest kernels, such as [Microdroid].
6
7[Microdroid]: ../guest/microdroid/README.md
8
9## Context
10
11As of Android 15, AVF only supports protected virtual machines (pVMs) on
12AArch64. On this architecture, the Linux kernel and many other embedded projects
13have adopted the [device tree format][dtspec] as the way to describe the
14platform to the software. This includes so-called "[platform devices]" (which are
15non-discoverable MMIO-based devices), CPUs (number, characteristics, ...),
16memory (address and size), and more.
17
18With virtualization, it is common for the virtual machine manager (VMM, e.g.
19crosvm or QEMU), typically a host userspace process, to generate the DT as it
20configures the virtual platform. In the case of AVF, the threat model prevents
21the guest from trusting the host and therefore the DT must be validated by a
22trusted entity. To avoid adding extra logic in the highly-privileged hypervisor,
23AVF relies on [pvmfw], a small piece of code that runs in the context of the
24guest (but before the guest kernel), loaded by the hypervisor, which validates
25the untrusted device tree. If any anomaly is detected, pvmfw aborts the boot of
26the guest. As a result, the guest kernel can trust the DT it receives.
27
28The DT sanitized by pvmfw is received by guests following the [Linux boot
29protocol][booting.txt] and includes both virtual and physical devices, which are
30hardly distinguishable from the guest's perspective (although the context could
31provide information helping to identify the nature of the device e.g. a
32virtio-blk device is likely to be virtual while a platform accelerator would be
33physical). The guest is not expected to treat physical devices differently from
34virtual devices and this distinction is therefore not relevant.
35
36```
37┌────────┐               ┌───────┐ valid              ┌───────┐
38│ crosvm ├──{input DT}──►│ pvmfw ├───────{guest DT}──►│ guest │
39└────────┘               └───┬───┘                    └───────┘
40                             │   invalid
41                             └───────────► SYSTEM RESET
42```
43
44[dtspec]: https://www.devicetree.org/specifications
45[platform devices]: https://docs.kernel.org/driver-api/driver-model/platform.html
46[pvmfw]: ../guest/pvmfw/README.md
47[booting.txt]: https://www.kernel.org/doc/Documentation/arm64/booting.txt
48
49## Device Tree Generation (Host-side)
50
51crosvm describes the virtual platform to the guest by generating a DT
52enumerating the memory region, virtual CPUs, virtual devices, and other
53properties (e.g. ramdisk, cmdline, ...). For physical devices (assigned using
54VFIO), it generates simple nodes describing the fundamental properties it
55configures for the devices i.e. `<reg>`, `<interrupts>`, `<iommus>`
56(respectively referring to IPA ranges, vIRQs, and pvIOMMUs).
57
58It is possible for the caller of crosvm to pass more DT properties or nodes to
59the guest by providing device tree overlays (DTBO) to crosvm. These overlays get
60applied after the DT describing the configured platform has been generated, the
61final result getting passed to the guest.
62
63For physical devices, crosvm supports applying a "filtered" subset of the DTBO
64received, where subnodes are only kept if they have a label corresponding to an
65assigned VFIO device. This allows the caller to always pass the same overlay,
66irrespective of which physical devices are being assigned, greatly simplifying
67the logic of the caller. This makes it possible for crosvm to support complex
68nodes for physical devices without including device-specific logic as any extra
69property (e.g. `<compatible>`) will be passed through the overlay and added to
70the final DT in a generic way. This _vm DTBO_ is read from an AVB-verified
71partition (see `ro.boot.hypervisor.vm_dtbo_idx`).
72
73Otherwise, if the `filter` option is not used, crosvm applies the overlay fully.
74This can be used to supplement the guest DT with nodes and properties which are
75not tied to particular assigned physical devices or emulated virtual devices. In
76particular, `virtualizationservice` currently makes use of it to pass
77AVF-specific properties.
78
79```
80            ┌─►{DTBO,filter}─┐
81┌─────────┐ │                │  ┌────────┐
82│ virtmgr ├─┼────►{DTBO}─────┼─►│ crosvm ├───►{guest DT}───► ...
83└─────────┘ │                │  └────────┘
84            └─►{VFIO sysfs}──┘
85```
86
87## Device Tree Sanitization
88
89pvmfw intercepts the boot sequence of the guest and locates the DT generated by
90the VMM through the VMM-guest ABI. A design goal of pvmfw is to have as little
91side-effect as possible on the guest so that the VMM can keep the illusion that
92it configured and booted the guest directly and the guest does not need to rely
93or expect pvmfw to have performed any noticeable work (a noteworthy exception
94being the memory region describing the [DICE chain]). As a result, both VMM and
95guest can mostly use the same logic between protected and non-protected VMs
96(where pvmfw does not run) and keep the simpler VMM-guest execution model they
97are used to. In the context of pvmfw and DT validation, the final DT passed by
98crosvm to the guest is typically referred to as the _input DT_.
99
100```
101┌────────┐                  ┌───────┐                  ┌───────┐
102│ crosvm ├───►{input DT}───►│ pvmfw │───►{guest DT}───►│ guest │
103└────────┘                  └───────┘                  └───────┘
104                              ▲   ▲
105   ┌─────┐  ┌─►{VM DTBO}──────┘   │
106   │ ABL ├──┤                     │
107   └─────┘  └─►{ref. DT}──────────┘
108```
109
110[DICE chain]: ../guest/pvmfw/README.md#virtual-platform-dice-chain-handover
111
112### Virtual Platform
113
114The DT sanitization policy in pvmfw matches the virtual platform defined by
115crosvm and its implementation is therefore tightly coupled with it (this is one
116reason why AVF expects pvmfw and the VMM to be updated in sync). It covers
117fundamental properties of the platform (e.g.  location of main memory,
118properties of CPUs, layout of the interrupt controller, ...) and the properties
119of (sometimes optional) virtual devices supported by crosvm and used by AVF
120guests.
121
122### Physical Devices
123
124To support device assignment, pvmfw needs to be able to validate physical
125platform-specific device properties. To achieve this in a platform-agnostic way,
126pvmfw receives a DT overlay (called the _VM DTBO_) from the Android Bootloader
127(ABL), containing a description of all the assignable devices. By detecting
128which devices have been assigned using platform-specific reserved DT labels, it
129can validate the properties of the physical devices through [generic logic].
130pvmfw also verifies with the hypervisor that the guest addresses from the DT
131have been properly mapped to the expected physical addresses of the devices; see
132[_Getting started with device assignment_][da.md].
133
134Note that, as pvmfw runs within the context of an individual pVM, it cannot
135detect abuses by the host of device assignment across guests (e.g.
136simultaneously assigning the same device to multiple guests), and it is the
137responsibility of the hypervisor to enforce this isolation. AVF also relies on
138the hypervisor to clear the state of the device on donation and (most
139importantly) on return to the host so that pvmfw does not need to access the
140assigned devices.
141
142[generic logic]: ../guest/pvmfw/src/device_assignment.rs
143[da.md]: ../docs/device_assignment.md
144
145### Extra Properties (Security-Sensitive)
146
147Some AVF use-cases require passing platform-specific inputs to protected guests.
148If these are security-sensitive, they must also be validated before being used
149by the guest. In most cases, the DT property is platform-agnostic (and supported
150by the generic guest) but its value is platform-specific. The _reference DT_ is
151an [input of pvmfw][pvmfw-config] (received from the loader) and used to
152validate DT entries which are:
153
154- security-sensitive: the host should not be able to tamper with these values
155- not confidential: the property is visible to the host (as it generates it)
156- Same across VMs: the property (if present) must be same across all instances
157- possibly optional: pvmfw does not abort the boot if the entry is missing
158
159[pvmfw-config]: ../guest/pvmfw/README.md#configuration-data-format
160
161### Extra Properties (Host-Generated)
162
163Finally, to allow the host to generate values that vary between guests (and
164which therefore can't be described using one the previous mechanisms), pvmfw
165treats the subtree of the input DT at path `/avf/untrusted` differently: it only
166performs minimal sanitization on it, allowing the host to pass arbitrary,
167unsanitized DT entries. Therefore, this subtree must be used with extra
168validation by guests e.g. only accessed by path (where the name, "`untrusted`",
169acts as a reminder), with no assumptions about the presence or correctness of
170nodes or properties, without expecting properties to be well-formed, ...
171
172In particular, pvmfw prevents other nodes from linking to this subtree
173(`<phandle>` is rejected) and limits the risk of guests unexpectedly parsing it
174other than by path (`<compatible>` is also rejected) but guests must not support
175non-standard ways of binding against nodes by property as they would then be
176vulnerable to attacks from a malicious host.
177
178### Implementation details
179
180DT sanitization is currently implemented in pvmfw by parsing the input DT into
181temporary data structures and pruning a built-in device tree (called the
182_platform DT_; see [platform.dts]) accordingly. For device assignment, it prunes
183the received VM DTBO to only keep the devices that have actually been assigned
184(as the overlay contains all assignable devices of the platform).
185
186[platform.dts]: ../guest/pvmfw/platform.dts
187
188## DT for guests
189
190### AVF-specific properties and nodes
191
192For Microdroid and other AVF guests, some special DT entries are defined:
193
194- the `/chosen/avf,new-instance` flag, set when pvmfw triggered the generation
195  of a new set of CDIs (see DICE) _i.e._ the pVM instance was booted for the
196  first time. This should be used by the next stages to synchronise the
197  generation of new CDIs and detect a malicious host attempting to force only
198  one stage to do so. This property becomes obsolete (and might not be set) when
199  [deferred rollback protection] is used by the guest kernel;
200
201- the `/chosen/avf,strict-boot` flag, always set for protected VMs and can be
202  used by guests to enable extra validation;
203
204- the `/avf/untrusted/defer-rollback-protection` flag controls [deferred
205  rollback protection] on devices and for guests which support it;
206
207- the host-allocated `/avf/untrusted/instance-id` is used to assign a unique
208  identifier to the VM instance & is used for differentiating VM secrets as well
209  as by guest OS to index external storage such as Secretkeeper.
210
211[deferred rollback protection]: ../docs/updatable_vm.md#deferring-rollback-protection
212