README.md
1# virtio-queue
2
3The `virtio-queue` crate provides a virtio device implementation for a virtio
4queue, a virtio descriptor and a chain of such descriptors.
5Two formats of virtio queues are defined in the specification: split virtqueues
6and packed virtqueues. The `virtio-queue` crate offers support only for the
7[split virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-240006)
8format.
9The purpose of the virtio-queue API is to be consumed by virtio device
10implementations (such as the block device or vsock device).
11The main abstraction is the `Queue`. The crate is also defining a state object
12for the queue, i.e. `QueueState`.
13
14## Usage
15
16Let’s take a concrete example of how a device would work with a queue, using
17the MMIO bus.
18
19First, it is important to mention that the mandatory parts of the virtio
20interface are the following:
21
22- the device status field → provides an indication of
23 [the completed steps](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001)
24 of the device initialization routine,
25- the feature bits →
26 [the features](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-100001)
27 the driver/device understand(s),
28- [notifications](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-170003),
29- one or more
30 [virtqueues](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-230005)
31 → the mechanism for data transport between the driver and device.
32
33Each virtqueue consists of three parts:
34
35- Descriptor Table,
36- Available Ring,
37- Used Ring.
38
39Before booting the virtual machine (VM), the VMM does the following set up:
40
411. initialize an array of Queues using the Queue constructor.
422. register the device to the MMIO bus, so that the driver can later send
43 read/write requests from/to the MMIO space, some of those requests also set
44 up the queues’ state.
453. other pre-boot configurations, such as registering a fd for the interrupt
46 assigned to the device, fd which will be later used by the device to inform
47 the driver that it has information to communicate.
48
49After the boot of the VM, the driver starts sending read/write requests to
50configure things like:
51
52* the supported features;
53* queue parameters. The following setters are used for the queue set up:
54 * `set_size` → for setting the size of the queue.
55 * `set_ready` → configure the queue to the `ready for processing` state.
56 * `set_desc_table_address`, `set_avail_ring_address`,
57 `set_used_ring_address` → configure the guest address of the constituent
58 parts of the queue.
59 * `set_event_idx` → it is called as part of the features' negotiation in
60 the `virtio-device` crate, and is enabling or disabling the
61 VIRTIO_F_RING_EVENT_IDX feature.
62* the device activation. As part of this activation, the device can also create
63 a queue handler for the device, that can be later used to process the queue.
64
65Once the queues are ready, the device can be used.
66
67The steady state operation of a virtio device follows a model where the driver
68produces descriptor chains which are consumed by the device, and both parties
69need to be notified when new elements have been placed on the associate ring to
70avoid busy polling. The precise notification mechanism is left up to the VMM
71that incorporates the devices and queues (it usually involves things like MMIO
72vm exits and interrupt injection into the guest). The queue implementation is
73agnostic to the notification mechanism in use, and it exposes methods and
74functionality (such as iterators) that are called from the outside in response
75to a notification event.
76
77### Data transmission using virtqueues
78
79The basic principle of how the queues are used by the device/driver is the
80following, as showed in the diagram below as well:
81
821. when the guest driver has a new request (buffer), it allocates free
83 descriptor(s) for the buffer in the descriptor table, chaining as necessary.
842. the driver adds a new entry with the head index of the descriptor chain
85 describing the request, in the available ring entries.
863. the driver increments the `idx` with the number of new entries, the diagram
87 shows the simple use case of only one new entry.
884. the driver sends an available buffer notification to the device if such
89 notifications are not suppressed.
905. the device will at some point consume that request, by first reading the
91 `idx` field from the available ring. This can be directly achieved with
92 `Queue::avail_idx`, but we do not recommend to the consumers of the crate
93 to use this because it is already called behind the scenes by the iterator
94 over all available descriptor chain heads.
956. the device gets the index of the descriptor chain(s) corresponding to the
96 read `idx` value.
977. the device reads the corresponding descriptor(s) from the descriptor table.
988. the device adds a new entry in the used ring by using `Queue::add_used`; the
99 entry is defined in the spec as `virtq_used_elem`, and in `virtio-queue` as
100 `VirtqUsedElem`. This structure is holding both the index of the descriptor
101 chain and the number of bytes that were written to the memory as part of
102 serving the request.
1039. the device increments the `idx` from the used ring; this is done as part of
104 the `Queue::add_used` that was mentioned above.
10510. the device sends a used buffer notification to the driver if such
106 notifications are not suppressed.
107
108
109
110A descriptor is storing four fields, with the first two, `addr` and `len`,
111pointing to the data in memory to which the descriptor refers, as shown in the
112diagram below. The `flags` field is useful for indicating if, for example, the
113buffer is device readable or writable, or if we have another descriptor chained
114after this one (VIRTQ_DESC_F_NEXT flag set). `next` field is storing the index
115of the next descriptor if VIRTQ_DESC_F_NEXT is set.
116
117
118
119**Requirements for device implementation**
120
121* Abstractions from virtio-queue such as `DescriptorChain` can be used to parse
122 descriptors provided by the device, which represent input or output memory
123 areas for device I/O. A descriptor is essentially an (address, length) pair,
124 which is subsequently used by the device model operation. We do not check the
125 validity of the descriptors, and instead expect any validations to happen
126 when the device implementation is attempting to access the corresponding
127 areas. Early checks can add non-negligible additional costs, and exclusively
128 relying upon them may lead to time-of-check-to-time-of-use race conditions.
129* The device should validate before reading/writing to a buffer that it is
130 device-readable/device-writable.
131
132## Design
133
134`QueueT` is a trait that allows different implementations for a `Queue`
135object for single-threaded context and multi-threaded context. The
136implementations provided in `virtio-queue` are:
137
1381. `Queue` → it is used for the single-threaded context.
1392. `QueueSync` → it is used for the multi-threaded context, and is simply
140 a wrapper over an `Arc<Mutex<Queue>>`.
141
142Besides the above abstractions, the `virtio-queue` crate provides also the
143following ones:
144
145* `Descriptor` → which mostly offers accessors for the members of the
146 `Descriptor`.
147* `DescriptorChain` → provides accessors for the `DescriptorChain`’s members
148 and an `Iterator` implementation for iterating over the `DescriptorChain`,
149 there is also an abstraction for iterators over just the device readable or
150 just the device writable descriptors (`DescriptorChainRwIter`).
151* `AvailIter` - is a consuming iterator over all available descriptor chain
152 heads in the queue.
153
154## Save/Restore Queue
155
156The `Queue` allows saving the state through the `state` function which returns
157a `QueueState`. `Queue` objects can be created from a previously saved state by
158using `QueueState::try_from`. The VMM should check for errors when restoring
159a `Queue` from a previously saved state.
160
161### Notification suppression
162
163A big part of the `virtio-queue` crate consists of the notification suppression
164support. As already mentioned, the driver can send an available buffer
165notification to the device when there are new entries in the available ring,
166and the device can send a used buffer notification to the driver when there are
167new entries in the used ring. There might be cases when sending a notification
168each time these scenarios happen is not efficient, for example when the driver
169is processing the used ring, it would not need to receive another used buffer
170notification. The mechanism for suppressing the notifications is detailed in
171the following sections from the specification:
172- [Used Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-400007),
173- [Available Buffer Notification Suppression](https://docs.oasis-open.org/virtio/virtio/v1.1/csprd01/virtio-v1.1-csprd01.html#x1-4800010).
174
175The `Queue` abstraction is proposing the following sequence of steps for
176processing new available ring entries:
177
1781. the device first disables the notifications to make the driver aware it is
179 processing the available ring and does not want interruptions, by using
180 `Queue::disable_notification`. Notifications are disabled by the device
181 either if VIRTIO_F_EVENT_IDX is not negotiated, and VIRTQ_USED_F_NO_NOTIFY
182 is set in the `flags` field of the used ring, or if VIRTIO_F_EVENT_IDX is
183 negotiated, and `avail_event` value is not updated, i.e. it remains set to
184 the latest `idx` value of the available ring that was already notified by
185 the driver.
1862. the device processes the new entries by using the `AvailIter` iterator.
1873. the device can enable the notifications now, by using
188 `Queue::enable_notification`. Notifications are enabled by the device either
189 if VIRTIO_F_EVENT_IDX is not negotiated, and 0 is set in the `flags` field
190 of the used ring, or if VIRTIO_F_EVENT_IDX is negotiated, and `avail_event`
191 value is set to the smallest `idx` value of the available ring that was not
192 already notified by the driver. This way the device makes sure that it won’t
193 miss any notification.
194
195The above steps should be done in a loop to also handle the less likely case
196where the driver added new entries just before we re-enabled notifications.
197
198On the driver side, the `Queue` provides the `needs_notification` method which
199should be used each time the device adds a new entry to the used ring.
200Depending on the `used_event` value and on the last used value
201(`signalled_used`), `needs_notification` returns true to let the device know it
202should send a notification to the guest.
203
204## Assumptions
205
206We assume the users of the `Queue` implementation won’t attempt to use the
207queue before checking that the `ready` bit is set. This can be verified by
208calling `Queue::is_valid` which, besides this, is also checking that the three
209queue parts are valid memory regions.
210We assume consumers will use `AvailIter::go_to_previous_position` only in
211single-threaded contexts.
212We assume the users will consume the entries from the available ring in the
213recommended way from the documentation, i.e. device starts processing the
214available ring entries, disables the notifications, processes the entries,
215and then re-enables notifications.
216
217## License
218
219This project is licensed under either of
220
221- [Apache License](http://www.apache.org/licenses/LICENSE-2.0), Version 2.0
222- [BSD-3-Clause License](https://opensource.org/licenses/BSD-3-Clause)
223