1.. SPDX-License-Identifier: GPL-2.0 2 3================== 4AF_XDP TX Metadata 5================== 6 7This document describes how to enable offloads when transmitting packets 8via :doc:`af_xdp`. Refer to :doc:`xdp-rx-metadata` on how to access similar 9metadata on the receive side. 10 11General Design 12============== 13 14The headroom for the metadata is reserved via ``tx_metadata_len`` and 15``XDP_UMEM_TX_METADATA_LEN`` flag in ``struct xdp_umem_reg``. The metadata 16length is therefore the same for every socket that shares the same umem. 17The metadata layout is a fixed UAPI, refer to ``union xsk_tx_metadata`` in 18``include/uapi/linux/if_xdp.h``. Thus, generally, the ``tx_metadata_len`` 19field above should contain ``sizeof(union xsk_tx_metadata)``. 20 21Note that in the original implementation the ``XDP_UMEM_TX_METADATA_LEN`` 22flag was not required. Applications might attempt to create a umem 23with a flag first and if it fails, do another attempt without a flag. 24 25The headroom and the metadata itself should be located right before 26``xdp_desc->addr`` in the umem frame. Within a frame, the metadata 27layout is as follows:: 28 29 tx_metadata_len 30 / \ 31 +-----------------+---------+----------------------------+ 32 | xsk_tx_metadata | padding | payload | 33 +-----------------+---------+----------------------------+ 34 ^ 35 | 36 xdp_desc->addr 37 38An AF_XDP application can request headrooms larger than ``sizeof(struct 39xsk_tx_metadata)``. The kernel will ignore the padding (and will still 40use ``xdp_desc->addr - tx_metadata_len`` to locate 41the ``xsk_tx_metadata``). For the frames that shouldn't carry 42any metadata (i.e., the ones that don't have ``XDP_TX_METADATA`` option), 43the metadata area is ignored by the kernel as well. 44 45The flags field enables the particular offload: 46 47- ``XDP_TXMD_FLAGS_TIMESTAMP``: requests the device to put transmission 48 timestamp into ``tx_timestamp`` field of ``union xsk_tx_metadata``. 49- ``XDP_TXMD_FLAGS_CHECKSUM``: requests the device to calculate L4 50 checksum. ``csum_start`` specifies byte offset of where the checksumming 51 should start and ``csum_offset`` specifies byte offset where the 52 device should store the computed checksum. 53- ``XDP_TXMD_FLAGS_LAUNCH_TIME``: requests the device to schedule the 54 packet for transmission at a pre-determined time called launch time. The 55 value of launch time is indicated by ``launch_time`` field of 56 ``union xsk_tx_metadata``. 57 58Besides the flags above, in order to trigger the offloads, the first 59packet's ``struct xdp_desc`` descriptor should set ``XDP_TX_METADATA`` 60bit in the ``options`` field. Also note that in a multi-buffer packet 61only the first chunk should carry the metadata. 62 63Software TX Checksum 64==================== 65 66For development and testing purposes its possible to pass 67``XDP_UMEM_TX_SW_CSUM`` flag to ``XDP_UMEM_REG`` UMEM registration call. 68In this case, when running in ``XDK_COPY`` mode, the TX checksum 69is calculated on the CPU. Do not enable this option in production because 70it will negatively affect performance. 71 72Launch Time 73=========== 74 75The value of the requested launch time should be based on the device's PTP 76Hardware Clock (PHC) to ensure accuracy. AF_XDP takes a different data path 77compared to the ETF queuing discipline, which organizes packets and delays 78their transmission. Instead, AF_XDP immediately hands off the packets to 79the device driver without rearranging their order or holding them prior to 80transmission. Since the driver maintains FIFO behavior and does not perform 81packet reordering, a packet with a launch time request will block other 82packets in the same Tx Queue until it is sent. Therefore, it is recommended 83to allocate separate queue for scheduling traffic that is intended for 84future transmission. 85 86In scenarios where the launch time offload feature is disabled, the device 87driver is expected to disregard the launch time request. For correct 88interpretation and meaningful operation, the launch time should never be 89set to a value larger than the farthest programmable time in the future 90(the horizon). Different devices have different hardware limitations on the 91launch time offload feature. 92 93stmmac driver 94------------- 95 96For stmmac, TSO and launch time (TBS) features are mutually exclusive for 97each individual Tx Queue. By default, the driver configures Tx Queue 0 to 98support TSO and the rest of the Tx Queues to support TBS. The launch time 99hardware offload feature can be enabled or disabled by using the tc-etf 100command to call the driver's ndo_setup_tc() callback. 101 102The value of the launch time that is programmed in the Enhanced Normal 103Transmit Descriptors is a 32-bit value, where the most significant 8 bits 104represent the time in seconds and the remaining 24 bits represent the time 105in 256 ns increments. The programmed launch time is compared against the 106PTP time (bits[39:8]) and rolls over after 256 seconds. Therefore, the 107horizon of the launch time for dwmac4 and dwxlgmac2 is 128 seconds in the 108future. 109 110igc driver 111---------- 112 113For igc, all four Tx Queues support the launch time feature. The launch 114time hardware offload feature can be enabled or disabled by using the 115tc-etf command to call the driver's ndo_setup_tc() callback. When entering 116TSN mode, the igc driver will reset the device and create a default Qbv 117schedule with a 1-second cycle time, with all Tx Queues open at all times. 118 119The value of the launch time that is programmed in the Advanced Transmit 120Context Descriptor is a relative offset to the starting time of the Qbv 121transmission window of the queue. The Frst flag of the descriptor can be 122set to schedule the packet for the next Qbv cycle. Therefore, the horizon 123of the launch time for i225 and i226 is the ending time of the next cycle 124of the Qbv transmission window of the queue. For example, when the Qbv 125cycle time is set to 1 second, the horizon of the launch time ranges 126from 1 second to 2 seconds, depending on where the Qbv cycle is currently 127running. 128 129Querying Device Capabilities 130============================ 131 132Every devices exports its offloads capabilities via netlink netdev family. 133Refer to ``xsk-flags`` features bitmask in 134``Documentation/netlink/specs/netdev.yaml``. 135 136- ``tx-timestamp``: device supports ``XDP_TXMD_FLAGS_TIMESTAMP`` 137- ``tx-checksum``: device supports ``XDP_TXMD_FLAGS_CHECKSUM`` 138- ``tx-launch-time-fifo``: device supports ``XDP_TXMD_FLAGS_LAUNCH_TIME`` 139 140See ``tools/net/ynl/samples/netdev.c`` on how to query this information. 141 142Example 143======= 144 145See ``tools/testing/selftests/bpf/xdp_hw_metadata.c`` for an example 146program that handles TX metadata. Also see https://github.com/fomichev/xskgen 147for a more bare-bones example. 148