xref: /aosp_15_r20/external/wayland/doc/publican/sources/Protocol.xml (revision 84e872a0dc482bffdb63672969dd03a827d67c73)
1<?xml version='1.0' encoding='utf-8' ?>
2<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
3<!ENTITY % BOOK_ENTITIES SYSTEM "Wayland.ent">
4%BOOK_ENTITIES;
5]>
6<chapter id="chap-Protocol">
7  <title>Wayland Protocol and Model of Operation</title>
8  <section id="sect-Protocol-Basic-Principles">
9    <title>Basic Principles</title>
10    <para>
11      The Wayland protocol is an asynchronous object oriented protocol.  All
12      requests are method invocations on some object.  The requests include
13      an object ID that uniquely identifies an object on the server.  Each
14      object implements an interface and the requests include an opcode that
15      identifies which method in the interface to invoke.
16    </para>
17    <para>
18      The protocol is message-based.  A message sent by a client to the server
19      is called request.  A message from the server to a client is called event.
20      A message has a number of arguments, each of which has a certain type (see
21      <xref linkend="sect-Protocol-Wire-Format"/> for a list of argument types).
22    </para>
23    <para>
24      Additionally, the protocol can specify <type>enum</type>s which associate
25      names to specific numeric enumeration values.  These are primarily just
26      descriptive in nature: at the wire format level enums are just integers.
27      But they also serve a secondary purpose to enhance type safety or
28      otherwise add context for use in language bindings or other such code.
29      This latter usage is only supported so long as code written before these
30      attributes were introduced still works after; in other words, adding an
31      enum should not break API, otherwise it puts backwards compatibility at
32      risk.
33    </para>
34    <para>
35      <type>enum</type>s can be defined as just a set of integers, or as
36      bitfields.  This is specified via the <type>bitfield</type> boolean
37      attribute in the <type>enum</type> definition.  If this attribute is true,
38      the enum is intended to be accessed primarily using bitwise operations,
39      for example when arbitrarily many choices of the enum can be ORed
40      together; if it is false, or the attribute is omitted, then the enum
41      arguments are a just a sequence of numerical values.
42    </para>
43    <para>
44      The <type>enum</type> attribute can be used on either <type>uint</type>
45      or <type>int</type> arguments, however if the <type>enum</type> is
46      defined as a <type>bitfield</type>, it can only be used on
47      <type>uint</type> args.
48    </para>
49    <para>
50      The server sends back events to the client, each event is emitted from
51      an object.  Events can be error conditions.  The event includes the
52      object ID and the event opcode, from which the client can determine
53      the type of event.  Events are generated both in response to requests
54      (in which case the request and the event constitutes a round trip) or
55      spontaneously when the server state changes.
56    </para>
57    <para>
58      <itemizedlist>
59	<listitem>
60	  <para>
61	    State is broadcast on connect, events are sent
62	    out when state changes. Clients must listen for
63	    these changes and cache the state.
64	    There is no need (or mechanism) to query server state.
65	  </para>
66	</listitem>
67	<listitem>
68	  <para>
69	    The server will broadcast the presence of a number of global objects,
70	    which in turn will broadcast their current state.
71	  </para>
72	</listitem>
73      </itemizedlist>
74    </para>
75  </section>
76  <section id="sect-Protocol-Code-Generation">
77    <title>Code Generation</title>
78    <para>
79      The interfaces, requests and events are defined in
80      <filename>protocol/wayland.xml</filename>.
81      This xml is used to generate the function prototypes that can be used by
82      clients and compositors.
83    </para>
84    <para>
85      The protocol entry points are generated as inline functions which just
86      wrap the <function>wl_proxy_*</function> functions.  The inline functions aren't
87      part of the library ABI and language bindings should generate their
88      own stubs for the protocol entry points from the xml.
89    </para>
90  </section>
91  <section id="sect-Protocol-Wire-Format">
92    <title>Wire Format</title>
93    <para>
94      The protocol is sent over a UNIX domain stream socket, where the endpoint
95      usually is named <systemitem class="service">wayland-0</systemitem>
96      (although it can be changed via <emphasis>WAYLAND_DISPLAY</emphasis>
97      in the environment). Beginning in Wayland 1.15, implementations can
98      optionally support server socket endpoints located at arbitrary
99      locations in the filesystem by setting <emphasis>WAYLAND_DISPLAY</emphasis>
100      to the absolute path at which the server endpoint listens.
101    </para>
102    <para>
103      Every message is structured as 32-bit words; values are represented in the
104      host's byte-order.  The message header has 2 words in it:
105      <itemizedlist>
106	<listitem>
107	  <para>
108	    The first word is the sender's object ID (32-bit).
109	  </para>
110	</listitem>
111	<listitem>
112	  <para>
113	    The second has 2 parts of 16-bit.  The upper 16-bits are the message
114	    size in bytes, starting at the header (i.e. it has a minimum value of 8).The lower is the request/event opcode.
115	  </para>
116	</listitem>
117      </itemizedlist>
118      The payload describes the request/event arguments.  Every argument is always
119      aligned to 32-bits. Where padding is required, the value of padding bytes is
120      undefined. There is no prefix that describes the type, but it is
121      inferred implicitly from the xml specification.
122    </para>
123    <para>
124
125      The representation of argument types are as follows:
126      <variablelist>
127	<varlistentry>
128	  <term>int</term>
129	  <term>uint</term>
130	  <listitem>
131	    <para>
132	      The value is the 32-bit value of the signed/unsigned
133	      int.
134	    </para>
135	  </listitem>
136	</varlistentry>
137	<varlistentry>
138	  <term>fixed</term>
139	  <listitem>
140	    <para>
141	      Signed 24.8 decimal numbers. It is a signed decimal type which
142	      offers a sign bit, 23 bits of integer precision and 8 bits of
143	      decimal precision. This is exposed as an opaque struct with
144	      conversion helpers to and from double and int on the C API side.
145	    </para>
146	  </listitem>
147	</varlistentry>
148	<varlistentry>
149	  <term>string</term>
150	  <listitem>
151	    <para>
152	      Starts with an unsigned 32-bit length (including null terminator),
153	      followed by the string contents, including terminating null byte,
154	      then padding to a 32-bit boundary. A null value is represented
155	      with a length of 0.
156	    </para>
157	  </listitem>
158	</varlistentry>
159	<varlistentry>
160	  <term>object</term>
161	  <listitem>
162	    <para>
163	      32-bit object ID. A null value is represented with an ID of 0.
164	    </para>
165	  </listitem>
166	</varlistentry>
167	<varlistentry>
168	  <term>new_id</term>
169	  <listitem>
170	    <para>
171	      The 32-bit object ID.  Generally, the interface used for the new
172	      object is inferred from the xml, but in the case where it's not
173	      specified, a new_id is preceded by a <code>string</code> specifying
174	      the interface name, and a <code>uint</code> specifying the version.
175	    </para>
176	  </listitem>
177	</varlistentry>
178	<varlistentry>
179	  <term>array</term>
180	  <listitem>
181	    <para>
182	      Starts with 32-bit array size in bytes, followed by the array
183	      contents verbatim, and finally padding to a 32-bit boundary.
184	    </para>
185	  </listitem>
186	</varlistentry>
187	<varlistentry>
188	  <term>fd</term>
189	  <listitem>
190	    <para>
191	      The file descriptor is not stored in the message buffer, but in
192	      the ancillary data of the UNIX domain socket message (msg_control).
193	    </para>
194	  </listitem>
195	</varlistentry>
196      </variablelist>
197    </para>
198    <para>
199      The protocol does not specify the exact position of the ancillary data
200      in the stream, except that the order of file descriptors is the same as
201      the order of messages and <code>fd</code> arguments within messages on
202      the wire.
203    </para>
204    <para>
205      In particular, it means that any byte of the stream, even the message
206      header, may carry the ancillary data with file descriptors.
207    </para>
208    <para>
209      Clients and compositors should queue incoming data until they have
210      whole messages to process, as file descriptors may arrive earlier
211      or later than the corresponding data bytes.
212    </para>
213  </section>
214  <xi:include href="ProtocolInterfaces.xml" xmlns:xi="http://www.w3.org/2001/XInclude"/>
215  <section id="sect-Protocol-Versioning">
216    <title>Versioning</title>
217    <para>
218      Every interface is versioned and every protocol object implements a
219      particular version of its interface.  For global objects, the maximum
220      version supported by the server is advertised with the global and the
221      actual version of the created protocol object is determined by the
222      version argument passed to wl_registry.bind().  For objects that are
223      not globals, their version is inferred from the object that created
224      them.
225    </para>
226    <para>
227      In order to keep things sane, this has a few implications for
228      interface versions:
229      <itemizedlist>
230	<listitem>
231	  <para>
232	    The object creation hierarchy must be a tree.  Otherwise,
233	    inferring object versions from the parent object becomes a much
234	    more difficult to properly track.
235	  </para>
236	</listitem>
237	<listitem>
238	  <para>
239	    When the version of an interface increases, so does the version
240	    of its parent (recursively until you get to a global interface)
241	  </para>
242	</listitem>
243	<listitem>
244	  <para>
245	    A global interface's version number acts like a counter for all
246	    of its child interfaces.  Whenever a child interface gets
247	    modified, the global parent's interface version number also
248	    increases (see above).  The child interface then takes on the
249	    same version number as the new version of its parent global
250	    interface.
251	  </para>
252	</listitem>
253      </itemizedlist>
254    </para>
255    <para>
256      To illustrate the above, consider the wl_compositor interface.  It
257      has two children, wl_surface and wl_region.  As of wayland version
258      1.2, wl_surface and wl_compositor are both at version 3.  If
259      something is added to the wl_region interface, both wl_region and
260      wl_compositor will get bumpped to version 4.  If, afterwards,
261      wl_surface is changed, both wl_compositor and wl_surface will be at
262      version 5.  In this way the global interface version is used as a
263      sort of "counter" for all of its child interfaces.  This makes it
264      very simple to know the version of the child given the version of its
265      parent.  The child is at the highest possible interface version that
266      is less than or equal to its parent's version.
267    </para>
268    <para>
269      It is worth noting a particular exception to the above versioning
270      scheme.  The wl_display (and, by extension, wl_registry) interface
271      cannot change because it is the core protocol object and its version
272      is never advertised nor is there a mechanism to request a different
273      version.
274    </para>
275  </section>
276  <section id="sect-Protocol-Connect-Time">
277    <title>Connect Time</title>
278    <para>
279      There is no fixed connection setup information, the server emits
280      multiple events at connect time, to indicate the presence and
281      properties of global objects: outputs, compositor, input devices.
282    </para>
283  </section>
284  <section id="sect-Protocol-Security-and-Authentication">
285    <title>Security and Authentication</title>
286    <para>
287      <itemizedlist>
288	<listitem>
289	  <para>
290	    mostly about access to underlying buffers, need new drm auth
291	    mechanism (the grant-to ioctl idea), need to check the cmd stream?
292	  </para>
293	</listitem>
294	<listitem>
295	  <para>
296	    getting the server socket depends on the compositor type, could
297	    be a system wide name, through fd passing on the session dbus.
298	    or the client is forked by the compositor and the fd is
299	    already opened.
300	  </para>
301	</listitem>
302      </itemizedlist>
303    </para>
304  </section>
305  <section id="sect-Protocol-Creating-Objects">
306    <title>Creating Objects</title>
307    <para>
308      Each object has a unique ID.  The IDs are allocated by the entity
309      creating the object (either client or server).  IDs allocated by the
310      client are in the range [1, 0xfeffffff] while IDs allocated by the
311      server are in the range [0xff000000, 0xffffffff].  The 0 ID is
312      reserved to represent a null or non-existent object.
313
314      For efficiency purposes, the IDs are densely packed in the sense that
315      the ID N will not be used until N-1 has been used.  Any ID allocation
316      algorithm that does not maintain this property is incompatible with
317      the implementation in libwayland.
318    </para>
319  </section>
320  <section id="sect-Protocol-Compositor">
321    <title>Compositor</title>
322    <para>
323      The compositor is a global object, advertised at connect time.
324    </para>
325    <para>
326      See <xref linkend="protocol-spec-wl_compositor"/> for the
327      protocol description.
328    </para>
329  </section>
330  <section id="sect-Protocol-Surface">
331    <title>Surfaces</title>
332    <para>
333      A surface manages a rectangular grid of pixels that clients create
334      for displaying their content to the screen.  Clients don't know
335      the global position of their surfaces, and cannot access other
336      clients' surfaces.
337    </para>
338    <para>
339      Once the client has finished writing pixels, it 'commits' the
340      buffer; this permits the compositor to access the buffer and read
341      the pixels.  When the compositor is finished, it releases the
342      buffer back to the client.
343    </para>
344    <para>
345      See <xref linkend="protocol-spec-wl_surface"/> for the protocol
346      description.
347    </para>
348  </section>
349  <section id="sect-Protocol-Input">
350    <title>Input</title>
351    <para>
352      A seat represents a group of input devices including mice,
353      keyboards and touchscreens. It has a keyboard and pointer
354      focus. Seats are global objects. Pointer events are delivered
355      in surface-local coordinates.
356    </para>
357    <para>
358      The compositor maintains an implicit grab when a button is
359      pressed, to ensure that the corresponding button release
360      event gets delivered to the same surface. But there is no way
361      for clients to take an explicit grab. Instead, surfaces can
362      be mapped as 'popup', which combines transient window semantics
363      with a pointer grab.
364    </para>
365    <para>
366      To avoid race conditions, input events that are likely to
367      trigger further requests (such as button presses, key events,
368      pointer motions) carry serial numbers, and requests such as
369      wl_surface.set_popup require that the serial number of the
370      triggering event is specified. The server maintains a
371      monotonically increasing counter for these serial numbers.
372    </para>
373    <para>
374      Input events also carry timestamps with millisecond granularity.
375      Their base is undefined, so they can't be compared against
376      system time (as obtained with clock_gettime or gettimeofday).
377      They can be compared with each other though, and for instance
378      be used to identify sequences of button presses as double
379      or triple clicks.
380    </para>
381    <para>
382      See <xref linkend="protocol-spec-wl_seat"/> for the
383      protocol description.
384    </para>
385    <para>
386      Talk about:
387
388      <itemizedlist>
389	<listitem>
390	  <para>
391	    keyboard map, change events
392	  </para>
393	</listitem>
394	<listitem>
395	  <para>
396	    xkb on Wayland
397	  </para>
398	</listitem>
399	<listitem>
400	  <para>
401	    multi pointer Wayland
402	  </para>
403	</listitem>
404      </itemizedlist>
405    </para>
406    <para>
407      A surface can change the pointer image when the surface is the pointer
408      focus of the input device.  Wayland doesn't automatically change the
409      pointer image when a pointer enters a surface, but expects the
410      application to set the cursor it wants in response to the pointer
411      focus and motion events.  The rationale is that a client has to manage
412      changing pointer images for UI elements within the surface in response
413      to motion events anyway, so we'll make that the only mechanism for
414      setting or changing the pointer image.  If the server receives a request
415      to set the pointer image after the surface loses pointer focus, the
416      request is ignored.  To the client this will look like it successfully
417      set the pointer image.
418    </para>
419    <para>
420      Setting the pointer image to NULL causes the cursor to be hidden.
421    </para>
422    <para>
423      The compositor will revert the pointer image back to a default image
424      when no surface has the pointer focus for that device.
425    </para>
426    <para>
427      What if the pointer moves from one window which has set a special
428      pointer image to a surface that doesn't set an image in response to
429      the motion event?  The new surface will be stuck with the special
430      pointer image.  We can't just revert the pointer image on leaving a
431      surface, since if we immediately enter a surface that sets a different
432      image, the image will flicker.  If a client does not set a pointer image
433      when the pointer enters a surface, the pointer stays with the image set
434      by the last surface that changed it, possibly even hidden.  Such a client
435      is likely just broken.
436    </para>
437  </section>
438  <section id="sect-Protocol-Output">
439    <title>Output</title>
440    <para>
441      An output is a global object, advertised at connect time or as it
442      comes and goes.
443    </para>
444    <para>
445      See <xref linkend="protocol-spec-wl_output"/> for the protocol
446      description.
447    </para>
448    <para>
449    </para>
450    <itemizedlist>
451      <listitem>
452	<para>
453	  laid out in a big (compositor) coordinate system
454	</para>
455      </listitem>
456      <listitem>
457	<para>
458	  basically xrandr over Wayland
459	</para>
460      </listitem>
461      <listitem>
462	<para>
463	  geometry needs position in compositor coordinate system
464	</para>
465      </listitem>
466      <listitem>
467	<para>
468	  events to advertise available modes, requests to move and change
469	  modes
470	</para>
471      </listitem>
472    </itemizedlist>
473  </section>
474  <section id="sect-Protocol-data-sharing">
475    <title>Data sharing between clients</title>
476    <para>
477      The Wayland protocol provides clients a mechanism for sharing
478      data that allows the implementation of copy-paste and
479      drag-and-drop. The client providing the data creates a
480      <function>wl_data_source</function> object and the clients
481      obtaining the data will see it as <function>wl_data_offer</function>
482      object. This interface allows the clients to agree on a mutually
483      supported mime type and transfer the data via a file descriptor
484      that is passed through the protocol.
485    </para>
486    <para>
487      The next section explains the negotiation between data source and
488      data offer objects. <xref linkend="sect-Protocol-data-sharing-devices"/>
489      explains how these objects are created and passed to different
490      clients using the <function>wl_data_device</function> interface
491      that implements copy-paste and drag-and-drop support.
492    </para>
493    <para>
494      See <xref linkend="protocol-spec-wl_data_offer"/>,
495      <xref linkend="protocol-spec-wl_data_source"/>,
496      <xref linkend="protocol-spec-wl_data_device"/> and
497      <xref linkend="protocol-spec-wl_data_device_manager"/> for
498      protocol descriptions.
499    </para>
500    <para>
501      MIME is defined in RFC's 2045-2049. A
502      <ulink url="https://www.iana.org/assignments/media-types/media-types.xhtml">
503      registry of MIME types</ulink> is maintained by the Internet Assigned
504      Numbers Authority (IANA).
505    </para>
506    <section>
507      <title>Data negotiation</title>
508      <para>
509	A client providing data to other clients will create a <function>wl_data_source</function>
510	object and advertise the mime types for the formats it supports for
511	that data through the <function>wl_data_source.offer</function>
512	request. On the receiving end, the data offer object will generate one
513	<function>wl_data_offer.offer</function> event for each supported mime
514	type.
515      </para>
516      <para>
517	The actual data transfer happens when the receiving client sends a
518	<function>wl_data_offer.receive</function> request. This request takes
519	a mime type and a file descriptor as arguments. This request will generate a
520	<function>wl_data_source.send</function> event on the sending client
521	with the same arguments, and the latter client is expected to write its
522	data to the given file descriptor using the chosen mime type.
523      </para>
524    </section>
525    <section id="sect-Protocol-data-sharing-devices">
526      <title>Data devices</title>
527      <para>
528	Data devices glue data sources and offers together. A data device is
529	associated with a <function>wl_seat</function> and is obtained by the clients using the
530	<function>wl_data_device_manager</function> factory object, which is also responsible for
531	creating data sources.
532      </para>
533      <para>
534	Clients are informed of new data offers through the
535	<function>wl_data_device.data_offer</function> event. After this
536	event is generated the data offer will advertise the available mime
537	types. New data offers are introduced prior to their use for
538	copy-paste or drag-and-drop.
539      </para>
540      <section>
541	<title>Selection</title>
542	<para>
543	  Each data device has a selection data source. Clients create a data
544	  source object using the device manager and may set it as the
545	  current selection for a given data device. Whenever the current
546	  selection changes, the client with keyboard focus receives a
547	  <function>wl_data_device.selection</function> event. This event is
548	  also generated on a client immediately before it receives keyboard
549	  focus.
550	</para>
551	<para>
552	  The data offer is introduced with
553	  <function>wl_data_device.data_offer</function> event before the
554	  selection event.
555	</para>
556      </section>
557      <section>
558	<title>Drag and Drop</title>
559	<para>
560	  A drag-and-drop operation is started using the
561	  <function>wl_data_device.start_drag</function> request. This
562	  requests causes a pointer grab that will generate enter, motion and
563	  leave events on the data device. A data source is supplied as
564	  argument to start_drag, and data offers associated with it are
565	  supplied to clients surfaces under the pointer in the
566	  <function>wl_data_device.enter</function> event. The data offer
567	  is introduced to the client prior to the enter event with the
568	  <function>wl_data_device.data_offer</function> event.
569	</para>
570	<para>
571	  Clients are expected to provide feedback to the data sending client
572	  by calling the <function>wl_data_offer.accept</function> request with
573	  a mime type it accepts. If none of the advertised mime types is
574	  supported by the receiving client, it should supply NULL to the
575	  accept request. The accept request causes the sending client to
576	  receive a <function>wl_data_source.target</function> event with the
577	  chosen mime type.
578	</para>
579	<para>
580	  When the drag ends, the receiving client receives a
581	  <function>wl_data_device.drop</function> event at which it is expected
582	  to transfer the data using the
583	  <function>wl_data_offer.receive</function> request.
584	</para>
585      </section>
586    </section>
587  </section>
588</chapter>
589