xref: /aosp_15_r20/external/harfbuzz_ng/docs/usermanual-clusters.xml (revision 2d1272b857b1f7575e6e246373e1cb218663db8a)
1*2d1272b8SAndroid Build Coastguard Worker<?xml version="1.0"?>
2*2d1272b8SAndroid Build Coastguard Worker<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN"
3*2d1272b8SAndroid Build Coastguard Worker               "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [
4*2d1272b8SAndroid Build Coastguard Worker  <!ENTITY % local.common.attrib "xmlns:xi  CDATA  #FIXED 'http://www.w3.org/2003/XInclude'">
5*2d1272b8SAndroid Build Coastguard Worker  <!ENTITY version SYSTEM "version.xml">
6*2d1272b8SAndroid Build Coastguard Worker]>
7*2d1272b8SAndroid Build Coastguard Worker<chapter id="clusters">
8*2d1272b8SAndroid Build Coastguard Worker  <title>Clusters</title>
9*2d1272b8SAndroid Build Coastguard Worker  <section id="clusters-and-shaping">
10*2d1272b8SAndroid Build Coastguard Worker    <title>Clusters and shaping</title>
11*2d1272b8SAndroid Build Coastguard Worker    <para>
12*2d1272b8SAndroid Build Coastguard Worker      In text shaping, a <emphasis>cluster</emphasis> is a sequence of
13*2d1272b8SAndroid Build Coastguard Worker      characters that needs to be treated as a single, indivisible
14*2d1272b8SAndroid Build Coastguard Worker      unit. A single letter or symbol can be a cluster of its
15*2d1272b8SAndroid Build Coastguard Worker      own. Other clusters correspond to longer subsequences of the
16*2d1272b8SAndroid Build Coastguard Worker      input code points &mdash; such as a ligature or conjunct form
17*2d1272b8SAndroid Build Coastguard Worker      &mdash; and require the shaper to ensure that the cluster is not
18*2d1272b8SAndroid Build Coastguard Worker      broken during the shaping process.
19*2d1272b8SAndroid Build Coastguard Worker    </para>
20*2d1272b8SAndroid Build Coastguard Worker    <para>
21*2d1272b8SAndroid Build Coastguard Worker      A cluster is distinct from a <emphasis>grapheme</emphasis>,
22*2d1272b8SAndroid Build Coastguard Worker      which is the smallest unit of meaning in a writing system or
23*2d1272b8SAndroid Build Coastguard Worker      script.
24*2d1272b8SAndroid Build Coastguard Worker    </para>
25*2d1272b8SAndroid Build Coastguard Worker    <para>
26*2d1272b8SAndroid Build Coastguard Worker      The definitions of the two terms are similar. However, clusters
27*2d1272b8SAndroid Build Coastguard Worker      are only relevant for script shaping and glyph layout. In
28*2d1272b8SAndroid Build Coastguard Worker      contrast, graphemes are a property of the underlying script, and
29*2d1272b8SAndroid Build Coastguard Worker      are of interest when client programs implement orthographic
30*2d1272b8SAndroid Build Coastguard Worker      or linguistic functionality.
31*2d1272b8SAndroid Build Coastguard Worker    </para>
32*2d1272b8SAndroid Build Coastguard Worker    <para>
33*2d1272b8SAndroid Build Coastguard Worker      For example, two individual letters are often two separate
34*2d1272b8SAndroid Build Coastguard Worker      graphemes. When two letters form a ligature, however, they
35*2d1272b8SAndroid Build Coastguard Worker      combine into a single glyph. They are then part of the same
36*2d1272b8SAndroid Build Coastguard Worker      cluster and are treated as a unit by the shaping engine &mdash;
37*2d1272b8SAndroid Build Coastguard Worker      even though the two original, underlying letters remain separate
38*2d1272b8SAndroid Build Coastguard Worker      graphemes.
39*2d1272b8SAndroid Build Coastguard Worker    </para>
40*2d1272b8SAndroid Build Coastguard Worker    <para>
41*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz is concerned with clusters, <emphasis>not</emphasis>
42*2d1272b8SAndroid Build Coastguard Worker      with graphemes &mdash; although client programs using HarfBuzz
43*2d1272b8SAndroid Build Coastguard Worker      may still care about graphemes for other reasons from time to time.
44*2d1272b8SAndroid Build Coastguard Worker    </para>
45*2d1272b8SAndroid Build Coastguard Worker    <para>
46*2d1272b8SAndroid Build Coastguard Worker      During the shaping process, there are several shaping operations
47*2d1272b8SAndroid Build Coastguard Worker      that may merge adjacent characters (for example, when two code
48*2d1272b8SAndroid Build Coastguard Worker      points form a ligature or a conjunct form and are replaced by a
49*2d1272b8SAndroid Build Coastguard Worker      single glyph) or split one character into several (for example,
50*2d1272b8SAndroid Build Coastguard Worker      when decomposing a code point through the
51*2d1272b8SAndroid Build Coastguard Worker      <literal>ccmp</literal> feature). Operations like these alter
52*2d1272b8SAndroid Build Coastguard Worker      clusters; HarfBuzz tracks the changes to ensure that no clusters
53*2d1272b8SAndroid Build Coastguard Worker      get lost or broken during shaping.
54*2d1272b8SAndroid Build Coastguard Worker    </para>
55*2d1272b8SAndroid Build Coastguard Worker    <para>
56*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz records cluster information independently from how
57*2d1272b8SAndroid Build Coastguard Worker      shaping operations affect the individual glyphs returned in an
58*2d1272b8SAndroid Build Coastguard Worker      output buffer. Consequently, a client program using HarfBuzz can
59*2d1272b8SAndroid Build Coastguard Worker      utilize the cluster information to implement features such as:
60*2d1272b8SAndroid Build Coastguard Worker    </para>
61*2d1272b8SAndroid Build Coastguard Worker    <itemizedlist>
62*2d1272b8SAndroid Build Coastguard Worker      <listitem>
63*2d1272b8SAndroid Build Coastguard Worker	<para>
64*2d1272b8SAndroid Build Coastguard Worker	  Correctly positioning the cursor within a shaped text run,
65*2d1272b8SAndroid Build Coastguard Worker	  even when characters have formed ligatures, composed or
66*2d1272b8SAndroid Build Coastguard Worker	  decomposed, reordered, or undergone other shaping operations.
67*2d1272b8SAndroid Build Coastguard Worker	</para>
68*2d1272b8SAndroid Build Coastguard Worker      </listitem>
69*2d1272b8SAndroid Build Coastguard Worker      <listitem>
70*2d1272b8SAndroid Build Coastguard Worker	<para>
71*2d1272b8SAndroid Build Coastguard Worker	  Correctly highlighting a text selection that includes some,
72*2d1272b8SAndroid Build Coastguard Worker	  but not all, of the characters in a word.
73*2d1272b8SAndroid Build Coastguard Worker	</para>
74*2d1272b8SAndroid Build Coastguard Worker      </listitem>
75*2d1272b8SAndroid Build Coastguard Worker      <listitem>
76*2d1272b8SAndroid Build Coastguard Worker	<para>
77*2d1272b8SAndroid Build Coastguard Worker	  Applying text attributes (such as color or underlining) to
78*2d1272b8SAndroid Build Coastguard Worker	  part, but not all, of a word.
79*2d1272b8SAndroid Build Coastguard Worker	</para>
80*2d1272b8SAndroid Build Coastguard Worker      </listitem>
81*2d1272b8SAndroid Build Coastguard Worker      <listitem>
82*2d1272b8SAndroid Build Coastguard Worker	<para>
83*2d1272b8SAndroid Build Coastguard Worker	  Generating output document formats (such as PDF) with
84*2d1272b8SAndroid Build Coastguard Worker	  embedded text that can be fully extracted.
85*2d1272b8SAndroid Build Coastguard Worker	</para>
86*2d1272b8SAndroid Build Coastguard Worker      </listitem>
87*2d1272b8SAndroid Build Coastguard Worker      <listitem>
88*2d1272b8SAndroid Build Coastguard Worker	<para>
89*2d1272b8SAndroid Build Coastguard Worker	  Determining the mapping between input characters and output
90*2d1272b8SAndroid Build Coastguard Worker	  glyphs, such as which glyphs are ligatures.
91*2d1272b8SAndroid Build Coastguard Worker	</para>
92*2d1272b8SAndroid Build Coastguard Worker      </listitem>
93*2d1272b8SAndroid Build Coastguard Worker      <listitem>
94*2d1272b8SAndroid Build Coastguard Worker	<para>
95*2d1272b8SAndroid Build Coastguard Worker	  Performing line-breaking, justification, and other
96*2d1272b8SAndroid Build Coastguard Worker	  line-level or paragraph-level operations that must be done
97*2d1272b8SAndroid Build Coastguard Worker	  after shaping is complete, but which require examining
98*2d1272b8SAndroid Build Coastguard Worker	  character-level properties.
99*2d1272b8SAndroid Build Coastguard Worker	</para>
100*2d1272b8SAndroid Build Coastguard Worker      </listitem>
101*2d1272b8SAndroid Build Coastguard Worker    </itemizedlist>
102*2d1272b8SAndroid Build Coastguard Worker  </section>
103*2d1272b8SAndroid Build Coastguard Worker  <section id="working-with-harfbuzz-clusters">
104*2d1272b8SAndroid Build Coastguard Worker    <title>Working with HarfBuzz clusters</title>
105*2d1272b8SAndroid Build Coastguard Worker    <para>
106*2d1272b8SAndroid Build Coastguard Worker      When you add text to a HarfBuzz buffer, each code point must be
107*2d1272b8SAndroid Build Coastguard Worker      assigned a <emphasis>cluster value</emphasis>.
108*2d1272b8SAndroid Build Coastguard Worker    </para>
109*2d1272b8SAndroid Build Coastguard Worker    <para>
110*2d1272b8SAndroid Build Coastguard Worker      This cluster value is an arbitrary number; HarfBuzz uses it only
111*2d1272b8SAndroid Build Coastguard Worker      to distinguish between clusters. Many client programs will use
112*2d1272b8SAndroid Build Coastguard Worker      the index of each code point in the input text stream as the
113*2d1272b8SAndroid Build Coastguard Worker      cluster value. This is for the sake of convenience; the actual
114*2d1272b8SAndroid Build Coastguard Worker      value does not matter.
115*2d1272b8SAndroid Build Coastguard Worker    </para>
116*2d1272b8SAndroid Build Coastguard Worker    <para>
117*2d1272b8SAndroid Build Coastguard Worker      Some of the shaping operations performed by HarfBuzz &mdash;
118*2d1272b8SAndroid Build Coastguard Worker      such as reordering, composition, decomposition, and substitution
119*2d1272b8SAndroid Build Coastguard Worker      &mdash; may alter the cluster values of some characters. The
120*2d1272b8SAndroid Build Coastguard Worker      final cluster values in the buffer at the end of the shaping
121*2d1272b8SAndroid Build Coastguard Worker      process will indicate to client programs which subsequences of
122*2d1272b8SAndroid Build Coastguard Worker      glyphs represent a cluster and, therefore, must not be
123*2d1272b8SAndroid Build Coastguard Worker      separated.
124*2d1272b8SAndroid Build Coastguard Worker    </para>
125*2d1272b8SAndroid Build Coastguard Worker    <para>
126*2d1272b8SAndroid Build Coastguard Worker      In addition, client programs can query the final cluster values
127*2d1272b8SAndroid Build Coastguard Worker      to discern other potentially important information about the
128*2d1272b8SAndroid Build Coastguard Worker      glyphs in the output buffer (such as whether or not a ligature
129*2d1272b8SAndroid Build Coastguard Worker      was formed).
130*2d1272b8SAndroid Build Coastguard Worker    </para>
131*2d1272b8SAndroid Build Coastguard Worker    <para>
132*2d1272b8SAndroid Build Coastguard Worker      For example, if the initial sequence of cluster values was:
133*2d1272b8SAndroid Build Coastguard Worker    </para>
134*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
135*2d1272b8SAndroid Build Coastguard Worker      0,1,2,3,4
136*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
137*2d1272b8SAndroid Build Coastguard Worker    <para>
138*2d1272b8SAndroid Build Coastguard Worker      and the final sequence of cluster values is:
139*2d1272b8SAndroid Build Coastguard Worker    </para>
140*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
141*2d1272b8SAndroid Build Coastguard Worker      0,0,3,3
142*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
143*2d1272b8SAndroid Build Coastguard Worker    <para>
144*2d1272b8SAndroid Build Coastguard Worker      then there are two clusters in the output buffer: the first
145*2d1272b8SAndroid Build Coastguard Worker      cluster includes the first two glyphs, and the second cluster
146*2d1272b8SAndroid Build Coastguard Worker      includes the third and fourth glyphs. It is also evident that a
147*2d1272b8SAndroid Build Coastguard Worker      ligature or conjunct has been formed, because there are fewer
148*2d1272b8SAndroid Build Coastguard Worker      glyphs in the output buffer (four) than there were code points
149*2d1272b8SAndroid Build Coastguard Worker      in the input buffer (five).
150*2d1272b8SAndroid Build Coastguard Worker    </para>
151*2d1272b8SAndroid Build Coastguard Worker    <para>
152*2d1272b8SAndroid Build Coastguard Worker      Although client programs using HarfBuzz are free to assign
153*2d1272b8SAndroid Build Coastguard Worker      initial cluster values in any manner they choose to, HarfBuzz
154*2d1272b8SAndroid Build Coastguard Worker      does offer some useful guarantees if the cluster values are
155*2d1272b8SAndroid Build Coastguard Worker      assigned in a monotonic (either non-decreasing or non-increasing)
156*2d1272b8SAndroid Build Coastguard Worker      order.
157*2d1272b8SAndroid Build Coastguard Worker    </para>
158*2d1272b8SAndroid Build Coastguard Worker    <para>
159*2d1272b8SAndroid Build Coastguard Worker      For buffers in the left-to-right (LTR)
160*2d1272b8SAndroid Build Coastguard Worker      or top-to-bottom (TTB) text flow direction,
161*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz will preserve the monotonic property: client programs
162*2d1272b8SAndroid Build Coastguard Worker      are guaranteed that monotonically increasing initial cluster
163*2d1272b8SAndroid Build Coastguard Worker      values will be returned as monotonically increasing final
164*2d1272b8SAndroid Build Coastguard Worker      cluster values.
165*2d1272b8SAndroid Build Coastguard Worker    </para>
166*2d1272b8SAndroid Build Coastguard Worker    <para>
167*2d1272b8SAndroid Build Coastguard Worker      For buffers in the right-to-left (RTL)
168*2d1272b8SAndroid Build Coastguard Worker      or bottom-to-top (BTT) text flow direction,
169*2d1272b8SAndroid Build Coastguard Worker      the directionality of the buffer itself is reversed for final
170*2d1272b8SAndroid Build Coastguard Worker      output as a matter of design. Therefore, HarfBuzz inverts the
171*2d1272b8SAndroid Build Coastguard Worker      monotonic property: client programs are guaranteed that
172*2d1272b8SAndroid Build Coastguard Worker      monotonically increasing initial cluster values will be
173*2d1272b8SAndroid Build Coastguard Worker      returned as monotonically <emphasis>decreasing</emphasis> final
174*2d1272b8SAndroid Build Coastguard Worker      cluster values.
175*2d1272b8SAndroid Build Coastguard Worker    </para>
176*2d1272b8SAndroid Build Coastguard Worker    <para>
177*2d1272b8SAndroid Build Coastguard Worker      Client programs can adjust how HarfBuzz handles clusters during
178*2d1272b8SAndroid Build Coastguard Worker      shaping by setting the
179*2d1272b8SAndroid Build Coastguard Worker      <literal>cluster_level</literal> of the
180*2d1272b8SAndroid Build Coastguard Worker      buffer. HarfBuzz offers three <emphasis>levels</emphasis> of
181*2d1272b8SAndroid Build Coastguard Worker      clustering support for this property:
182*2d1272b8SAndroid Build Coastguard Worker    </para>
183*2d1272b8SAndroid Build Coastguard Worker    <itemizedlist>
184*2d1272b8SAndroid Build Coastguard Worker      <listitem>
185*2d1272b8SAndroid Build Coastguard Worker	<para><emphasis>Level 0</emphasis> is the default.
186*2d1272b8SAndroid Build Coastguard Worker	</para>
187*2d1272b8SAndroid Build Coastguard Worker	<para>
188*2d1272b8SAndroid Build Coastguard Worker	  The distinguishing feature of level 0 behavior is that, at
189*2d1272b8SAndroid Build Coastguard Worker	  the beginning of processing the buffer, all code points that
190*2d1272b8SAndroid Build Coastguard Worker	  are categorized as <emphasis>marks</emphasis>,
191*2d1272b8SAndroid Build Coastguard Worker	  <emphasis>modifier symbols</emphasis>, or
192*2d1272b8SAndroid Build Coastguard Worker	  <emphasis>Emoji extended pictographic</emphasis> modifiers,
193*2d1272b8SAndroid Build Coastguard Worker	  as well as the <emphasis>Zero Width Joiner</emphasis> and
194*2d1272b8SAndroid Build Coastguard Worker	  <emphasis>Zero Width Non-Joiner</emphasis> code points, are
195*2d1272b8SAndroid Build Coastguard Worker	  assigned the cluster value of the closest preceding code
196*2d1272b8SAndroid Build Coastguard Worker	  point from <emphasis>different</emphasis> category.
197*2d1272b8SAndroid Build Coastguard Worker	</para>
198*2d1272b8SAndroid Build Coastguard Worker	<para>
199*2d1272b8SAndroid Build Coastguard Worker	  In essence, whenever a base character is followed by a mark
200*2d1272b8SAndroid Build Coastguard Worker	  character or a sequence of mark characters, those marks are
201*2d1272b8SAndroid Build Coastguard Worker	  reassigned to the same initial cluster value as the base
202*2d1272b8SAndroid Build Coastguard Worker	  character. This reassignment is referred to as
203*2d1272b8SAndroid Build Coastguard Worker	  "merging" the affected clusters. This behavior is based on
204*2d1272b8SAndroid Build Coastguard Worker	  the Grapheme Cluster Boundary specification in <ulink
205*2d1272b8SAndroid Build Coastguard Worker	  url="https://www.unicode.org/reports/tr29/#Regex_Definitions">Unicode
206*2d1272b8SAndroid Build Coastguard Worker	  Technical Report 29</ulink>.
207*2d1272b8SAndroid Build Coastguard Worker	</para>
208*2d1272b8SAndroid Build Coastguard Worker	<para>
209*2d1272b8SAndroid Build Coastguard Worker	  This cluster level is suitable for code that likes to use
210*2d1272b8SAndroid Build Coastguard Worker	  HarfBuzz cluster values as an approximation of the Unicode
211*2d1272b8SAndroid Build Coastguard Worker	  Grapheme Cluster Boundaries as well.
212*2d1272b8SAndroid Build Coastguard Worker	</para>
213*2d1272b8SAndroid Build Coastguard Worker	<para>
214*2d1272b8SAndroid Build Coastguard Worker	  Client programs can specify level 0 behavior for a buffer by
215*2d1272b8SAndroid Build Coastguard Worker	  setting its <literal>cluster_level</literal> to
216*2d1272b8SAndroid Build Coastguard Worker	  <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</literal>.
217*2d1272b8SAndroid Build Coastguard Worker	</para>
218*2d1272b8SAndroid Build Coastguard Worker      </listitem>
219*2d1272b8SAndroid Build Coastguard Worker      <listitem>
220*2d1272b8SAndroid Build Coastguard Worker	<para>
221*2d1272b8SAndroid Build Coastguard Worker	  <emphasis>Level 1</emphasis> tweaks the old behavior
222*2d1272b8SAndroid Build Coastguard Worker	  slightly to produce better results. Therefore, level 1
223*2d1272b8SAndroid Build Coastguard Worker	  clustering is recommended for code that is not required to
224*2d1272b8SAndroid Build Coastguard Worker	  implement backward compatibility with the old HarfBuzz.
225*2d1272b8SAndroid Build Coastguard Worker	</para>
226*2d1272b8SAndroid Build Coastguard Worker	<para>
227*2d1272b8SAndroid Build Coastguard Worker	  <emphasis>Level 1</emphasis> differs from level 0 by not merging the
228*2d1272b8SAndroid Build Coastguard Worker	  clusters of marks and other modifier code points with the
229*2d1272b8SAndroid Build Coastguard Worker	  preceding "base" code point's cluster. By preserving the
230*2d1272b8SAndroid Build Coastguard Worker	  separate cluster values of these marks and modifier code
231*2d1272b8SAndroid Build Coastguard Worker	  points, script shapers can perform additional operations
232*2d1272b8SAndroid Build Coastguard Worker	  that might lead to improved results (for example, coloring
233*2d1272b8SAndroid Build Coastguard Worker	  mark glyphs differently than their base).
234*2d1272b8SAndroid Build Coastguard Worker	</para>
235*2d1272b8SAndroid Build Coastguard Worker	<para>
236*2d1272b8SAndroid Build Coastguard Worker	  Client programs can specify level 1 behavior for a buffer by
237*2d1272b8SAndroid Build Coastguard Worker	  setting its <literal>cluster_level</literal> to
238*2d1272b8SAndroid Build Coastguard Worker	  <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</literal>.
239*2d1272b8SAndroid Build Coastguard Worker	</para>
240*2d1272b8SAndroid Build Coastguard Worker      </listitem>
241*2d1272b8SAndroid Build Coastguard Worker      <listitem>
242*2d1272b8SAndroid Build Coastguard Worker	<para>
243*2d1272b8SAndroid Build Coastguard Worker	  <emphasis>Level 2</emphasis> differs significantly in how it
244*2d1272b8SAndroid Build Coastguard Worker	  treats cluster values. In level 2, HarfBuzz never merges
245*2d1272b8SAndroid Build Coastguard Worker	  clusters.
246*2d1272b8SAndroid Build Coastguard Worker	</para>
247*2d1272b8SAndroid Build Coastguard Worker	<para>
248*2d1272b8SAndroid Build Coastguard Worker	  This difference can be seen most clearly when HarfBuzz processes
249*2d1272b8SAndroid Build Coastguard Worker	  ligature substitutions and glyph decompositions. In level 0
250*2d1272b8SAndroid Build Coastguard Worker	  and level 1, ligatures and glyph decomposition both involve
251*2d1272b8SAndroid Build Coastguard Worker	  merging clusters; in level 2, neither of these operations
252*2d1272b8SAndroid Build Coastguard Worker	  triggers a merge.
253*2d1272b8SAndroid Build Coastguard Worker	</para>
254*2d1272b8SAndroid Build Coastguard Worker	<para>
255*2d1272b8SAndroid Build Coastguard Worker	  Client programs can specify level 2 behavior for a buffer by
256*2d1272b8SAndroid Build Coastguard Worker	  setting its <literal>cluster_level</literal> to
257*2d1272b8SAndroid Build Coastguard Worker	  <literal>HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</literal>.
258*2d1272b8SAndroid Build Coastguard Worker	</para>
259*2d1272b8SAndroid Build Coastguard Worker      </listitem>
260*2d1272b8SAndroid Build Coastguard Worker    </itemizedlist>
261*2d1272b8SAndroid Build Coastguard Worker    <para>
262*2d1272b8SAndroid Build Coastguard Worker      As mentioned earlier, client programs using HarfBuzz often
263*2d1272b8SAndroid Build Coastguard Worker      assign initial cluster values in a buffer by reusing the indices
264*2d1272b8SAndroid Build Coastguard Worker      of the code points in the input text. This gives a sequence of
265*2d1272b8SAndroid Build Coastguard Worker      cluster values that is monotonically increasing (for example,
266*2d1272b8SAndroid Build Coastguard Worker      0,1,2,3,4).
267*2d1272b8SAndroid Build Coastguard Worker    </para>
268*2d1272b8SAndroid Build Coastguard Worker    <para>
269*2d1272b8SAndroid Build Coastguard Worker      It is not <emphasis>required</emphasis> that the cluster values
270*2d1272b8SAndroid Build Coastguard Worker      in a buffer be monotonically increasing. However, if the initial
271*2d1272b8SAndroid Build Coastguard Worker      cluster values in a buffer are monotonic and the buffer is
272*2d1272b8SAndroid Build Coastguard Worker      configured to use cluster level 0 or 1, then HarfBuzz
273*2d1272b8SAndroid Build Coastguard Worker      guarantees that the final cluster values in the shaped buffer
274*2d1272b8SAndroid Build Coastguard Worker      will also be monotonic. No such guarantee is made for cluster
275*2d1272b8SAndroid Build Coastguard Worker      level 2.
276*2d1272b8SAndroid Build Coastguard Worker    </para>
277*2d1272b8SAndroid Build Coastguard Worker    <para>
278*2d1272b8SAndroid Build Coastguard Worker      In levels 0 and 1, HarfBuzz implements the following conceptual
279*2d1272b8SAndroid Build Coastguard Worker      model for cluster values:
280*2d1272b8SAndroid Build Coastguard Worker    </para>
281*2d1272b8SAndroid Build Coastguard Worker    <itemizedlist spacing="compact">
282*2d1272b8SAndroid Build Coastguard Worker      <listitem>
283*2d1272b8SAndroid Build Coastguard Worker	<para>
284*2d1272b8SAndroid Build Coastguard Worker          If the sequence of input cluster values is monotonic, the
285*2d1272b8SAndroid Build Coastguard Worker	  sequence of cluster values will remain monotonic.
286*2d1272b8SAndroid Build Coastguard Worker	</para>
287*2d1272b8SAndroid Build Coastguard Worker      </listitem>
288*2d1272b8SAndroid Build Coastguard Worker      <listitem>
289*2d1272b8SAndroid Build Coastguard Worker	<para>
290*2d1272b8SAndroid Build Coastguard Worker          Each cluster value represents a single cluster.
291*2d1272b8SAndroid Build Coastguard Worker	</para>
292*2d1272b8SAndroid Build Coastguard Worker      </listitem>
293*2d1272b8SAndroid Build Coastguard Worker      <listitem>
294*2d1272b8SAndroid Build Coastguard Worker	<para>
295*2d1272b8SAndroid Build Coastguard Worker          Each cluster contains one or more glyphs and one or more
296*2d1272b8SAndroid Build Coastguard Worker          characters.
297*2d1272b8SAndroid Build Coastguard Worker	</para>
298*2d1272b8SAndroid Build Coastguard Worker      </listitem>
299*2d1272b8SAndroid Build Coastguard Worker    </itemizedlist>
300*2d1272b8SAndroid Build Coastguard Worker    <para>
301*2d1272b8SAndroid Build Coastguard Worker      In practice, this model offers several benefits. Assuming that
302*2d1272b8SAndroid Build Coastguard Worker      the initial cluster values were monotonically increasing
303*2d1272b8SAndroid Build Coastguard Worker      and distinct before shaping began, then, in the final output:
304*2d1272b8SAndroid Build Coastguard Worker    </para>
305*2d1272b8SAndroid Build Coastguard Worker    <itemizedlist spacing="compact">
306*2d1272b8SAndroid Build Coastguard Worker      <listitem>
307*2d1272b8SAndroid Build Coastguard Worker	<para>
308*2d1272b8SAndroid Build Coastguard Worker	  All adjacent glyphs having the same final cluster
309*2d1272b8SAndroid Build Coastguard Worker	  value belong to the same cluster.
310*2d1272b8SAndroid Build Coastguard Worker	</para>
311*2d1272b8SAndroid Build Coastguard Worker      </listitem>
312*2d1272b8SAndroid Build Coastguard Worker      <listitem>
313*2d1272b8SAndroid Build Coastguard Worker	<para>
314*2d1272b8SAndroid Build Coastguard Worker          Each character belongs to the cluster that has the highest
315*2d1272b8SAndroid Build Coastguard Worker	  cluster value <emphasis>not larger than</emphasis> its
316*2d1272b8SAndroid Build Coastguard Worker	  initial cluster value.
317*2d1272b8SAndroid Build Coastguard Worker	</para>
318*2d1272b8SAndroid Build Coastguard Worker      </listitem>
319*2d1272b8SAndroid Build Coastguard Worker    </itemizedlist>
320*2d1272b8SAndroid Build Coastguard Worker  </section>
321*2d1272b8SAndroid Build Coastguard Worker
322*2d1272b8SAndroid Build Coastguard Worker  <section id="a-clustering-example-for-levels-0-and-1">
323*2d1272b8SAndroid Build Coastguard Worker    <title>A clustering example for levels 0 and 1</title>
324*2d1272b8SAndroid Build Coastguard Worker    <para>
325*2d1272b8SAndroid Build Coastguard Worker      The basic shaping operations affect clusters in a predictable
326*2d1272b8SAndroid Build Coastguard Worker      manner when using level 0 or level 1:
327*2d1272b8SAndroid Build Coastguard Worker    </para>
328*2d1272b8SAndroid Build Coastguard Worker    <itemizedlist>
329*2d1272b8SAndroid Build Coastguard Worker      <listitem>
330*2d1272b8SAndroid Build Coastguard Worker	<para>
331*2d1272b8SAndroid Build Coastguard Worker	  When two or more clusters <emphasis>merge</emphasis>, the
332*2d1272b8SAndroid Build Coastguard Worker	  resulting merged cluster takes as its cluster value the
333*2d1272b8SAndroid Build Coastguard Worker	  <emphasis>minimum</emphasis> of the incoming cluster values.
334*2d1272b8SAndroid Build Coastguard Worker	</para>
335*2d1272b8SAndroid Build Coastguard Worker      </listitem>
336*2d1272b8SAndroid Build Coastguard Worker      <listitem>
337*2d1272b8SAndroid Build Coastguard Worker	<para>
338*2d1272b8SAndroid Build Coastguard Worker	  When a cluster <emphasis>decomposes</emphasis>, all of the
339*2d1272b8SAndroid Build Coastguard Worker	  resulting child clusters inherit as their cluster value the
340*2d1272b8SAndroid Build Coastguard Worker	  cluster value of the parent cluster.
341*2d1272b8SAndroid Build Coastguard Worker	</para>
342*2d1272b8SAndroid Build Coastguard Worker      </listitem>
343*2d1272b8SAndroid Build Coastguard Worker      <listitem>
344*2d1272b8SAndroid Build Coastguard Worker	<para>
345*2d1272b8SAndroid Build Coastguard Worker	  When a character is <emphasis>reordered</emphasis>, the
346*2d1272b8SAndroid Build Coastguard Worker	  reordered character and all clusters that the character
347*2d1272b8SAndroid Build Coastguard Worker	  moves past as part of the reordering are merged into one cluster.
348*2d1272b8SAndroid Build Coastguard Worker	</para>
349*2d1272b8SAndroid Build Coastguard Worker      </listitem>
350*2d1272b8SAndroid Build Coastguard Worker    </itemizedlist>
351*2d1272b8SAndroid Build Coastguard Worker    <para>
352*2d1272b8SAndroid Build Coastguard Worker      The functionality, guarantees, and benefits of level 0 and level
353*2d1272b8SAndroid Build Coastguard Worker      1 behavior can be seen with some examples. First, let us examine
354*2d1272b8SAndroid Build Coastguard Worker      what happens with cluster values when shaping involves cluster
355*2d1272b8SAndroid Build Coastguard Worker      merging with ligatures and decomposition.
356*2d1272b8SAndroid Build Coastguard Worker    </para>
357*2d1272b8SAndroid Build Coastguard Worker
358*2d1272b8SAndroid Build Coastguard Worker    <para>
359*2d1272b8SAndroid Build Coastguard Worker      Let's say we start with the following character sequence (top row) and
360*2d1272b8SAndroid Build Coastguard Worker      initial cluster values (bottom row):
361*2d1272b8SAndroid Build Coastguard Worker    </para>
362*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
363*2d1272b8SAndroid Build Coastguard Worker      A,B,C,D,E
364*2d1272b8SAndroid Build Coastguard Worker      0,1,2,3,4
365*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
366*2d1272b8SAndroid Build Coastguard Worker    <para>
367*2d1272b8SAndroid Build Coastguard Worker      During shaping, HarfBuzz maps these characters to glyphs from
368*2d1272b8SAndroid Build Coastguard Worker      the font. For simplicity, let us assume that each character maps
369*2d1272b8SAndroid Build Coastguard Worker      to the corresponding, identical-looking glyph:
370*2d1272b8SAndroid Build Coastguard Worker    </para>
371*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
372*2d1272b8SAndroid Build Coastguard Worker      A,B,C,D,E
373*2d1272b8SAndroid Build Coastguard Worker      0,1,2,3,4
374*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
375*2d1272b8SAndroid Build Coastguard Worker    <para>
376*2d1272b8SAndroid Build Coastguard Worker      Now if, for example, <literal>B</literal> and <literal>C</literal>
377*2d1272b8SAndroid Build Coastguard Worker      form a ligature, then the clusters to which they belong
378*2d1272b8SAndroid Build Coastguard Worker      &quot;merge&quot;. This merged cluster takes for its cluster
379*2d1272b8SAndroid Build Coastguard Worker      value the minimum of all the cluster values of the clusters that
380*2d1272b8SAndroid Build Coastguard Worker      went in to the ligature. In this case, we get:
381*2d1272b8SAndroid Build Coastguard Worker    </para>
382*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
383*2d1272b8SAndroid Build Coastguard Worker      A,BC,D,E
384*2d1272b8SAndroid Build Coastguard Worker      0,1 ,3,4
385*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
386*2d1272b8SAndroid Build Coastguard Worker    <para>
387*2d1272b8SAndroid Build Coastguard Worker      because 1 is the minimum of the set {1,2}, which were the
388*2d1272b8SAndroid Build Coastguard Worker      cluster values of <literal>B</literal> and
389*2d1272b8SAndroid Build Coastguard Worker      <literal>C</literal>.
390*2d1272b8SAndroid Build Coastguard Worker    </para>
391*2d1272b8SAndroid Build Coastguard Worker    <para>
392*2d1272b8SAndroid Build Coastguard Worker      Next, let us say that the <literal>BC</literal> ligature glyph
393*2d1272b8SAndroid Build Coastguard Worker      decomposes into three components, and <literal>D</literal> also
394*2d1272b8SAndroid Build Coastguard Worker      decomposes into two components. Whenever a cluster decomposes,
395*2d1272b8SAndroid Build Coastguard Worker      its components each inherit the cluster value of their parent:
396*2d1272b8SAndroid Build Coastguard Worker    </para>
397*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
398*2d1272b8SAndroid Build Coastguard Worker      A,BC0,BC1,BC2,D0,D1,E
399*2d1272b8SAndroid Build Coastguard Worker      0,1  ,1  ,1  ,3 ,3 ,4
400*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
401*2d1272b8SAndroid Build Coastguard Worker    <para>
402*2d1272b8SAndroid Build Coastguard Worker      Next, if <literal>BC2</literal> and <literal>D0</literal> form a
403*2d1272b8SAndroid Build Coastguard Worker      ligature, then their clusters (cluster values 1 and 3) merge into
404*2d1272b8SAndroid Build Coastguard Worker      <literal>min(1,3) = 1</literal>:
405*2d1272b8SAndroid Build Coastguard Worker    </para>
406*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
407*2d1272b8SAndroid Build Coastguard Worker      A,BC0,BC1,BC2D0,D1,E
408*2d1272b8SAndroid Build Coastguard Worker      0,1  ,1  ,1    ,1 ,4
409*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
410*2d1272b8SAndroid Build Coastguard Worker    <para>
411*2d1272b8SAndroid Build Coastguard Worker      Note that the entirety of cluster 3 merges into cluster 1, not
412*2d1272b8SAndroid Build Coastguard Worker      just the <literal>D0</literal> glyph. This reflects the fact
413*2d1272b8SAndroid Build Coastguard Worker      that the cluster <emphasis>must</emphasis> be treated as an
414*2d1272b8SAndroid Build Coastguard Worker      indivisible unit.
415*2d1272b8SAndroid Build Coastguard Worker    </para>
416*2d1272b8SAndroid Build Coastguard Worker    <para>
417*2d1272b8SAndroid Build Coastguard Worker      At this point, cluster 1 means: the character sequence
418*2d1272b8SAndroid Build Coastguard Worker      <literal>BCD</literal> is represented by glyphs
419*2d1272b8SAndroid Build Coastguard Worker      <literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any
420*2d1272b8SAndroid Build Coastguard Worker      further.
421*2d1272b8SAndroid Build Coastguard Worker    </para>
422*2d1272b8SAndroid Build Coastguard Worker  </section>
423*2d1272b8SAndroid Build Coastguard Worker  <section id="reordering-in-levels-0-and-1">
424*2d1272b8SAndroid Build Coastguard Worker    <title>Reordering in levels 0 and 1</title>
425*2d1272b8SAndroid Build Coastguard Worker    <para>
426*2d1272b8SAndroid Build Coastguard Worker      Another common operation in some shapers is glyph
427*2d1272b8SAndroid Build Coastguard Worker      reordering. In order to maintain a monotonic cluster sequence
428*2d1272b8SAndroid Build Coastguard Worker      when glyph reordering takes place, HarfBuzz merges the clusters
429*2d1272b8SAndroid Build Coastguard Worker      of everything in the reordering sequence.
430*2d1272b8SAndroid Build Coastguard Worker    </para>
431*2d1272b8SAndroid Build Coastguard Worker    <para>
432*2d1272b8SAndroid Build Coastguard Worker      For example, let us again start with the character sequence (top
433*2d1272b8SAndroid Build Coastguard Worker      row) and initial cluster values (bottom row):
434*2d1272b8SAndroid Build Coastguard Worker    </para>
435*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
436*2d1272b8SAndroid Build Coastguard Worker      A,B,C,D,E
437*2d1272b8SAndroid Build Coastguard Worker      0,1,2,3,4
438*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
439*2d1272b8SAndroid Build Coastguard Worker    <para>
440*2d1272b8SAndroid Build Coastguard Worker      If <literal>D</literal> is reordered to the position immediately
441*2d1272b8SAndroid Build Coastguard Worker      before <literal>B</literal>, then HarfBuzz merges the
442*2d1272b8SAndroid Build Coastguard Worker      <literal>B</literal>, <literal>C</literal>, and
443*2d1272b8SAndroid Build Coastguard Worker      <literal>D</literal> clusters &mdash; all the clusters between
444*2d1272b8SAndroid Build Coastguard Worker      the final position of the reordered glyph and its original
445*2d1272b8SAndroid Build Coastguard Worker      position. This means that we get:
446*2d1272b8SAndroid Build Coastguard Worker    </para>
447*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
448*2d1272b8SAndroid Build Coastguard Worker      A,D,B,C,E
449*2d1272b8SAndroid Build Coastguard Worker      0,1,1,1,4
450*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
451*2d1272b8SAndroid Build Coastguard Worker    <para>
452*2d1272b8SAndroid Build Coastguard Worker      as the final cluster sequence.
453*2d1272b8SAndroid Build Coastguard Worker    </para>
454*2d1272b8SAndroid Build Coastguard Worker    <para>
455*2d1272b8SAndroid Build Coastguard Worker      Merging this many clusters is not ideal, but it is the only
456*2d1272b8SAndroid Build Coastguard Worker      sensible way for HarfBuzz to maintain the guarantee that the
457*2d1272b8SAndroid Build Coastguard Worker      sequence of cluster values remains monotonic and to retain the
458*2d1272b8SAndroid Build Coastguard Worker      true relationship between glyphs and characters.
459*2d1272b8SAndroid Build Coastguard Worker    </para>
460*2d1272b8SAndroid Build Coastguard Worker  </section>
461*2d1272b8SAndroid Build Coastguard Worker  <section id="the-distinction-between-levels-0-and-1">
462*2d1272b8SAndroid Build Coastguard Worker    <title>The distinction between levels 0 and 1</title>
463*2d1272b8SAndroid Build Coastguard Worker    <para>
464*2d1272b8SAndroid Build Coastguard Worker      The preceding examples demonstrate the main effects of using
465*2d1272b8SAndroid Build Coastguard Worker      cluster levels 0 and 1. The only difference between the two
466*2d1272b8SAndroid Build Coastguard Worker      levels is this: in level 0, at the very beginning of the shaping
467*2d1272b8SAndroid Build Coastguard Worker      process, HarfBuzz merges the cluster of each base character
468*2d1272b8SAndroid Build Coastguard Worker      with the clusters of all Unicode marks (combining or not) and
469*2d1272b8SAndroid Build Coastguard Worker      modifiers that follow it.
470*2d1272b8SAndroid Build Coastguard Worker    </para>
471*2d1272b8SAndroid Build Coastguard Worker    <para>
472*2d1272b8SAndroid Build Coastguard Worker      For example, let us start with the following character sequence
473*2d1272b8SAndroid Build Coastguard Worker      (top row) and accompanying initial cluster values (bottom row):
474*2d1272b8SAndroid Build Coastguard Worker    </para>
475*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
476*2d1272b8SAndroid Build Coastguard Worker      A,acute,B
477*2d1272b8SAndroid Build Coastguard Worker      0,1    ,2
478*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
479*2d1272b8SAndroid Build Coastguard Worker    <para>
480*2d1272b8SAndroid Build Coastguard Worker      The <literal>acute</literal> is a Unicode mark. If HarfBuzz is
481*2d1272b8SAndroid Build Coastguard Worker      using cluster level 0 on this sequence, then the
482*2d1272b8SAndroid Build Coastguard Worker      <literal>A</literal> and <literal>acute</literal> clusters will
483*2d1272b8SAndroid Build Coastguard Worker      merge, and the result will become:
484*2d1272b8SAndroid Build Coastguard Worker    </para>
485*2d1272b8SAndroid Build Coastguard Worker    <programlisting>
486*2d1272b8SAndroid Build Coastguard Worker      A,acute,B
487*2d1272b8SAndroid Build Coastguard Worker      0,0    ,2
488*2d1272b8SAndroid Build Coastguard Worker    </programlisting>
489*2d1272b8SAndroid Build Coastguard Worker    <para>
490*2d1272b8SAndroid Build Coastguard Worker      This merger is performed before any other script-shaping
491*2d1272b8SAndroid Build Coastguard Worker      steps.
492*2d1272b8SAndroid Build Coastguard Worker    </para>
493*2d1272b8SAndroid Build Coastguard Worker    <para>
494*2d1272b8SAndroid Build Coastguard Worker      This initial cluster merging is the default behavior of the
495*2d1272b8SAndroid Build Coastguard Worker      Windows shaping engine, and the old HarfBuzz codebase copied
496*2d1272b8SAndroid Build Coastguard Worker      that behavior to maintain compatibility. Consequently, it has
497*2d1272b8SAndroid Build Coastguard Worker      remained the default behavior in the new HarfBuzz codebase.
498*2d1272b8SAndroid Build Coastguard Worker    </para>
499*2d1272b8SAndroid Build Coastguard Worker    <para>
500*2d1272b8SAndroid Build Coastguard Worker      But this initial cluster-merging behavior makes it impossible
501*2d1272b8SAndroid Build Coastguard Worker      for client programs to implement some features (such as to
502*2d1272b8SAndroid Build Coastguard Worker      color diacritic marks differently from their base
503*2d1272b8SAndroid Build Coastguard Worker      characters). That is why, in level 1, HarfBuzz does not perform
504*2d1272b8SAndroid Build Coastguard Worker      the initial merging step.
505*2d1272b8SAndroid Build Coastguard Worker    </para>
506*2d1272b8SAndroid Build Coastguard Worker    <para>
507*2d1272b8SAndroid Build Coastguard Worker      For client programs that rely on HarfBuzz cluster values to
508*2d1272b8SAndroid Build Coastguard Worker      perform cursor positioning, level 0 is more convenient. But
509*2d1272b8SAndroid Build Coastguard Worker      relying on cluster boundaries for cursor positioning is wrong: cursor
510*2d1272b8SAndroid Build Coastguard Worker      positions should be determined based on Unicode grapheme
511*2d1272b8SAndroid Build Coastguard Worker      boundaries, not on shaping-cluster boundaries. As such, using
512*2d1272b8SAndroid Build Coastguard Worker      level 1 clustering behavior is recommended.
513*2d1272b8SAndroid Build Coastguard Worker    </para>
514*2d1272b8SAndroid Build Coastguard Worker    <para>
515*2d1272b8SAndroid Build Coastguard Worker      One final facet of levels 0 and 1 is worth noting. HarfBuzz
516*2d1272b8SAndroid Build Coastguard Worker      currently does not allow any
517*2d1272b8SAndroid Build Coastguard Worker      <emphasis>multiple-substitution</emphasis> GSUB lookups to
518*2d1272b8SAndroid Build Coastguard Worker      replace a glyph with zero glyphs (in other words, to delete a
519*2d1272b8SAndroid Build Coastguard Worker      glyph).
520*2d1272b8SAndroid Build Coastguard Worker    </para>
521*2d1272b8SAndroid Build Coastguard Worker    <para>
522*2d1272b8SAndroid Build Coastguard Worker      But, in some other situations, glyphs can be deleted. In
523*2d1272b8SAndroid Build Coastguard Worker      those cases, if the glyph being deleted is the last glyph of its
524*2d1272b8SAndroid Build Coastguard Worker      cluster, HarfBuzz makes sure to merge the deleted glyph's
525*2d1272b8SAndroid Build Coastguard Worker      cluster with a neighboring cluster.
526*2d1272b8SAndroid Build Coastguard Worker    </para>
527*2d1272b8SAndroid Build Coastguard Worker    <para>
528*2d1272b8SAndroid Build Coastguard Worker      This is done primarily to make sure that the starting cluster of the
529*2d1272b8SAndroid Build Coastguard Worker      text always has the cluster index pointing to the start of the text
530*2d1272b8SAndroid Build Coastguard Worker      for the run; more than one client program currently relies on this
531*2d1272b8SAndroid Build Coastguard Worker      guarantee.
532*2d1272b8SAndroid Build Coastguard Worker    </para>
533*2d1272b8SAndroid Build Coastguard Worker    <para>
534*2d1272b8SAndroid Build Coastguard Worker      Incidentally, Apple's CoreText does something different to
535*2d1272b8SAndroid Build Coastguard Worker      maintain the same promise: it inserts a glyph with id 65535 at
536*2d1272b8SAndroid Build Coastguard Worker      the beginning of the glyph string if the glyph corresponding to
537*2d1272b8SAndroid Build Coastguard Worker      the first character in the run was deleted. HarfBuzz might do
538*2d1272b8SAndroid Build Coastguard Worker      something similar in the future.
539*2d1272b8SAndroid Build Coastguard Worker    </para>
540*2d1272b8SAndroid Build Coastguard Worker  </section>
541*2d1272b8SAndroid Build Coastguard Worker  <section id="level-2">
542*2d1272b8SAndroid Build Coastguard Worker    <title>Level 2</title>
543*2d1272b8SAndroid Build Coastguard Worker    <para>
544*2d1272b8SAndroid Build Coastguard Worker      HarfBuzz's level 2 cluster behavior uses a significantly
545*2d1272b8SAndroid Build Coastguard Worker      different model than that of level 0 and level 1.
546*2d1272b8SAndroid Build Coastguard Worker    </para>
547*2d1272b8SAndroid Build Coastguard Worker    <para>
548*2d1272b8SAndroid Build Coastguard Worker      The level 2 behavior is easy to describe, but it may be
549*2d1272b8SAndroid Build Coastguard Worker      difficult to understand in practical terms. In brief, level 2
550*2d1272b8SAndroid Build Coastguard Worker      performs no merging of clusters whatsoever.
551*2d1272b8SAndroid Build Coastguard Worker    </para>
552*2d1272b8SAndroid Build Coastguard Worker    <para>
553*2d1272b8SAndroid Build Coastguard Worker      This means that there is no initial base-and-mark merging step
554*2d1272b8SAndroid Build Coastguard Worker      (as is done in level 0), and it means that reordering moves and
555*2d1272b8SAndroid Build Coastguard Worker      ligature substitutions do not trigger a cluster merge.
556*2d1272b8SAndroid Build Coastguard Worker    </para>
557*2d1272b8SAndroid Build Coastguard Worker    <para>
558*2d1272b8SAndroid Build Coastguard Worker      Only one shaping operation directly affects clusters when using
559*2d1272b8SAndroid Build Coastguard Worker      level 2:
560*2d1272b8SAndroid Build Coastguard Worker    </para>
561*2d1272b8SAndroid Build Coastguard Worker    <itemizedlist>
562*2d1272b8SAndroid Build Coastguard Worker      <listitem>
563*2d1272b8SAndroid Build Coastguard Worker	<para>
564*2d1272b8SAndroid Build Coastguard Worker	  When a cluster <emphasis>decomposes</emphasis>, all of the
565*2d1272b8SAndroid Build Coastguard Worker	  resulting child clusters inherit as their cluster value the
566*2d1272b8SAndroid Build Coastguard Worker	  cluster value of the parent cluster.
567*2d1272b8SAndroid Build Coastguard Worker	</para>
568*2d1272b8SAndroid Build Coastguard Worker      </listitem>
569*2d1272b8SAndroid Build Coastguard Worker    </itemizedlist>
570*2d1272b8SAndroid Build Coastguard Worker    <para>
571*2d1272b8SAndroid Build Coastguard Worker      When glyphs do form a ligature (or when some other feature
572*2d1272b8SAndroid Build Coastguard Worker      substitutes multiple glyphs with one glyph) the cluster value
573*2d1272b8SAndroid Build Coastguard Worker      of the first glyph is retained as the cluster value for the
574*2d1272b8SAndroid Build Coastguard Worker      resulting ligature.
575*2d1272b8SAndroid Build Coastguard Worker    </para>
576*2d1272b8SAndroid Build Coastguard Worker    <para>
577*2d1272b8SAndroid Build Coastguard Worker      This occurrence sounds similar to a cluster merge, but it is
578*2d1272b8SAndroid Build Coastguard Worker      different. In particular, no subsequent characters &mdash;
579*2d1272b8SAndroid Build Coastguard Worker      including marks and modifiers &mdash; are affected. They retain
580*2d1272b8SAndroid Build Coastguard Worker      their previous cluster values.
581*2d1272b8SAndroid Build Coastguard Worker    </para>
582*2d1272b8SAndroid Build Coastguard Worker    <para>
583*2d1272b8SAndroid Build Coastguard Worker      Level 2 cluster behavior is ultimately less complex than level 0
584*2d1272b8SAndroid Build Coastguard Worker      or level 1, but there are several cases for which processing
585*2d1272b8SAndroid Build Coastguard Worker      cluster values produced at level 2 may be tricky.
586*2d1272b8SAndroid Build Coastguard Worker    </para>
587*2d1272b8SAndroid Build Coastguard Worker    <section id="ligatures-with-combining-marks-in-level-2">
588*2d1272b8SAndroid Build Coastguard Worker      <title>Ligatures with combining marks in level 2</title>
589*2d1272b8SAndroid Build Coastguard Worker      <para>
590*2d1272b8SAndroid Build Coastguard Worker	The first example of how HarfBuzz's level 2 cluster behavior
591*2d1272b8SAndroid Build Coastguard Worker	can be tricky is when the text to be shaped includes combining
592*2d1272b8SAndroid Build Coastguard Worker	marks attached to ligatures.
593*2d1272b8SAndroid Build Coastguard Worker      </para>
594*2d1272b8SAndroid Build Coastguard Worker      <para>
595*2d1272b8SAndroid Build Coastguard Worker	Let us start with an input sequence with the following
596*2d1272b8SAndroid Build Coastguard Worker	characters (top row) and initial cluster values (bottom row):
597*2d1272b8SAndroid Build Coastguard Worker      </para>
598*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
599*2d1272b8SAndroid Build Coastguard Worker	A,acute,B,breve,C,circumflex
600*2d1272b8SAndroid Build Coastguard Worker	0,1    ,2,3    ,4,5
601*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
602*2d1272b8SAndroid Build Coastguard Worker      <para>
603*2d1272b8SAndroid Build Coastguard Worker	If the sequence <literal>A,B,C</literal> forms a ligature,
604*2d1272b8SAndroid Build Coastguard Worker	then these are the cluster values HarfBuzz will return under
605*2d1272b8SAndroid Build Coastguard Worker	the various cluster levels:
606*2d1272b8SAndroid Build Coastguard Worker      </para>
607*2d1272b8SAndroid Build Coastguard Worker      <para>
608*2d1272b8SAndroid Build Coastguard Worker	Level 0:
609*2d1272b8SAndroid Build Coastguard Worker      </para>
610*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
611*2d1272b8SAndroid Build Coastguard Worker	ABC,acute,breve,circumflex
612*2d1272b8SAndroid Build Coastguard Worker	0  ,0    ,0    ,0
613*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
614*2d1272b8SAndroid Build Coastguard Worker      <para>
615*2d1272b8SAndroid Build Coastguard Worker	Level 1:
616*2d1272b8SAndroid Build Coastguard Worker      </para>
617*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
618*2d1272b8SAndroid Build Coastguard Worker	ABC,acute,breve,circumflex
619*2d1272b8SAndroid Build Coastguard Worker	0  ,0    ,0    ,5
620*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
621*2d1272b8SAndroid Build Coastguard Worker      <para>
622*2d1272b8SAndroid Build Coastguard Worker	Level 2:
623*2d1272b8SAndroid Build Coastguard Worker      </para>
624*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
625*2d1272b8SAndroid Build Coastguard Worker	ABC,acute,breve,circumflex
626*2d1272b8SAndroid Build Coastguard Worker	0  ,1    ,3    ,5
627*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
628*2d1272b8SAndroid Build Coastguard Worker      <para>
629*2d1272b8SAndroid Build Coastguard Worker	Making sense of the level 2 result is the hardest for a client
630*2d1272b8SAndroid Build Coastguard Worker	program, because there is nothing in the cluster values that
631*2d1272b8SAndroid Build Coastguard Worker	indicates that <literal>B</literal> and <literal>C</literal>
632*2d1272b8SAndroid Build Coastguard Worker	formed a ligature with <literal>A</literal>.
633*2d1272b8SAndroid Build Coastguard Worker      </para>
634*2d1272b8SAndroid Build Coastguard Worker      <para>
635*2d1272b8SAndroid Build Coastguard Worker	In contrast, the "merged" cluster values of the mark glyphs
636*2d1272b8SAndroid Build Coastguard Worker	that are seen in the level 0 and level 1 output are evidence
637*2d1272b8SAndroid Build Coastguard Worker	that a ligature substitution took place.
638*2d1272b8SAndroid Build Coastguard Worker      </para>
639*2d1272b8SAndroid Build Coastguard Worker    </section>
640*2d1272b8SAndroid Build Coastguard Worker    <section id="reordering-in-level-2">
641*2d1272b8SAndroid Build Coastguard Worker      <title>Reordering in level 2</title>
642*2d1272b8SAndroid Build Coastguard Worker      <para>
643*2d1272b8SAndroid Build Coastguard Worker	Another example of how HarfBuzz's level 2 cluster behavior
644*2d1272b8SAndroid Build Coastguard Worker	can be tricky is when glyphs reorder. Consider an input sequence
645*2d1272b8SAndroid Build Coastguard Worker	with the following characters (top row) and initial cluster
646*2d1272b8SAndroid Build Coastguard Worker	values (bottom row):
647*2d1272b8SAndroid Build Coastguard Worker      </para>
648*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
649*2d1272b8SAndroid Build Coastguard Worker	A,B,C,D,E
650*2d1272b8SAndroid Build Coastguard Worker	0,1,2,3,4
651*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
652*2d1272b8SAndroid Build Coastguard Worker      <para>
653*2d1272b8SAndroid Build Coastguard Worker	Now imagine <literal>D</literal> moves before
654*2d1272b8SAndroid Build Coastguard Worker	<literal>B</literal> in a reordering operation. The cluster
655*2d1272b8SAndroid Build Coastguard Worker	values will then be:
656*2d1272b8SAndroid Build Coastguard Worker      </para>
657*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
658*2d1272b8SAndroid Build Coastguard Worker	A,D,B,C,E
659*2d1272b8SAndroid Build Coastguard Worker	0,3,1,2,4
660*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
661*2d1272b8SAndroid Build Coastguard Worker      <para>
662*2d1272b8SAndroid Build Coastguard Worker	Next, if <literal>D</literal> forms a ligature with
663*2d1272b8SAndroid Build Coastguard Worker	<literal>B</literal>, the output is:
664*2d1272b8SAndroid Build Coastguard Worker      </para>
665*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
666*2d1272b8SAndroid Build Coastguard Worker	A,DB,C,E
667*2d1272b8SAndroid Build Coastguard Worker	0,3 ,2,4
668*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
669*2d1272b8SAndroid Build Coastguard Worker      <para>
670*2d1272b8SAndroid Build Coastguard Worker	However, in a different scenario, in which the shaping rules
671*2d1272b8SAndroid Build Coastguard Worker	of the script instead caused <literal>A</literal> and
672*2d1272b8SAndroid Build Coastguard Worker	<literal>B</literal> to form a ligature
673*2d1272b8SAndroid Build Coastguard Worker	<emphasis>before</emphasis> the <literal>D</literal> reordered, the
674*2d1272b8SAndroid Build Coastguard Worker	result would be:
675*2d1272b8SAndroid Build Coastguard Worker      </para>
676*2d1272b8SAndroid Build Coastguard Worker      <programlisting>
677*2d1272b8SAndroid Build Coastguard Worker	AB,D,C,E
678*2d1272b8SAndroid Build Coastguard Worker	0 ,3,2,4
679*2d1272b8SAndroid Build Coastguard Worker      </programlisting>
680*2d1272b8SAndroid Build Coastguard Worker      <para>
681*2d1272b8SAndroid Build Coastguard Worker	There is no way for a client program to differentiate between
682*2d1272b8SAndroid Build Coastguard Worker	these two scenarios based on the cluster values
683*2d1272b8SAndroid Build Coastguard Worker	alone. Consequently, client programs that use level 2 might
684*2d1272b8SAndroid Build Coastguard Worker	need to undertake additional work in order to manage cursor
685*2d1272b8SAndroid Build Coastguard Worker	positioning, text attributes, or other desired features.
686*2d1272b8SAndroid Build Coastguard Worker      </para>
687*2d1272b8SAndroid Build Coastguard Worker    </section>
688*2d1272b8SAndroid Build Coastguard Worker    <section id="other-considerations-in-level-2">
689*2d1272b8SAndroid Build Coastguard Worker      <title>Other considerations in level 2</title>
690*2d1272b8SAndroid Build Coastguard Worker      <para>
691*2d1272b8SAndroid Build Coastguard Worker	There may be other problems encountered with ligatures under
692*2d1272b8SAndroid Build Coastguard Worker	level 2, such as if the direction of the text is forced to
693*2d1272b8SAndroid Build Coastguard Worker	the opposite of its natural direction (for example, Arabic text
694*2d1272b8SAndroid Build Coastguard Worker	that is forced into left-to-right directionality). But,
695*2d1272b8SAndroid Build Coastguard Worker	generally speaking, these other scenarios are minor corner
696*2d1272b8SAndroid Build Coastguard Worker	cases that are too obscure for most client programs to need to
697*2d1272b8SAndroid Build Coastguard Worker	worry about.
698*2d1272b8SAndroid Build Coastguard Worker      </para>
699*2d1272b8SAndroid Build Coastguard Worker    </section>
700*2d1272b8SAndroid Build Coastguard Worker  </section>
701*2d1272b8SAndroid Build Coastguard Worker</chapter>
702