1*2d1272b8SAndroid Build Coastguard Worker<?xml version="1.0"?> 2*2d1272b8SAndroid Build Coastguard Worker<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.3//EN" 3*2d1272b8SAndroid Build Coastguard Worker "http://www.oasis-open.org/docbook/xml/4.3/docbookx.dtd" [ 4*2d1272b8SAndroid Build Coastguard Worker <!ENTITY % local.common.attrib "xmlns:xi CDATA #FIXED 'http://www.w3.org/2003/XInclude'"> 5*2d1272b8SAndroid Build Coastguard Worker <!ENTITY version SYSTEM "version.xml"> 6*2d1272b8SAndroid Build Coastguard Worker]> 7*2d1272b8SAndroid Build Coastguard Worker<chapter id="clusters"> 8*2d1272b8SAndroid Build Coastguard Worker <title>Clusters</title> 9*2d1272b8SAndroid Build Coastguard Worker <section id="clusters-and-shaping"> 10*2d1272b8SAndroid Build Coastguard Worker <title>Clusters and shaping</title> 11*2d1272b8SAndroid Build Coastguard Worker <para> 12*2d1272b8SAndroid Build Coastguard Worker In text shaping, a <emphasis>cluster</emphasis> is a sequence of 13*2d1272b8SAndroid Build Coastguard Worker characters that needs to be treated as a single, indivisible 14*2d1272b8SAndroid Build Coastguard Worker unit. A single letter or symbol can be a cluster of its 15*2d1272b8SAndroid Build Coastguard Worker own. Other clusters correspond to longer subsequences of the 16*2d1272b8SAndroid Build Coastguard Worker input code points — such as a ligature or conjunct form 17*2d1272b8SAndroid Build Coastguard Worker — and require the shaper to ensure that the cluster is not 18*2d1272b8SAndroid Build Coastguard Worker broken during the shaping process. 19*2d1272b8SAndroid Build Coastguard Worker </para> 20*2d1272b8SAndroid Build Coastguard Worker <para> 21*2d1272b8SAndroid Build Coastguard Worker A cluster is distinct from a <emphasis>grapheme</emphasis>, 22*2d1272b8SAndroid Build Coastguard Worker which is the smallest unit of meaning in a writing system or 23*2d1272b8SAndroid Build Coastguard Worker script. 24*2d1272b8SAndroid Build Coastguard Worker </para> 25*2d1272b8SAndroid Build Coastguard Worker <para> 26*2d1272b8SAndroid Build Coastguard Worker The definitions of the two terms are similar. However, clusters 27*2d1272b8SAndroid Build Coastguard Worker are only relevant for script shaping and glyph layout. In 28*2d1272b8SAndroid Build Coastguard Worker contrast, graphemes are a property of the underlying script, and 29*2d1272b8SAndroid Build Coastguard Worker are of interest when client programs implement orthographic 30*2d1272b8SAndroid Build Coastguard Worker or linguistic functionality. 31*2d1272b8SAndroid Build Coastguard Worker </para> 32*2d1272b8SAndroid Build Coastguard Worker <para> 33*2d1272b8SAndroid Build Coastguard Worker For example, two individual letters are often two separate 34*2d1272b8SAndroid Build Coastguard Worker graphemes. When two letters form a ligature, however, they 35*2d1272b8SAndroid Build Coastguard Worker combine into a single glyph. They are then part of the same 36*2d1272b8SAndroid Build Coastguard Worker cluster and are treated as a unit by the shaping engine — 37*2d1272b8SAndroid Build Coastguard Worker even though the two original, underlying letters remain separate 38*2d1272b8SAndroid Build Coastguard Worker graphemes. 39*2d1272b8SAndroid Build Coastguard Worker </para> 40*2d1272b8SAndroid Build Coastguard Worker <para> 41*2d1272b8SAndroid Build Coastguard Worker HarfBuzz is concerned with clusters, <emphasis>not</emphasis> 42*2d1272b8SAndroid Build Coastguard Worker with graphemes — although client programs using HarfBuzz 43*2d1272b8SAndroid Build Coastguard Worker may still care about graphemes for other reasons from time to time. 44*2d1272b8SAndroid Build Coastguard Worker </para> 45*2d1272b8SAndroid Build Coastguard Worker <para> 46*2d1272b8SAndroid Build Coastguard Worker During the shaping process, there are several shaping operations 47*2d1272b8SAndroid Build Coastguard Worker that may merge adjacent characters (for example, when two code 48*2d1272b8SAndroid Build Coastguard Worker points form a ligature or a conjunct form and are replaced by a 49*2d1272b8SAndroid Build Coastguard Worker single glyph) or split one character into several (for example, 50*2d1272b8SAndroid Build Coastguard Worker when decomposing a code point through the 51*2d1272b8SAndroid Build Coastguard Worker <literal>ccmp</literal> feature). Operations like these alter 52*2d1272b8SAndroid Build Coastguard Worker clusters; HarfBuzz tracks the changes to ensure that no clusters 53*2d1272b8SAndroid Build Coastguard Worker get lost or broken during shaping. 54*2d1272b8SAndroid Build Coastguard Worker </para> 55*2d1272b8SAndroid Build Coastguard Worker <para> 56*2d1272b8SAndroid Build Coastguard Worker HarfBuzz records cluster information independently from how 57*2d1272b8SAndroid Build Coastguard Worker shaping operations affect the individual glyphs returned in an 58*2d1272b8SAndroid Build Coastguard Worker output buffer. Consequently, a client program using HarfBuzz can 59*2d1272b8SAndroid Build Coastguard Worker utilize the cluster information to implement features such as: 60*2d1272b8SAndroid Build Coastguard Worker </para> 61*2d1272b8SAndroid Build Coastguard Worker <itemizedlist> 62*2d1272b8SAndroid Build Coastguard Worker <listitem> 63*2d1272b8SAndroid Build Coastguard Worker <para> 64*2d1272b8SAndroid Build Coastguard Worker Correctly positioning the cursor within a shaped text run, 65*2d1272b8SAndroid Build Coastguard Worker even when characters have formed ligatures, composed or 66*2d1272b8SAndroid Build Coastguard Worker decomposed, reordered, or undergone other shaping operations. 67*2d1272b8SAndroid Build Coastguard Worker </para> 68*2d1272b8SAndroid Build Coastguard Worker </listitem> 69*2d1272b8SAndroid Build Coastguard Worker <listitem> 70*2d1272b8SAndroid Build Coastguard Worker <para> 71*2d1272b8SAndroid Build Coastguard Worker Correctly highlighting a text selection that includes some, 72*2d1272b8SAndroid Build Coastguard Worker but not all, of the characters in a word. 73*2d1272b8SAndroid Build Coastguard Worker </para> 74*2d1272b8SAndroid Build Coastguard Worker </listitem> 75*2d1272b8SAndroid Build Coastguard Worker <listitem> 76*2d1272b8SAndroid Build Coastguard Worker <para> 77*2d1272b8SAndroid Build Coastguard Worker Applying text attributes (such as color or underlining) to 78*2d1272b8SAndroid Build Coastguard Worker part, but not all, of a word. 79*2d1272b8SAndroid Build Coastguard Worker </para> 80*2d1272b8SAndroid Build Coastguard Worker </listitem> 81*2d1272b8SAndroid Build Coastguard Worker <listitem> 82*2d1272b8SAndroid Build Coastguard Worker <para> 83*2d1272b8SAndroid Build Coastguard Worker Generating output document formats (such as PDF) with 84*2d1272b8SAndroid Build Coastguard Worker embedded text that can be fully extracted. 85*2d1272b8SAndroid Build Coastguard Worker </para> 86*2d1272b8SAndroid Build Coastguard Worker </listitem> 87*2d1272b8SAndroid Build Coastguard Worker <listitem> 88*2d1272b8SAndroid Build Coastguard Worker <para> 89*2d1272b8SAndroid Build Coastguard Worker Determining the mapping between input characters and output 90*2d1272b8SAndroid Build Coastguard Worker glyphs, such as which glyphs are ligatures. 91*2d1272b8SAndroid Build Coastguard Worker </para> 92*2d1272b8SAndroid Build Coastguard Worker </listitem> 93*2d1272b8SAndroid Build Coastguard Worker <listitem> 94*2d1272b8SAndroid Build Coastguard Worker <para> 95*2d1272b8SAndroid Build Coastguard Worker Performing line-breaking, justification, and other 96*2d1272b8SAndroid Build Coastguard Worker line-level or paragraph-level operations that must be done 97*2d1272b8SAndroid Build Coastguard Worker after shaping is complete, but which require examining 98*2d1272b8SAndroid Build Coastguard Worker character-level properties. 99*2d1272b8SAndroid Build Coastguard Worker </para> 100*2d1272b8SAndroid Build Coastguard Worker </listitem> 101*2d1272b8SAndroid Build Coastguard Worker </itemizedlist> 102*2d1272b8SAndroid Build Coastguard Worker </section> 103*2d1272b8SAndroid Build Coastguard Worker <section id="working-with-harfbuzz-clusters"> 104*2d1272b8SAndroid Build Coastguard Worker <title>Working with HarfBuzz clusters</title> 105*2d1272b8SAndroid Build Coastguard Worker <para> 106*2d1272b8SAndroid Build Coastguard Worker When you add text to a HarfBuzz buffer, each code point must be 107*2d1272b8SAndroid Build Coastguard Worker assigned a <emphasis>cluster value</emphasis>. 108*2d1272b8SAndroid Build Coastguard Worker </para> 109*2d1272b8SAndroid Build Coastguard Worker <para> 110*2d1272b8SAndroid Build Coastguard Worker This cluster value is an arbitrary number; HarfBuzz uses it only 111*2d1272b8SAndroid Build Coastguard Worker to distinguish between clusters. Many client programs will use 112*2d1272b8SAndroid Build Coastguard Worker the index of each code point in the input text stream as the 113*2d1272b8SAndroid Build Coastguard Worker cluster value. This is for the sake of convenience; the actual 114*2d1272b8SAndroid Build Coastguard Worker value does not matter. 115*2d1272b8SAndroid Build Coastguard Worker </para> 116*2d1272b8SAndroid Build Coastguard Worker <para> 117*2d1272b8SAndroid Build Coastguard Worker Some of the shaping operations performed by HarfBuzz — 118*2d1272b8SAndroid Build Coastguard Worker such as reordering, composition, decomposition, and substitution 119*2d1272b8SAndroid Build Coastguard Worker — may alter the cluster values of some characters. The 120*2d1272b8SAndroid Build Coastguard Worker final cluster values in the buffer at the end of the shaping 121*2d1272b8SAndroid Build Coastguard Worker process will indicate to client programs which subsequences of 122*2d1272b8SAndroid Build Coastguard Worker glyphs represent a cluster and, therefore, must not be 123*2d1272b8SAndroid Build Coastguard Worker separated. 124*2d1272b8SAndroid Build Coastguard Worker </para> 125*2d1272b8SAndroid Build Coastguard Worker <para> 126*2d1272b8SAndroid Build Coastguard Worker In addition, client programs can query the final cluster values 127*2d1272b8SAndroid Build Coastguard Worker to discern other potentially important information about the 128*2d1272b8SAndroid Build Coastguard Worker glyphs in the output buffer (such as whether or not a ligature 129*2d1272b8SAndroid Build Coastguard Worker was formed). 130*2d1272b8SAndroid Build Coastguard Worker </para> 131*2d1272b8SAndroid Build Coastguard Worker <para> 132*2d1272b8SAndroid Build Coastguard Worker For example, if the initial sequence of cluster values was: 133*2d1272b8SAndroid Build Coastguard Worker </para> 134*2d1272b8SAndroid Build Coastguard Worker <programlisting> 135*2d1272b8SAndroid Build Coastguard Worker 0,1,2,3,4 136*2d1272b8SAndroid Build Coastguard Worker </programlisting> 137*2d1272b8SAndroid Build Coastguard Worker <para> 138*2d1272b8SAndroid Build Coastguard Worker and the final sequence of cluster values is: 139*2d1272b8SAndroid Build Coastguard Worker </para> 140*2d1272b8SAndroid Build Coastguard Worker <programlisting> 141*2d1272b8SAndroid Build Coastguard Worker 0,0,3,3 142*2d1272b8SAndroid Build Coastguard Worker </programlisting> 143*2d1272b8SAndroid Build Coastguard Worker <para> 144*2d1272b8SAndroid Build Coastguard Worker then there are two clusters in the output buffer: the first 145*2d1272b8SAndroid Build Coastguard Worker cluster includes the first two glyphs, and the second cluster 146*2d1272b8SAndroid Build Coastguard Worker includes the third and fourth glyphs. It is also evident that a 147*2d1272b8SAndroid Build Coastguard Worker ligature or conjunct has been formed, because there are fewer 148*2d1272b8SAndroid Build Coastguard Worker glyphs in the output buffer (four) than there were code points 149*2d1272b8SAndroid Build Coastguard Worker in the input buffer (five). 150*2d1272b8SAndroid Build Coastguard Worker </para> 151*2d1272b8SAndroid Build Coastguard Worker <para> 152*2d1272b8SAndroid Build Coastguard Worker Although client programs using HarfBuzz are free to assign 153*2d1272b8SAndroid Build Coastguard Worker initial cluster values in any manner they choose to, HarfBuzz 154*2d1272b8SAndroid Build Coastguard Worker does offer some useful guarantees if the cluster values are 155*2d1272b8SAndroid Build Coastguard Worker assigned in a monotonic (either non-decreasing or non-increasing) 156*2d1272b8SAndroid Build Coastguard Worker order. 157*2d1272b8SAndroid Build Coastguard Worker </para> 158*2d1272b8SAndroid Build Coastguard Worker <para> 159*2d1272b8SAndroid Build Coastguard Worker For buffers in the left-to-right (LTR) 160*2d1272b8SAndroid Build Coastguard Worker or top-to-bottom (TTB) text flow direction, 161*2d1272b8SAndroid Build Coastguard Worker HarfBuzz will preserve the monotonic property: client programs 162*2d1272b8SAndroid Build Coastguard Worker are guaranteed that monotonically increasing initial cluster 163*2d1272b8SAndroid Build Coastguard Worker values will be returned as monotonically increasing final 164*2d1272b8SAndroid Build Coastguard Worker cluster values. 165*2d1272b8SAndroid Build Coastguard Worker </para> 166*2d1272b8SAndroid Build Coastguard Worker <para> 167*2d1272b8SAndroid Build Coastguard Worker For buffers in the right-to-left (RTL) 168*2d1272b8SAndroid Build Coastguard Worker or bottom-to-top (BTT) text flow direction, 169*2d1272b8SAndroid Build Coastguard Worker the directionality of the buffer itself is reversed for final 170*2d1272b8SAndroid Build Coastguard Worker output as a matter of design. Therefore, HarfBuzz inverts the 171*2d1272b8SAndroid Build Coastguard Worker monotonic property: client programs are guaranteed that 172*2d1272b8SAndroid Build Coastguard Worker monotonically increasing initial cluster values will be 173*2d1272b8SAndroid Build Coastguard Worker returned as monotonically <emphasis>decreasing</emphasis> final 174*2d1272b8SAndroid Build Coastguard Worker cluster values. 175*2d1272b8SAndroid Build Coastguard Worker </para> 176*2d1272b8SAndroid Build Coastguard Worker <para> 177*2d1272b8SAndroid Build Coastguard Worker Client programs can adjust how HarfBuzz handles clusters during 178*2d1272b8SAndroid Build Coastguard Worker shaping by setting the 179*2d1272b8SAndroid Build Coastguard Worker <literal>cluster_level</literal> of the 180*2d1272b8SAndroid Build Coastguard Worker buffer. HarfBuzz offers three <emphasis>levels</emphasis> of 181*2d1272b8SAndroid Build Coastguard Worker clustering support for this property: 182*2d1272b8SAndroid Build Coastguard Worker </para> 183*2d1272b8SAndroid Build Coastguard Worker <itemizedlist> 184*2d1272b8SAndroid Build Coastguard Worker <listitem> 185*2d1272b8SAndroid Build Coastguard Worker <para><emphasis>Level 0</emphasis> is the default. 186*2d1272b8SAndroid Build Coastguard Worker </para> 187*2d1272b8SAndroid Build Coastguard Worker <para> 188*2d1272b8SAndroid Build Coastguard Worker The distinguishing feature of level 0 behavior is that, at 189*2d1272b8SAndroid Build Coastguard Worker the beginning of processing the buffer, all code points that 190*2d1272b8SAndroid Build Coastguard Worker are categorized as <emphasis>marks</emphasis>, 191*2d1272b8SAndroid Build Coastguard Worker <emphasis>modifier symbols</emphasis>, or 192*2d1272b8SAndroid Build Coastguard Worker <emphasis>Emoji extended pictographic</emphasis> modifiers, 193*2d1272b8SAndroid Build Coastguard Worker as well as the <emphasis>Zero Width Joiner</emphasis> and 194*2d1272b8SAndroid Build Coastguard Worker <emphasis>Zero Width Non-Joiner</emphasis> code points, are 195*2d1272b8SAndroid Build Coastguard Worker assigned the cluster value of the closest preceding code 196*2d1272b8SAndroid Build Coastguard Worker point from <emphasis>different</emphasis> category. 197*2d1272b8SAndroid Build Coastguard Worker </para> 198*2d1272b8SAndroid Build Coastguard Worker <para> 199*2d1272b8SAndroid Build Coastguard Worker In essence, whenever a base character is followed by a mark 200*2d1272b8SAndroid Build Coastguard Worker character or a sequence of mark characters, those marks are 201*2d1272b8SAndroid Build Coastguard Worker reassigned to the same initial cluster value as the base 202*2d1272b8SAndroid Build Coastguard Worker character. This reassignment is referred to as 203*2d1272b8SAndroid Build Coastguard Worker "merging" the affected clusters. This behavior is based on 204*2d1272b8SAndroid Build Coastguard Worker the Grapheme Cluster Boundary specification in <ulink 205*2d1272b8SAndroid Build Coastguard Worker url="https://www.unicode.org/reports/tr29/#Regex_Definitions">Unicode 206*2d1272b8SAndroid Build Coastguard Worker Technical Report 29</ulink>. 207*2d1272b8SAndroid Build Coastguard Worker </para> 208*2d1272b8SAndroid Build Coastguard Worker <para> 209*2d1272b8SAndroid Build Coastguard Worker This cluster level is suitable for code that likes to use 210*2d1272b8SAndroid Build Coastguard Worker HarfBuzz cluster values as an approximation of the Unicode 211*2d1272b8SAndroid Build Coastguard Worker Grapheme Cluster Boundaries as well. 212*2d1272b8SAndroid Build Coastguard Worker </para> 213*2d1272b8SAndroid Build Coastguard Worker <para> 214*2d1272b8SAndroid Build Coastguard Worker Client programs can specify level 0 behavior for a buffer by 215*2d1272b8SAndroid Build Coastguard Worker setting its <literal>cluster_level</literal> to 216*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_GRAPHEMES</literal>. 217*2d1272b8SAndroid Build Coastguard Worker </para> 218*2d1272b8SAndroid Build Coastguard Worker </listitem> 219*2d1272b8SAndroid Build Coastguard Worker <listitem> 220*2d1272b8SAndroid Build Coastguard Worker <para> 221*2d1272b8SAndroid Build Coastguard Worker <emphasis>Level 1</emphasis> tweaks the old behavior 222*2d1272b8SAndroid Build Coastguard Worker slightly to produce better results. Therefore, level 1 223*2d1272b8SAndroid Build Coastguard Worker clustering is recommended for code that is not required to 224*2d1272b8SAndroid Build Coastguard Worker implement backward compatibility with the old HarfBuzz. 225*2d1272b8SAndroid Build Coastguard Worker </para> 226*2d1272b8SAndroid Build Coastguard Worker <para> 227*2d1272b8SAndroid Build Coastguard Worker <emphasis>Level 1</emphasis> differs from level 0 by not merging the 228*2d1272b8SAndroid Build Coastguard Worker clusters of marks and other modifier code points with the 229*2d1272b8SAndroid Build Coastguard Worker preceding "base" code point's cluster. By preserving the 230*2d1272b8SAndroid Build Coastguard Worker separate cluster values of these marks and modifier code 231*2d1272b8SAndroid Build Coastguard Worker points, script shapers can perform additional operations 232*2d1272b8SAndroid Build Coastguard Worker that might lead to improved results (for example, coloring 233*2d1272b8SAndroid Build Coastguard Worker mark glyphs differently than their base). 234*2d1272b8SAndroid Build Coastguard Worker </para> 235*2d1272b8SAndroid Build Coastguard Worker <para> 236*2d1272b8SAndroid Build Coastguard Worker Client programs can specify level 1 behavior for a buffer by 237*2d1272b8SAndroid Build Coastguard Worker setting its <literal>cluster_level</literal> to 238*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_CLUSTER_LEVEL_MONOTONE_CHARACTERS</literal>. 239*2d1272b8SAndroid Build Coastguard Worker </para> 240*2d1272b8SAndroid Build Coastguard Worker </listitem> 241*2d1272b8SAndroid Build Coastguard Worker <listitem> 242*2d1272b8SAndroid Build Coastguard Worker <para> 243*2d1272b8SAndroid Build Coastguard Worker <emphasis>Level 2</emphasis> differs significantly in how it 244*2d1272b8SAndroid Build Coastguard Worker treats cluster values. In level 2, HarfBuzz never merges 245*2d1272b8SAndroid Build Coastguard Worker clusters. 246*2d1272b8SAndroid Build Coastguard Worker </para> 247*2d1272b8SAndroid Build Coastguard Worker <para> 248*2d1272b8SAndroid Build Coastguard Worker This difference can be seen most clearly when HarfBuzz processes 249*2d1272b8SAndroid Build Coastguard Worker ligature substitutions and glyph decompositions. In level 0 250*2d1272b8SAndroid Build Coastguard Worker and level 1, ligatures and glyph decomposition both involve 251*2d1272b8SAndroid Build Coastguard Worker merging clusters; in level 2, neither of these operations 252*2d1272b8SAndroid Build Coastguard Worker triggers a merge. 253*2d1272b8SAndroid Build Coastguard Worker </para> 254*2d1272b8SAndroid Build Coastguard Worker <para> 255*2d1272b8SAndroid Build Coastguard Worker Client programs can specify level 2 behavior for a buffer by 256*2d1272b8SAndroid Build Coastguard Worker setting its <literal>cluster_level</literal> to 257*2d1272b8SAndroid Build Coastguard Worker <literal>HB_BUFFER_CLUSTER_LEVEL_CHARACTERS</literal>. 258*2d1272b8SAndroid Build Coastguard Worker </para> 259*2d1272b8SAndroid Build Coastguard Worker </listitem> 260*2d1272b8SAndroid Build Coastguard Worker </itemizedlist> 261*2d1272b8SAndroid Build Coastguard Worker <para> 262*2d1272b8SAndroid Build Coastguard Worker As mentioned earlier, client programs using HarfBuzz often 263*2d1272b8SAndroid Build Coastguard Worker assign initial cluster values in a buffer by reusing the indices 264*2d1272b8SAndroid Build Coastguard Worker of the code points in the input text. This gives a sequence of 265*2d1272b8SAndroid Build Coastguard Worker cluster values that is monotonically increasing (for example, 266*2d1272b8SAndroid Build Coastguard Worker 0,1,2,3,4). 267*2d1272b8SAndroid Build Coastguard Worker </para> 268*2d1272b8SAndroid Build Coastguard Worker <para> 269*2d1272b8SAndroid Build Coastguard Worker It is not <emphasis>required</emphasis> that the cluster values 270*2d1272b8SAndroid Build Coastguard Worker in a buffer be monotonically increasing. However, if the initial 271*2d1272b8SAndroid Build Coastguard Worker cluster values in a buffer are monotonic and the buffer is 272*2d1272b8SAndroid Build Coastguard Worker configured to use cluster level 0 or 1, then HarfBuzz 273*2d1272b8SAndroid Build Coastguard Worker guarantees that the final cluster values in the shaped buffer 274*2d1272b8SAndroid Build Coastguard Worker will also be monotonic. No such guarantee is made for cluster 275*2d1272b8SAndroid Build Coastguard Worker level 2. 276*2d1272b8SAndroid Build Coastguard Worker </para> 277*2d1272b8SAndroid Build Coastguard Worker <para> 278*2d1272b8SAndroid Build Coastguard Worker In levels 0 and 1, HarfBuzz implements the following conceptual 279*2d1272b8SAndroid Build Coastguard Worker model for cluster values: 280*2d1272b8SAndroid Build Coastguard Worker </para> 281*2d1272b8SAndroid Build Coastguard Worker <itemizedlist spacing="compact"> 282*2d1272b8SAndroid Build Coastguard Worker <listitem> 283*2d1272b8SAndroid Build Coastguard Worker <para> 284*2d1272b8SAndroid Build Coastguard Worker If the sequence of input cluster values is monotonic, the 285*2d1272b8SAndroid Build Coastguard Worker sequence of cluster values will remain monotonic. 286*2d1272b8SAndroid Build Coastguard Worker </para> 287*2d1272b8SAndroid Build Coastguard Worker </listitem> 288*2d1272b8SAndroid Build Coastguard Worker <listitem> 289*2d1272b8SAndroid Build Coastguard Worker <para> 290*2d1272b8SAndroid Build Coastguard Worker Each cluster value represents a single cluster. 291*2d1272b8SAndroid Build Coastguard Worker </para> 292*2d1272b8SAndroid Build Coastguard Worker </listitem> 293*2d1272b8SAndroid Build Coastguard Worker <listitem> 294*2d1272b8SAndroid Build Coastguard Worker <para> 295*2d1272b8SAndroid Build Coastguard Worker Each cluster contains one or more glyphs and one or more 296*2d1272b8SAndroid Build Coastguard Worker characters. 297*2d1272b8SAndroid Build Coastguard Worker </para> 298*2d1272b8SAndroid Build Coastguard Worker </listitem> 299*2d1272b8SAndroid Build Coastguard Worker </itemizedlist> 300*2d1272b8SAndroid Build Coastguard Worker <para> 301*2d1272b8SAndroid Build Coastguard Worker In practice, this model offers several benefits. Assuming that 302*2d1272b8SAndroid Build Coastguard Worker the initial cluster values were monotonically increasing 303*2d1272b8SAndroid Build Coastguard Worker and distinct before shaping began, then, in the final output: 304*2d1272b8SAndroid Build Coastguard Worker </para> 305*2d1272b8SAndroid Build Coastguard Worker <itemizedlist spacing="compact"> 306*2d1272b8SAndroid Build Coastguard Worker <listitem> 307*2d1272b8SAndroid Build Coastguard Worker <para> 308*2d1272b8SAndroid Build Coastguard Worker All adjacent glyphs having the same final cluster 309*2d1272b8SAndroid Build Coastguard Worker value belong to the same cluster. 310*2d1272b8SAndroid Build Coastguard Worker </para> 311*2d1272b8SAndroid Build Coastguard Worker </listitem> 312*2d1272b8SAndroid Build Coastguard Worker <listitem> 313*2d1272b8SAndroid Build Coastguard Worker <para> 314*2d1272b8SAndroid Build Coastguard Worker Each character belongs to the cluster that has the highest 315*2d1272b8SAndroid Build Coastguard Worker cluster value <emphasis>not larger than</emphasis> its 316*2d1272b8SAndroid Build Coastguard Worker initial cluster value. 317*2d1272b8SAndroid Build Coastguard Worker </para> 318*2d1272b8SAndroid Build Coastguard Worker </listitem> 319*2d1272b8SAndroid Build Coastguard Worker </itemizedlist> 320*2d1272b8SAndroid Build Coastguard Worker </section> 321*2d1272b8SAndroid Build Coastguard Worker 322*2d1272b8SAndroid Build Coastguard Worker <section id="a-clustering-example-for-levels-0-and-1"> 323*2d1272b8SAndroid Build Coastguard Worker <title>A clustering example for levels 0 and 1</title> 324*2d1272b8SAndroid Build Coastguard Worker <para> 325*2d1272b8SAndroid Build Coastguard Worker The basic shaping operations affect clusters in a predictable 326*2d1272b8SAndroid Build Coastguard Worker manner when using level 0 or level 1: 327*2d1272b8SAndroid Build Coastguard Worker </para> 328*2d1272b8SAndroid Build Coastguard Worker <itemizedlist> 329*2d1272b8SAndroid Build Coastguard Worker <listitem> 330*2d1272b8SAndroid Build Coastguard Worker <para> 331*2d1272b8SAndroid Build Coastguard Worker When two or more clusters <emphasis>merge</emphasis>, the 332*2d1272b8SAndroid Build Coastguard Worker resulting merged cluster takes as its cluster value the 333*2d1272b8SAndroid Build Coastguard Worker <emphasis>minimum</emphasis> of the incoming cluster values. 334*2d1272b8SAndroid Build Coastguard Worker </para> 335*2d1272b8SAndroid Build Coastguard Worker </listitem> 336*2d1272b8SAndroid Build Coastguard Worker <listitem> 337*2d1272b8SAndroid Build Coastguard Worker <para> 338*2d1272b8SAndroid Build Coastguard Worker When a cluster <emphasis>decomposes</emphasis>, all of the 339*2d1272b8SAndroid Build Coastguard Worker resulting child clusters inherit as their cluster value the 340*2d1272b8SAndroid Build Coastguard Worker cluster value of the parent cluster. 341*2d1272b8SAndroid Build Coastguard Worker </para> 342*2d1272b8SAndroid Build Coastguard Worker </listitem> 343*2d1272b8SAndroid Build Coastguard Worker <listitem> 344*2d1272b8SAndroid Build Coastguard Worker <para> 345*2d1272b8SAndroid Build Coastguard Worker When a character is <emphasis>reordered</emphasis>, the 346*2d1272b8SAndroid Build Coastguard Worker reordered character and all clusters that the character 347*2d1272b8SAndroid Build Coastguard Worker moves past as part of the reordering are merged into one cluster. 348*2d1272b8SAndroid Build Coastguard Worker </para> 349*2d1272b8SAndroid Build Coastguard Worker </listitem> 350*2d1272b8SAndroid Build Coastguard Worker </itemizedlist> 351*2d1272b8SAndroid Build Coastguard Worker <para> 352*2d1272b8SAndroid Build Coastguard Worker The functionality, guarantees, and benefits of level 0 and level 353*2d1272b8SAndroid Build Coastguard Worker 1 behavior can be seen with some examples. First, let us examine 354*2d1272b8SAndroid Build Coastguard Worker what happens with cluster values when shaping involves cluster 355*2d1272b8SAndroid Build Coastguard Worker merging with ligatures and decomposition. 356*2d1272b8SAndroid Build Coastguard Worker </para> 357*2d1272b8SAndroid Build Coastguard Worker 358*2d1272b8SAndroid Build Coastguard Worker <para> 359*2d1272b8SAndroid Build Coastguard Worker Let's say we start with the following character sequence (top row) and 360*2d1272b8SAndroid Build Coastguard Worker initial cluster values (bottom row): 361*2d1272b8SAndroid Build Coastguard Worker </para> 362*2d1272b8SAndroid Build Coastguard Worker <programlisting> 363*2d1272b8SAndroid Build Coastguard Worker A,B,C,D,E 364*2d1272b8SAndroid Build Coastguard Worker 0,1,2,3,4 365*2d1272b8SAndroid Build Coastguard Worker </programlisting> 366*2d1272b8SAndroid Build Coastguard Worker <para> 367*2d1272b8SAndroid Build Coastguard Worker During shaping, HarfBuzz maps these characters to glyphs from 368*2d1272b8SAndroid Build Coastguard Worker the font. For simplicity, let us assume that each character maps 369*2d1272b8SAndroid Build Coastguard Worker to the corresponding, identical-looking glyph: 370*2d1272b8SAndroid Build Coastguard Worker </para> 371*2d1272b8SAndroid Build Coastguard Worker <programlisting> 372*2d1272b8SAndroid Build Coastguard Worker A,B,C,D,E 373*2d1272b8SAndroid Build Coastguard Worker 0,1,2,3,4 374*2d1272b8SAndroid Build Coastguard Worker </programlisting> 375*2d1272b8SAndroid Build Coastguard Worker <para> 376*2d1272b8SAndroid Build Coastguard Worker Now if, for example, <literal>B</literal> and <literal>C</literal> 377*2d1272b8SAndroid Build Coastguard Worker form a ligature, then the clusters to which they belong 378*2d1272b8SAndroid Build Coastguard Worker "merge". This merged cluster takes for its cluster 379*2d1272b8SAndroid Build Coastguard Worker value the minimum of all the cluster values of the clusters that 380*2d1272b8SAndroid Build Coastguard Worker went in to the ligature. In this case, we get: 381*2d1272b8SAndroid Build Coastguard Worker </para> 382*2d1272b8SAndroid Build Coastguard Worker <programlisting> 383*2d1272b8SAndroid Build Coastguard Worker A,BC,D,E 384*2d1272b8SAndroid Build Coastguard Worker 0,1 ,3,4 385*2d1272b8SAndroid Build Coastguard Worker </programlisting> 386*2d1272b8SAndroid Build Coastguard Worker <para> 387*2d1272b8SAndroid Build Coastguard Worker because 1 is the minimum of the set {1,2}, which were the 388*2d1272b8SAndroid Build Coastguard Worker cluster values of <literal>B</literal> and 389*2d1272b8SAndroid Build Coastguard Worker <literal>C</literal>. 390*2d1272b8SAndroid Build Coastguard Worker </para> 391*2d1272b8SAndroid Build Coastguard Worker <para> 392*2d1272b8SAndroid Build Coastguard Worker Next, let us say that the <literal>BC</literal> ligature glyph 393*2d1272b8SAndroid Build Coastguard Worker decomposes into three components, and <literal>D</literal> also 394*2d1272b8SAndroid Build Coastguard Worker decomposes into two components. Whenever a cluster decomposes, 395*2d1272b8SAndroid Build Coastguard Worker its components each inherit the cluster value of their parent: 396*2d1272b8SAndroid Build Coastguard Worker </para> 397*2d1272b8SAndroid Build Coastguard Worker <programlisting> 398*2d1272b8SAndroid Build Coastguard Worker A,BC0,BC1,BC2,D0,D1,E 399*2d1272b8SAndroid Build Coastguard Worker 0,1 ,1 ,1 ,3 ,3 ,4 400*2d1272b8SAndroid Build Coastguard Worker </programlisting> 401*2d1272b8SAndroid Build Coastguard Worker <para> 402*2d1272b8SAndroid Build Coastguard Worker Next, if <literal>BC2</literal> and <literal>D0</literal> form a 403*2d1272b8SAndroid Build Coastguard Worker ligature, then their clusters (cluster values 1 and 3) merge into 404*2d1272b8SAndroid Build Coastguard Worker <literal>min(1,3) = 1</literal>: 405*2d1272b8SAndroid Build Coastguard Worker </para> 406*2d1272b8SAndroid Build Coastguard Worker <programlisting> 407*2d1272b8SAndroid Build Coastguard Worker A,BC0,BC1,BC2D0,D1,E 408*2d1272b8SAndroid Build Coastguard Worker 0,1 ,1 ,1 ,1 ,4 409*2d1272b8SAndroid Build Coastguard Worker </programlisting> 410*2d1272b8SAndroid Build Coastguard Worker <para> 411*2d1272b8SAndroid Build Coastguard Worker Note that the entirety of cluster 3 merges into cluster 1, not 412*2d1272b8SAndroid Build Coastguard Worker just the <literal>D0</literal> glyph. This reflects the fact 413*2d1272b8SAndroid Build Coastguard Worker that the cluster <emphasis>must</emphasis> be treated as an 414*2d1272b8SAndroid Build Coastguard Worker indivisible unit. 415*2d1272b8SAndroid Build Coastguard Worker </para> 416*2d1272b8SAndroid Build Coastguard Worker <para> 417*2d1272b8SAndroid Build Coastguard Worker At this point, cluster 1 means: the character sequence 418*2d1272b8SAndroid Build Coastguard Worker <literal>BCD</literal> is represented by glyphs 419*2d1272b8SAndroid Build Coastguard Worker <literal>BC0,BC1,BC2D0,D1</literal> and cannot be broken down any 420*2d1272b8SAndroid Build Coastguard Worker further. 421*2d1272b8SAndroid Build Coastguard Worker </para> 422*2d1272b8SAndroid Build Coastguard Worker </section> 423*2d1272b8SAndroid Build Coastguard Worker <section id="reordering-in-levels-0-and-1"> 424*2d1272b8SAndroid Build Coastguard Worker <title>Reordering in levels 0 and 1</title> 425*2d1272b8SAndroid Build Coastguard Worker <para> 426*2d1272b8SAndroid Build Coastguard Worker Another common operation in some shapers is glyph 427*2d1272b8SAndroid Build Coastguard Worker reordering. In order to maintain a monotonic cluster sequence 428*2d1272b8SAndroid Build Coastguard Worker when glyph reordering takes place, HarfBuzz merges the clusters 429*2d1272b8SAndroid Build Coastguard Worker of everything in the reordering sequence. 430*2d1272b8SAndroid Build Coastguard Worker </para> 431*2d1272b8SAndroid Build Coastguard Worker <para> 432*2d1272b8SAndroid Build Coastguard Worker For example, let us again start with the character sequence (top 433*2d1272b8SAndroid Build Coastguard Worker row) and initial cluster values (bottom row): 434*2d1272b8SAndroid Build Coastguard Worker </para> 435*2d1272b8SAndroid Build Coastguard Worker <programlisting> 436*2d1272b8SAndroid Build Coastguard Worker A,B,C,D,E 437*2d1272b8SAndroid Build Coastguard Worker 0,1,2,3,4 438*2d1272b8SAndroid Build Coastguard Worker </programlisting> 439*2d1272b8SAndroid Build Coastguard Worker <para> 440*2d1272b8SAndroid Build Coastguard Worker If <literal>D</literal> is reordered to the position immediately 441*2d1272b8SAndroid Build Coastguard Worker before <literal>B</literal>, then HarfBuzz merges the 442*2d1272b8SAndroid Build Coastguard Worker <literal>B</literal>, <literal>C</literal>, and 443*2d1272b8SAndroid Build Coastguard Worker <literal>D</literal> clusters — all the clusters between 444*2d1272b8SAndroid Build Coastguard Worker the final position of the reordered glyph and its original 445*2d1272b8SAndroid Build Coastguard Worker position. This means that we get: 446*2d1272b8SAndroid Build Coastguard Worker </para> 447*2d1272b8SAndroid Build Coastguard Worker <programlisting> 448*2d1272b8SAndroid Build Coastguard Worker A,D,B,C,E 449*2d1272b8SAndroid Build Coastguard Worker 0,1,1,1,4 450*2d1272b8SAndroid Build Coastguard Worker </programlisting> 451*2d1272b8SAndroid Build Coastguard Worker <para> 452*2d1272b8SAndroid Build Coastguard Worker as the final cluster sequence. 453*2d1272b8SAndroid Build Coastguard Worker </para> 454*2d1272b8SAndroid Build Coastguard Worker <para> 455*2d1272b8SAndroid Build Coastguard Worker Merging this many clusters is not ideal, but it is the only 456*2d1272b8SAndroid Build Coastguard Worker sensible way for HarfBuzz to maintain the guarantee that the 457*2d1272b8SAndroid Build Coastguard Worker sequence of cluster values remains monotonic and to retain the 458*2d1272b8SAndroid Build Coastguard Worker true relationship between glyphs and characters. 459*2d1272b8SAndroid Build Coastguard Worker </para> 460*2d1272b8SAndroid Build Coastguard Worker </section> 461*2d1272b8SAndroid Build Coastguard Worker <section id="the-distinction-between-levels-0-and-1"> 462*2d1272b8SAndroid Build Coastguard Worker <title>The distinction between levels 0 and 1</title> 463*2d1272b8SAndroid Build Coastguard Worker <para> 464*2d1272b8SAndroid Build Coastguard Worker The preceding examples demonstrate the main effects of using 465*2d1272b8SAndroid Build Coastguard Worker cluster levels 0 and 1. The only difference between the two 466*2d1272b8SAndroid Build Coastguard Worker levels is this: in level 0, at the very beginning of the shaping 467*2d1272b8SAndroid Build Coastguard Worker process, HarfBuzz merges the cluster of each base character 468*2d1272b8SAndroid Build Coastguard Worker with the clusters of all Unicode marks (combining or not) and 469*2d1272b8SAndroid Build Coastguard Worker modifiers that follow it. 470*2d1272b8SAndroid Build Coastguard Worker </para> 471*2d1272b8SAndroid Build Coastguard Worker <para> 472*2d1272b8SAndroid Build Coastguard Worker For example, let us start with the following character sequence 473*2d1272b8SAndroid Build Coastguard Worker (top row) and accompanying initial cluster values (bottom row): 474*2d1272b8SAndroid Build Coastguard Worker </para> 475*2d1272b8SAndroid Build Coastguard Worker <programlisting> 476*2d1272b8SAndroid Build Coastguard Worker A,acute,B 477*2d1272b8SAndroid Build Coastguard Worker 0,1 ,2 478*2d1272b8SAndroid Build Coastguard Worker </programlisting> 479*2d1272b8SAndroid Build Coastguard Worker <para> 480*2d1272b8SAndroid Build Coastguard Worker The <literal>acute</literal> is a Unicode mark. If HarfBuzz is 481*2d1272b8SAndroid Build Coastguard Worker using cluster level 0 on this sequence, then the 482*2d1272b8SAndroid Build Coastguard Worker <literal>A</literal> and <literal>acute</literal> clusters will 483*2d1272b8SAndroid Build Coastguard Worker merge, and the result will become: 484*2d1272b8SAndroid Build Coastguard Worker </para> 485*2d1272b8SAndroid Build Coastguard Worker <programlisting> 486*2d1272b8SAndroid Build Coastguard Worker A,acute,B 487*2d1272b8SAndroid Build Coastguard Worker 0,0 ,2 488*2d1272b8SAndroid Build Coastguard Worker </programlisting> 489*2d1272b8SAndroid Build Coastguard Worker <para> 490*2d1272b8SAndroid Build Coastguard Worker This merger is performed before any other script-shaping 491*2d1272b8SAndroid Build Coastguard Worker steps. 492*2d1272b8SAndroid Build Coastguard Worker </para> 493*2d1272b8SAndroid Build Coastguard Worker <para> 494*2d1272b8SAndroid Build Coastguard Worker This initial cluster merging is the default behavior of the 495*2d1272b8SAndroid Build Coastguard Worker Windows shaping engine, and the old HarfBuzz codebase copied 496*2d1272b8SAndroid Build Coastguard Worker that behavior to maintain compatibility. Consequently, it has 497*2d1272b8SAndroid Build Coastguard Worker remained the default behavior in the new HarfBuzz codebase. 498*2d1272b8SAndroid Build Coastguard Worker </para> 499*2d1272b8SAndroid Build Coastguard Worker <para> 500*2d1272b8SAndroid Build Coastguard Worker But this initial cluster-merging behavior makes it impossible 501*2d1272b8SAndroid Build Coastguard Worker for client programs to implement some features (such as to 502*2d1272b8SAndroid Build Coastguard Worker color diacritic marks differently from their base 503*2d1272b8SAndroid Build Coastguard Worker characters). That is why, in level 1, HarfBuzz does not perform 504*2d1272b8SAndroid Build Coastguard Worker the initial merging step. 505*2d1272b8SAndroid Build Coastguard Worker </para> 506*2d1272b8SAndroid Build Coastguard Worker <para> 507*2d1272b8SAndroid Build Coastguard Worker For client programs that rely on HarfBuzz cluster values to 508*2d1272b8SAndroid Build Coastguard Worker perform cursor positioning, level 0 is more convenient. But 509*2d1272b8SAndroid Build Coastguard Worker relying on cluster boundaries for cursor positioning is wrong: cursor 510*2d1272b8SAndroid Build Coastguard Worker positions should be determined based on Unicode grapheme 511*2d1272b8SAndroid Build Coastguard Worker boundaries, not on shaping-cluster boundaries. As such, using 512*2d1272b8SAndroid Build Coastguard Worker level 1 clustering behavior is recommended. 513*2d1272b8SAndroid Build Coastguard Worker </para> 514*2d1272b8SAndroid Build Coastguard Worker <para> 515*2d1272b8SAndroid Build Coastguard Worker One final facet of levels 0 and 1 is worth noting. HarfBuzz 516*2d1272b8SAndroid Build Coastguard Worker currently does not allow any 517*2d1272b8SAndroid Build Coastguard Worker <emphasis>multiple-substitution</emphasis> GSUB lookups to 518*2d1272b8SAndroid Build Coastguard Worker replace a glyph with zero glyphs (in other words, to delete a 519*2d1272b8SAndroid Build Coastguard Worker glyph). 520*2d1272b8SAndroid Build Coastguard Worker </para> 521*2d1272b8SAndroid Build Coastguard Worker <para> 522*2d1272b8SAndroid Build Coastguard Worker But, in some other situations, glyphs can be deleted. In 523*2d1272b8SAndroid Build Coastguard Worker those cases, if the glyph being deleted is the last glyph of its 524*2d1272b8SAndroid Build Coastguard Worker cluster, HarfBuzz makes sure to merge the deleted glyph's 525*2d1272b8SAndroid Build Coastguard Worker cluster with a neighboring cluster. 526*2d1272b8SAndroid Build Coastguard Worker </para> 527*2d1272b8SAndroid Build Coastguard Worker <para> 528*2d1272b8SAndroid Build Coastguard Worker This is done primarily to make sure that the starting cluster of the 529*2d1272b8SAndroid Build Coastguard Worker text always has the cluster index pointing to the start of the text 530*2d1272b8SAndroid Build Coastguard Worker for the run; more than one client program currently relies on this 531*2d1272b8SAndroid Build Coastguard Worker guarantee. 532*2d1272b8SAndroid Build Coastguard Worker </para> 533*2d1272b8SAndroid Build Coastguard Worker <para> 534*2d1272b8SAndroid Build Coastguard Worker Incidentally, Apple's CoreText does something different to 535*2d1272b8SAndroid Build Coastguard Worker maintain the same promise: it inserts a glyph with id 65535 at 536*2d1272b8SAndroid Build Coastguard Worker the beginning of the glyph string if the glyph corresponding to 537*2d1272b8SAndroid Build Coastguard Worker the first character in the run was deleted. HarfBuzz might do 538*2d1272b8SAndroid Build Coastguard Worker something similar in the future. 539*2d1272b8SAndroid Build Coastguard Worker </para> 540*2d1272b8SAndroid Build Coastguard Worker </section> 541*2d1272b8SAndroid Build Coastguard Worker <section id="level-2"> 542*2d1272b8SAndroid Build Coastguard Worker <title>Level 2</title> 543*2d1272b8SAndroid Build Coastguard Worker <para> 544*2d1272b8SAndroid Build Coastguard Worker HarfBuzz's level 2 cluster behavior uses a significantly 545*2d1272b8SAndroid Build Coastguard Worker different model than that of level 0 and level 1. 546*2d1272b8SAndroid Build Coastguard Worker </para> 547*2d1272b8SAndroid Build Coastguard Worker <para> 548*2d1272b8SAndroid Build Coastguard Worker The level 2 behavior is easy to describe, but it may be 549*2d1272b8SAndroid Build Coastguard Worker difficult to understand in practical terms. In brief, level 2 550*2d1272b8SAndroid Build Coastguard Worker performs no merging of clusters whatsoever. 551*2d1272b8SAndroid Build Coastguard Worker </para> 552*2d1272b8SAndroid Build Coastguard Worker <para> 553*2d1272b8SAndroid Build Coastguard Worker This means that there is no initial base-and-mark merging step 554*2d1272b8SAndroid Build Coastguard Worker (as is done in level 0), and it means that reordering moves and 555*2d1272b8SAndroid Build Coastguard Worker ligature substitutions do not trigger a cluster merge. 556*2d1272b8SAndroid Build Coastguard Worker </para> 557*2d1272b8SAndroid Build Coastguard Worker <para> 558*2d1272b8SAndroid Build Coastguard Worker Only one shaping operation directly affects clusters when using 559*2d1272b8SAndroid Build Coastguard Worker level 2: 560*2d1272b8SAndroid Build Coastguard Worker </para> 561*2d1272b8SAndroid Build Coastguard Worker <itemizedlist> 562*2d1272b8SAndroid Build Coastguard Worker <listitem> 563*2d1272b8SAndroid Build Coastguard Worker <para> 564*2d1272b8SAndroid Build Coastguard Worker When a cluster <emphasis>decomposes</emphasis>, all of the 565*2d1272b8SAndroid Build Coastguard Worker resulting child clusters inherit as their cluster value the 566*2d1272b8SAndroid Build Coastguard Worker cluster value of the parent cluster. 567*2d1272b8SAndroid Build Coastguard Worker </para> 568*2d1272b8SAndroid Build Coastguard Worker </listitem> 569*2d1272b8SAndroid Build Coastguard Worker </itemizedlist> 570*2d1272b8SAndroid Build Coastguard Worker <para> 571*2d1272b8SAndroid Build Coastguard Worker When glyphs do form a ligature (or when some other feature 572*2d1272b8SAndroid Build Coastguard Worker substitutes multiple glyphs with one glyph) the cluster value 573*2d1272b8SAndroid Build Coastguard Worker of the first glyph is retained as the cluster value for the 574*2d1272b8SAndroid Build Coastguard Worker resulting ligature. 575*2d1272b8SAndroid Build Coastguard Worker </para> 576*2d1272b8SAndroid Build Coastguard Worker <para> 577*2d1272b8SAndroid Build Coastguard Worker This occurrence sounds similar to a cluster merge, but it is 578*2d1272b8SAndroid Build Coastguard Worker different. In particular, no subsequent characters — 579*2d1272b8SAndroid Build Coastguard Worker including marks and modifiers — are affected. They retain 580*2d1272b8SAndroid Build Coastguard Worker their previous cluster values. 581*2d1272b8SAndroid Build Coastguard Worker </para> 582*2d1272b8SAndroid Build Coastguard Worker <para> 583*2d1272b8SAndroid Build Coastguard Worker Level 2 cluster behavior is ultimately less complex than level 0 584*2d1272b8SAndroid Build Coastguard Worker or level 1, but there are several cases for which processing 585*2d1272b8SAndroid Build Coastguard Worker cluster values produced at level 2 may be tricky. 586*2d1272b8SAndroid Build Coastguard Worker </para> 587*2d1272b8SAndroid Build Coastguard Worker <section id="ligatures-with-combining-marks-in-level-2"> 588*2d1272b8SAndroid Build Coastguard Worker <title>Ligatures with combining marks in level 2</title> 589*2d1272b8SAndroid Build Coastguard Worker <para> 590*2d1272b8SAndroid Build Coastguard Worker The first example of how HarfBuzz's level 2 cluster behavior 591*2d1272b8SAndroid Build Coastguard Worker can be tricky is when the text to be shaped includes combining 592*2d1272b8SAndroid Build Coastguard Worker marks attached to ligatures. 593*2d1272b8SAndroid Build Coastguard Worker </para> 594*2d1272b8SAndroid Build Coastguard Worker <para> 595*2d1272b8SAndroid Build Coastguard Worker Let us start with an input sequence with the following 596*2d1272b8SAndroid Build Coastguard Worker characters (top row) and initial cluster values (bottom row): 597*2d1272b8SAndroid Build Coastguard Worker </para> 598*2d1272b8SAndroid Build Coastguard Worker <programlisting> 599*2d1272b8SAndroid Build Coastguard Worker A,acute,B,breve,C,circumflex 600*2d1272b8SAndroid Build Coastguard Worker 0,1 ,2,3 ,4,5 601*2d1272b8SAndroid Build Coastguard Worker </programlisting> 602*2d1272b8SAndroid Build Coastguard Worker <para> 603*2d1272b8SAndroid Build Coastguard Worker If the sequence <literal>A,B,C</literal> forms a ligature, 604*2d1272b8SAndroid Build Coastguard Worker then these are the cluster values HarfBuzz will return under 605*2d1272b8SAndroid Build Coastguard Worker the various cluster levels: 606*2d1272b8SAndroid Build Coastguard Worker </para> 607*2d1272b8SAndroid Build Coastguard Worker <para> 608*2d1272b8SAndroid Build Coastguard Worker Level 0: 609*2d1272b8SAndroid Build Coastguard Worker </para> 610*2d1272b8SAndroid Build Coastguard Worker <programlisting> 611*2d1272b8SAndroid Build Coastguard Worker ABC,acute,breve,circumflex 612*2d1272b8SAndroid Build Coastguard Worker 0 ,0 ,0 ,0 613*2d1272b8SAndroid Build Coastguard Worker </programlisting> 614*2d1272b8SAndroid Build Coastguard Worker <para> 615*2d1272b8SAndroid Build Coastguard Worker Level 1: 616*2d1272b8SAndroid Build Coastguard Worker </para> 617*2d1272b8SAndroid Build Coastguard Worker <programlisting> 618*2d1272b8SAndroid Build Coastguard Worker ABC,acute,breve,circumflex 619*2d1272b8SAndroid Build Coastguard Worker 0 ,0 ,0 ,5 620*2d1272b8SAndroid Build Coastguard Worker </programlisting> 621*2d1272b8SAndroid Build Coastguard Worker <para> 622*2d1272b8SAndroid Build Coastguard Worker Level 2: 623*2d1272b8SAndroid Build Coastguard Worker </para> 624*2d1272b8SAndroid Build Coastguard Worker <programlisting> 625*2d1272b8SAndroid Build Coastguard Worker ABC,acute,breve,circumflex 626*2d1272b8SAndroid Build Coastguard Worker 0 ,1 ,3 ,5 627*2d1272b8SAndroid Build Coastguard Worker </programlisting> 628*2d1272b8SAndroid Build Coastguard Worker <para> 629*2d1272b8SAndroid Build Coastguard Worker Making sense of the level 2 result is the hardest for a client 630*2d1272b8SAndroid Build Coastguard Worker program, because there is nothing in the cluster values that 631*2d1272b8SAndroid Build Coastguard Worker indicates that <literal>B</literal> and <literal>C</literal> 632*2d1272b8SAndroid Build Coastguard Worker formed a ligature with <literal>A</literal>. 633*2d1272b8SAndroid Build Coastguard Worker </para> 634*2d1272b8SAndroid Build Coastguard Worker <para> 635*2d1272b8SAndroid Build Coastguard Worker In contrast, the "merged" cluster values of the mark glyphs 636*2d1272b8SAndroid Build Coastguard Worker that are seen in the level 0 and level 1 output are evidence 637*2d1272b8SAndroid Build Coastguard Worker that a ligature substitution took place. 638*2d1272b8SAndroid Build Coastguard Worker </para> 639*2d1272b8SAndroid Build Coastguard Worker </section> 640*2d1272b8SAndroid Build Coastguard Worker <section id="reordering-in-level-2"> 641*2d1272b8SAndroid Build Coastguard Worker <title>Reordering in level 2</title> 642*2d1272b8SAndroid Build Coastguard Worker <para> 643*2d1272b8SAndroid Build Coastguard Worker Another example of how HarfBuzz's level 2 cluster behavior 644*2d1272b8SAndroid Build Coastguard Worker can be tricky is when glyphs reorder. Consider an input sequence 645*2d1272b8SAndroid Build Coastguard Worker with the following characters (top row) and initial cluster 646*2d1272b8SAndroid Build Coastguard Worker values (bottom row): 647*2d1272b8SAndroid Build Coastguard Worker </para> 648*2d1272b8SAndroid Build Coastguard Worker <programlisting> 649*2d1272b8SAndroid Build Coastguard Worker A,B,C,D,E 650*2d1272b8SAndroid Build Coastguard Worker 0,1,2,3,4 651*2d1272b8SAndroid Build Coastguard Worker </programlisting> 652*2d1272b8SAndroid Build Coastguard Worker <para> 653*2d1272b8SAndroid Build Coastguard Worker Now imagine <literal>D</literal> moves before 654*2d1272b8SAndroid Build Coastguard Worker <literal>B</literal> in a reordering operation. The cluster 655*2d1272b8SAndroid Build Coastguard Worker values will then be: 656*2d1272b8SAndroid Build Coastguard Worker </para> 657*2d1272b8SAndroid Build Coastguard Worker <programlisting> 658*2d1272b8SAndroid Build Coastguard Worker A,D,B,C,E 659*2d1272b8SAndroid Build Coastguard Worker 0,3,1,2,4 660*2d1272b8SAndroid Build Coastguard Worker </programlisting> 661*2d1272b8SAndroid Build Coastguard Worker <para> 662*2d1272b8SAndroid Build Coastguard Worker Next, if <literal>D</literal> forms a ligature with 663*2d1272b8SAndroid Build Coastguard Worker <literal>B</literal>, the output is: 664*2d1272b8SAndroid Build Coastguard Worker </para> 665*2d1272b8SAndroid Build Coastguard Worker <programlisting> 666*2d1272b8SAndroid Build Coastguard Worker A,DB,C,E 667*2d1272b8SAndroid Build Coastguard Worker 0,3 ,2,4 668*2d1272b8SAndroid Build Coastguard Worker </programlisting> 669*2d1272b8SAndroid Build Coastguard Worker <para> 670*2d1272b8SAndroid Build Coastguard Worker However, in a different scenario, in which the shaping rules 671*2d1272b8SAndroid Build Coastguard Worker of the script instead caused <literal>A</literal> and 672*2d1272b8SAndroid Build Coastguard Worker <literal>B</literal> to form a ligature 673*2d1272b8SAndroid Build Coastguard Worker <emphasis>before</emphasis> the <literal>D</literal> reordered, the 674*2d1272b8SAndroid Build Coastguard Worker result would be: 675*2d1272b8SAndroid Build Coastguard Worker </para> 676*2d1272b8SAndroid Build Coastguard Worker <programlisting> 677*2d1272b8SAndroid Build Coastguard Worker AB,D,C,E 678*2d1272b8SAndroid Build Coastguard Worker 0 ,3,2,4 679*2d1272b8SAndroid Build Coastguard Worker </programlisting> 680*2d1272b8SAndroid Build Coastguard Worker <para> 681*2d1272b8SAndroid Build Coastguard Worker There is no way for a client program to differentiate between 682*2d1272b8SAndroid Build Coastguard Worker these two scenarios based on the cluster values 683*2d1272b8SAndroid Build Coastguard Worker alone. Consequently, client programs that use level 2 might 684*2d1272b8SAndroid Build Coastguard Worker need to undertake additional work in order to manage cursor 685*2d1272b8SAndroid Build Coastguard Worker positioning, text attributes, or other desired features. 686*2d1272b8SAndroid Build Coastguard Worker </para> 687*2d1272b8SAndroid Build Coastguard Worker </section> 688*2d1272b8SAndroid Build Coastguard Worker <section id="other-considerations-in-level-2"> 689*2d1272b8SAndroid Build Coastguard Worker <title>Other considerations in level 2</title> 690*2d1272b8SAndroid Build Coastguard Worker <para> 691*2d1272b8SAndroid Build Coastguard Worker There may be other problems encountered with ligatures under 692*2d1272b8SAndroid Build Coastguard Worker level 2, such as if the direction of the text is forced to 693*2d1272b8SAndroid Build Coastguard Worker the opposite of its natural direction (for example, Arabic text 694*2d1272b8SAndroid Build Coastguard Worker that is forced into left-to-right directionality). But, 695*2d1272b8SAndroid Build Coastguard Worker generally speaking, these other scenarios are minor corner 696*2d1272b8SAndroid Build Coastguard Worker cases that are too obscure for most client programs to need to 697*2d1272b8SAndroid Build Coastguard Worker worry about. 698*2d1272b8SAndroid Build Coastguard Worker </para> 699*2d1272b8SAndroid Build Coastguard Worker </section> 700*2d1272b8SAndroid Build Coastguard Worker </section> 701*2d1272b8SAndroid Build Coastguard Worker</chapter> 702