1// Copyright 2009 The Go Authors. All rights reserved. 2// Use of this source code is governed by a BSD-style 3// license that can be found in the LICENSE file. 4 5/* 6 7Multi-precision division. Here be dragons. 8 9Given u and v, where u is n+m digits, and v is n digits (with no leading zeros), 10the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v. 11That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x, 12and rem = u - quo·v. 13 14 15Long Division 16 17Division in a computer proceeds the same as long division in elementary school, 18but computers are not as good as schoolchildren at following vague directions, 19so we have to be much more precise about the actual steps and what can happen. 20 21We work from most to least significant digit of the quotient, doing: 22 23 • Guess a digit q, the number of v to subtract from the current 24 section of u to zero out the topmost digit. 25 • Check the guess by multiplying q·v and comparing it against 26 the current section of u, adjusting the guess as needed. 27 • Subtract q·v from the current section of u. 28 • Add q to the corresponding section of the result quo. 29 30When all digits have been processed, the final remainder is left in u 31and returned as rem. 32 33For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2). 34 35 q₂ q₁ q₀ 36 _________________ 37 v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀ 38 ↓ ↓ ↓ | | 39 [u₄ u₃ u₂]| | 40 - [ q₂·v ]| | 41 ----------- ↓ | 42 [ rem | u₁]| 43 - [ q₁·v ]| 44 ----------- ↓ 45 [ rem | u₀] 46 - [ q₀·v ] 47 ------------ 48 [ rem ] 49 50Instead of creating new storage for the remainders and copying digits from u 51as indicated by the arrows, we use u's storage directly as both the source 52and destination of the subtractions, so that the remainders overwrite 53successive overlapping sections of u as the division proceeds, using a slice 54of u to identify the current section. This avoids all the copying as well as 55shifting of remainders. 56 57Division of u with n+m digits by v with n digits (in base B) can in general 58produce at most m+1 digits, because: 59 60 • u < B^(n+m) [B^(n+m) has n+m+1 digits] 61 • v ≥ B^(n-1) [B^(n-1) is the smallest n-digit number] 62 • u/v < B^(n+m) / B^(n-1) [divide bounds for u, v] 63 • u/v < B^(m+1) [simplify] 64 65The first step is special: it takes the top n digits of u and divides them by 66the n digits of v, producing the first quotient digit and an n-digit remainder. 67In the example, q₂ = ⌊u₄u₃u₂ / v⌋. 68 69The first step divides n digits by n digits to ensure that it produces only a 70single digit. 71 72Each subsequent step appends the next digit from u to the remainder and divides 73those n+1 digits by the n digits of v, producing another quotient digit and a 74new n-digit remainder. 75 76Subsequent steps divide n+1 digits by n digits, an operation that in general 77might produce two digits. However, as used in the algorithm, that division is 78guaranteed to produce only a single digit. The dividend is of the form 79rem·B + d, where rem is a remainder from the previous step and d is a single 80digit, so: 81 82 • rem ≤ v - 1 [rem is a remainder from dividing by v] 83 • rem·B ≤ v·B - B [multiply by B] 84 • d ≤ B - 1 [d is a single digit] 85 • rem·B + d ≤ v·B - 1 [add] 86 • rem·B + d < v·B [change ≤ to <] 87 • (rem·B + d)/v < B [divide by v] 88 89 90Guess and Check 91 92At each step we need to divide n+1 digits by n digits, but this is for the 93implementation of division by n digits, so we can't just invoke a division 94routine: we _are_ the division routine. Instead, we guess at the answer and 95then check it using multiplication. If the guess is wrong, we correct it. 96 97How can this guessing possibly be efficient? It turns out that the following 98statement (let's call it the Good Guess Guarantee) is true. 99 100If 101 102 • q = ⌊u/v⌋ where u is n+1 digits and v is n digits, 103 • q < B, and 104 • the topmost digit of v = vₙ₋₁ ≥ B/2, 105 106then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.) 107 108That is, if we know the answer has only a single digit and we guess an answer 109by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division, 110then that guess is at least as large as the correct answer. It is also not 111too much larger: it is off by at most two from the correct answer. 112 113Note that in the first step of the overall division, which is an n-by-n-digit 114division, the 2-by-1 guess uses an implicit uₙ = 0. 115 116Note that using a 2-by-1-digit division here does not mean calling ourselves 117recursively. Instead, we use an efficient direct hardware implementation of 118that operation. 119 120Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v. 121If a guess q̂ is too big, it will not satisfy this test. Viewed a different way, 122the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is 123negative, then the guess q̂ is too big. 124 125This gives us a way to compute q. First compute q̂ with 2-by-1-digit division. 126Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because 127q̂ ≤ q+2. 128 129 130Scaling Inputs 131 132The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2. 133For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off 134because the first digit 1 is smaller than B/2 = 5. 135 136We can ensure that v has a large top digit by multiplying both u and v by the 137right amount. Continuing the example, if we multiply both 172 and 19 by 3, we 138now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough 139⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here 140to multiply by 4, because that can be done with a shift. Specifically, we can 141always count the number of leading zeros i in the first digit of v and then 142shift both u and v left by i bits. 143 144Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will 145be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder 146by the scaling factor (shifting right i bits) when we finish. 147 148Note that these shifts happen before and after the entire division algorithm, 149not at each step in the per-digit iteration. 150 151Note the effect of scaling inputs on the size of the possible quotient. 152In the scaled u/v, u can gain a digit from scaling; v never does, because we 153pick the scaling factor to make v's top digit larger but without overflowing. 154If u and v have n+m and n digits after scaling, then: 155 156 • u < B^(n+m) [B^(n+m) has n+m+1 digits] 157 • v ≥ B^n / 2 [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2] 158 • u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v] 159 • u/v < 2 B^m [simplify] 160 161The quotient can still have m+1 significant digits, but if so the top digit 162must be a 1. This provides a different way to handle the first digit of the 163result: compare the top n digits of u against v and fill in either a 0 or a 1. 164 165 166Refining Guesses 167 168Before we check whether u < q̂·v, we can adjust our guess to change it from 169q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋. 170Although not mentioned above, the Good Guess Guarantee also promises that this 1713-by-2-digit division guess is more precise and at most one away from the real 172answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done 173without n-digit math. 174 175If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to 176⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division: 177if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced. 178 179Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0, 180and 181 182 uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂) 183 [splitting off the bottom digit] 184 = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂ 185 [regrouping] 186 187The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁. 188If the initial guess returns both q̂ and its remainder r̂, then checking 189whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂. 190 191If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by 192decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂. 193(As before, this fixup is only needed at most twice.) 194 195Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one 196away from the correct q, and we've avoided doing any n-digit math. 197(If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.) 198 199The final check u < q̂·v and the possible fixup must be done at full precision. 200For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess 201is not often wrong at all. But still we must do the check. Note that since the 2023-by-2 guess is off by at most 1, it can be convenient to perform the final 203u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the 204subtraction underflows, decremeting q̂ and adding one v back to r is enough to 205arrive at the final q, r. 206 207That's the entirety of long division: scale the inputs, and then loop over 208each output position, guessing, checking, and correcting the next output digit. 209 210For a 2n-digit number divided by an n-digit number (the worst size-n case for 211division complexity), this algorithm uses n+1 iterations, each of which must do 212at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of 213O(n) time each, so O(n²) time overall. 214 215 216Recursive Division 217 218For very large inputs, it is possible to improve on the O(n²) algorithm. 219Let's call a group of n/2 real digits a (very) “wide digit”. We can run the 220standard long division algorithm explained above over the wide digits instead of 221the actual digits. This will result in many fewer steps, but the math involved in 222each step is more work. 223 224Where basic long division uses a 2-by-1-digit division to guess the initial q̂, 225the new algorithm must use a 2-by-1-wide-digit division, which is of course 226really an n-by-n/2-digit division. That's OK: if we implement n-digit division 227in terms of n/2-digit division, the recursion will terminate when the divisor 228becomes small enough to handle with standard long division or even with the 2292-by-1 hardware instruction. 230 231For example, here is a sketch of dividing 10 digits by 4, proceeding with 232wide digits corresponding to two regular digits. The first step, still special, 233must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit 234remainder less than v. The middle steps divide 6 digits by 4, guaranteed to 235produce two output digits each (one wide digit) with 4-digit remainders. 236The final step must use what it has: the 4-digit remainder plus one more, 2375 digits to divide by 4. 238 239 q₆ q₅ q₄ q₃ q₂ q₁ q₀ 240 _______________________________ 241 v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ 242 ↓ ↓ ↓ ↓ ↓ | | | | | 243 [u₉ u₈ u₇ u₆ u₅]| | | | | 244 - [ q₆q₅·v ]| | | | | 245 ----------------- ↓ ↓ | | | 246 [ rem |u₄ u₃]| | | 247 - [ q₄q₃·v ]| | | 248 -------------------- ↓ ↓ | 249 [ rem |u₂ u₁]| 250 - [ q₂q₁·v ]| 251 -------------------- ↓ 252 [ rem |u₀] 253 - [ q₀·v ] 254 ------------------ 255 [ rem ] 256 257An alternative would be to look ahead to how well n/2 divides into n+m and 258adjust the first step to use fewer digits as needed, making the first step 259more special to make the last step not special at all. For example, using the 260same input, we could choose to use only 4 digits in the first step, leaving 261a full wide digit for the last step: 262 263 q₆ q₅ q₄ q₃ q₂ q₁ q₀ 264 _______________________________ 265 v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ 266 ↓ ↓ ↓ ↓ | | | | | | 267 [u₉ u₈ u₇ u₆]| | | | | | 268 - [ q₆·v ]| | | | | | 269 -------------- ↓ ↓ | | | | 270 [ rem |u₅ u₄]| | | | 271 - [ q₅q₄·v ]| | | | 272 -------------------- ↓ ↓ | | 273 [ rem |u₃ u₂]| | 274 - [ q₃q₂·v ]| | 275 -------------------- ↓ ↓ 276 [ rem |u₁ u₀] 277 - [ q₁q₀·v ] 278 --------------------- 279 [ rem ] 280 281Today, the code in divRecursiveStep works like the first example. Perhaps in 282the future we will make it work like the alternative, to avoid a special case 283in the final iteration. 284 285Either way, each step is a 3-by-2-wide-digit division approximated first by 286a 2-by-1-wide-digit division, just as we did for regular digits in long division. 287Because the actual answer we want is a 3-by-2-wide-digit division, instead of 288multiplying q̂·v directly during the fixup, we can use the quick refinement 289from long division (an n/2-by-n/2 multiply) to correct q to its actual value 290and also compute the remainder (as mentioned above), and then stop after that, 291never doing a full n-by-n multiply. 292 293Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add 294(not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that 295produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee 296to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first 297digit. These benefits are discussed more after the Good Guess Guarantee proof 298below. 299 300 301How Fast is Recursive Division? 302 303For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over 304wide digits, producing two wide digits plus a possible leading regular digit 1, 305which can be handled without a recursive call. That is, the algorithm uses two 306full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit 307multiplication, along with a few n-digit additions and subtractions. The standard 308n-by-n-digit multiplication algorithm requires O(n²) time, making the overall 309algorithm require time T(n) where 310 311 T(n) = 2T(n/2) + O(n) + O(n²) 312 313which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²). 314This is not an improvement over regular long division. 315 316When the number of digits n becomes large enough, Karatsuba's algorithm for 317multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time. 318(Karatsuba multiplication is implemented in func karatsuba in nat.go.) 319That makes the overall recursive division algorithm take O(n^1.6) time as well, 320which is an improvement, but again only for large enough numbers. 321 322It is not critical to make sure that every recursion does only two recursive 323calls. While in general the number of recursive calls can change the time 324analysis, in this case doing three calls does not change the analysis: 325 326 T(n) = 3T(n/2) + O(n) + O(n^log₂3) 327 328ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking 329time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the 330division does not hurt the asymptotic performance. Of course, it is likely 331still faster in practice to do two. 332 333 334Proof of the Good Guess Guarantee 335 336Given numbers x, y, let us break them into the quotients and remainders when 337divided by some scaling factor S, with the added constraints that the quotient 338x/y and the high part of y are both less than some limit T, and that the high 339part of y is at least half as big as T. 340 341 x₁ = ⌊x/S⌋ y₁ = ⌊y/S⌋ 342 x₀ = x mod S y₀ = y mod S 343 344 x = x₁·S + x₀ 0 ≤ x₀ < S x/y < T 345 y = y₁·S + y₀ 0 ≤ y₀ < S T/2 ≤ y₁ < T 346 347And consider the two truncated quotients: 348 349 q = ⌊x/y⌋ 350 q̂ = ⌊x₁/y₁⌋ 351 352We will prove that q ≤ q̂ ≤ q+2. 353 354The guarantee makes no real demands on the scaling factor S: it is simply the 355magnitude of the digits cut from both x and y to produce x₁ and y₁. 356The guarantee makes only limited demands on T: it must be large enough to hold 357the quotient x/y, and y₁ must have roughly the same size. 358 359To apply to the earlier discussion of 2-by-1 guesses in long division, 360we would choose: 361 362 S = Bⁿ⁻¹ 363 T = B 364 x = u 365 x₁ = uₙuₙ₋₁ 366 x₀ = uₙ₋₂...u₀ 367 y = v 368 y₁ = vₙ₋₁ 369 y₀ = vₙ₋₂...u₀ 370 371These simpler variables avoid repeating those longer expressions in the proof. 372 373Note also that, by definition, truncating division ⌊x/y⌋ satisfies 374 375 x/y - 1 < ⌊x/y⌋ ≤ x/y. 376 377This fact will be used a few times in the proofs. 378 379Proof that q ≤ q̂: 380 381 q̂·y₁ = ⌊x₁/y₁⌋·y₁ [by definition, q̂ = ⌊x₁/y₁⌋] 382 > (x₁/y₁ - 1)·y₁ [x₁/y₁ - 1 < ⌊x₁/y₁⌋] 383 = x₁ - y₁ [distribute y₁] 384 385 So q̂·y₁ > x₁ - y₁. 386 Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1. 387 388 q̂ - q = q̂ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] 389 ≥ q̂ - x/y [⌊x/y⌋ < x/y] 390 = (1/y)·(q̂·y - x) [factor out 1/y] 391 ≥ (1/y)·(q̂·y₁·S - x) [y = y₁·S + y₀ ≥ y₁·S] 392 ≥ (1/y)·((x₁ - y₁ + 1)·S - x) [above: q̂·y₁ ≥ x₁ - y₁ + 1] 393 = (1/y)·(x₁·S - y₁·S + S - x) [distribute S] 394 = (1/y)·(S - x₀ - y₁·S) [-x = -x₁·S - x₀] 395 > -y₁·S / y [x₀ < S, so S - x₀ > 0; drop it] 396 ≥ -1 [y₁·S ≤ y] 397 398 So q̂ - q > -1. 399 Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂. 400 401Proof that q̂ ≤ q+2: 402 403 x₁/y₁ - x/y = x₁·S/y₁·S - x/y [multiply left term by S/S] 404 ≤ x/y₁·S - x/y [x₁S ≤ x] 405 = (x/y)·(y/y₁·S - 1) [factor out x/y] 406 = (x/y)·((y - y₁·S)/y₁·S) [move -1 into y/y₁·S fraction] 407 = (x/y)·(y₀/y₁·S) [y - y₁·S = y₀] 408 = (x/y)·(1/y₁)·(y₀/S) [factor out 1/y₁] 409 < (x/y)·(1/y₁) [y₀ < S, so y₀/S < 1] 410 ≤ (x/y)·(2/T) [y₁ ≥ T/2, so 1/y₁ ≤ 2/T] 411 < T·(2/T) [x/y < T] 412 = 2 [T·(2/T) = 2] 413 414 So x₁/y₁ - x/y < 2. 415 416 q̂ - q = ⌊x₁/y₁⌋ - q [by definition, q̂ = ⌊x₁/y₁⌋] 417 = ⌊x₁/y₁⌋ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] 418 ≤ x₁/y₁ - ⌊x/y⌋ [⌊x₁/y₁⌋ ≤ x₁/y₁] 419 < x₁/y₁ - (x/y - 1) [⌊x/y⌋ > x/y - 1] 420 = (x₁/y₁ - x/y) + 1 [regrouping] 421 < 2 + 1 [above: x₁/y₁ - x/y < 2] 422 = 3 423 424 So q̂ - q < 3. 425 Since q̂ - q is an integer, q̂ - q ≤ 2. 426 427Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore 428q̂ - q ≤ 1. 429 430Note also that in the general case 2n-by-n division where we don't know that 431x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could 432remove the special case first step of long division as long as we allow the 433first fixup loop to run up to four times. (Using a simple comparison to decide 434whether the first digit is 0 or 1 is still more efficient, though.) 435 436Finally, note that when dividing three leading base-B digits by two (scaled), 437we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T. 438This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that 439⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an 440integer. For random x and y, the chance of this is 2/B, or, for large B, 441approximately zero. This means that after we produce the 3-by-2 guess in the 442long division algorithm, the fixup loop essentially never runs. 443 444In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit 445division has exactly the same effect: the probability of needing a fixup is the 446same 2/B. Even better, we can allow the general case x/y < 2T and the fixup 447probability only grows to 4/B, still essentially zero. 448 449 450References 451 452There are no great references for implementing long division; thus this comment. 453Here are some notes about what to expect from the obvious references. 454 455Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical 456reference for long division, but that entire series is highly compressed, never 457repeating a necessary fact and leaving important insights to the exercises. 458For example, no rationale whatsoever is given for the calculation that extends 459q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound. 460The proof that the calculation even has the desired effect is left to exercises. 461The solutions to those exercises provided at the back of the book are entirely 462calculations, still with no explanation as to what is going on or how you would 463arrive at the idea of doing those exact calculations. Nowhere is it mentioned 464that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the 465Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction, 466making it difficult to understand how modifications like adding another digit 467or adjusting the quotient range affects the overall bound. 468 469All that said, Knuth remains the canonical reference. It is dense but packed 470full of information and references, and the proofs are simpler than many other 471presentations. The proofs above are reworkings of Knuth's to remove the 472arguments by contradiction and add explanations or steps that Knuth omitted. 473But beware of errors in older printings. Take the published errata with you. 474 475Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield” 476starts with a blunt critique of Knuth's presentation (among others) and then 477presents a more detailed and easier to follow treatment of long division, 478including an implementation in Pascal. But the algorithm and implementation 479work entirely in terms of 3-by-2 division, which is much less useful on modern 480hardware than an algorithm using 2-by-1 division. The proofs are a bit too 481focused on digit counting and seem needlessly complex, especially compared to 482the ones given above. 483 484Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of 485implementing division by an n-digit divisor using recursive calls to division 486by an n/2-digit divisor, relying on Karatsuba multiplication to yield a 487sub-quadratic run time. However, the presentation decisions are made almost 488entirely for the purpose of simplifying the run-time analysis, rather than 489simplifying the presentation. Instead of a single algorithm that loops over 490quotient digits, the paper presents two mutually-recursive algorithms, for 4912n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n 492algorithm. 493 494The proofs in the paper are remarkably complex, especially considering that 495the algorithm is at its core just long division on wide digits, so that the 496usual long division proofs apply essentially unaltered. 497*/ 498 499package big 500 501import "math/bits" 502 503// rem returns r such that r = u%v. 504// It uses z as the storage for r. 505func (z nat) rem(u, v nat) (r nat) { 506 if alias(z, u) { 507 z = nil 508 } 509 qp := getNat(0) 510 q, r := qp.div(z, u, v) 511 *qp = q 512 putNat(qp) 513 return r 514} 515 516// div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v. 517// It uses z and z2 as the storage for q and r. 518func (z nat) div(z2, u, v nat) (q, r nat) { 519 if len(v) == 0 { 520 panic("division by zero") 521 } 522 523 if u.cmp(v) < 0 { 524 q = z[:0] 525 r = z2.set(u) 526 return 527 } 528 529 if len(v) == 1 { 530 // Short division: long optimized for a single-word divisor. 531 // In that case, the 2-by-1 guess is all we need at each step. 532 var r2 Word 533 q, r2 = z.divW(u, v[0]) 534 r = z2.setWord(r2) 535 return 536 } 537 538 q, r = z.divLarge(z2, u, v) 539 return 540} 541 542// divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y. 543// It uses z as the storage for q. 544// Note that y is a single digit (Word), not a big number. 545func (z nat) divW(x nat, y Word) (q nat, r Word) { 546 m := len(x) 547 switch { 548 case y == 0: 549 panic("division by zero") 550 case y == 1: 551 q = z.set(x) // result is x 552 return 553 case m == 0: 554 q = z[:0] // result is 0 555 return 556 } 557 // m > 0 558 z = z.make(m) 559 r = divWVW(z, 0, x, y) 560 q = z.norm() 561 return 562} 563 564// modW returns x % d. 565func (x nat) modW(d Word) (r Word) { 566 // TODO(agl): we don't actually need to store the q value. 567 var q nat 568 q = q.make(len(x)) 569 return divWVW(q, 0, x, d) 570} 571 572// divWVW overwrites z with ⌊x/y⌋, returning the remainder r. 573// The caller must ensure that len(z) = len(x). 574func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) { 575 r = xn 576 if len(x) == 1 { 577 qq, rr := bits.Div(uint(r), uint(x[0]), uint(y)) 578 z[0] = Word(qq) 579 return Word(rr) 580 } 581 rec := reciprocalWord(y) 582 for i := len(z) - 1; i >= 0; i-- { 583 z[i], r = divWW(r, x[i], y, rec) 584 } 585 return r 586} 587 588// div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn. 589// It uses z and u as the storage for q and r. 590// The caller must ensure that len(vIn) ≥ 2 (use divW otherwise) 591// and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise). 592func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) { 593 n := len(vIn) 594 m := len(uIn) - n 595 596 // Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above). 597 // vIn is treated as a read-only input (it may be in use by another 598 // goroutine), so we must make a copy. 599 // uIn is copied to u. 600 shift := nlz(vIn[n-1]) 601 vp := getNat(n) 602 v := *vp 603 shlVU(v, vIn, shift) 604 u = u.make(len(uIn) + 1) 605 u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift) 606 607 // The caller should not pass aliased z and u, since those are 608 // the two different outputs, but correct just in case. 609 if alias(z, u) { 610 z = nil 611 } 612 q = z.make(m + 1) 613 614 // Use basic or recursive long division depending on size. 615 if n < divRecursiveThreshold { 616 q.divBasic(u, v) 617 } else { 618 q.divRecursive(u, v) 619 } 620 putNat(vp) 621 622 q = q.norm() 623 624 // Undo scaling of remainder. 625 shrVU(u, u, shift) 626 r = u.norm() 627 628 return q, r 629} 630 631// divBasic implements long division as described above. 632// It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r. 633// q must be large enough to hold ⌊u/v⌋. 634func (q nat) divBasic(u, v nat) { 635 n := len(v) 636 m := len(u) - n 637 638 qhatvp := getNat(n + 1) 639 qhatv := *qhatvp 640 641 // Set up for divWW below, precomputing reciprocal argument. 642 vn1 := v[n-1] 643 rec := reciprocalWord(vn1) 644 645 // Compute each digit of quotient. 646 for j := m; j >= 0; j-- { 647 // Compute the 2-by-1 guess q̂. 648 // The first iteration must invent a leading 0 for u. 649 qhat := Word(_M) 650 var ujn Word 651 if j+n < len(u) { 652 ujn = u[j+n] 653 } 654 655 // ujn ≤ vn1, or else q̂ would be more than one digit. 656 // For ujn == vn1, we set q̂ to the max digit M above. 657 // Otherwise, we compute the 2-by-1 guess. 658 if ujn != vn1 { 659 var rhat Word 660 qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec) 661 662 // Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above. 663 vn2 := v[n-2] 664 x1, x2 := mulWW(qhat, vn2) 665 ujn2 := u[j+n-2] 666 for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2] 667 qhat-- 668 prevRhat := rhat 669 rhat += vn1 670 // If r̂ overflows, then 671 // r̂ u[j+n-2]v[n-1] is now definitely > x1 x2. 672 if rhat < prevRhat { 673 break 674 } 675 // TODO(rsc): No need for a full mulWW. 676 // x2 += vn2; if x2 overflows, x1++ 677 x1, x2 = mulWW(qhat, vn2) 678 } 679 } 680 681 // Compute q̂·v. 682 qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0) 683 qhl := len(qhatv) 684 if j+qhl > len(u) && qhatv[n] == 0 { 685 qhl-- 686 } 687 688 // Subtract q̂·v from the current section of u. 689 // If it underflows, q̂·v > u, which we fix up 690 // by decrementing q̂ and adding v back. 691 c := subVV(u[j:j+qhl], u[j:], qhatv) 692 if c != 0 { 693 c := addVV(u[j:j+n], u[j:], v) 694 // If n == qhl, the carry from subVV and the carry from addVV 695 // cancel out and don't affect u[j+n]. 696 if n < qhl { 697 u[j+n] += c 698 } 699 qhat-- 700 } 701 702 // Save quotient digit. 703 // Caller may know the top digit is zero and not leave room for it. 704 if j == m && m == len(q) && qhat == 0 { 705 continue 706 } 707 q[j] = qhat 708 } 709 710 putNat(qhatvp) 711} 712 713// greaterThan reports whether the two digit numbers x1 x2 > y1 y2. 714// TODO(rsc): In contradiction to most of this file, x1 is the high 715// digit and x2 is the low digit. This should be fixed. 716func greaterThan(x1, x2, y1, y2 Word) bool { 717 return x1 > y1 || x1 == y1 && x2 > y2 718} 719 720// divRecursiveThreshold is the number of divisor digits 721// at which point divRecursive is faster than divBasic. 722const divRecursiveThreshold = 100 723 724// divRecursive implements recursive division as described above. 725// It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r. 726// z must be large enough to hold ⌊u/v⌋. 727// This function is just for allocating and freeing temporaries 728// around divRecursiveStep, the real implementation. 729func (z nat) divRecursive(u, v nat) { 730 // Recursion depth is (much) less than 2 log₂(len(v)). 731 // Allocate a slice of temporaries to be reused across recursion, 732 // plus one extra temporary not live across the recursion. 733 recDepth := 2 * bits.Len(uint(len(v))) 734 tmp := getNat(3 * len(v)) 735 temps := make([]*nat, recDepth) 736 737 clear(z) 738 z.divRecursiveStep(u, v, 0, tmp, temps) 739 740 // Free temporaries. 741 for _, n := range temps { 742 if n != nil { 743 putNat(n) 744 } 745 } 746 putNat(tmp) 747} 748 749// divRecursiveStep is the actual implementation of recursive division. 750// It adds ⌊u/v⌋ to z and overwrites u with the remainder r. 751// z must be large enough to hold ⌊u/v⌋. 752// It uses temps[depth] (allocating if needed) as a temporary live across 753// the recursive call. It also uses tmp, but not live across the recursion. 754func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) { 755 // u is a subsection of the original and may have leading zeros. 756 // TODO(rsc): The v = v.norm() is useless and should be removed. 757 // We know (and require) that v's top digit is ≥ B/2. 758 u = u.norm() 759 v = v.norm() 760 if len(u) == 0 { 761 clear(z) 762 return 763 } 764 765 // Fall back to basic division if the problem is now small enough. 766 n := len(v) 767 if n < divRecursiveThreshold { 768 z.divBasic(u, v) 769 return 770 } 771 772 // Nothing to do if u is shorter than v (implies u < v). 773 m := len(u) - n 774 if m < 0 { 775 return 776 } 777 778 // We consider B digits in a row as a single wide digit. 779 // (See “Recursive Division” above.) 780 // 781 // TODO(rsc): rename B to Wide, to avoid confusion with _B, 782 // which is something entirely different. 783 // TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋. 784 B := n / 2 785 786 // Allocate a nat for qhat below. 787 if temps[depth] == nil { 788 temps[depth] = getNat(n) // TODO(rsc): Can be just B+1. 789 } else { 790 *temps[depth] = temps[depth].make(B + 1) 791 } 792 793 // Compute each wide digit of the quotient. 794 // 795 // TODO(rsc): Change the loop to be 796 // for j := (m+B-1)/B*B; j > 0; j -= B { 797 // which will make the final step a regular step, letting us 798 // delete what amounts to an extra copy of the loop body below. 799 j := m 800 for j > B { 801 // Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits). 802 // First make the 2-by-1-wide-digit guess using a recursive call. 803 // Then extend the guess to the full 3-by-2 (see “Refining Guesses”). 804 // 805 // For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit, 806 // we use a (2B+1)-by-(B+1) digit, which handles the possibility that 807 // the result has an extra leading 1 digit as well as guaranteeing 808 // that the computed q̂ will be off by at most 1 instead of 2. 809 810 // s is the number of digits to drop from the 3B- and 2B-digit chunks. 811 // We drop B-1 to be left with 2B+1 and B+1. 812 s := (B - 1) 813 814 // uu is the up-to-3B-digit section of u we are working on. 815 uu := u[j-B:] 816 817 // Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n]. 818 qhat := *temps[depth] 819 clear(qhat) 820 qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps) 821 qhat = qhat.norm() 822 823 // Extend to a 3-by-2 quotient and remainder. 824 // Because divRecursiveStep overwrote the top part of uu with 825 // the remainder r̂, the full uu already contains the equivalent 826 // of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion. 827 // Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder. 828 // If that subtraction underflows, q̂·v > u, which we fix up 829 // by decrementing q̂ and adding v back, same as in long division. 830 831 // TODO(rsc): Instead of subtract and fix-up, this code is computing 832 // q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u. 833 // But we can do the subtraction directly, as in the comment above 834 // and in long division, because we know that q̂ is wrong by at most one. 835 qhatv := tmp.make(3 * n) 836 clear(qhatv) 837 qhatv = qhatv.mul(qhat, v[:s]) 838 for i := 0; i < 2; i++ { 839 e := qhatv.cmp(uu.norm()) 840 if e <= 0 { 841 break 842 } 843 subVW(qhat, qhat, 1) 844 c := subVV(qhatv[:s], qhatv[:s], v[:s]) 845 if len(qhatv) > s { 846 subVW(qhatv[s:], qhatv[s:], c) 847 } 848 addAt(uu[s:], v[s:], 0) 849 } 850 if qhatv.cmp(uu.norm()) > 0 { 851 panic("impossible") 852 } 853 c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv) 854 if c > 0 { 855 subVW(uu[len(qhatv):], uu[len(qhatv):], c) 856 } 857 addAt(z, qhat, j-B) 858 j -= B 859 } 860 861 // TODO(rsc): Rewrite loop as described above and delete all this code. 862 863 // Now u < (v<<B), compute lower bits in the same way. 864 // Choose shift = B-1 again. 865 s := B - 1 866 qhat := *temps[depth] 867 clear(qhat) 868 qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps) 869 qhat = qhat.norm() 870 qhatv := tmp.make(3 * n) 871 clear(qhatv) 872 qhatv = qhatv.mul(qhat, v[:s]) 873 // Set the correct remainder as before. 874 for i := 0; i < 2; i++ { 875 if e := qhatv.cmp(u.norm()); e > 0 { 876 subVW(qhat, qhat, 1) 877 c := subVV(qhatv[:s], qhatv[:s], v[:s]) 878 if len(qhatv) > s { 879 subVW(qhatv[s:], qhatv[s:], c) 880 } 881 addAt(u[s:], v[s:], 0) 882 } 883 } 884 if qhatv.cmp(u.norm()) > 0 { 885 panic("impossible") 886 } 887 c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv) 888 if c > 0 { 889 c = subVW(u[len(qhatv):], u[len(qhatv):], c) 890 } 891 if c > 0 { 892 panic("impossible") 893 } 894 895 // Done! 896 addAt(z, qhat.norm(), 0) 897} 898