1// Copyright 2009 The Go Authors. All rights reserved.
2// Use of this source code is governed by a BSD-style
3// license that can be found in the LICENSE file.
4
5/*
6
7Multi-precision division. Here be dragons.
8
9Given u and v, where u is n+m digits, and v is n digits (with no leading zeros),
10the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v.
11That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x,
12and rem = u - quo·v.
13
14
15Long Division
16
17Division in a computer proceeds the same as long division in elementary school,
18but computers are not as good as schoolchildren at following vague directions,
19so we have to be much more precise about the actual steps and what can happen.
20
21We work from most to least significant digit of the quotient, doing:
22
23 • Guess a digit q, the number of v to subtract from the current
24   section of u to zero out the topmost digit.
25 • Check the guess by multiplying q·v and comparing it against
26   the current section of u, adjusting the guess as needed.
27 • Subtract q·v from the current section of u.
28 • Add q to the corresponding section of the result quo.
29
30When all digits have been processed, the final remainder is left in u
31and returned as rem.
32
33For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2).
34
35	                 q₂ q₁ q₀
36	         _________________
37	v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀
38	           ↓  ↓  ↓  |  |
39	          [u₄ u₃ u₂]|  |
40	        - [  q₂·v  ]|  |
41	        ----------- ↓  |
42	          [  rem  | u₁]|
43	        - [    q₁·v   ]|
44	           ----------- ↓
45	             [  rem  | u₀]
46	           - [    q₀·v   ]
47	              ------------
48	                [  rem   ]
49
50Instead of creating new storage for the remainders and copying digits from u
51as indicated by the arrows, we use u's storage directly as both the source
52and destination of the subtractions, so that the remainders overwrite
53successive overlapping sections of u as the division proceeds, using a slice
54of u to identify the current section. This avoids all the copying as well as
55shifting of remainders.
56
57Division of u with n+m digits by v with n digits (in base B) can in general
58produce at most m+1 digits, because:
59
60  • u < B^(n+m)               [B^(n+m) has n+m+1 digits]
61  • v ≥ B^(n-1)               [B^(n-1) is the smallest n-digit number]
62  • u/v < B^(n+m) / B^(n-1)   [divide bounds for u, v]
63  • u/v < B^(m+1)             [simplify]
64
65The first step is special: it takes the top n digits of u and divides them by
66the n digits of v, producing the first quotient digit and an n-digit remainder.
67In the example, q₂ = ⌊u₄u₃u₂ / v⌋.
68
69The first step divides n digits by n digits to ensure that it produces only a
70single digit.
71
72Each subsequent step appends the next digit from u to the remainder and divides
73those n+1 digits by the n digits of v, producing another quotient digit and a
74new n-digit remainder.
75
76Subsequent steps divide n+1 digits by n digits, an operation that in general
77might produce two digits. However, as used in the algorithm, that division is
78guaranteed to produce only a single digit. The dividend is of the form
79rem·B + d, where rem is a remainder from the previous step and d is a single
80digit, so:
81
82 • rem ≤ v - 1                 [rem is a remainder from dividing by v]
83 • rem·B ≤ v·B - B             [multiply by B]
84 • d ≤ B - 1                   [d is a single digit]
85 • rem·B + d ≤ v·B - 1         [add]
86 • rem·B + d < v·B             [change ≤ to <]
87 • (rem·B + d)/v < B           [divide by v]
88
89
90Guess and Check
91
92At each step we need to divide n+1 digits by n digits, but this is for the
93implementation of division by n digits, so we can't just invoke a division
94routine: we _are_ the division routine. Instead, we guess at the answer and
95then check it using multiplication. If the guess is wrong, we correct it.
96
97How can this guessing possibly be efficient? It turns out that the following
98statement (let's call it the Good Guess Guarantee) is true.
99
100If
101
102 • q = ⌊u/v⌋ where u is n+1 digits and v is n digits,
103 • q < B, and
104 • the topmost digit of v = vₙ₋₁ ≥ B/2,
105
106then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.)
107
108That is, if we know the answer has only a single digit and we guess an answer
109by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division,
110then that guess is at least as large as the correct answer. It is also not
111too much larger: it is off by at most two from the correct answer.
112
113Note that in the first step of the overall division, which is an n-by-n-digit
114division, the 2-by-1 guess uses an implicit uₙ = 0.
115
116Note that using a 2-by-1-digit division here does not mean calling ourselves
117recursively. Instead, we use an efficient direct hardware implementation of
118that operation.
119
120Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v.
121If a guess q̂ is too big, it will not satisfy this test. Viewed a different way,
122the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is
123negative, then the guess q̂ is too big.
124
125This gives us a way to compute q. First compute q̂ with 2-by-1-digit division.
126Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because
127q̂ ≤ q+2.
128
129
130Scaling Inputs
131
132The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2.
133For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off
134because the first digit 1 is smaller than B/2 = 5.
135
136We can ensure that v has a large top digit by multiplying both u and v by the
137right amount. Continuing the example, if we multiply both 172 and 19 by 3, we
138now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough
139⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here
140to multiply by 4, because that can be done with a shift. Specifically, we can
141always count the number of leading zeros i in the first digit of v and then
142shift both u and v left by i bits.
143
144Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will
145be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder
146by the scaling factor (shifting right i bits) when we finish.
147
148Note that these shifts happen before and after the entire division algorithm,
149not at each step in the per-digit iteration.
150
151Note the effect of scaling inputs on the size of the possible quotient.
152In the scaled u/v, u can gain a digit from scaling; v never does, because we
153pick the scaling factor to make v's top digit larger but without overflowing.
154If u and v have n+m and n digits after scaling, then:
155
156  • u < B^(n+m)               [B^(n+m) has n+m+1 digits]
157  • v ≥ B^n / 2               [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2]
158  • u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v]
159  • u/v < 2 B^m               [simplify]
160
161The quotient can still have m+1 significant digits, but if so the top digit
162must be a 1. This provides a different way to handle the first digit of the
163result: compare the top n digits of u against v and fill in either a 0 or a 1.
164
165
166Refining Guesses
167
168Before we check whether u < q̂·v, we can adjust our guess to change it from
169q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋.
170Although not mentioned above, the Good Guess Guarantee also promises that this
1713-by-2-digit division guess is more precise and at most one away from the real
172answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done
173without n-digit math.
174
175If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to
176⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division:
177if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced.
178
179Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0,
180and
181
182	uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂)
183	                          [splitting off the bottom digit]
184	                      = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂
185	                          [regrouping]
186
187The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁.
188If the initial guess returns both q̂ and its remainder r̂, then checking
189whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂.
190
191If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by
192decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂.
193(As before, this fixup is only needed at most twice.)
194
195Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one
196away from the correct q, and we've avoided doing any n-digit math.
197(If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.)
198
199The final check u < q̂·v and the possible fixup must be done at full precision.
200For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess
201is not often wrong at all. But still we must do the check. Note that since the
2023-by-2 guess is off by at most 1, it can be convenient to perform the final
203u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the
204subtraction underflows, decremeting q̂ and adding one v back to r is enough to
205arrive at the final q, r.
206
207That's the entirety of long division: scale the inputs, and then loop over
208each output position, guessing, checking, and correcting the next output digit.
209
210For a 2n-digit number divided by an n-digit number (the worst size-n case for
211division complexity), this algorithm uses n+1 iterations, each of which must do
212at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of
213O(n) time each, so O(n²) time overall.
214
215
216Recursive Division
217
218For very large inputs, it is possible to improve on the O(n²) algorithm.
219Let's call a group of n/2 real digits a (very) “wide digit”. We can run the
220standard long division algorithm explained above over the wide digits instead of
221the actual digits. This will result in many fewer steps, but the math involved in
222each step is more work.
223
224Where basic long division uses a 2-by-1-digit division to guess the initial q̂,
225the new algorithm must use a 2-by-1-wide-digit division, which is of course
226really an n-by-n/2-digit division. That's OK: if we implement n-digit division
227in terms of n/2-digit division, the recursion will terminate when the divisor
228becomes small enough to handle with standard long division or even with the
2292-by-1 hardware instruction.
230
231For example, here is a sketch of dividing 10 digits by 4, proceeding with
232wide digits corresponding to two regular digits. The first step, still special,
233must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit
234remainder less than v. The middle steps divide 6 digits by 4, guaranteed to
235produce two output digits each (one wide digit) with 4-digit remainders.
236The final step must use what it has: the 4-digit remainder plus one more,
2375 digits to divide by 4.
238
239	                       q₆ q₅ q₄ q₃ q₂ q₁ q₀
240	            _______________________________
241	v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
242	              ↓  ↓  ↓  ↓  ↓  |  |  |  |  |
243	             [u₉ u₈ u₇ u₆ u₅]|  |  |  |  |
244	           - [    q₆q₅·v    ]|  |  |  |  |
245	           ----------------- ↓  ↓  |  |  |
246	                [    rem    |u₄ u₃]|  |  |
247	              - [     q₄q₃·v      ]|  |  |
248	              -------------------- ↓  ↓  |
249	                      [    rem    |u₂ u₁]|
250	                    - [     q₂q₁·v      ]|
251	                    -------------------- ↓
252	                            [    rem    |u₀]
253	                          - [     q₀·v     ]
254	                          ------------------
255	                               [    rem    ]
256
257An alternative would be to look ahead to how well n/2 divides into n+m and
258adjust the first step to use fewer digits as needed, making the first step
259more special to make the last step not special at all. For example, using the
260same input, we could choose to use only 4 digits in the first step, leaving
261a full wide digit for the last step:
262
263	                       q₆ q₅ q₄ q₃ q₂ q₁ q₀
264	            _______________________________
265	v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀
266	              ↓  ↓  ↓  ↓  |  |  |  |  |  |
267	             [u₉ u₈ u₇ u₆]|  |  |  |  |  |
268	           - [    q₆·v   ]|  |  |  |  |  |
269	           -------------- ↓  ↓  |  |  |  |
270	             [    rem    |u₅ u₄]|  |  |  |
271	           - [     q₅q₄·v      ]|  |  |  |
272	           -------------------- ↓  ↓  |  |
273	                   [    rem    |u₃ u₂]|  |
274	                 - [     q₃q₂·v      ]|  |
275	                 -------------------- ↓  ↓
276	                         [    rem    |u₁ u₀]
277	                       - [     q₁q₀·v      ]
278	                       ---------------------
279	                               [    rem    ]
280
281Today, the code in divRecursiveStep works like the first example. Perhaps in
282the future we will make it work like the alternative, to avoid a special case
283in the final iteration.
284
285Either way, each step is a 3-by-2-wide-digit division approximated first by
286a 2-by-1-wide-digit division, just as we did for regular digits in long division.
287Because the actual answer we want is a 3-by-2-wide-digit division, instead of
288multiplying q̂·v directly during the fixup, we can use the quick refinement
289from long division (an n/2-by-n/2 multiply) to correct q to its actual value
290and also compute the remainder (as mentioned above), and then stop after that,
291never doing a full n-by-n multiply.
292
293Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add
294(not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that
295produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee
296to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first
297digit. These benefits are discussed more after the Good Guess Guarantee proof
298below.
299
300
301How Fast is Recursive Division?
302
303For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over
304wide digits, producing two wide digits plus a possible leading regular digit 1,
305which can be handled without a recursive call. That is, the algorithm uses two
306full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit
307multiplication, along with a few n-digit additions and subtractions. The standard
308n-by-n-digit multiplication algorithm requires O(n²) time, making the overall
309algorithm require time T(n) where
310
311	T(n) = 2T(n/2) + O(n) + O(n²)
312
313which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²).
314This is not an improvement over regular long division.
315
316When the number of digits n becomes large enough, Karatsuba's algorithm for
317multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time.
318(Karatsuba multiplication is implemented in func karatsuba in nat.go.)
319That makes the overall recursive division algorithm take O(n^1.6) time as well,
320which is an improvement, but again only for large enough numbers.
321
322It is not critical to make sure that every recursion does only two recursive
323calls. While in general the number of recursive calls can change the time
324analysis, in this case doing three calls does not change the analysis:
325
326	T(n) = 3T(n/2) + O(n) + O(n^log₂3)
327
328ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking
329time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the
330division does not hurt the asymptotic performance. Of course, it is likely
331still faster in practice to do two.
332
333
334Proof of the Good Guess Guarantee
335
336Given numbers x, y, let us break them into the quotients and remainders when
337divided by some scaling factor S, with the added constraints that the quotient
338x/y and the high part of y are both less than some limit T, and that the high
339part of y is at least half as big as T.
340
341	x₁ = ⌊x/S⌋        y₁ = ⌊y/S⌋
342	x₀ = x mod S      y₀ = y mod S
343
344	x  = x₁·S + x₀    0 ≤ x₀ < S    x/y < T
345	y  = y₁·S + y₀    0 ≤ y₀ < S    T/2 ≤ y₁ < T
346
347And consider the two truncated quotients:
348
349	q = ⌊x/y⌋
350	q̂ = ⌊x₁/y₁⌋
351
352We will prove that q ≤ q̂ ≤ q+2.
353
354The guarantee makes no real demands on the scaling factor S: it is simply the
355magnitude of the digits cut from both x and y to produce x₁ and y₁.
356The guarantee makes only limited demands on T: it must be large enough to hold
357the quotient x/y, and y₁ must have roughly the same size.
358
359To apply to the earlier discussion of 2-by-1 guesses in long division,
360we would choose:
361
362	S  = Bⁿ⁻¹
363	T  = B
364	x  = u
365	x₁ = uₙuₙ₋₁
366	x₀ = uₙ₋₂...u₀
367	y  = v
368	y₁ = vₙ₋₁
369	y₀ = vₙ₋₂...u₀
370
371These simpler variables avoid repeating those longer expressions in the proof.
372
373Note also that, by definition, truncating division ⌊x/y⌋ satisfies
374
375	x/y - 1 < ⌊x/y⌋ ≤ x/y.
376
377This fact will be used a few times in the proofs.
378
379Proof that q ≤ q̂:
380
381	q̂·y₁ = ⌊x₁/y₁⌋·y₁                      [by definition, q̂ = ⌊x₁/y₁⌋]
382	     > (x₁/y₁ - 1)·y₁                  [x₁/y₁ - 1 < ⌊x₁/y₁⌋]
383	     = x₁ - y₁                         [distribute y₁]
384
385	So q̂·y₁ > x₁ - y₁.
386	Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1.
387
388	q̂ - q = q̂ - ⌊x/y⌋                      [by definition, q = ⌊x/y⌋]
389	      ≥ q̂ - x/y                        [⌊x/y⌋ < x/y]
390	      = (1/y)·(q̂·y - x)                [factor out 1/y]
391	      ≥ (1/y)·(q̂·y₁·S - x)             [y = y₁·S + y₀ ≥ y₁·S]
392	      ≥ (1/y)·((x₁ - y₁ + 1)·S - x)    [above: q̂·y₁ ≥ x₁ - y₁ + 1]
393	      = (1/y)·(x₁·S - y₁·S + S - x)    [distribute S]
394	      = (1/y)·(S - x₀ - y₁·S)          [-x = -x₁·S - x₀]
395	      > -y₁·S / y                      [x₀ < S, so S - x₀ > 0; drop it]
396	      ≥ -1                             [y₁·S ≤ y]
397
398	So q̂ - q > -1.
399	Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂.
400
401Proof that q̂ ≤ q+2:
402
403	x₁/y₁ - x/y = x₁·S/y₁·S - x/y          [multiply left term by S/S]
404	            ≤ x/y₁·S - x/y             [x₁S ≤ x]
405	            = (x/y)·(y/y₁·S - 1)       [factor out x/y]
406	            = (x/y)·((y - y₁·S)/y₁·S)  [move -1 into y/y₁·S fraction]
407	            = (x/y)·(y₀/y₁·S)          [y - y₁·S = y₀]
408	            = (x/y)·(1/y₁)·(y₀/S)      [factor out 1/y₁]
409	            < (x/y)·(1/y₁)             [y₀ < S, so y₀/S < 1]
410	            ≤ (x/y)·(2/T)              [y₁ ≥ T/2, so 1/y₁ ≤ 2/T]
411	            < T·(2/T)                  [x/y < T]
412	            = 2                        [T·(2/T) = 2]
413
414	So x₁/y₁ - x/y < 2.
415
416	q̂ - q = ⌊x₁/y₁⌋ - q                    [by definition, q̂ = ⌊x₁/y₁⌋]
417	      = ⌊x₁/y₁⌋ - ⌊x/y⌋                [by definition, q = ⌊x/y⌋]
418	      ≤ x₁/y₁ - ⌊x/y⌋                  [⌊x₁/y₁⌋ ≤ x₁/y₁]
419	      < x₁/y₁ - (x/y - 1)              [⌊x/y⌋ > x/y - 1]
420	      = (x₁/y₁ - x/y) + 1              [regrouping]
421	      < 2 + 1                          [above: x₁/y₁ - x/y < 2]
422	      = 3
423
424	So q̂ - q < 3.
425	Since q̂ - q is an integer, q̂ - q ≤ 2.
426
427Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore
428q̂ - q ≤ 1.
429
430Note also that in the general case 2n-by-n division where we don't know that
431x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could
432remove the special case first step of long division as long as we allow the
433first fixup loop to run up to four times. (Using a simple comparison to decide
434whether the first digit is 0 or 1 is still more efficient, though.)
435
436Finally, note that when dividing three leading base-B digits by two (scaled),
437we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T.
438This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that
439⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an
440integer. For random x and y, the chance of this is 2/B, or, for large B,
441approximately zero. This means that after we produce the 3-by-2 guess in the
442long division algorithm, the fixup loop essentially never runs.
443
444In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit
445division has exactly the same effect: the probability of needing a fixup is the
446same 2/B. Even better, we can allow the general case x/y < 2T and the fixup
447probability only grows to 4/B, still essentially zero.
448
449
450References
451
452There are no great references for implementing long division; thus this comment.
453Here are some notes about what to expect from the obvious references.
454
455Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical
456reference for long division, but that entire series is highly compressed, never
457repeating a necessary fact and leaving important insights to the exercises.
458For example, no rationale whatsoever is given for the calculation that extends
459q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound.
460The proof that the calculation even has the desired effect is left to exercises.
461The solutions to those exercises provided at the back of the book are entirely
462calculations, still with no explanation as to what is going on or how you would
463arrive at the idea of doing those exact calculations. Nowhere is it mentioned
464that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the
465Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction,
466making it difficult to understand how modifications like adding another digit
467or adjusting the quotient range affects the overall bound.
468
469All that said, Knuth remains the canonical reference. It is dense but packed
470full of information and references, and the proofs are simpler than many other
471presentations. The proofs above are reworkings of Knuth's to remove the
472arguments by contradiction and add explanations or steps that Knuth omitted.
473But beware of errors in older printings. Take the published errata with you.
474
475Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield”
476starts with a blunt critique of Knuth's presentation (among others) and then
477presents a more detailed and easier to follow treatment of long division,
478including an implementation in Pascal. But the algorithm and implementation
479work entirely in terms of 3-by-2 division, which is much less useful on modern
480hardware than an algorithm using 2-by-1 division. The proofs are a bit too
481focused on digit counting and seem needlessly complex, especially compared to
482the ones given above.
483
484Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of
485implementing division by an n-digit divisor using recursive calls to division
486by an n/2-digit divisor, relying on Karatsuba multiplication to yield a
487sub-quadratic run time. However, the presentation decisions are made almost
488entirely for the purpose of simplifying the run-time analysis, rather than
489simplifying the presentation. Instead of a single algorithm that loops over
490quotient digits, the paper presents two mutually-recursive algorithms, for
4912n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n
492algorithm.
493
494The proofs in the paper are remarkably complex, especially considering that
495the algorithm is at its core just long division on wide digits, so that the
496usual long division proofs apply essentially unaltered.
497*/
498
499package big
500
501import "math/bits"
502
503// rem returns r such that r = u%v.
504// It uses z as the storage for r.
505func (z nat) rem(u, v nat) (r nat) {
506	if alias(z, u) {
507		z = nil
508	}
509	qp := getNat(0)
510	q, r := qp.div(z, u, v)
511	*qp = q
512	putNat(qp)
513	return r
514}
515
516// div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v.
517// It uses z and z2 as the storage for q and r.
518func (z nat) div(z2, u, v nat) (q, r nat) {
519	if len(v) == 0 {
520		panic("division by zero")
521	}
522
523	if u.cmp(v) < 0 {
524		q = z[:0]
525		r = z2.set(u)
526		return
527	}
528
529	if len(v) == 1 {
530		// Short division: long optimized for a single-word divisor.
531		// In that case, the 2-by-1 guess is all we need at each step.
532		var r2 Word
533		q, r2 = z.divW(u, v[0])
534		r = z2.setWord(r2)
535		return
536	}
537
538	q, r = z.divLarge(z2, u, v)
539	return
540}
541
542// divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y.
543// It uses z as the storage for q.
544// Note that y is a single digit (Word), not a big number.
545func (z nat) divW(x nat, y Word) (q nat, r Word) {
546	m := len(x)
547	switch {
548	case y == 0:
549		panic("division by zero")
550	case y == 1:
551		q = z.set(x) // result is x
552		return
553	case m == 0:
554		q = z[:0] // result is 0
555		return
556	}
557	// m > 0
558	z = z.make(m)
559	r = divWVW(z, 0, x, y)
560	q = z.norm()
561	return
562}
563
564// modW returns x % d.
565func (x nat) modW(d Word) (r Word) {
566	// TODO(agl): we don't actually need to store the q value.
567	var q nat
568	q = q.make(len(x))
569	return divWVW(q, 0, x, d)
570}
571
572// divWVW overwrites z with ⌊x/y⌋, returning the remainder r.
573// The caller must ensure that len(z) = len(x).
574func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) {
575	r = xn
576	if len(x) == 1 {
577		qq, rr := bits.Div(uint(r), uint(x[0]), uint(y))
578		z[0] = Word(qq)
579		return Word(rr)
580	}
581	rec := reciprocalWord(y)
582	for i := len(z) - 1; i >= 0; i-- {
583		z[i], r = divWW(r, x[i], y, rec)
584	}
585	return r
586}
587
588// div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn.
589// It uses z and u as the storage for q and r.
590// The caller must ensure that len(vIn) ≥ 2 (use divW otherwise)
591// and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise).
592func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) {
593	n := len(vIn)
594	m := len(uIn) - n
595
596	// Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above).
597	// vIn is treated as a read-only input (it may be in use by another
598	// goroutine), so we must make a copy.
599	// uIn is copied to u.
600	shift := nlz(vIn[n-1])
601	vp := getNat(n)
602	v := *vp
603	shlVU(v, vIn, shift)
604	u = u.make(len(uIn) + 1)
605	u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift)
606
607	// The caller should not pass aliased z and u, since those are
608	// the two different outputs, but correct just in case.
609	if alias(z, u) {
610		z = nil
611	}
612	q = z.make(m + 1)
613
614	// Use basic or recursive long division depending on size.
615	if n < divRecursiveThreshold {
616		q.divBasic(u, v)
617	} else {
618		q.divRecursive(u, v)
619	}
620	putNat(vp)
621
622	q = q.norm()
623
624	// Undo scaling of remainder.
625	shrVU(u, u, shift)
626	r = u.norm()
627
628	return q, r
629}
630
631// divBasic implements long division as described above.
632// It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r.
633// q must be large enough to hold ⌊u/v⌋.
634func (q nat) divBasic(u, v nat) {
635	n := len(v)
636	m := len(u) - n
637
638	qhatvp := getNat(n + 1)
639	qhatv := *qhatvp
640
641	// Set up for divWW below, precomputing reciprocal argument.
642	vn1 := v[n-1]
643	rec := reciprocalWord(vn1)
644
645	// Compute each digit of quotient.
646	for j := m; j >= 0; j-- {
647		// Compute the 2-by-1 guess q̂.
648		// The first iteration must invent a leading 0 for u.
649		qhat := Word(_M)
650		var ujn Word
651		if j+n < len(u) {
652			ujn = u[j+n]
653		}
654
655		// ujn ≤ vn1, or else q̂ would be more than one digit.
656		// For ujn == vn1, we set q̂ to the max digit M above.
657		// Otherwise, we compute the 2-by-1 guess.
658		if ujn != vn1 {
659			var rhat Word
660			qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec)
661
662			// Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above.
663			vn2 := v[n-2]
664			x1, x2 := mulWW(qhat, vn2)
665			ujn2 := u[j+n-2]
666			for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2]
667				qhat--
668				prevRhat := rhat
669				rhat += vn1
670				// If r̂  overflows, then
671				// r̂ u[j+n-2]v[n-1] is now definitely > x1 x2.
672				if rhat < prevRhat {
673					break
674				}
675				// TODO(rsc): No need for a full mulWW.
676				// x2 += vn2; if x2 overflows, x1++
677				x1, x2 = mulWW(qhat, vn2)
678			}
679		}
680
681		// Compute q̂·v.
682		qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0)
683		qhl := len(qhatv)
684		if j+qhl > len(u) && qhatv[n] == 0 {
685			qhl--
686		}
687
688		// Subtract q̂·v from the current section of u.
689		// If it underflows, q̂·v > u, which we fix up
690		// by decrementing q̂ and adding v back.
691		c := subVV(u[j:j+qhl], u[j:], qhatv)
692		if c != 0 {
693			c := addVV(u[j:j+n], u[j:], v)
694			// If n == qhl, the carry from subVV and the carry from addVV
695			// cancel out and don't affect u[j+n].
696			if n < qhl {
697				u[j+n] += c
698			}
699			qhat--
700		}
701
702		// Save quotient digit.
703		// Caller may know the top digit is zero and not leave room for it.
704		if j == m && m == len(q) && qhat == 0 {
705			continue
706		}
707		q[j] = qhat
708	}
709
710	putNat(qhatvp)
711}
712
713// greaterThan reports whether the two digit numbers x1 x2 > y1 y2.
714// TODO(rsc): In contradiction to most of this file, x1 is the high
715// digit and x2 is the low digit. This should be fixed.
716func greaterThan(x1, x2, y1, y2 Word) bool {
717	return x1 > y1 || x1 == y1 && x2 > y2
718}
719
720// divRecursiveThreshold is the number of divisor digits
721// at which point divRecursive is faster than divBasic.
722const divRecursiveThreshold = 100
723
724// divRecursive implements recursive division as described above.
725// It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r.
726// z must be large enough to hold ⌊u/v⌋.
727// This function is just for allocating and freeing temporaries
728// around divRecursiveStep, the real implementation.
729func (z nat) divRecursive(u, v nat) {
730	// Recursion depth is (much) less than 2 log₂(len(v)).
731	// Allocate a slice of temporaries to be reused across recursion,
732	// plus one extra temporary not live across the recursion.
733	recDepth := 2 * bits.Len(uint(len(v)))
734	tmp := getNat(3 * len(v))
735	temps := make([]*nat, recDepth)
736
737	clear(z)
738	z.divRecursiveStep(u, v, 0, tmp, temps)
739
740	// Free temporaries.
741	for _, n := range temps {
742		if n != nil {
743			putNat(n)
744		}
745	}
746	putNat(tmp)
747}
748
749// divRecursiveStep is the actual implementation of recursive division.
750// It adds ⌊u/v⌋ to z and overwrites u with the remainder r.
751// z must be large enough to hold ⌊u/v⌋.
752// It uses temps[depth] (allocating if needed) as a temporary live across
753// the recursive call. It also uses tmp, but not live across the recursion.
754func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) {
755	// u is a subsection of the original and may have leading zeros.
756	// TODO(rsc): The v = v.norm() is useless and should be removed.
757	// We know (and require) that v's top digit is ≥ B/2.
758	u = u.norm()
759	v = v.norm()
760	if len(u) == 0 {
761		clear(z)
762		return
763	}
764
765	// Fall back to basic division if the problem is now small enough.
766	n := len(v)
767	if n < divRecursiveThreshold {
768		z.divBasic(u, v)
769		return
770	}
771
772	// Nothing to do if u is shorter than v (implies u < v).
773	m := len(u) - n
774	if m < 0 {
775		return
776	}
777
778	// We consider B digits in a row as a single wide digit.
779	// (See “Recursive Division” above.)
780	//
781	// TODO(rsc): rename B to Wide, to avoid confusion with _B,
782	// which is something entirely different.
783	// TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋.
784	B := n / 2
785
786	// Allocate a nat for qhat below.
787	if temps[depth] == nil {
788		temps[depth] = getNat(n) // TODO(rsc): Can be just B+1.
789	} else {
790		*temps[depth] = temps[depth].make(B + 1)
791	}
792
793	// Compute each wide digit of the quotient.
794	//
795	// TODO(rsc): Change the loop to be
796	//	for j := (m+B-1)/B*B; j > 0; j -= B {
797	// which will make the final step a regular step, letting us
798	// delete what amounts to an extra copy of the loop body below.
799	j := m
800	for j > B {
801		// Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits).
802		// First make the 2-by-1-wide-digit guess using a recursive call.
803		// Then extend the guess to the full 3-by-2 (see “Refining Guesses”).
804		//
805		// For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit,
806		// we use a (2B+1)-by-(B+1) digit, which handles the possibility that
807		// the result has an extra leading 1 digit as well as guaranteeing
808		// that the computed q̂ will be off by at most 1 instead of 2.
809
810		// s is the number of digits to drop from the 3B- and 2B-digit chunks.
811		// We drop B-1 to be left with 2B+1 and B+1.
812		s := (B - 1)
813
814		// uu is the up-to-3B-digit section of u we are working on.
815		uu := u[j-B:]
816
817		// Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n].
818		qhat := *temps[depth]
819		clear(qhat)
820		qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps)
821		qhat = qhat.norm()
822
823		// Extend to a 3-by-2 quotient and remainder.
824		// Because divRecursiveStep overwrote the top part of uu with
825		// the remainder r̂, the full uu already contains the equivalent
826		// of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion.
827		// Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder.
828		// If that subtraction underflows, q̂·v > u, which we fix up
829		// by decrementing q̂ and adding v back, same as in long division.
830
831		// TODO(rsc): Instead of subtract and fix-up, this code is computing
832		// q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u.
833		// But we can do the subtraction directly, as in the comment above
834		// and in long division, because we know that q̂ is wrong by at most one.
835		qhatv := tmp.make(3 * n)
836		clear(qhatv)
837		qhatv = qhatv.mul(qhat, v[:s])
838		for i := 0; i < 2; i++ {
839			e := qhatv.cmp(uu.norm())
840			if e <= 0 {
841				break
842			}
843			subVW(qhat, qhat, 1)
844			c := subVV(qhatv[:s], qhatv[:s], v[:s])
845			if len(qhatv) > s {
846				subVW(qhatv[s:], qhatv[s:], c)
847			}
848			addAt(uu[s:], v[s:], 0)
849		}
850		if qhatv.cmp(uu.norm()) > 0 {
851			panic("impossible")
852		}
853		c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv)
854		if c > 0 {
855			subVW(uu[len(qhatv):], uu[len(qhatv):], c)
856		}
857		addAt(z, qhat, j-B)
858		j -= B
859	}
860
861	// TODO(rsc): Rewrite loop as described above and delete all this code.
862
863	// Now u < (v<<B), compute lower bits in the same way.
864	// Choose shift = B-1 again.
865	s := B - 1
866	qhat := *temps[depth]
867	clear(qhat)
868	qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps)
869	qhat = qhat.norm()
870	qhatv := tmp.make(3 * n)
871	clear(qhatv)
872	qhatv = qhatv.mul(qhat, v[:s])
873	// Set the correct remainder as before.
874	for i := 0; i < 2; i++ {
875		if e := qhatv.cmp(u.norm()); e > 0 {
876			subVW(qhat, qhat, 1)
877			c := subVV(qhatv[:s], qhatv[:s], v[:s])
878			if len(qhatv) > s {
879				subVW(qhatv[s:], qhatv[s:], c)
880			}
881			addAt(u[s:], v[s:], 0)
882		}
883	}
884	if qhatv.cmp(u.norm()) > 0 {
885		panic("impossible")
886	}
887	c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv)
888	if c > 0 {
889		c = subVW(u[len(qhatv):], u[len(qhatv):], c)
890	}
891	if c > 0 {
892		panic("impossible")
893	}
894
895	// Done!
896	addAt(z, qhat.norm(), 0)
897}
898