Lines Matching full:we
15 probability p of being sampled. In theory we can think of each byte undergoing a
16 Bernoulli trial. The reason we use a random sampling approach, as opposed to
20 To scale the sampled bytes to the correct scale, we multiply by 1 / p, i.e. if
21 we sample a byte with probability 10%, then each byte sampled represents 10
29 1. We look at an allocation
32 chance of it being sampled at least once, we return the size of the
35 3. If the size of the allocation is smaller, then we compute the number of times
36 we would draw a sample if we sampled each byte with the given sampling rate:
38 * In practice we do this by keeping track of the arrival time of the next
39 sample. When an allocation happens, we subtract its size from the arrival
40 time of the next sample, and check whether we have brought it below zero. We
43 above 0. The amount of times we had to draw from the exponential
46 * We multiply the number of samples we drew within the allocation by the
49 We instead draw directly from the Poisson/binomial distribution to see how many
50 samples we get for a given allocation size, but this is not as computationally
51 efficient. This is because we don’t sample most of the allocations, due to their
53 exponential draw approach, as for a non-sample, we only need to decrement a
59 If we sample at some rate 1 / p, then to set p reasonably we need to know what
61 expected error when we sample it. If we set p = 1 then we sample everything and
66 We will sample an allocation with probability Exponential(sampling rate) <
67 allocation size. This is equivalent to the probability that we do not fail to
68 sample all bytes within the allocation if we sample bytes at our sampling rate.
70 Because the exponential distribution is memoryless, we can add together
72 probability. This means that if we have an allocation of 1MB and then another of
76 We can see from the chart below that if we 16X our sampling rate from 32KiB to
77 512KiB we still have a 95% chance of sampling anything above 1.5MiB. If we 64X
78 it to 2048KiB we still have an 80% chance to sample anything above 3.5MiB.
83 We can also consider the expected error we’ll make at given allocation sizes and
85 we have two allocations of the same type occurring at different times they have
87 sum. This is because the exponential distribution we use is memoryless.
90 For expected error we report something like the [mean absolute percentage
91 error]. This just means we see how wrong we might be in percent relative to the
95 about and it’s useful as a gauge of how wrong on average we might be with our
101 the expected error assuming we have at least one sample within the allocation.
108 Assuming that we take at least one sample of an allocation, what error might we
109 expect? We can answer that using the following chart, which shows expected error
110 conditional on at least one sample being taken. This is the expected error we
143 We previously (before Android 12) returned the size of the allocation accurately
148 The most obvious impact is that with the old approach we would expect to sample
157 10KB, and we have an allocation that’s 10KB, we sample it with certainty. If
158 instead that 10KB is split among two 5KB allocations, we sample it with
162 the same. If we didn’t return allocations greater than the sampling rate every
168 We altered the cutoff at which we simply returned the allocation size.
170 discontinuity. Now the cutoff is determined by the size at which we have a >99%
171 chance of sampling an allocation at least once, given the sampling rate we’re