1*6dbdd20aSAndroid Build Coastguard Worker# heapprofd: Sampling for Memory Profiles 2*6dbdd20aSAndroid Build Coastguard Worker 3*6dbdd20aSAndroid Build Coastguard Worker_tomlewin, fmayer **·** 2021-04-14_ 4*6dbdd20aSAndroid Build Coastguard Worker 5*6dbdd20aSAndroid Build Coastguard Worker## Background 6*6dbdd20aSAndroid Build Coastguard Worker 7*6dbdd20aSAndroid Build Coastguard WorkerA heap profiler associates memory allocations with the callstacks on which they 8*6dbdd20aSAndroid Build Coastguard Workerhappen ([example visualization]). It is prohibitively expensive to handle every 9*6dbdd20aSAndroid Build Coastguard Workerallocation done by a program, so the [Android Heap Profiler] employs a sampling 10*6dbdd20aSAndroid Build Coastguard Workerapproach that handles a statistically meaningful subset of allocations. This 11*6dbdd20aSAndroid Build Coastguard Workerdocument explores the statistical properties thereof. 12*6dbdd20aSAndroid Build Coastguard Worker 13*6dbdd20aSAndroid Build Coastguard Worker## Conceptual motivation 14*6dbdd20aSAndroid Build Coastguard WorkerConceptually the sampling is implemented such that each byte has some 15*6dbdd20aSAndroid Build Coastguard Workerprobability p of being sampled. In theory we can think of each byte undergoing a 16*6dbdd20aSAndroid Build Coastguard WorkerBernoulli trial. The reason we use a random sampling approach, as opposed to 17*6dbdd20aSAndroid Build Coastguard Workertaking every nth byte, is that there may be regular allocation patterns in the 18*6dbdd20aSAndroid Build Coastguard Workercode that may be missed by a correspondingly regular sampling. 19*6dbdd20aSAndroid Build Coastguard Worker 20*6dbdd20aSAndroid Build Coastguard WorkerTo scale the sampled bytes to the correct scale, we multiply by 1 / p, i.e. if 21*6dbdd20aSAndroid Build Coastguard Workerwe sample a byte with probability 10%, then each byte sampled represents 10 22*6dbdd20aSAndroid Build Coastguard Workerbytes allocated. 23*6dbdd20aSAndroid Build Coastguard Worker 24*6dbdd20aSAndroid Build Coastguard Worker 25*6dbdd20aSAndroid Build Coastguard Worker## Implementation 26*6dbdd20aSAndroid Build Coastguard Worker 27*6dbdd20aSAndroid Build Coastguard WorkerIn practice, the [algorithm] works as follows: 28*6dbdd20aSAndroid Build Coastguard Worker 29*6dbdd20aSAndroid Build Coastguard Worker1. We look at an allocation 30*6dbdd20aSAndroid Build Coastguard Worker 31*6dbdd20aSAndroid Build Coastguard Worker2. If the size of the allocation is large enough that there’s a greater than 99% 32*6dbdd20aSAndroid Build Coastguard Worker chance of it being sampled at least once, we return the size of the 33*6dbdd20aSAndroid Build Coastguard Worker allocation directly. This is a performance optimization. 34*6dbdd20aSAndroid Build Coastguard Worker 35*6dbdd20aSAndroid Build Coastguard Worker3. If the size of the allocation is smaller, then we compute the number of times 36*6dbdd20aSAndroid Build Coastguard Worker we would draw a sample if we sampled each byte with the given sampling rate: 37*6dbdd20aSAndroid Build Coastguard Worker 38*6dbdd20aSAndroid Build Coastguard Worker * In practice we do this by keeping track of the arrival time of the next 39*6dbdd20aSAndroid Build Coastguard Worker sample. When an allocation happens, we subtract its size from the arrival 40*6dbdd20aSAndroid Build Coastguard Worker time of the next sample, and check whether we have brought it below zero. We 41*6dbdd20aSAndroid Build Coastguard Worker then repeatedly draw from the exponential distribution (which is the 42*6dbdd20aSAndroid Build Coastguard Worker interarrival time of Poisson) until the arrival time is brought back 43*6dbdd20aSAndroid Build Coastguard Worker above 0. The amount of times we had to draw from the exponential 44*6dbdd20aSAndroid Build Coastguard Worker distribution is the number of samples the allocation should count as. 45*6dbdd20aSAndroid Build Coastguard Worker 46*6dbdd20aSAndroid Build Coastguard Worker * We multiply the number of samples we drew within the allocation by the 47*6dbdd20aSAndroid Build Coastguard Worker sampling rate to get an estimate of the size of the allocation 48*6dbdd20aSAndroid Build Coastguard Worker 49*6dbdd20aSAndroid Build Coastguard WorkerWe instead draw directly from the Poisson/binomial distribution to see how many 50*6dbdd20aSAndroid Build Coastguard Workersamples we get for a given allocation size, but this is not as computationally 51*6dbdd20aSAndroid Build Coastguard Workerefficient. This is because we don’t sample most of the allocations, due to their 52*6dbdd20aSAndroid Build Coastguard Workersmall size and our low sampling rate. This means it’s more efficient to use the 53*6dbdd20aSAndroid Build Coastguard Workerexponential draw approach, as for a non-sample, we only need to decrement a 54*6dbdd20aSAndroid Build Coastguard Workercounter. Sampling from the Poisson for every allocation (rather than drawing 55*6dbdd20aSAndroid Build Coastguard Workerfrom exponential for every sample) is more expensive. 56*6dbdd20aSAndroid Build Coastguard Worker 57*6dbdd20aSAndroid Build Coastguard Worker## Theoretical performance 58*6dbdd20aSAndroid Build Coastguard Worker 59*6dbdd20aSAndroid Build Coastguard WorkerIf we sample at some rate 1 / p, then to set p reasonably we need to know what 60*6dbdd20aSAndroid Build Coastguard Workerour probability of sampling an allocation of any size is, as well as our 61*6dbdd20aSAndroid Build Coastguard Workerexpected error when we sample it. If we set p = 1 then we sample everything and 62*6dbdd20aSAndroid Build Coastguard Workerhave no error. Reducing the sampling rate costs us coverage and accuracy. 63*6dbdd20aSAndroid Build Coastguard Worker 64*6dbdd20aSAndroid Build Coastguard Worker### Sampling probabilities 65*6dbdd20aSAndroid Build Coastguard Worker 66*6dbdd20aSAndroid Build Coastguard WorkerWe will sample an allocation with probability Exponential(sampling rate) < 67*6dbdd20aSAndroid Build Coastguard Workerallocation size. This is equivalent to the probability that we do not fail to 68*6dbdd20aSAndroid Build Coastguard Workersample all bytes within the allocation if we sample bytes at our sampling rate. 69*6dbdd20aSAndroid Build Coastguard Worker 70*6dbdd20aSAndroid Build Coastguard WorkerBecause the exponential distribution is memoryless, we can add together 71*6dbdd20aSAndroid Build Coastguard Workerallocations that are the same even if they occur apart for the purposes of 72*6dbdd20aSAndroid Build Coastguard Workerprobability. This means that if we have an allocation of 1MB and then another of 73*6dbdd20aSAndroid Build Coastguard Worker1MB, the probability of us taking at least one sample is the same as the 74*6dbdd20aSAndroid Build Coastguard Workerprobability of us taking at least one sample of a contiguous 2MB allocation. 75*6dbdd20aSAndroid Build Coastguard Worker 76*6dbdd20aSAndroid Build Coastguard WorkerWe can see from the chart below that if we 16X our sampling rate from 32KiB to 77*6dbdd20aSAndroid Build Coastguard Worker512KiB we still have a 95% chance of sampling anything above 1.5MiB. If we 64X 78*6dbdd20aSAndroid Build Coastguard Workerit to 2048KiB we still have an 80% chance to sample anything above 3.5MiB. 79*6dbdd20aSAndroid Build Coastguard Worker 80*6dbdd20aSAndroid Build Coastguard Worker 81*6dbdd20aSAndroid Build Coastguard Worker 82*6dbdd20aSAndroid Build Coastguard Worker### Expected errors 83*6dbdd20aSAndroid Build Coastguard WorkerWe can also consider the expected error we’ll make at given allocation sizes and 84*6dbdd20aSAndroid Build Coastguard Workersampling rates. Like before it doesn’t matter where the allocation happens — if 85*6dbdd20aSAndroid Build Coastguard Workerwe have two allocations of the same type occurring at different times they have 86*6dbdd20aSAndroid Build Coastguard Workerthe same statistical properties as a single allocation with size equal to their 87*6dbdd20aSAndroid Build Coastguard Workersum. This is because the exponential distribution we use is memoryless. 88*6dbdd20aSAndroid Build Coastguard Worker 89*6dbdd20aSAndroid Build Coastguard Worker 90*6dbdd20aSAndroid Build Coastguard WorkerFor expected error we report something like the [mean absolute percentage 91*6dbdd20aSAndroid Build Coastguard Workererror]. This just means we see how wrong we might be in percent relative to the 92*6dbdd20aSAndroid Build Coastguard Workertrue allocation size, and then scale these results according to their 93*6dbdd20aSAndroid Build Coastguard Workerprobability of occurrence. This is an estimator that has some issues (i.e. it’s 94*6dbdd20aSAndroid Build Coastguard Workerbiased such that underestimates are more penalised) but it’s easy to reason 95*6dbdd20aSAndroid Build Coastguard Workerabout and it’s useful as a gauge of how wrong on average we might be with our 96*6dbdd20aSAndroid Build Coastguard Workerestimates. I would recommend just reading this as analogous to a standard 97*6dbdd20aSAndroid Build Coastguard Workerdeviation. 98*6dbdd20aSAndroid Build Coastguard Worker 99*6dbdd20aSAndroid Build Coastguard Worker 100*6dbdd20aSAndroid Build Coastguard WorkerCharts below show both the expected error and the conditional expected error: 101*6dbdd20aSAndroid Build Coastguard Workerthe expected error assuming we have at least one sample within the allocation. 102*6dbdd20aSAndroid Build Coastguard WorkerThere’s periodicity in both in line with the sampling rate used. The thing to 103*6dbdd20aSAndroid Build Coastguard Workertake away is that, while the estimates are unbiased such that their expected 104*6dbdd20aSAndroid Build Coastguard Workervalue is equal to their real value, substantial errors are still possible. 105*6dbdd20aSAndroid Build Coastguard Worker 106*6dbdd20aSAndroid Build Coastguard Worker 107*6dbdd20aSAndroid Build Coastguard Worker 108*6dbdd20aSAndroid Build Coastguard WorkerAssuming that we take at least one sample of an allocation, what error might we 109*6dbdd20aSAndroid Build Coastguard Workerexpect? We can answer that using the following chart, which shows expected error 110*6dbdd20aSAndroid Build Coastguard Workerconditional on at least one sample being taken. This is the expected error we 111*6dbdd20aSAndroid Build Coastguard Workercan expect for the things that end up in our heap profile. It’s important to 112*6dbdd20aSAndroid Build Coastguard Workeremphasise that this expected error methodology is not exact, and also that the 113*6dbdd20aSAndroid Build Coastguard Workerunderlying sampling remains unbiased — the expected value is the true value. 114*6dbdd20aSAndroid Build Coastguard WorkerThis should be considered more as a measure akin to standard deviation. 115*6dbdd20aSAndroid Build Coastguard Worker 116*6dbdd20aSAndroid Build Coastguard Worker 117*6dbdd20aSAndroid Build Coastguard Worker 118*6dbdd20aSAndroid Build Coastguard Worker## Performance Considerations 119*6dbdd20aSAndroid Build Coastguard Worker### STL Distributions 120*6dbdd20aSAndroid Build Coastguard Worker 121*6dbdd20aSAndroid Build Coastguard WorkerBenchmarking of the STL distributions on a Pixel 4 reinforces our approach of 122*6dbdd20aSAndroid Build Coastguard Workerestimating the geometric distribution using an exponential distribution, as its 123*6dbdd20aSAndroid Build Coastguard Workerperformance is ~8x better ([full results]). 124*6dbdd20aSAndroid Build Coastguard Worker 125*6dbdd20aSAndroid Build Coastguard Worker 126*6dbdd20aSAndroid Build Coastguard WorkerDraw sample from exponential distribution with p = 1 / 32000: 127*6dbdd20aSAndroid Build Coastguard Worker 128*6dbdd20aSAndroid Build Coastguard WorkerARM64: 21.3ns (sd 0.066ns, negligibly small, smaller than a CPU cycle) 129*6dbdd20aSAndroid Build Coastguard Worker 130*6dbdd20aSAndroid Build Coastguard WorkerARM32: 33.2ns (sd 0.011ns, negligibly small, smaller than a CPU cycle) 131*6dbdd20aSAndroid Build Coastguard Worker 132*6dbdd20aSAndroid Build Coastguard Worker 133*6dbdd20aSAndroid Build Coastguard WorkerDraw sample from geometric distribution with p = 1 / 32000: 134*6dbdd20aSAndroid Build Coastguard Worker 135*6dbdd20aSAndroid Build Coastguard WorkerARM64: 169ns (sd 0.023ns, negligibly small, smaller than a CPU cycle) (7.93x) 136*6dbdd20aSAndroid Build Coastguard Worker 137*6dbdd20aSAndroid Build Coastguard WorkerARM32: 222ns (sd 0.118ns, negligibly small, smaller than a CPU cycle) (6.69x) 138*6dbdd20aSAndroid Build Coastguard Worker 139*6dbdd20aSAndroid Build Coastguard Worker## Appendix 140*6dbdd20aSAndroid Build Coastguard Worker 141*6dbdd20aSAndroid Build Coastguard Worker### Improvements made 142*6dbdd20aSAndroid Build Coastguard Worker 143*6dbdd20aSAndroid Build Coastguard WorkerWe previously (before Android 12) returned the size of the allocation accurately 144*6dbdd20aSAndroid Build Coastguard Workerand immediately if the allocation size was greater than our sampling rate. This 145*6dbdd20aSAndroid Build Coastguard Workerhad several impacts. 146*6dbdd20aSAndroid Build Coastguard Worker 147*6dbdd20aSAndroid Build Coastguard Worker 148*6dbdd20aSAndroid Build Coastguard WorkerThe most obvious impact is that with the old approach we would expect to sample 149*6dbdd20aSAndroid Build Coastguard Workeran allocation equal in size to our sampling rate ~63% of the time, rather than 150*6dbdd20aSAndroid Build Coastguard Worker100%. This means there is a discontinuity in coverage between an allocation a 151*6dbdd20aSAndroid Build Coastguard Workerbyte smaller than our sampling rate, and one a byte larger. This is still 152*6dbdd20aSAndroid Build Coastguard Workerunbiased from a statistical perspective, but it’s important to note. 153*6dbdd20aSAndroid Build Coastguard Worker 154*6dbdd20aSAndroid Build Coastguard Worker 155*6dbdd20aSAndroid Build Coastguard WorkerAnother unusual impact is that the sampling rate depends not only on the size of 156*6dbdd20aSAndroid Build Coastguard Workerthe memory being used, but also how it’s allocated. If our sampling rate were 157*6dbdd20aSAndroid Build Coastguard Worker10KB, and we have an allocation that’s 10KB, we sample it with certainty. If 158*6dbdd20aSAndroid Build Coastguard Workerinstead that 10KB is split among two 5KB allocations, we sample it with 159*6dbdd20aSAndroid Build Coastguard Workerprobability 63%. Again this doesn’t bias our results, but it means that our 160*6dbdd20aSAndroid Build Coastguard Workermeasurements of memory where there are many small allocations might be noisier 161*6dbdd20aSAndroid Build Coastguard Workerthan ones where there are a few large allocations, even if total memory used is 162*6dbdd20aSAndroid Build Coastguard Workerthe same. If we didn’t return allocations greater than the sampling rate every 163*6dbdd20aSAndroid Build Coastguard Workertime, then the memorylessness property of the exponential distribution would 164*6dbdd20aSAndroid Build Coastguard Workermean our method is insensitive to how the memory is split amongst allocations, 165*6dbdd20aSAndroid Build Coastguard Workerwhich seems a desirable property. 166*6dbdd20aSAndroid Build Coastguard Worker 167*6dbdd20aSAndroid Build Coastguard Worker 168*6dbdd20aSAndroid Build Coastguard WorkerWe altered the cutoff at which we simply returned the allocation size. 169*6dbdd20aSAndroid Build Coastguard WorkerPreviously, as discussed, the cutoff was at the sampling rate, which led to a 170*6dbdd20aSAndroid Build Coastguard Workerdiscontinuity. Now the cutoff is determined by the size at which we have a >99% 171*6dbdd20aSAndroid Build Coastguard Workerchance of sampling an allocation at least once, given the sampling rate we’re 172*6dbdd20aSAndroid Build Coastguard Workerusing. This resolves the above issues. 173*6dbdd20aSAndroid Build Coastguard Worker 174*6dbdd20aSAndroid Build Coastguard Worker[algorithm]: 175*6dbdd20aSAndroid Build Coastguard Worker https://cs.android.com/android/platform/superproject/main/+/main:external/perfetto/src/profiling/memory/sampler.h 176*6dbdd20aSAndroid Build Coastguard Worker[example visualization]: /docs/images/native-heap-prof.png 177*6dbdd20aSAndroid Build Coastguard Worker[Android Heap Profiler]: /docs/design-docs/heapprofd-design 178*6dbdd20aSAndroid Build Coastguard Worker[mean absolute percentage error]: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error 179*6dbdd20aSAndroid Build Coastguard Worker[full results]: https://gist.github.com/fmayer/3aafcaf58f8db09714ba09aa4ac2b1ac 180