xref: /aosp_15_r20/external/perfetto/docs/design-docs/heapprofd-sampling.md (revision 6dbdd20afdafa5e3ca9b8809fa73465d530080dc)
1*6dbdd20aSAndroid Build Coastguard Worker# heapprofd: Sampling for Memory Profiles
2*6dbdd20aSAndroid Build Coastguard Worker
3*6dbdd20aSAndroid Build Coastguard Worker_tomlewin, fmayer **·** 2021-04-14_
4*6dbdd20aSAndroid Build Coastguard Worker
5*6dbdd20aSAndroid Build Coastguard Worker## Background
6*6dbdd20aSAndroid Build Coastguard Worker
7*6dbdd20aSAndroid Build Coastguard WorkerA heap profiler associates memory allocations with the callstacks on which they
8*6dbdd20aSAndroid Build Coastguard Workerhappen ([example visualization]). It is prohibitively expensive to handle every
9*6dbdd20aSAndroid Build Coastguard Workerallocation done by a program, so the [Android Heap Profiler] employs a sampling
10*6dbdd20aSAndroid Build Coastguard Workerapproach that handles a statistically meaningful subset of allocations. This
11*6dbdd20aSAndroid Build Coastguard Workerdocument explores the statistical properties thereof.
12*6dbdd20aSAndroid Build Coastguard Worker
13*6dbdd20aSAndroid Build Coastguard Worker## Conceptual motivation
14*6dbdd20aSAndroid Build Coastguard WorkerConceptually the sampling is implemented such that each byte has some
15*6dbdd20aSAndroid Build Coastguard Workerprobability p of being sampled. In theory we can think of each byte undergoing a
16*6dbdd20aSAndroid Build Coastguard WorkerBernoulli trial. The reason we use a random sampling approach, as opposed to
17*6dbdd20aSAndroid Build Coastguard Workertaking every nth byte, is that there may be regular allocation patterns in the
18*6dbdd20aSAndroid Build Coastguard Workercode that may be missed by a correspondingly regular sampling.
19*6dbdd20aSAndroid Build Coastguard Worker
20*6dbdd20aSAndroid Build Coastguard WorkerTo scale the sampled bytes to the correct scale, we multiply by 1 / p, i.e. if
21*6dbdd20aSAndroid Build Coastguard Workerwe sample a byte with probability 10%, then each byte sampled represents 10
22*6dbdd20aSAndroid Build Coastguard Workerbytes allocated.
23*6dbdd20aSAndroid Build Coastguard Worker
24*6dbdd20aSAndroid Build Coastguard Worker
25*6dbdd20aSAndroid Build Coastguard Worker## Implementation
26*6dbdd20aSAndroid Build Coastguard Worker
27*6dbdd20aSAndroid Build Coastguard WorkerIn practice, the [algorithm] works as follows:
28*6dbdd20aSAndroid Build Coastguard Worker
29*6dbdd20aSAndroid Build Coastguard Worker1. We look at an allocation
30*6dbdd20aSAndroid Build Coastguard Worker
31*6dbdd20aSAndroid Build Coastguard Worker2. If the size of the allocation is large enough that there’s a greater than 99%
32*6dbdd20aSAndroid Build Coastguard Worker   chance of it being sampled at least once, we return the size of the
33*6dbdd20aSAndroid Build Coastguard Worker   allocation directly. This is a performance optimization.
34*6dbdd20aSAndroid Build Coastguard Worker
35*6dbdd20aSAndroid Build Coastguard Worker3. If the size of the allocation is smaller, then we compute the number of times
36*6dbdd20aSAndroid Build Coastguard Worker   we would draw a sample if we sampled each byte with the given sampling rate:
37*6dbdd20aSAndroid Build Coastguard Worker
38*6dbdd20aSAndroid Build Coastguard Worker  * In practice we do this by keeping track of the arrival time of the next
39*6dbdd20aSAndroid Build Coastguard Worker    sample. When an allocation happens, we subtract its size from the arrival
40*6dbdd20aSAndroid Build Coastguard Worker    time of the next sample, and check whether we have brought it below zero. We
41*6dbdd20aSAndroid Build Coastguard Worker    then repeatedly draw from the exponential distribution (which is the
42*6dbdd20aSAndroid Build Coastguard Worker    interarrival time of Poisson) until the arrival time is brought back
43*6dbdd20aSAndroid Build Coastguard Worker    above 0. The amount of times we had to draw from the exponential
44*6dbdd20aSAndroid Build Coastguard Worker    distribution is the number of samples the allocation should count as.
45*6dbdd20aSAndroid Build Coastguard Worker
46*6dbdd20aSAndroid Build Coastguard Worker  * We multiply the number of samples we drew within the allocation by the
47*6dbdd20aSAndroid Build Coastguard Worker    sampling rate to get an estimate of the size of the allocation
48*6dbdd20aSAndroid Build Coastguard Worker
49*6dbdd20aSAndroid Build Coastguard WorkerWe instead draw directly from the Poisson/binomial distribution to see how many
50*6dbdd20aSAndroid Build Coastguard Workersamples we get for a given allocation size, but this is not as computationally
51*6dbdd20aSAndroid Build Coastguard Workerefficient. This is because we don’t sample most of the allocations, due to their
52*6dbdd20aSAndroid Build Coastguard Workersmall size and our low sampling rate. This means it’s more efficient to use the
53*6dbdd20aSAndroid Build Coastguard Workerexponential draw approach, as for a non-sample, we only need to decrement a
54*6dbdd20aSAndroid Build Coastguard Workercounter. Sampling from the Poisson for every allocation (rather than drawing
55*6dbdd20aSAndroid Build Coastguard Workerfrom exponential for every sample) is more expensive.
56*6dbdd20aSAndroid Build Coastguard Worker
57*6dbdd20aSAndroid Build Coastguard Worker## Theoretical performance
58*6dbdd20aSAndroid Build Coastguard Worker
59*6dbdd20aSAndroid Build Coastguard WorkerIf we sample at some rate 1 / p, then to set p reasonably we need to know what
60*6dbdd20aSAndroid Build Coastguard Workerour probability of sampling an allocation of any size is, as well as our
61*6dbdd20aSAndroid Build Coastguard Workerexpected error when we sample it. If we set p = 1 then we sample everything and
62*6dbdd20aSAndroid Build Coastguard Workerhave no error. Reducing the sampling rate costs us coverage and accuracy.
63*6dbdd20aSAndroid Build Coastguard Worker
64*6dbdd20aSAndroid Build Coastguard Worker### Sampling probabilities
65*6dbdd20aSAndroid Build Coastguard Worker
66*6dbdd20aSAndroid Build Coastguard WorkerWe will sample an allocation with probability Exponential(sampling rate) <
67*6dbdd20aSAndroid Build Coastguard Workerallocation size. This is equivalent to the probability that we do not fail to
68*6dbdd20aSAndroid Build Coastguard Workersample all bytes within the allocation if we sample bytes at our sampling rate.
69*6dbdd20aSAndroid Build Coastguard Worker
70*6dbdd20aSAndroid Build Coastguard WorkerBecause the exponential distribution is memoryless, we can add together
71*6dbdd20aSAndroid Build Coastguard Workerallocations that are the same even if they occur apart for the purposes of
72*6dbdd20aSAndroid Build Coastguard Workerprobability. This means that if we have an allocation of 1MB and then another of
73*6dbdd20aSAndroid Build Coastguard Worker1MB, the probability of us taking at least one sample is the same as the
74*6dbdd20aSAndroid Build Coastguard Workerprobability of us taking at least one sample of a contiguous 2MB allocation.
75*6dbdd20aSAndroid Build Coastguard Worker
76*6dbdd20aSAndroid Build Coastguard WorkerWe can see from the chart below that if we 16X our sampling rate from 32KiB to
77*6dbdd20aSAndroid Build Coastguard Worker512KiB we still have a 95% chance of sampling anything above 1.5MiB. If we 64X
78*6dbdd20aSAndroid Build Coastguard Workerit to 2048KiB we still have an 80% chance to sample anything above 3.5MiB.
79*6dbdd20aSAndroid Build Coastguard Worker
80*6dbdd20aSAndroid Build Coastguard Worker![](/docs/images/heapprofd-sampling/one-sample.png)
81*6dbdd20aSAndroid Build Coastguard Worker
82*6dbdd20aSAndroid Build Coastguard Worker### Expected errors
83*6dbdd20aSAndroid Build Coastguard WorkerWe can also consider the expected error we’ll make at given allocation sizes and
84*6dbdd20aSAndroid Build Coastguard Workersampling rates. Like before it doesn’t matter where the allocation happens — if
85*6dbdd20aSAndroid Build Coastguard Workerwe have two allocations of the same type occurring at different times they have
86*6dbdd20aSAndroid Build Coastguard Workerthe same statistical properties as a single allocation with size equal to their
87*6dbdd20aSAndroid Build Coastguard Workersum. This is because the exponential distribution we use is memoryless.
88*6dbdd20aSAndroid Build Coastguard Worker
89*6dbdd20aSAndroid Build Coastguard Worker
90*6dbdd20aSAndroid Build Coastguard WorkerFor expected error we report something like the [mean absolute percentage
91*6dbdd20aSAndroid Build Coastguard Workererror]. This just means we see how wrong we might be in percent relative to the
92*6dbdd20aSAndroid Build Coastguard Workertrue allocation size, and then scale these results according to their
93*6dbdd20aSAndroid Build Coastguard Workerprobability of occurrence. This is an estimator that has some issues (i.e. it’s
94*6dbdd20aSAndroid Build Coastguard Workerbiased such that underestimates are more penalised) but it’s easy to reason
95*6dbdd20aSAndroid Build Coastguard Workerabout and it’s useful as a gauge of how wrong on average we might be with our
96*6dbdd20aSAndroid Build Coastguard Workerestimates. I would recommend just reading this as analogous to a standard
97*6dbdd20aSAndroid Build Coastguard Workerdeviation.
98*6dbdd20aSAndroid Build Coastguard Worker
99*6dbdd20aSAndroid Build Coastguard Worker
100*6dbdd20aSAndroid Build Coastguard WorkerCharts below show both the expected error and the conditional expected error:
101*6dbdd20aSAndroid Build Coastguard Workerthe expected error assuming we have at least one sample within the allocation.
102*6dbdd20aSAndroid Build Coastguard WorkerThere’s periodicity in both in line with the sampling rate used. The thing to
103*6dbdd20aSAndroid Build Coastguard Workertake away is that, while the estimates are unbiased such that their expected
104*6dbdd20aSAndroid Build Coastguard Workervalue is equal to their real value, substantial errors are still possible.
105*6dbdd20aSAndroid Build Coastguard Worker
106*6dbdd20aSAndroid Build Coastguard Worker![](/docs/images/heapprofd-sampling/expected-error.png)
107*6dbdd20aSAndroid Build Coastguard Worker
108*6dbdd20aSAndroid Build Coastguard WorkerAssuming that we take at least one sample of an allocation, what error might we
109*6dbdd20aSAndroid Build Coastguard Workerexpect? We can answer that using the following chart, which shows expected error
110*6dbdd20aSAndroid Build Coastguard Workerconditional on at least one sample being taken. This is the expected error we
111*6dbdd20aSAndroid Build Coastguard Workercan expect for the things that end up in our heap profile. It’s important to
112*6dbdd20aSAndroid Build Coastguard Workeremphasise that this expected error methodology is not exact, and also that the
113*6dbdd20aSAndroid Build Coastguard Workerunderlying sampling remains unbiased — the expected value is the true value.
114*6dbdd20aSAndroid Build Coastguard WorkerThis should be considered more as a measure akin to standard deviation.
115*6dbdd20aSAndroid Build Coastguard Worker
116*6dbdd20aSAndroid Build Coastguard Worker![](/docs/images/heapprofd-sampling/conditional-expected-error.png)
117*6dbdd20aSAndroid Build Coastguard Worker
118*6dbdd20aSAndroid Build Coastguard Worker## Performance Considerations
119*6dbdd20aSAndroid Build Coastguard Worker### STL Distributions
120*6dbdd20aSAndroid Build Coastguard Worker
121*6dbdd20aSAndroid Build Coastguard WorkerBenchmarking of the STL distributions on a Pixel 4 reinforces our approach of
122*6dbdd20aSAndroid Build Coastguard Workerestimating the geometric distribution using an exponential distribution, as its
123*6dbdd20aSAndroid Build Coastguard Workerperformance is ~8x better ([full results]).
124*6dbdd20aSAndroid Build Coastguard Worker
125*6dbdd20aSAndroid Build Coastguard Worker
126*6dbdd20aSAndroid Build Coastguard WorkerDraw sample from exponential distribution with p = 1 / 32000:
127*6dbdd20aSAndroid Build Coastguard Worker
128*6dbdd20aSAndroid Build Coastguard WorkerARM64: 21.3ns (sd 0.066ns, negligibly small, smaller than a CPU cycle)
129*6dbdd20aSAndroid Build Coastguard Worker
130*6dbdd20aSAndroid Build Coastguard WorkerARM32: 33.2ns (sd 0.011ns, negligibly small, smaller than a CPU cycle)
131*6dbdd20aSAndroid Build Coastguard Worker
132*6dbdd20aSAndroid Build Coastguard Worker
133*6dbdd20aSAndroid Build Coastguard WorkerDraw sample from geometric distribution with p = 1 / 32000:
134*6dbdd20aSAndroid Build Coastguard Worker
135*6dbdd20aSAndroid Build Coastguard WorkerARM64: 169ns (sd 0.023ns, negligibly small, smaller than a CPU cycle) (7.93x)
136*6dbdd20aSAndroid Build Coastguard Worker
137*6dbdd20aSAndroid Build Coastguard WorkerARM32: 222ns (sd 0.118ns, negligibly small, smaller than a CPU cycle) (6.69x)
138*6dbdd20aSAndroid Build Coastguard Worker
139*6dbdd20aSAndroid Build Coastguard Worker## Appendix
140*6dbdd20aSAndroid Build Coastguard Worker
141*6dbdd20aSAndroid Build Coastguard Worker### Improvements made
142*6dbdd20aSAndroid Build Coastguard Worker
143*6dbdd20aSAndroid Build Coastguard WorkerWe previously (before Android 12) returned the size of the allocation accurately
144*6dbdd20aSAndroid Build Coastguard Workerand immediately if the allocation size was greater than our sampling rate. This
145*6dbdd20aSAndroid Build Coastguard Workerhad several impacts.
146*6dbdd20aSAndroid Build Coastguard Worker
147*6dbdd20aSAndroid Build Coastguard Worker
148*6dbdd20aSAndroid Build Coastguard WorkerThe most obvious impact is that with the old approach we would expect to sample
149*6dbdd20aSAndroid Build Coastguard Workeran allocation equal in size to our sampling rate ~63% of the time, rather than
150*6dbdd20aSAndroid Build Coastguard Worker100%. This means there is a discontinuity in coverage between an allocation a
151*6dbdd20aSAndroid Build Coastguard Workerbyte smaller than our sampling rate, and one a byte larger. This is still
152*6dbdd20aSAndroid Build Coastguard Workerunbiased from a statistical perspective, but it’s important to note.
153*6dbdd20aSAndroid Build Coastguard Worker
154*6dbdd20aSAndroid Build Coastguard Worker
155*6dbdd20aSAndroid Build Coastguard WorkerAnother unusual impact is that the sampling rate depends not only on the size of
156*6dbdd20aSAndroid Build Coastguard Workerthe memory being used, but also how it’s allocated. If our sampling rate were
157*6dbdd20aSAndroid Build Coastguard Worker10KB, and we have an allocation that’s 10KB, we sample it with certainty. If
158*6dbdd20aSAndroid Build Coastguard Workerinstead that 10KB is split among two 5KB allocations, we sample it with
159*6dbdd20aSAndroid Build Coastguard Workerprobability 63%. Again this doesn’t bias our results, but it means that our
160*6dbdd20aSAndroid Build Coastguard Workermeasurements of memory where there are many small allocations might be noisier
161*6dbdd20aSAndroid Build Coastguard Workerthan ones where there are a few large allocations, even if total memory used is
162*6dbdd20aSAndroid Build Coastguard Workerthe same. If we didn’t return allocations greater than the sampling rate every
163*6dbdd20aSAndroid Build Coastguard Workertime, then the memorylessness property of the exponential distribution would
164*6dbdd20aSAndroid Build Coastguard Workermean our method is insensitive to how the memory is split amongst allocations,
165*6dbdd20aSAndroid Build Coastguard Workerwhich seems a desirable property.
166*6dbdd20aSAndroid Build Coastguard Worker
167*6dbdd20aSAndroid Build Coastguard Worker
168*6dbdd20aSAndroid Build Coastguard WorkerWe altered the cutoff at which we simply returned the allocation size.
169*6dbdd20aSAndroid Build Coastguard WorkerPreviously, as discussed, the cutoff was at the sampling rate, which led to a
170*6dbdd20aSAndroid Build Coastguard Workerdiscontinuity. Now the cutoff is determined by the size at which we have a >99%
171*6dbdd20aSAndroid Build Coastguard Workerchance of sampling an allocation at least once, given the sampling rate we’re
172*6dbdd20aSAndroid Build Coastguard Workerusing. This resolves the above issues.
173*6dbdd20aSAndroid Build Coastguard Worker
174*6dbdd20aSAndroid Build Coastguard Worker[algorithm]:
175*6dbdd20aSAndroid Build Coastguard Worker  https://cs.android.com/android/platform/superproject/main/+/main:external/perfetto/src/profiling/memory/sampler.h
176*6dbdd20aSAndroid Build Coastguard Worker[example visualization]: /docs/images/native-heap-prof.png
177*6dbdd20aSAndroid Build Coastguard Worker[Android Heap Profiler]: /docs/design-docs/heapprofd-design
178*6dbdd20aSAndroid Build Coastguard Worker[mean absolute percentage error]: https://en.wikipedia.org/wiki/Mean_absolute_percentage_error
179*6dbdd20aSAndroid Build Coastguard Worker[full results]: https://gist.github.com/fmayer/3aafcaf58f8db09714ba09aa4ac2b1ac
180