1# Debugging memory usage on Android 2 3## Prerequisites 4 5* A host running macOS or Linux. 6* [ADB](https://developer.android.com/studio/command-line/adb) installed and 7 in PATH. 8* A device running Android 11+. 9 10If you are profiling your own app and are not running a userdebug build of 11Android, your app needs to be marked as profileable or 12debuggable in its manifest. See the [heapprofd documentation]( 13/docs/data-sources/native-heap-profiler.md#heapprofd-targets) for more 14details on which applications can be targeted. 15 16## dumpsys meminfo 17 18A good place to get started investigating memory usage of a process is 19`dumpsys meminfo` which gives a high-level overview of how much of the various 20types of memory are being used by a process. 21 22```bash 23$ adb shell dumpsys meminfo com.android.systemui 24 25Applications Memory Usage (in Kilobytes): 26Uptime: 2030149 Realtime: 2030149 27 28** MEMINFO in pid 1974 [com.android.systemui] ** 29 Pss Private Private SwapPss Rss Heap Heap Heap 30 Total Dirty Clean Dirty Total Size Alloc Free 31 ------ ------ ------ ------ ------ ------ ------ ------ 32 Native Heap 16840 16804 0 6764 19428 34024 25037 5553 33 Dalvik Heap 9110 9032 0 136 13164 36444 9111 27333 34 35[more stuff...] 36``` 37 38Looking at the "Private Dirty" column of Dalvik Heap (= Java Heap) and 39Native Heap, we can see that SystemUI's memory usage on the Java heap 40is 9M, on the native heap it's 17M. 41 42## Linux memory management 43 44But what does *clean*, *dirty*, *Rss*, *Pss*, *Swap* actually mean? To answer 45this question, we need to delve into Linux memory management a bit. 46 47From the kernel's point of view, memory is split into equally sized blocks 48called *pages*. These are generally 4KiB. 49 50Pages are organized in virtually contiguous ranges called VMA 51(Virtual Memory Area). 52 53VMAs are created when a process requests a new pool of memory pages through 54the [mmap() system call](https://man7.org/linux/man-pages/man2/mmap.2.html). 55Applications rarely call mmap() directly. Those calls are typically mediated by 56the allocator, `malloc()/operator new()` for native processes or by the 57Android RunTime for Java apps. 58 59VMAs can be of two types: file-backed and anonymous. 60 61**File-backed VMAs** are a view of a file in memory. They are obtained passing a 62file descriptor to `mmap()`. The kernel will serve page faults on the VMA 63through the passed file, so reading a pointer to the VMA becomes the equivalent 64of a `read()` on the file. 65File-backed VMAs are used, for instance, by the dynamic linker (`ld`) when 66executing new processes or dynamically loading libraries, or by the Android 67framework, when loading a new .dex library or accessing resources in the APK. 68 69**Anonymous VMAs** are memory-only areas not backed by any file. This is the way 70allocators request dynamic memory from the kernel. Anonymous VMAs are obtained 71calling `mmap(... MAP_ANONYMOUS ...)`. 72 73Physical memory is only allocated, in page granularity, once the application 74tries to read/write from a VMA. If you allocate 32 MiB worth of pages but only 75touch one byte, your process' memory usage will only go up by 4KiB. You will 76have increased your process' *virtual memory* by 32 MiB, but its resident 77*physical memory* by 4 KiB. 78 79When optimizing memory use of programs, we are interested in reducing their 80footprint in *physical memory*. High *virtual memory* use is generally not a 81cause for concern on modern platforms (except if you run out of address space, 82which is very hard on 64 bit systems). 83 84We call the amount a process' memory that is resident in *physical memory* its 85**RSS** (Resident Set Size). Not all resident memory is equal though. 86 87From a memory-consumption viewpoint, individual pages within a VMA can have the 88following states: 89 90* **Resident**: the page is mapped to a physical memory page. Resident pages can 91 be in two states: 92 * **Clean** (only for file-backed pages): the contents of the page are the 93 same of the contents on-disk. The kernel can evict clean pages more easily 94 in case of memory pressure. This is because if they should be needed 95 again, the kernel knows it can re-create its contents by reading them from 96 the underlying file. 97 * **Dirty**: the contents of the page diverge from the disk, or (in most 98 cases), the page has no disk backing (i.e. it's _anonymous_). Dirty pages 99 cannot be evicted because doing so would cause data loss. However they can 100 be swapped out on disk or ZRAM, if present. 101* **Swapped**: a dirty page can be written to the swap file on disk (on most Linux 102 desktop distributions) or compressed (on Android and CrOS through 103 [ZRAM](https://source.android.com/devices/tech/perf/low-ram#zram)). The page 104 will stay swapped until a new page fault on its virtual address happens, at 105 which point the kernel will bring it back in main memory. 106* **Not present**: no page fault ever happened on the page or the page was 107 clean and later was evicted. 108 109It is generally more important to reduce the amount of _dirty_ memory as that 110cannot be reclaimed like _clean_ memory and, on Android, even if swapped in 111ZRAM, will still eat part of the system memory budget. 112This is why we looked at *Private Dirty* in the `dumpsys meminfo` example. 113 114*Shared* memory can be mapped into more than one process. This means VMAs in 115different processes refer to the same physical memory. This typically happens 116with file-backed memory of commonly used libraries (e.g., libc.so, 117framework.dex) or, more rarely, when a process `fork()`s and a child process 118inherits dirty memory from its parent. 119 120This introduces the concept of **PSS** (Proportional Set Size). In **PSS**, 121memory that is resident in multiple processes is proportionally attributed to 122each of them. If we map one 4KiB page into four processes, each of their 123**PSS** will increase by 1KiB. 124 125#### Recap 126 127* Dynamically allocated memory, whether allocated through C's `malloc()`, C++'s 128 `operator new()` or Java's `new X()` starts always as _anonymous_ and _dirty_, 129 unless it is never used. 130* If this memory is not read/written for a while, or in case of memory pressure, 131 it gets swapped out on ZRAM and becomes _swapped_. 132* Anonymous memory, whether _resident_ (and hence _dirty_) or _swapped_ is 133 always a resource hog and should be avoided if unnecessary. 134* File-mapped memory comes from code (java or native), libraries and resource 135 and is almost always _clean_. Clean memory also erodes the system memory 136 budget but typically application developers have less control on it. 137 138## Memory over time 139 140`dumpsys meminfo` is good to get a snapshot of the current memory usage, but 141even very short memory spikes can lead to low-memory situations, which will 142lead to [LMKs](#lmk). We have two tools to investigate situations like this 143 144* RSS High Watermark. 145* Memory tracepoints. 146 147### RSS High Watermark 148 149We can get a lot of information from the `/proc/[pid]/status` file, including 150memory information. `VmHWM` shows the maximum RSS usage the process has seen 151since it was started. This value is kept updated by the kernel. 152 153```bash 154$ adb shell cat '/proc/$(pidof com.android.systemui)/status' 155[...] 156VmHWM: 256972 kB 157VmRSS: 195272 kB 158RssAnon: 30184 kB 159RssFile: 164420 kB 160RssShmem: 668 kB 161VmSwap: 43960 kB 162[...] 163``` 164 165### Memory tracepoints 166 167NOTE: For detailed instructions about the memory trace points see the 168 [Data sources > Memory > Counters and events]( 169 /docs/data-sources/memory-counters.md) page. 170 171We can use Perfetto to get information about memory management events from the 172kernel. 173 174```bash 175$ adb shell perfetto \ 176 -c - --txt \ 177 -o /data/misc/perfetto-traces/trace \ 178<<EOF 179 180buffers: { 181 size_kb: 8960 182 fill_policy: DISCARD 183} 184buffers: { 185 size_kb: 1280 186 fill_policy: DISCARD 187} 188data_sources: { 189 config { 190 name: "linux.process_stats" 191 target_buffer: 1 192 process_stats_config { 193 scan_all_processes_on_start: true 194 } 195 } 196} 197data_sources: { 198 config { 199 name: "linux.ftrace" 200 ftrace_config { 201 ftrace_events: "mm_event/mm_event_record" 202 ftrace_events: "kmem/rss_stat" 203 ftrace_events: "kmem/ion_heap_grow" 204 ftrace_events: "kmem/ion_heap_shrink" 205 } 206 } 207} 208duration_ms: 30000 209 210EOF 211``` 212 213While it is running, take a photo if you are following along. 214 215Pull the file using `adb pull /data/misc/perfetto-traces/trace ~/mem-trace` 216and upload to the [Perfetto UI](https://ui.perfetto.dev). This will show 217overall stats about system [ION](#ion) usage, and per-process stats to 218expand. Scroll down (or Ctrl-F for) to `com.google.android.GoogleCamera` and 219expand. This will show a timeline for various memory stats for camera. 220 221 222 223We can see that around 2/3 into the trace, the memory spiked (in the 224mem.rss.anon track). This is where I took a photo. This is a good way to see 225how the memory usage of an application reacts to different triggers. 226 227## Which tool to use 228 229If you want to drill down into _anonymous_ memory allocated by Java code, 230labeled by `dumpsys meminfo` as `Dalvik Heap`, see the 231[Analyzing the java heap](#java-hprof) section. 232 233If you want to drill down into _anonymous_ memory allocated by native code, 234labeled by `dumpsys meminfo` as `Native Heap`, see the 235[Analyzing the Native Heap](#heapprofd) section. Note that it's frequent to end 236up with native memory even if your app doesn't have any C/C++ code. This is 237because the implementation of some framework API (e.g. Regex) is internally 238implemented through native code. 239 240If you want to drill down into file-mapped memory the best option is to use 241`adb shell showmap PID` (on Android) or inspect `/proc/PID/smaps`. 242 243 244## {#lmk} Low-memory kills 245 246When an Android device becomes low on memory, a daemon called `lmkd` will 247start killing processes in order to free up memory. Devices' strategies differ, 248but in general processes will be killed in order of descending `oom_score_adj` 249score (i.e. background apps and processes first, foreground processes last). 250 251Apps on Android are not killed when switching away from them. They instead 252remain *cached* even after the user finishes using them. This is to make 253subsequent starts of the app faster. Such apps will generally be killed 254first (because they have a higher `oom_score_adj`). 255 256We can collect information about LMKs and `oom_score_adj` using Perfetto. 257 258```protobuf 259$ adb shell perfetto \ 260 -c - --txt \ 261 -o /data/misc/perfetto-traces/trace \ 262<<EOF 263 264buffers: { 265 size_kb: 8960 266 fill_policy: DISCARD 267} 268buffers: { 269 size_kb: 1280 270 fill_policy: DISCARD 271} 272data_sources: { 273 config { 274 name: "linux.process_stats" 275 target_buffer: 1 276 process_stats_config { 277 scan_all_processes_on_start: true 278 } 279 } 280} 281data_sources: { 282 config { 283 name: "linux.ftrace" 284 ftrace_config { 285 ftrace_events: "lowmemorykiller/lowmemory_kill" 286 ftrace_events: "oom/oom_score_adj_update" 287 ftrace_events: "ftrace/print" 288 atrace_apps: "lmkd" 289 } 290 } 291} 292duration_ms: 60000 293 294EOF 295``` 296 297Pull the file using `adb pull /data/misc/perfetto-traces/trace ~/oom-trace` 298and upload to the [Perfetto UI](https://ui.perfetto.dev). 299 300 301 302We can see that the OOM score of Camera gets reduced (making it less likely 303to be killed) when it is opened, and gets increased again once it is closed. 304 305## {#heapprofd} Analyzing the Native Heap 306 307**Native Heap Profiles require Android 10.** 308 309NOTE: For detailed instructions about the native heap profiler and 310 troubleshooting see the [Data sources > Heap profiler]( 311 /docs/data-sources/native-heap-profiler.md) page. 312 313Applications usually get memory through `malloc` or C++'s `new` rather than 314directly getting it from the kernel. The allocator makes sure that your memory 315is more efficiently handled (i.e. there are not many gaps) and that the 316overhead from asking the kernel remains low. 317 318We can log the native allocations and frees that a process does using 319*heapprofd*. The resulting profile can be used to attribute memory usage 320to particular function callstacks, supporting a mix of both native and Java 321code. The profile *will only show allocations done while it was running*, any 322allocations done before will not be shown. 323 324### {#capture-profile-native} Capturing the profile 325 326Use the `tools/heap_profile` script to profile a process. If you are having 327trouble make sure you are using the [latest version]( 328https://raw.githubusercontent.com/google/perfetto/main/tools/heap_profile). 329See all the arguments using `tools/heap_profile -h`, or use the defaults 330and just profile a process (e.g. `system_server`): 331 332```bash 333$ tools/heap_profile -n system_server 334 335Profiling active. Press Ctrl+C to terminate. 336You may disconnect your device. 337 338Wrote profiles to /tmp/profile-1283e247-2170-4f92-8181-683763e17445 (symlink /tmp/heap_profile-latest) 339These can be viewed using pprof. Googlers: head to pprof/ and upload them. 340``` 341 342When you see *Profiling active*, play around with the phone a bit. When you 343are done, press Ctrl-C to end the profile. For this tutorial, I opened a 344couple of apps. 345 346### Viewing the data 347 348Then upload the `raw-trace` file from the output directory to the 349[Perfetto UI](https://ui.perfetto.dev) and click on diamond marker that 350shows. 351 352 353 354The tabs that are available are 355 356* **Unreleased malloc size**: how many bytes were allocated but not freed at 357 this callstack the moment the dump was created. 358* **Total malloc size**: how many bytes were allocated (including ones freed at 359 the moment of the dump) at this callstack. 360* **Unreleased malloc count**: how many allocations without matching frees were 361 done at this callstack. 362* **Total malloc count**: how many allocations (including ones with matching 363 frees) were done at this callstack. 364 365The default view will show you all allocations that were done while the 366profile was running but that weren't freed (the **space** tab). 367 368 369 370We can see that a lot of memory gets allocated in paths through 371`AssetManager.applyStyle`. To get the total memory that was allocated 372this way, we can enter "applyStyle" into the Focus textbox. This will only 373show callstacks where some frame matches "applyStyle". 374 375 376 377From this we have a clear idea where in the code we have to look. From the 378code we can see how that memory is being used and if we actually need all of 379it. 380 381## {#java-hprof} Analyzing the Java Heap 382 383**Java Heap Dumps require Android 11.** 384 385NOTE: For detailed instructions about capturing Java heap dumps and 386 troubleshooting see the [Data sources > Java heap dumps]( 387 /docs/data-sources/java-heap-profiler.md) page. 388 389### {#capture-profile-java} Dumping the java heap 390We can get a snapshot of the graph of all the Java objects that constitute the 391Java heap. We use the `tools/java_heap_dump` script. If you are having trouble 392make sure you are using the [latest version]( 393https://raw.githubusercontent.com/google/perfetto/main/tools/java_heap_dump). 394 395```bash 396$ tools/java_heap_dump -n com.android.systemui 397 398Dumping Java Heap. 399Wrote profile to /tmp/tmpup3QrQprofile 400This can be viewed using https://ui.perfetto.dev. 401``` 402 403### Viewing the Data 404 405Upload the trace to the [Perfetto UI](https://ui.perfetto.dev) and click on 406diamond marker that shows. 407 408 409 410This will present a set of flamegraph views as explained below. 411 412#### "Size" and "Objects" tabs 413 414 415 416These views show the memory attributed to the shortest path to a 417garbage-collection root. In general an object is reachable by many paths, we 418only show the shortest as that reduces the complexity of the data displayed and 419is generally the highest-signal. The rightmost `[merged]` stacks is the sum of 420all objects that are too small to be displayed. 421 422* **Size**: how many bytes are retained via this path to the GC root. 423* **Objects**: how many objects are retained via this path to the GC root. 424 425If we want to only see callstacks that have a frame that contains some string, 426we can use the Focus feature. If we want to know all allocations that have to 427do with notifications, we can put "notification" in the Focus box. 428 429As with native heap profiles, if we want to focus on some specific aspect of the 430graph, we can filter by the names of the classes. If we wanted to see everything 431that could be caused by notifications, we can put "notification" in the Focus box. 432 433 434 435We aggregate the paths per class name, so if there are multiple objects of the 436same type retained by a `java.lang.Object[]`, we will show one element as its 437child, as you can see in the leftmost stack above. This also applies to the 438dominator tree paths as described below. 439 440#### "Dominated Size" and "Dominated Objects" tabs 441 442 443 444Another way to present the heap graph as a flamegraph (a tree) is to show its 445[dominator tree](/docs/analysis/stdlib-docs.autogen#memory-heap_graph_dominator_tree). 446In a heap graph, an object `a` dominates an object `b` if `b` is reachable from 447the root only via paths that go through `a`. The dominators of an object form a 448chain from the root and the object is exclusvely retained by all objects on this 449chain. For all reachable objects in the graph those chains form a tree, i.e. the 450dominator tree. 451 452We aggregate the tree paths per class name, and each element (tree node) 453represents a set of objects that have the same class name and position in the 454dominator tree. 455 456* **Dominated Size**: how many bytes are exclusively retained by the objects in 457a node. 458* **Dominated Objects**: how many objects are exclusively retained by the 459objects in a node. 460