xref: /aosp_15_r20/system/core/llkd/README.md (revision 00c7fec1bb09f3284aad6a6f96d2f63dfc3650ad)
1*00c7fec1SAndroid Build Coastguard Worker<!--
2*00c7fec1SAndroid Build Coastguard WorkerProject: /_project.yaml
3*00c7fec1SAndroid Build Coastguard WorkerBook: /_book.yaml
4*00c7fec1SAndroid Build Coastguard Worker
5*00c7fec1SAndroid Build Coastguard Worker{% include "_versions.html" %}
6*00c7fec1SAndroid Build Coastguard Worker-->
7*00c7fec1SAndroid Build Coastguard Worker
8*00c7fec1SAndroid Build Coastguard Worker<!--
9*00c7fec1SAndroid Build Coastguard Worker  Copyright 2020 The Android Open Source Project
10*00c7fec1SAndroid Build Coastguard Worker
11*00c7fec1SAndroid Build Coastguard Worker  Licensed under the Apache License, Version 2.0 (the "License");
12*00c7fec1SAndroid Build Coastguard Worker  you may not use this file except in compliance with the License.
13*00c7fec1SAndroid Build Coastguard Worker  You may obtain a copy of the License at
14*00c7fec1SAndroid Build Coastguard Worker
15*00c7fec1SAndroid Build Coastguard Worker      http://www.apache.org/licenses/LICENSE-2.0
16*00c7fec1SAndroid Build Coastguard Worker
17*00c7fec1SAndroid Build Coastguard Worker  Unless required by applicable law or agreed to in writing, software
18*00c7fec1SAndroid Build Coastguard Worker  distributed under the License is distributed on an "AS IS" BASIS,
19*00c7fec1SAndroid Build Coastguard Worker  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
20*00c7fec1SAndroid Build Coastguard Worker  See the License for the specific language governing permissions and
21*00c7fec1SAndroid Build Coastguard Worker  limitations under the License.
22*00c7fec1SAndroid Build Coastguard Worker-->
23*00c7fec1SAndroid Build Coastguard Worker
24*00c7fec1SAndroid Build Coastguard Worker# Android Live-LocK Daemon (llkd)
25*00c7fec1SAndroid Build Coastguard Worker
26*00c7fec1SAndroid Build Coastguard WorkerAndroid 10 <!-- {{ androidQVersionNumber }} --> includes the Android Live-LocK Daemon
27*00c7fec1SAndroid Build Coastguard Worker(`llkd`), which is designed to catch and mitigate kernel deadlocks. The `llkd`
28*00c7fec1SAndroid Build Coastguard Workercomponent provides a default standalone implementation, but you can
29*00c7fec1SAndroid Build Coastguard Workeralternatively integrate the `llkd` code into another service, either as part of
30*00c7fec1SAndroid Build Coastguard Workerthe main loop or as a separate thread.
31*00c7fec1SAndroid Build Coastguard Worker
32*00c7fec1SAndroid Build Coastguard Worker## Detection scenarios <!-- {:#detection-scenarios} -->
33*00c7fec1SAndroid Build Coastguard Worker
34*00c7fec1SAndroid Build Coastguard WorkerThe `llkd` has two detection scenarios: Persistent D or Z state, and persistent
35*00c7fec1SAndroid Build Coastguard Workerstack signature.
36*00c7fec1SAndroid Build Coastguard Worker
37*00c7fec1SAndroid Build Coastguard Worker### Persistent D or Z state <!-- {:#persistent-d-or-z-state} -->
38*00c7fec1SAndroid Build Coastguard Worker
39*00c7fec1SAndroid Build Coastguard WorkerIf a thread is in D (uninterruptible sleep) or Z (zombie) state with no forward
40*00c7fec1SAndroid Build Coastguard Workerprogress for longer than `ro.llk.timeout_ms or ro.llk.[D|Z].timeout_ms`, the
41*00c7fec1SAndroid Build Coastguard Worker`llkd` kills the process (or parent process). If a subsequent scan shows the
42*00c7fec1SAndroid Build Coastguard Workersame process continues to exist, the `llkd` confirms a live-lock condition and
43*00c7fec1SAndroid Build Coastguard Workerpanics the kernel in a manner that provides the most detailed bug report for the
44*00c7fec1SAndroid Build Coastguard Workercondition.
45*00c7fec1SAndroid Build Coastguard Worker
46*00c7fec1SAndroid Build Coastguard WorkerThe `llkd` includes a self watchdog that alarms if `llkd` locks up; watchdog is
47*00c7fec1SAndroid Build Coastguard Workerdouble the expected time to flow through the mainloop and sampling is every
48*00c7fec1SAndroid Build Coastguard Worker`ro.llk_sample_ms`.
49*00c7fec1SAndroid Build Coastguard Worker
50*00c7fec1SAndroid Build Coastguard Worker### Persistent stack signature <!-- {:#persistent-stack-signature} -->
51*00c7fec1SAndroid Build Coastguard Worker
52*00c7fec1SAndroid Build Coastguard WorkerFor userdebug releases, the `llkd` can detect kernel live-locks using persistent
53*00c7fec1SAndroid Build Coastguard Workerstack signature checking. If a thread in any state except Z has a persistent
54*00c7fec1SAndroid Build Coastguard Workerlisted `ro.llk.stack` kernel symbol that is reported for longer than
55*00c7fec1SAndroid Build Coastguard Worker`ro.llk.timeout_ms` or `ro.llk.stack.timeout_ms`, the `llkd` kills the process
56*00c7fec1SAndroid Build Coastguard Worker(even if there is forward scheduling progress). If a subsequent scan shows the
57*00c7fec1SAndroid Build Coastguard Workersame process continues to exist, the `llkd` confirms a live-lock condition and
58*00c7fec1SAndroid Build Coastguard Workerpanics the kernel in a manner that provides the most detailed bug report for the
59*00c7fec1SAndroid Build Coastguard Workercondition.
60*00c7fec1SAndroid Build Coastguard Worker
61*00c7fec1SAndroid Build Coastguard WorkerNote: Because forward scheduling progress is allowed, the `llkd` does not
62*00c7fec1SAndroid Build Coastguard Workerperform [ABA detection](https://en.wikipedia.org/wiki/ABA_problem){:.external}.
63*00c7fec1SAndroid Build Coastguard Worker
64*00c7fec1SAndroid Build Coastguard WorkerThe `lldk` check persists continuously when the live lock condition exists and
65*00c7fec1SAndroid Build Coastguard Workerlooks for the composed strings `" symbol+0x"` or `" symbol.cfi+0x"` in the
66*00c7fec1SAndroid Build Coastguard Worker`/proc/pid/stack` file on Linux. The list of symbols is in `ro.llk.stack` and
67*00c7fec1SAndroid Build Coastguard Workerdefaults to the comma-separated list of
68*00c7fec1SAndroid Build Coastguard Worker"`cma_alloc,__get_user_pages,bit_wait_io,wait_on_page_bit_killable`".
69*00c7fec1SAndroid Build Coastguard Worker
70*00c7fec1SAndroid Build Coastguard WorkerSymbols should be rare and short-lived enough that on a typical system the
71*00c7fec1SAndroid Build Coastguard Workerfunction is seen only once in a sample over the timeout period of
72*00c7fec1SAndroid Build Coastguard Worker`ro.llk.stack.timeout_ms` (samples occur every `ro.llk.check_ms`). Due to lack
73*00c7fec1SAndroid Build Coastguard Workerof ABA protection, this is the only way to prevent a false trigger. The symbol
74*00c7fec1SAndroid Build Coastguard Workerfunction must appear below the function calling the lock that could contend. If
75*00c7fec1SAndroid Build Coastguard Workerthe lock is below or in the symbol function, the symbol appears in all affected
76*00c7fec1SAndroid Build Coastguard Workerprocesses, not just the one that caused the lockup.
77*00c7fec1SAndroid Build Coastguard Worker
78*00c7fec1SAndroid Build Coastguard Worker## Coverage <!-- {:#coverage} -->
79*00c7fec1SAndroid Build Coastguard Worker
80*00c7fec1SAndroid Build Coastguard WorkerThe default implementation of `llkd` does not monitor `init`, `[kthreadd]`, or
81*00c7fec1SAndroid Build Coastguard Worker`[kthreadd]` spawns. For the `llkd` to cover `[kthreadd]`-spawned threads:
82*00c7fec1SAndroid Build Coastguard Worker
83*00c7fec1SAndroid Build Coastguard Worker* Drivers must not remain in a persistent D state,
84*00c7fec1SAndroid Build Coastguard Worker
85*00c7fec1SAndroid Build Coastguard WorkerOR
86*00c7fec1SAndroid Build Coastguard Worker
87*00c7fec1SAndroid Build Coastguard Worker* Drivers must have mechanisms to recover the thread should it be killed
88*00c7fec1SAndroid Build Coastguard Worker  externally. For example, use `wait_event_interruptible()` instead of
89*00c7fec1SAndroid Build Coastguard Worker  `wait_event()`.
90*00c7fec1SAndroid Build Coastguard Worker
91*00c7fec1SAndroid Build Coastguard WorkerIf one of the above conditions is met, the `llkd` ignorelist can be adjusted to
92*00c7fec1SAndroid Build Coastguard Workercover kernel components.  Stack symbol checking involves an additional process
93*00c7fec1SAndroid Build Coastguard Workerignore list to prevent sepolicy violations on services that block `ptrace`
94*00c7fec1SAndroid Build Coastguard Workeroperations.
95*00c7fec1SAndroid Build Coastguard Worker
96*00c7fec1SAndroid Build Coastguard Worker## Android properties <!-- {:#android-properties} -->
97*00c7fec1SAndroid Build Coastguard Worker
98*00c7fec1SAndroid Build Coastguard WorkerThe `llkd` responds to several Android properties (listed below).
99*00c7fec1SAndroid Build Coastguard Worker
100*00c7fec1SAndroid Build Coastguard Worker* Properties named `prop_ms` are in milliseconds.
101*00c7fec1SAndroid Build Coastguard Worker* Properties that use comma (,) separator for lists use a leading separator to
102*00c7fec1SAndroid Build Coastguard Worker  preserve the default entry, then add or subtract entries with optional plus
103*00c7fec1SAndroid Build Coastguard Worker  (+) and minus (-) prefixes respectively. For these lists, the string "false"
104*00c7fec1SAndroid Build Coastguard Worker  is synonymous with an empty list, and blank or missing entries resort to the
105*00c7fec1SAndroid Build Coastguard Worker  specified default value.
106*00c7fec1SAndroid Build Coastguard Worker
107*00c7fec1SAndroid Build Coastguard Worker### ro.config.low_ram <!-- {:#ro-config-low-ram} -->
108*00c7fec1SAndroid Build Coastguard Worker
109*00c7fec1SAndroid Build Coastguard WorkerDevice is configured with limited memory.
110*00c7fec1SAndroid Build Coastguard Worker
111*00c7fec1SAndroid Build Coastguard Worker### ro.debuggable <!-- {:#ro-debuggable} -->
112*00c7fec1SAndroid Build Coastguard Worker
113*00c7fec1SAndroid Build Coastguard WorkerDevice is configured for userdebug or eng build.
114*00c7fec1SAndroid Build Coastguard Worker
115*00c7fec1SAndroid Build Coastguard Worker### ro.llk.sysrq_t <!-- {:#ro-llk-sysrq-t} -->
116*00c7fec1SAndroid Build Coastguard Worker
117*00c7fec1SAndroid Build Coastguard WorkerIf property is "eng", the default is not `ro.config.low_ram` or `ro.debuggable`.
118*00c7fec1SAndroid Build Coastguard WorkerIf true, dump all threads (`sysrq t`).
119*00c7fec1SAndroid Build Coastguard Worker
120*00c7fec1SAndroid Build Coastguard Worker### ro.llk.enable <!-- {:#ro-llk-enable} -->
121*00c7fec1SAndroid Build Coastguard Worker
122*00c7fec1SAndroid Build Coastguard WorkerAllow live-lock daemon to be enabled. Default is false.
123*00c7fec1SAndroid Build Coastguard Worker
124*00c7fec1SAndroid Build Coastguard Worker### llk.enable <!-- {:#llk-enable} -->
125*00c7fec1SAndroid Build Coastguard Worker
126*00c7fec1SAndroid Build Coastguard WorkerEvaluated for eng builds. Default is `ro.llk.enable`.
127*00c7fec1SAndroid Build Coastguard Worker
128*00c7fec1SAndroid Build Coastguard Worker### ro.khungtask.enable <!-- {:#ro-khungtask-enable} -->
129*00c7fec1SAndroid Build Coastguard Worker
130*00c7fec1SAndroid Build Coastguard WorkerAllow `[khungtask]` daemon to be enabled. Default is false.
131*00c7fec1SAndroid Build Coastguard Worker
132*00c7fec1SAndroid Build Coastguard Worker### khungtask.enable <!-- {:#khungtask-enable} -->
133*00c7fec1SAndroid Build Coastguard Worker
134*00c7fec1SAndroid Build Coastguard WorkerEvaluated for eng builds. Default is `ro.khungtask.enable`.
135*00c7fec1SAndroid Build Coastguard Worker
136*00c7fec1SAndroid Build Coastguard Worker### ro.llk.mlockall <!-- {:#ro-llk-mlockall} -->
137*00c7fec1SAndroid Build Coastguard Worker
138*00c7fec1SAndroid Build Coastguard WorkerEnable call to `mlockall()`. Default is false.
139*00c7fec1SAndroid Build Coastguard Worker
140*00c7fec1SAndroid Build Coastguard Worker### ro.khungtask.timeout <!-- {:#ro-khungtask-timeout} -->
141*00c7fec1SAndroid Build Coastguard Worker
142*00c7fec1SAndroid Build Coastguard Worker`[khungtask]` maximum time limit. Default is 12 minutes.
143*00c7fec1SAndroid Build Coastguard Worker
144*00c7fec1SAndroid Build Coastguard Worker### ro.llk.timeout_ms <!-- {:#ro-llk-timeout-ms} -->
145*00c7fec1SAndroid Build Coastguard Worker
146*00c7fec1SAndroid Build Coastguard WorkerD or Z maximum time limit. Default is 10 minutes. Double this value to set the
147*00c7fec1SAndroid Build Coastguard Workeralarm watchdog for `llkd`.
148*00c7fec1SAndroid Build Coastguard Worker
149*00c7fec1SAndroid Build Coastguard Worker### ro.llk.D.timeout_ms <!-- {:#ro-llk-D-timeout-ms} -->
150*00c7fec1SAndroid Build Coastguard Worker
151*00c7fec1SAndroid Build Coastguard WorkerD maximum time limit. Default is `ro.llk.timeout_ms`.
152*00c7fec1SAndroid Build Coastguard Worker
153*00c7fec1SAndroid Build Coastguard Worker### ro.llk.Z.timeout_ms <!-- {:#ro-llk-Z-timeout-ms} -->
154*00c7fec1SAndroid Build Coastguard Worker
155*00c7fec1SAndroid Build Coastguard WorkerZ maximum time limit. Default is `ro.llk.timeout_ms`.
156*00c7fec1SAndroid Build Coastguard Worker
157*00c7fec1SAndroid Build Coastguard Worker### ro.llk.stack.timeout_ms <!-- {:#ro-llk-stack-timeout-ms} -->
158*00c7fec1SAndroid Build Coastguard Worker
159*00c7fec1SAndroid Build Coastguard WorkerChecks for persistent stack symbols maximum time limit. Default is
160*00c7fec1SAndroid Build Coastguard Worker`ro.llk.timeout_ms`. **Active only on userdebug or eng builds**.
161*00c7fec1SAndroid Build Coastguard Worker
162*00c7fec1SAndroid Build Coastguard Worker### ro.llk.check_ms <!-- {:#ro-llk-check-ms} -->
163*00c7fec1SAndroid Build Coastguard Worker
164*00c7fec1SAndroid Build Coastguard WorkerSamples of threads for D or Z. Default is two minutes.
165*00c7fec1SAndroid Build Coastguard Worker
166*00c7fec1SAndroid Build Coastguard Worker### ro.llk.stack <!-- {:#ro-llk-stack} -->
167*00c7fec1SAndroid Build Coastguard Worker
168*00c7fec1SAndroid Build Coastguard WorkerChecks for kernel stack symbols that if persistently present can indicate a
169*00c7fec1SAndroid Build Coastguard Workersubsystem is locked up. Default is
170*00c7fec1SAndroid Build Coastguard Worker`cma_alloc,__get_user_pages,bit_wait_io,wait_on_page_bit_killable`
171*00c7fec1SAndroid Build Coastguard Workercomma-separated list of kernel symbols. The check doesn't do forward scheduling
172*00c7fec1SAndroid Build Coastguard WorkerABA except by polling every `ro.llk_check_ms` over the period
173*00c7fec1SAndroid Build Coastguard Worker`ro.llk.stack.timeout_ms`, so stack symbols should be exceptionally rare and
174*00c7fec1SAndroid Build Coastguard Workerfleeting (it is highly unlikely for a symbol to show up persistently in all
175*00c7fec1SAndroid Build Coastguard Workersamples of the stack). Checks for a match for `" symbol+0x"` or
176*00c7fec1SAndroid Build Coastguard Worker`" symbol.cfi+0x"` in stack expansion. **Available only on userdebug or eng
177*00c7fec1SAndroid Build Coastguard Workerbuilds**; security concerns on user builds result in limited privileges that
178*00c7fec1SAndroid Build Coastguard Workerprevent this check.
179*00c7fec1SAndroid Build Coastguard Worker
180*00c7fec1SAndroid Build Coastguard Worker### ro.llk.ignorelist.process <!-- {:#ro-llk-ignorelist-process} -->
181*00c7fec1SAndroid Build Coastguard Worker
182*00c7fec1SAndroid Build Coastguard WorkerThe `llkd` does not watch the specified processes. Default is `0,1,2` (`kernel`,
183*00c7fec1SAndroid Build Coastguard Worker`init`, and `[kthreadd]`) plus process names
184*00c7fec1SAndroid Build Coastguard Worker`init,[kthreadd],[khungtaskd],lmkd,llkd,watchdogd, [watchdogd],[watchdogd/0],...,[watchdogd/get_nprocs-1]`.
185*00c7fec1SAndroid Build Coastguard WorkerA process can be a `comm`, `cmdline`, or `pid` reference. An automated default
186*00c7fec1SAndroid Build Coastguard Workercan be larger than the current maximum property size of 92.
187*00c7fec1SAndroid Build Coastguard Worker
188*00c7fec1SAndroid Build Coastguard WorkerNote: `false` is an extremely unlikely process to want to ignore.
189*00c7fec1SAndroid Build Coastguard Worker
190*00c7fec1SAndroid Build Coastguard Worker### ro.llk.ignorelist.parent <!-- {:#ro-llk-ignorelist-parent} -->
191*00c7fec1SAndroid Build Coastguard Worker
192*00c7fec1SAndroid Build Coastguard WorkerThe `llkd` does not watch processes that have the specified parent(s). Default
193*00c7fec1SAndroid Build Coastguard Workeris `0,2,adbd&[setsid]` (`kernel`, `[kthreadd]`, and `adbd` only for zombie
194*00c7fec1SAndroid Build Coastguard Worker`setsid`). An ampersand (&) separator specifies that the parent is ignored only
195*00c7fec1SAndroid Build Coastguard Workerin combination with the target child process. Ampersand was selected because it
196*00c7fec1SAndroid Build Coastguard Workeris never part of a process name; however, a `setprop` in the shell requires the
197*00c7fec1SAndroid Build Coastguard Workerampersand to be escaped or quoted, although the `init rc` file where this is
198*00c7fec1SAndroid Build Coastguard Workernormally specified does not have this issue. A parent or target process can be a
199*00c7fec1SAndroid Build Coastguard Worker`comm`, `cmdline`, or `pid` reference.
200*00c7fec1SAndroid Build Coastguard Worker
201*00c7fec1SAndroid Build Coastguard Worker### ro.llk.ignorelist.uid <!-- {:#ro-llk-ignorelist-uid} -->
202*00c7fec1SAndroid Build Coastguard Worker
203*00c7fec1SAndroid Build Coastguard WorkerThe `llkd` does not watch processes that match the specified uid(s).
204*00c7fec1SAndroid Build Coastguard WorkerComma-separated list of uid numbers or names. Default is empty or false.
205*00c7fec1SAndroid Build Coastguard Worker
206*00c7fec1SAndroid Build Coastguard Worker### ro.llk.ignorelist.process.stack <!-- {:#ro-llk-ignorelist-process-stack} -->
207*00c7fec1SAndroid Build Coastguard Worker
208*00c7fec1SAndroid Build Coastguard WorkerThe `llkd` does not monitor the specified subset of processes for live lock stack
209*00c7fec1SAndroid Build Coastguard Workersignatures. Default is process names
210*00c7fec1SAndroid Build Coastguard Worker`init,lmkd.llkd,llkd,keystore,keystore2,ueventd,apexd,logd`. Prevents the sepolicy
211*00c7fec1SAndroid Build Coastguard Workerviolation associated with processes that block `ptrace` (as these can't be
212*00c7fec1SAndroid Build Coastguard Workerchecked). **Active only on userdebug and eng builds**. For details on build
213*00c7fec1SAndroid Build Coastguard Workertypes, refer to [Building Android](/setup/build/building#choose-a-target).
214*00c7fec1SAndroid Build Coastguard Worker
215*00c7fec1SAndroid Build Coastguard Worker## Architectural concerns <!-- {:#architectural-concerns} -->
216*00c7fec1SAndroid Build Coastguard Worker
217*00c7fec1SAndroid Build Coastguard Worker* Properties are limited to 92 characters.  However, this is not limited for
218*00c7fec1SAndroid Build Coastguard Worker  defaults defined in the `include/llkd.h` file in the sources.
219*00c7fec1SAndroid Build Coastguard Worker* The built-in `[khungtask]` daemon is too generic and trips on driver code that
220*00c7fec1SAndroid Build Coastguard Worker  sits around in D state too much. Switching drivers to sleep, or S state,
221*00c7fec1SAndroid Build Coastguard Worker  would make task(s) killable, and need to be resurrectable by drivers on an
222*00c7fec1SAndroid Build Coastguard Worker  as-need basis.
223*00c7fec1SAndroid Build Coastguard Worker
224*00c7fec1SAndroid Build Coastguard Worker## Library interface (optional) <!-- {:#library-interface-optional} -->
225*00c7fec1SAndroid Build Coastguard Worker
226*00c7fec1SAndroid Build Coastguard WorkerYou can optionally incorporate the `llkd` into another privileged daemon using
227*00c7fec1SAndroid Build Coastguard Workerthe following C interface from the `libllkd` component:
228*00c7fec1SAndroid Build Coastguard Worker
229*00c7fec1SAndroid Build Coastguard Worker```
230*00c7fec1SAndroid Build Coastguard Worker#include "llkd.h"
231*00c7fec1SAndroid Build Coastguard Workerbool llkInit(const char* threadname) /* return true if enabled */
232*00c7fec1SAndroid Build Coastguard Workerunsigned llkCheckMillseconds(void)   /* ms to sleep for next check */
233*00c7fec1SAndroid Build Coastguard Worker```
234*00c7fec1SAndroid Build Coastguard Worker
235*00c7fec1SAndroid Build Coastguard WorkerIf a threadname is provided, a thread automatically spawns, otherwise the caller
236*00c7fec1SAndroid Build Coastguard Workermust call `llkCheckMilliseconds` in its main loop. The function returns the
237*00c7fec1SAndroid Build Coastguard Workerperiod of time before the next expected call to this handler.
238