xref: /aosp_15_r20/external/bcc/tools/exitsnoop_example.txt (revision 387f9dfdfa2baef462e92476d413c7bc2470293e)
1Demonstrations of exitsnoop.
2
3This Linux tool traces all process terminations and reason, it
4    - is implemented using BPF, which requires CAP_SYS_ADMIN and
5      should therefore be invoked with sudo
6    - traces sched_process_exit tracepoint in kernel/exit.c
7    - includes processes by root and all users
8    - includes processes in containers
9    - includes processes that become zombie
10
11The following example shows the termination of the 'sleep' and 'bash' commands
12when run in a loop that is interrupted with Ctrl-C from the terminal:
13
14# ./exitsnoop.py  > exitlog &
15[1] 18997
16# for((i=65;i<100;i+=5)); do bash -c "sleep 1.$i;exit $i"; done
17^C
18# fg
19./exitsnoop.py > exitlog
20^C
21# cat exitlog
22PCOMM            PID    PPID   TID    AGE(s)  EXIT_CODE
23sleep            19004  19003  19004  1.65    0
24bash             19003  17656  19003  1.65    code 65
25sleep            19007  19006  19007  1.70    0
26bash             19006  17656  19006  1.70    code 70
27sleep            19010  19009  19010  1.75    0
28bash             19009  17656  19009  1.75    code 75
29sleep            19014  19013  19014  0.23    signal 2 (INT)
30bash             19013  17656  19013  0.23    signal 2 (INT)
31
32#
33
34The output shows the process/command name (PCOMM), the PID,
35the process that will be notified (PPID), the thread (TID), the AGE
36of the process with hundredth of a second resolution, and the reason for
37the process exit (EXIT_CODE).
38
39A -t option can be used to include a timestamp column, it shows local time
40by default.  The --utc option shows the time in UTC.  The --label
41option adds a column indicating the tool that generated the output,
42'exit' by default.  If other tools follow this format their outputs
43can be merged into a single trace with a simple lexical sort
44increasing in time order with each line labeled to indicate the event,
45e.g. 'exec', 'open', 'exit', etc.  Time is displayed with millisecond
46resolution. The -x option will show only non-zero exits and fatal
47signals, which excludes processes that exit with 0 code:
48
49# ./exitsnoop.py -t --utc -x --label= > exitlog &
50[1] 18289
51# for((i=65;i<100;i+=5)); do bash -c "sleep 1.$i;exit $i"; done
52^C
53# fg
54./exitsnoop.py -t --utc -x --label= > exitlog
55^C
56# cat exitlog
57TIME-UTC     LABEL PCOMM            PID    PPID   TID    AGE(s)  EXIT_CODE
5813:20:22.997 exit  bash             18300  17656  18300  1.65    code 65
5913:20:24.701 exit  bash             18303  17656  18303  1.70    code 70
6013:20:26.456 exit  bash             18306  17656  18306  1.75    code 75
6113:20:28.260 exit  bash             18310  17656  18310  1.80    code 80
6213:20:30.113 exit  bash             18313  17656  18313  1.85    code 85
6313:20:31.495 exit  sleep            18318  18317  18318  1.38    signal 2 (INT)
6413:20:31.495 exit  bash             18317  17656  18317  1.38    signal 2 (INT)
65#
66
67USAGE message:
68
69# ./exitsnoop.py -h
70usage: exitsnoop.py [-h] [-t] [--utc] [-p PID] [--label LABEL] [-x] [--per-thread]
71
72Trace all process termination (exit, fatal signal)
73
74optional arguments:
75  -h, --help         show this help message and exit
76  -t, --timestamp    include timestamp (local time default)
77  --utc              include timestamp in UTC (-t implied)
78  -p PID, --pid PID  trace this PID only
79  --label LABEL      label each line
80  -x, --failed       trace only fails, exclude exit(0)
81  --per-thread       trace per thread termination
82
83examples:
84    exitsnoop                # trace all process termination
85    exitsnoop -x             # trace only fails, exclude exit(0)
86    exitsnoop -t             # include timestamps (local time)
87    exitsnoop --utc          # include timestamps (UTC)
88    exitsnoop -p 181         # only trace PID 181
89    exitsnoop --label=exit   # label each output line with 'exit'
90    exitsnoop --per-thread   # trace per thread termination
91
92Exit status:
93
94    0 EX_OK        Success
95    2              argparse error
96   70 EX_SOFTWARE  syntax error detected by compiler, or
97                   verifier error from kernel
98   77 EX_NOPERM    Need sudo (CAP_SYS_ADMIN) for BPF() system call
99
100About process termination in Linux
101----------------------------------
102
103A program/process on Linux terminates normally
104    - by explicitly invoking the exit( int ) system call
105    - in C/C++ by returning an int from main(),
106      ...which is then used as the value for exit()
107    - by reaching the end of main() without a return
108      ...which is equivalent to return 0 (C99 and C++)
109  Notes:
110    - Linux keeps only the least significant eight bits of the exit value
111    - an exit value of 0 means success
112    - an exit value of 1-255 means an error
113
114A process terminates abnormally if it
115    - receives a signal which is not ignored or blocked and has no handler
116      ... the default action is to terminate with optional core dump
117    - is selected by the kernel's "Out of Memory Killer",
118      equivalent to being sent SIGKILL (9), which cannot be ignored or blocked
119  Notes:
120    - any signal can be sent asynchronously via the kill() system call
121    - synchronous signals are the result of the CPU detecting
122      a fault or trap during execution of the program, a kernel handler
123      is dispatched which determines the cause and the corresponding
124      signal, examples are
125        - attempting to fetch data or instructions at invalid or
126          privileged addresses,
127        - attempting to divide by zero, unmasked floating point exceptions
128        - hitting a breakpoint
129
130Linux keeps process termination information in 'exit_code', an int
131within struct 'task_struct' defined in <linux/sched.c>
132    - if the process terminated normally:
133        - the exit value is in bits 15:8
134        - the least significant 8 bits of exit_code are zero (bits 7:0)
135    - if the process terminates abnormally:
136        - the signal number (>= 1) is in bits 6:0
137        - bit 7 indicates a 'core dump' action, whether a core dump was
138          actually done depends on ulimit.
139
140Success is indicated with an exit value of zero.
141The meaning of a non zero exit value depends on the program.
142Some programs document their exit values and their meaning.
143This script uses exit values as defined in <include/sysexits.h>
144
145References:
146
147   https://github.com/torvalds/linux/blob/master/kernel/exit.c
148   https://github.com/torvalds/linux/blob/master/arch/x86/include/uapi/asm/signal.h
149   https://code.woboq.org/userspace/glibc/misc/sysexits.h.html
150
151