xref: /aosp_15_r20/external/bcc/tools/funclatency_example.txt (revision 387f9dfdfa2baef462e92476d413c7bc2470293e)
1Demonstrations of funclatency, the Linux eBPF/bcc version.
2
3
4Timing the do_sys_open() kernel function until Ctrl-C:
5
6# ./funclatency do_sys_open
7Tracing do_sys_open... Hit Ctrl-C to end.
8^C
9     nsecs           : count     distribution
10       0 -> 1        : 0        |                                      |
11       2 -> 3        : 0        |                                      |
12       4 -> 7        : 0        |                                      |
13       8 -> 15       : 0        |                                      |
14      16 -> 31       : 0        |                                      |
15      32 -> 63       : 0        |                                      |
16      64 -> 127      : 0        |                                      |
17     128 -> 255      : 0        |                                      |
18     256 -> 511      : 0        |                                      |
19     512 -> 1023     : 0        |                                      |
20    1024 -> 2047     : 0        |                                      |
21    2048 -> 4095     : 124      |****************                      |
22    4096 -> 8191     : 291      |**************************************|
23    8192 -> 16383    : 36       |****                                  |
24   16384 -> 32767    : 16       |**                                    |
25   32768 -> 65535    : 8        |*                                     |
26   65536 -> 131071   : 0        |                                      |
27  131072 -> 262143   : 0        |                                      |
28  262144 -> 524287   : 0        |                                      |
29  524288 -> 1048575  : 0        |                                      |
30 1048576 -> 2097151  : 0        |                                      |
31 2097152 -> 4194303  : 1        |                                      |
32
33avg = 13746 nsecs, total: 6543360 nsecs, count: 476
34
35Detaching...
36
37The output shows a histogram of function latency (call time), measured from when
38the function began executing (was called) to when it finished (returned).
39
40This example output shows that most of the time, do_sys_open() took between
412048 and 65536 nanoseconds (2 to 65 microseconds). The peak of this distribution
42shows 291 calls of between 4096 and 8191 nanoseconds. There was also one
43occurrence, an outlier, in the 2 to 4 millisecond range.
44
45How this works: the function entry and return are traced using the kernel kprobe
46and kretprobe tracer. Timestamps are collected, the delta time calculated, which
47is the bucketized and stored as an in-kernel histogram for efficiency. The
48histogram is visible in the output: it's the "count" column; everything else is
49decoration. Only the count column is copied to user-level on output. This is an
50efficient way to time kernel functions and examine their latency distribution.
51
52
53Now trace a user function, pthread_mutex_lock in libpthread, to determine if
54there is considerable lock contention:
55
56# ./funclatency pthread:pthread_mutex_lock -p $(pidof contentions)
57Tracing 1 function for "pthread:pthread_mutex_lock"... Hit Ctrl-C to end.
58     nsecs               : count     distribution
59         0 -> 1          : 0        |                                        |
60         2 -> 3          : 0        |                                        |
61         4 -> 7          : 0        |                                        |
62         8 -> 15         : 0        |                                        |
63        16 -> 31         : 0        |                                        |
64        32 -> 63         : 0        |                                        |
65        64 -> 127        : 0        |                                        |
66       128 -> 255        : 0        |                                        |
67       256 -> 511        : 0        |                                        |
68       512 -> 1023       : 0        |                                        |
69      1024 -> 2047       : 0        |                                        |
70      2048 -> 4095       : 508967   |****************************************|
71      4096 -> 8191       : 70072    |*****                                   |
72      8192 -> 16383      : 27686    |**                                      |
73     16384 -> 32767      : 5075     |                                        |
74     32768 -> 65535      : 2318     |                                        |
75     65536 -> 131071     : 581      |                                        |
76    131072 -> 262143     : 38       |                                        |
77    262144 -> 524287     : 5        |                                        |
78    524288 -> 1048575    : 1        |                                        |
79   1048576 -> 2097151    : 9        |                                        |
80
81avg = 4317 nsecs, total: 2654426112 nsecs, count: 614752
82
83Detaching...
84
85It seems that most calls to pthread_mutex_lock completed rather quickly (in
86under 4us), but there were some cases of considerable contention, sometimes
87over a full millisecond.
88
89
90Run a quick-and-dirty profiler over all the functions in an executable:
91# ./funclatency /home/user/primes:* -p $(pidof primes) -F
92Tracing 15 functions for "/home/user/primes:*"... Hit Ctrl-C to end.
93^C
94
95Function = is_prime [6556]
96     nsecs               : count     distribution
97         0 -> 1          : 0        |                                        |
98         2 -> 3          : 0        |                                        |
99         4 -> 7          : 0        |                                        |
100         8 -> 15         : 0        |                                        |
101        16 -> 31         : 0        |                                        |
102        32 -> 63         : 0        |                                        |
103        64 -> 127        : 0        |                                        |
104       128 -> 255        : 0        |                                        |
105       256 -> 511        : 0        |                                        |
106       512 -> 1023       : 0        |                                        |
107      1024 -> 2047       : 0        |                                        |
108      2048 -> 4095       : 1495322  |****************************************|
109      4096 -> 8191       : 95744    |**                                      |
110      8192 -> 16383      : 9926     |                                        |
111     16384 -> 32767      : 3070     |                                        |
112     32768 -> 65535      : 1415     |                                        |
113     65536 -> 131071     : 112      |                                        |
114    131072 -> 262143     : 9        |                                        |
115    262144 -> 524287     : 3        |                                        |
116    524288 -> 1048575    : 0        |                                        |
117   1048576 -> 2097151    : 8        |                                        |
118
119Function = insert_result [6556]
120     nsecs               : count     distribution
121         0 -> 1          : 0        |                                        |
122         2 -> 3          : 0        |                                        |
123         4 -> 7          : 0        |                                        |
124         8 -> 15         : 0        |                                        |
125        16 -> 31         : 0        |                                        |
126        32 -> 63         : 0        |                                        |
127        64 -> 127        : 0        |                                        |
128       128 -> 255        : 0        |                                        |
129       256 -> 511        : 0        |                                        |
130       512 -> 1023       : 0        |                                        |
131      1024 -> 2047       : 0        |                                        |
132      2048 -> 4095       : 111047   |****************************************|
133      4096 -> 8191       : 3998     |*                                       |
134      8192 -> 16383      : 720      |                                        |
135     16384 -> 32767      : 238      |                                        |
136     32768 -> 65535      : 106      |                                        |
137     65536 -> 131071     : 5        |                                        |
138    131072 -> 262143     : 4        |                                        |
139
140avg = 3404 nsecs, total: 5862276096 nsecs, count: 1721727
141
142Detaching...
143
144From the results, we can see that the is_prime function has something resembling
145an exponential distribution -- very few primes take a very long time to test,
146while most numbers are verified as prime or composite in less than 4us. The
147insert_result function exhibits a similar phenomenon, likely due to contention
148over a shared results container.
149
150
151Now vfs_read() is traced, and a microseconds histogram printed:
152
153# ./funclatency -u vfs_read
154Tracing vfs_read... Hit Ctrl-C to end.
155^C
156     usecs           : count     distribution
157       0 -> 1        : 1143     |**************************************|
158       2 -> 3        : 420      |*************                         |
159       4 -> 7        : 159      |*****                                 |
160       8 -> 15       : 295      |*********                             |
161      16 -> 31       : 25       |                                      |
162      32 -> 63       : 5        |                                      |
163      64 -> 127      : 1        |                                      |
164     128 -> 255      : 0        |                                      |
165     256 -> 511      : 0        |                                      |
166     512 -> 1023     : 0        |                                      |
167    1024 -> 2047     : 1        |                                      |
168    2048 -> 4095     : 0        |                                      |
169    4096 -> 8191     : 5        |                                      |
170    8192 -> 16383    : 0        |                                      |
171   16384 -> 32767    : 0        |                                      |
172   32768 -> 65535    : 0        |                                      |
173   65536 -> 131071   : 7        |                                      |
174  131072 -> 262143   : 7        |                                      |
175  262144 -> 524287   : 3        |                                      |
176  524288 -> 1048575  : 7        |                                      |
177
178avg = 4229 nsecs, total: 8789145 nsecs, count: 2078
179
180Detaching...
181
182This shows a bimodal distribution. Many vfs_read() calls were faster than 15
183microseconds, however, there was also a small handful between 65 milliseconds
184and 1 second, seen at the bottom of the table. These are likely network reads
185from SSH, waiting on interactive keystrokes.
186
187
188Tracing do_nanosleep() in milliseconds:
189
190# ./funclatency -m do_nanosleep
191Tracing do_nanosleep... Hit Ctrl-C to end.
192^C
193     msecs           : count     distribution
194       0 -> 1        : 0        |                                      |
195       2 -> 3        : 0        |                                      |
196       4 -> 7        : 0        |                                      |
197       8 -> 15       : 0        |                                      |
198      16 -> 31       : 0        |                                      |
199      32 -> 63       : 0        |                                      |
200      64 -> 127      : 0        |                                      |
201     128 -> 255      : 0        |                                      |
202     256 -> 511      : 0        |                                      |
203     512 -> 1023     : 328      |**************************************|
204    1024 -> 2047     : 0        |                                      |
205    2048 -> 4095     : 0        |                                      |
206    4096 -> 8191     : 32       |***                                   |
207    8192 -> 16383    : 0        |                                      |
208   16384 -> 32767    : 0        |                                      |
209   32768 -> 65535    : 2        |                                      |
210
211avg = 1510 nsecs, total: 546816 nsecs, count: 326
212
213Detaching...
214
215This looks like it has found threads that are sleeping every 1, 5, and 60
216seconds.
217
218
219An interval can be provided using -i, and timestamps added using -T. For
220example, tracing vfs_read() latency in milliseconds and printing output
221every 5 seconds:
222
223# ./funclatency -mTi 5 vfs_read
224Tracing vfs_read... Hit Ctrl-C to end.
225
22620:10:08
227     msecs           : count     distribution
228       0 -> 1        : 1500     |*************************************+|
229       2 -> 3        : 3        |                                      |
230       4 -> 7        : 1        |                                      |
231       8 -> 15       : 2        |                                      |
232      16 -> 31       : 0        |                                      |
233      32 -> 63       : 0        |                                      |
234      64 -> 127      : 4        |                                      |
235     128 -> 255      : 3        |                                      |
236     256 -> 511      : 1        |                                      |
237     512 -> 1023     : 7        |                                      |
238
239avg = 5 nsecs, total: 8259 nsecs, count: 1521
240
24120:10:13
242     msecs           : count     distribution
243       0 -> 1        : 1251     |*************************************+|
244       2 -> 3        : 3        |                                      |
245       4 -> 7        : 2        |                                      |
246       8 -> 15       : 0        |                                      |
247      16 -> 31       : 2        |                                      |
248      32 -> 63       : 3        |                                      |
249      64 -> 127      : 5        |                                      |
250     128 -> 255      : 5        |                                      |
251     256 -> 511      : 3        |                                      |
252     512 -> 1023     : 6        |                                      |
253    1024 -> 2047     : 2        |                                      |
254
255avg = 9 nsecs, total: 11736 nsecs, count: 1282
256
25720:10:18
258     msecs           : count     distribution
259       0 -> 1        : 1265     |*************************************+|
260       2 -> 3        : 0        |                                      |
261       4 -> 7        : 5        |                                      |
262       8 -> 15       : 9        |                                      |
263      16 -> 31       : 7        |                                      |
264      32 -> 63       : 1        |                                      |
265      64 -> 127      : 2        |                                      |
266     128 -> 255      : 3        |                                      |
267     256 -> 511      : 5        |                                      |
268     512 -> 1023     : 5        |                                      |
269    1024 -> 2047     : 0        |                                      |
270    2048 -> 4095     : 1        |                                      |
271
272avg = 8 nsecs, total: 11219 nsecs, count: 1303
273
274^C
27520:10:20
276     msecs           : count     distribution
277       0 -> 1        : 249      |*************************************+|
278       2 -> 3        : 0        |                                      |
279       4 -> 7        : 0        |                                      |
280       8 -> 15       : 1        |                                      |
281      16 -> 31       : 0        |                                      |
282      32 -> 63       : 0        |                                      |
283      64 -> 127      : 0        |                                      |
284     128 -> 255      : 0        |                                      |
285     256 -> 511      : 0        |                                      |
286     512 -> 1023     : 1        |                                      |
287
288avg = 4 nsecs, total: 1029 nsecs, count: 251
289
290Detaching...
291
292
293A single process can be traced, which filters in-kernel for efficiency. Here,
294the vfs_read() function is timed as milliseconds for PID 17064, which is a
295bash shell:
296
297# ./funclatency -mp 17064 vfs_read
298Tracing vfs_read... Hit Ctrl-C to end.
299^C
300     msecs           : count     distribution
301       0 -> 1        : 1        |**                                    |
302       2 -> 3        : 0        |                                      |
303       4 -> 7        : 0        |                                      |
304       8 -> 15       : 1        |**                                    |
305      16 -> 31       : 2        |*****                                 |
306      32 -> 63       : 0        |                                      |
307      64 -> 127      : 13       |**************************************|
308     128 -> 255      : 10       |*****************************         |
309     256 -> 511      : 4        |***********                           |
310
311avg = 153 nsecs, total: 4765 nsecs, count: 31
312
313Detaching...
314
315The distribution between 64 and 511 milliseconds shows keystroke latency.
316
317
318The -F option can be used to print a histogram per function. Eg:
319
320# ./funclatency -uF 'vfs_r*'
321Tracing 5 functions for "vfs_r*"... Hit Ctrl-C to end.
322^C
323
324Function = vfs_read
325     usecs               : count     distribution
326         0 -> 1          : 1044     |****************************************|
327         2 -> 3          : 383      |**************                          |
328         4 -> 7          : 76       |**                                      |
329         8 -> 15         : 41       |*                                       |
330        16 -> 31         : 26       |                                        |
331        32 -> 63         : 0        |                                        |
332        64 -> 127        : 1        |                                        |
333       128 -> 255        : 0        |                                        |
334       256 -> 511        : 0        |                                        |
335       512 -> 1023       : 0        |                                        |
336      1024 -> 2047       : 0        |                                        |
337      2048 -> 4095       : 4        |                                        |
338      4096 -> 8191       : 2        |                                        |
339      8192 -> 16383      : 0        |                                        |
340     16384 -> 32767      : 0        |                                        |
341     32768 -> 65535      : 2        |                                        |
342     65536 -> 131071     : 5        |                                        |
343    131072 -> 262143     : 5        |                                        |
344    262144 -> 524287     : 3        |                                        |
345    524288 -> 1048575    : 7        |                                        |
346
347Function = vfs_rename
348     usecs               : count     distribution
349         0 -> 1          : 2        |****                                    |
350         2 -> 3          : 2        |****                                    |
351         4 -> 7          : 2        |****                                    |
352         8 -> 15         : 0        |                                        |
353        16 -> 31         : 6        |*************                           |
354        32 -> 63         : 18       |****************************************|
355
356avg = 5087 nsecs, total: 8287001 nsecs, count: 1629
357
358Detaching...
359
360
361
362USAGE message:
363
364# ./funclatency -h
365usage: funclatency [-h] [-p PID] [-i INTERVAL] [-T] [-u] [-m] [-F] [-r] [-v]
366                   pattern
367
368Time functions and print latency as a histogram
369
370positional arguments:
371  pattern               search expression for functions
372
373optional arguments:
374  -h, --help            show this help message and exit
375  -p PID, --pid PID     trace this PID only
376  -i INTERVAL, --interval INTERVAL
377                        summary interval, in seconds
378  -d DURATION, --duration DURATION
379			total duration of trace, in seconds
380  -T, --timestamp       include timestamp on output
381  -u, --microseconds    microsecond histogram
382  -m, --milliseconds    millisecond histogram
383  -F, --function        show a separate histogram per function
384  -r, --regexp          use regular expressions. Default is "*" wildcards
385                        only.
386  -l LEVEL, --level LEVEL
387                        set the level of nested or recursive functions
388  -v, --verbose         print the BPF program (for debugging purposes)
389
390examples:
391    ./funclatency do_sys_open       # time the do_sys_open() kernel function
392    ./funclatency c:read            # time the read() C library function
393    ./funclatency -u vfs_read       # time vfs_read(), in microseconds
394    ./funclatency -m do_nanosleep   # time do_nanosleep(), in milliseconds
395    ./funclatency -i 2 -d 10 c:open # output every 2 seconds, for duration 10s
396    ./funclatency -mTi 5 vfs_read   # output every 5 seconds, with timestamps
397    ./funclatency -p 181 vfs_read   # time process 181 only
398    ./funclatency 'vfs_fstat*'      # time both vfs_fstat() and vfs_fstatat()
399    ./funclatency 'c:*printf'       # time the *printf family of functions
400    ./funclatency -F 'vfs_r*'       # show one histogram per matched function
401