xref: /aosp_15_r20/external/libtracefs/Documentation/libtracefs-sqlhist.txt.1 (revision 287e80b3a36113050663245e7f2c00d274188f18)
1SQLHIST(1)
2==========
3
4NAME
5----
6sqlhist - Tool that uses SQL language to create / show creation of tracefs histograms and synthetic events.
7
8SYNOPSIS
9--------
10*sqlhist* ['OPTIONS'] ['SQL-select-command']
11
12DESCRIPTION
13-----------
14The sqlhist(1) will take an SQL like statement to create tracefs histograms and
15synthetic events that can perform various actions for various handling of the
16data.
17
18The tracefs file system interfaces with the Linux tracing infrastructure that
19has various dynamic and static events through out the kernel. Each of these
20events can have a "histogram" attached to it, where the fields of the event
21will define the buckets of the histogram.
22
23A synthetic event is a way to attach two separate events and use the fields
24and time stamps of those events to create a new dynamic event. This new
25dynamic event is call a synthetic event. The fields of each event can have
26simple calculations done on them where, for example, the delta between
27a field of one event to a field of the other event can be taken. This also
28works for the time stamps of the events where the time delta between the
29two events can also be extracted and placed into the synthetic event.
30
31Other actions can be done from the fields of the events. A snapshot can
32be taken of the kernel ring buffer a variable used in the synthetic
33event creating hits a max, or simply changes.
34
35The commands to create histograms and synthetic events are complex and
36not easy to remember. *sqlhist* is used to convert SQL syntax into the
37commands needed to create the histogram or synthetic event.
38
39The *SQL-select-command* is a SQL string defined by *tracefs_sql*(3).
40
41Note, this must be run as root (or sudo) as interacting with the tracefs
42directory requires root privilege, unless the *-t* option is given with
43a copy of the _tracefs_ directory and its events.
44
45The *sqlhist* is a simple program where its code actual exists in the
46*tracefs_sql*(3) man page.
47
48OPTIONS
49-------
50*-n* 'name'::
51    The name of the synthetic event to create. This event can then be
52    used like any other event, and enabled via *trace-cmd*(1).
53
54*-t* 'tracefs-dir'::
55    In order to test this out as non root user, a copy of the tracefs directory
56    can be used, and passing that directory with this option will allow
57    the program to work. Obviously, *-e* will not work as non-root because
58    it will not be able to execute.
59
60    # mkdir /tmp/tracing
61    # cp -r /sys/kernel/tracing/events /tmp/tracing
62    # exit
63    $ ./sqlhist -t /tmp/tracing ...
64
65*-e*::
66    Not only display the commands to create the histogram, but also execute them.
67    This requires root privilege.
68
69*-f* 'file'::
70    Instead of reading the SQL commands from the command line, read them from
71    _file_. If _file_ is '-' then read from standard input.
72
73*-m* 'var'::
74    Do the given action when the variable _var_ hits a new maximum. This can
75    not be used with *-c*.
76
77*-c* 'var'::
78    Do the given action when the variable _var_ changes its value. This can
79    not be used with *-m*.
80
81*-s*::
82    Perform a snapshot instead of calling the synthetic event.
83
84*-T*::
85    Perform both a snapshot and trace the synthetic event.
86
87*-S* 'fields[,fields]'::
88    Save the given fields. The fields must be fields of the "end" event given
89    in the *SQL-select-command*
90
91*-B* 'instance'::
92    For simple statements that only produce a histogram, the instance given here
93    will be where the histogram will be created. This is ignored for full synthetic
94    event creation, as sythetic events have a global affect on all tracing instances,
95    where as, histograms only affect a single instance.
96
97EXAMPLES
98--------
99
100Create the sqlhist executable:
101
102[source, c]
103--
104   man tracefs_sql | sed -ne '/^EXAMPLE/,/FILES/ { /EXAMPLE/d ; /FILES/d ; p}' > sqlhist.c
105   gcc -o sqlhist sqlhist.c `pkg-config --cflags --libs libtracefs`
106--
107
108As described above, for testing purposes, make a copy of the event directory:
109[source, c]
110--
111   $ mkdir /tmp/tracing
112   $ sudo cp -r /sys/kernel/tracing/events /tmp/tracing/
113   $ sudo chmod -R 0644 /tmp/tracing/
114--
115
116For an example of simple histogram output using the copy of the tracefs directory.
117[source, c]
118--
119   $ ./sqlhist -t /tmp/tracing/ 'SELECT CAST(call_site as SYM-OFFSET), bytes_req, CAST(bytes_alloc AS _COUNTER_) FROM kmalloc'
120--
121
122Produces the output:
123[source, c]
124--
125   echo 'hist:keys=call_site.sym-offset,bytes_req:vals=bytes_alloc' > /sys/kernel/tracing/events/kmem/kmalloc/trigger
126--
127
128Which could be used by root:
129[source, c]
130--
131   # echo 'hist:keys=call_site.sym-offset,bytes_req:vals=bytes_alloc' > /sys/kernel/tracing/events/kmem/kmalloc/trigger
132   # cat /sys/kernel/tracing/events/kmem/kmalloc/hist
133# event histogram
134#
135# trigger info: hist:keys=call_site.sym-offset,bytes_req:vals=hitcount,bytes_alloc:sort=hitcount:size=2048 [active]
136#
137
138{ call_site: [ffffffff813f8d8a] load_elf_phdrs+0x4a/0xb0                               , bytes_req:        728 } hitcount:          1  bytes_alloc:       1024
139{ call_site: [ffffffffc0c69e74] nf_ct_ext_add+0xd4/0x1d0 [nf_conntrack]                , bytes_req:        128 } hitcount:          1  bytes_alloc:        128
140{ call_site: [ffffffff818355e6] dma_resv_get_fences+0xf6/0x440                         , bytes_req:          8 } hitcount:          1  bytes_alloc:          8
141{ call_site: [ffffffffc06dc73f] intel_gt_get_buffer_pool+0x15f/0x290 [i915]            , bytes_req:        424 } hitcount:          1  bytes_alloc:        512
142{ call_site: [ffffffff813f8d8a] load_elf_phdrs+0x4a/0xb0                               , bytes_req:        616 } hitcount:          1  bytes_alloc:       1024
143{ call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180                           , bytes_req:         32 } hitcount:          1  bytes_alloc:         32
144{ call_site: [ffffffffc070749d] shmem_get_pages+0xad/0x5d0 [i915]                      , bytes_req:         16 } hitcount:          1  bytes_alloc:         16
145{ call_site: [ffffffffc07507f5] intel_framebuffer_create+0x25/0x60 [i915]              , bytes_req:        408 } hitcount:          1  bytes_alloc:        512
146{ call_site: [ffffffffc06fc20f] eb_parse+0x34f/0x910 [i915]                            , bytes_req:        408 } hitcount:          1  bytes_alloc:        512
147{ call_site: [ffffffffc0700ebd] i915_gem_object_get_pages_internal+0x5d/0x270 [i915]   , bytes_req:         16 } hitcount:          1  bytes_alloc:         16
148{ call_site: [ffffffffc0771188] intel_frontbuffer_get+0x38/0x220 [i915]                , bytes_req:        400 } hitcount:          1  bytes_alloc:        512
149{ call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180                           , bytes_req:        128 } hitcount:          1  bytes_alloc:        128
150{ call_site: [ffffffff813f8f45] load_elf_binary+0x155/0x1680                           , bytes_req:         28 } hitcount:          1  bytes_alloc:         32
151{ call_site: [ffffffffc07038c8] __assign_mmap_offset+0x208/0x3d0 [i915]                , bytes_req:        288 } hitcount:          1  bytes_alloc:        512
152{ call_site: [ffffffff813737b2] alloc_bprm+0x32/0x2f0                                  , bytes_req:        416 } hitcount:          1  bytes_alloc:        512
153{ call_site: [ffffffff813f9027] load_elf_binary+0x237/0x1680                           , bytes_req:         64 } hitcount:          1  bytes_alloc:         64
154{ call_site: [ffffffff8161a44c] __sg_alloc_table+0x11c/0x180                           , bytes_req:         64 } hitcount:          1  bytes_alloc:         64
155{ call_site: [ffffffffc040ffe7] drm_vma_node_allow+0x27/0xe0 [drm]                     , bytes_req:         40 } hitcount:          2  bytes_alloc:        128
156{ call_site: [ffffffff813cda98] __do_sys_timerfd_create+0x58/0x1c0                     , bytes_req:        336 } hitcount:          2  bytes_alloc:       1024
157{ call_site: [ffffffff818355e6] dma_resv_get_fences+0xf6/0x440                         , bytes_req:         40 } hitcount:          2  bytes_alloc:        128
158{ call_site: [ffffffff8139b75a] single_open+0x2a/0xa0                                  , bytes_req:         32 } hitcount:          2  bytes_alloc:         64
159{ call_site: [ffffffff815df715] bio_kmalloc+0x25/0x80                                  , bytes_req:        136 } hitcount:          2  bytes_alloc:        384
160{ call_site: [ffffffffc071e5cd] i915_vma_work+0x1d/0x50 [i915]                         , bytes_req:        416 } hitcount:          3  bytes_alloc:       1536
161{ call_site: [ffffffff81390d0d] alloc_fdtable+0x4d/0x100                               , bytes_req:         56 } hitcount:          3  bytes_alloc:        192
162{ call_site: [ffffffffc06ff65f] i915_gem_do_execbuffer+0x158f/0x2440 [i915]            , bytes_req:         16 } hitcount:          4  bytes_alloc:         64
163{ call_site: [ffffffff8137713c] alloc_pipe_info+0x5c/0x230                             , bytes_req:        384 } hitcount:          5  bytes_alloc:       2560
164{ call_site: [ffffffff813771b4] alloc_pipe_info+0xd4/0x230                             , bytes_req:        640 } hitcount:          5  bytes_alloc:       5120
165{ call_site: [ffffffff81834cdb] dma_resv_list_alloc+0x1b/0x40                          , bytes_req:         40 } hitcount:          6  bytes_alloc:        384
166{ call_site: [ffffffff81834cdb] dma_resv_list_alloc+0x1b/0x40                          , bytes_req:         56 } hitcount:          9  bytes_alloc:        576
167{ call_site: [ffffffff8120086e] tracing_map_sort_entries+0x9e/0x3e0                    , bytes_req:         24 } hitcount:         60  bytes_alloc:       1920
168
169Totals:
170    Hits: 122
171    Entries: 30
172    Dropped: 0
173--
174
175Note, although the examples use uppercase for the SQL keywords, they do not have
176to be. 'SELECT' could also be 'select' or even 'sElEcT'.
177
178By using the full SQL language, synthetic events can be made and processed.
179For example, using *sqlhist* along with *trace-cmd*(1), wake up latency can
180be recorded by creating a synthetic event by attaching the _sched_waking_
181and the _sched_switch_ events.
182
183[source, c]
184--
185  # sqlhist -n wakeup_lat -e -T -m lat 'SELECT end.next_comm AS comm, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS lat FROM ' \
186    'sched_waking AS start JOIN sched_switch AS end ON start.pid = end.next_pid WHERE end.next_prio < 100 && end.next_comm == "cyclictest"'
187  # trace-cmd start -e all -e wakeup_lat -R stacktrace
188  # cyclictest -l 1000 -p80 -i250  -a -t -q -m -d 0 -b 1000 --tracemark
189  # trace-cmd show -s | tail -30
190          <idle>-0       [002] dNh4 23454.902246: sched_wakeup: comm=cyclictest pid=12272 prio=120 target_cpu=002
191          <idle>-0       [005] ...1 23454.902246: cpu_idle: state=4294967295 cpu_id=5
192          <idle>-0       [007] d..1 23454.902246: cpu_idle: state=0 cpu_id=7
193          <idle>-0       [002] dNh1 23454.902247: hrtimer_expire_exit: hrtimer=0000000037956dc2
194          <idle>-0       [005] d..1 23454.902248: cpu_idle: state=0 cpu_id=5
195          <idle>-0       [002] dNh1 23454.902248: write_msr: 6e0, value 4866ce957272
196          <idle>-0       [006] ...1 23454.902248: cpu_idle: state=4294967295 cpu_id=6
197          <idle>-0       [002] dNh1 23454.902249: local_timer_exit: vector=236
198          <idle>-0       [006] d..1 23454.902250: cpu_idle: state=0 cpu_id=6
199          <idle>-0       [002] .N.1 23454.902250: cpu_idle: state=4294967295 cpu_id=2
200          <idle>-0       [002] dN.1 23454.902251: rcu_utilization: Start context switch
201          <idle>-0       [002] dN.1 23454.902252: rcu_utilization: End context switch
202          <idle>-0       [001] ...1 23454.902252: cpu_idle: state=4294967295 cpu_id=1
203          <idle>-0       [002] dN.3 23454.902253: prandom_u32: ret=3692516021
204          <idle>-0       [001] d..1 23454.902254: cpu_idle: state=0 cpu_id=1
205          <idle>-0       [002] d..2 23454.902254: sched_switch: prev_comm=swapper/2 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=cyclictest next_pid=12275 next_prio=19
206          <idle>-0       [002] d..4 23454.902256: wakeup_lat: next_comm=cyclictest lat=17
207          <idle>-0       [002] d..5 23454.902258: <stack trace>
208 => trace_event_raw_event_synth
209 => action_trace
210 => event_hist_trigger
211 => event_triggers_call
212 => trace_event_buffer_commit
213 => trace_event_raw_event_sched_switch
214 => __traceiter_sched_switch
215 => __schedule
216 => schedule_idle
217 => do_idle
218 => cpu_startup_entry
219 => secondary_startup_64_no_verify
220--
221
222Here's the options for *sqlhist* explained:
223
224 *-n wakeup_lat* ::
225     Name the synthetic event to use *wakeup_lat*.
226
227 *-e*::
228     Execute the commands that are printed.
229
230 *-T*::
231     Perform both a trace action and then a snapshot action (swap the buffer into the static 'snapshot' buffer).
232
233 *-m lat*::
234     Trigger the actions whenever 'lat' hits a new maximum value.
235
236Now a breakdown of the SQL statement:
237[source, c]
238--
239 'SELECT end.next_comm AS comm, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS lat FROM ' \
240    'sched_waking AS start JOIN sched_switch AS end ON start.pid = end.next_pid WHERE end.next_prio < 100 && end.next_comm == "cyclictest"'
241--
242 *end.next_comm AS comm*::
243   Save the 'sched_switch' field *next_comm* and place it into the *comm* field of the 'wakeup_lat' synthetic event.
244
245 *(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) AS lat*::
246   Take the delta of the time stamps from the 'sched_switch' event and the 'sched_waking' event.
247   As time stamps are usually recorded in nanoseconds, *TIMESTAMP* would give the full nanosecond time stamp,
248   but here, the *TIMESTAMP_USECS* will truncate it into microseconds. The value is saved in the
249   variable *lat*, which will also be recorded in the synthetic event.
250
251 *FROM 'sched_waking' AS start JOIN sched_switch AS end ON start.pid = end.next_pid*::
252   Create the synthetic event by joining _sched_waking_ to _sched_switch_, matching
253   the _sched_waking_ 'pid' field with the _sched_switch_ 'next_pid' field.
254   Also make *start* an alias for _sched_waking_ and *end* an alias for _sched_switch_
255   which then an use *start* and *end* as a subsitute for _sched_waking_ and _sched_switch_
256   respectively through out the rest of the SQL statement.
257
258 *WHERE end.next_prio < 100 && end.next_comm == "cyclictest"*::
259   Filter the logic where it executes only if _sched_waking_ 'next_prio' field
260   is less than 100. (Note, in the Kernel, priorities are inverse, and the real-time
261   priorities are represented from 0-100 where 0 is the highest priority).
262   Also only trace when the 'next_comm' (the task scheduling in) of the _sched_switch_
263   event has the name "cyclictest".
264
265For the *trace-cmd*(3) command:
266[source, c]
267--
268   trace-cmd start -e all -e wakeup_lat -R stacktrace
269--
270
271 *trace-cmd start*::
272   Enables tracing (does not record to a file).
273
274 *-e all*::
275   Enable all events
276
277 *-e wakeup_lat -R stacktrace*::
278   have the "wakeup_lat" event (our synthetic event) enable the *stacktrace* trigger, were
279   for every instance of the "wakeup_lat" event, a kernel stack trace will be recorded
280   in the ring buffer.
281
282After calling *cyclictest* (a real-time tool to measure wakeup latency), read the snapshot
283buffer.
284
285 *trace-cmd show -s*::
286   *trace-cmd show* reads the kernel ring buffer, and the *-s* option will read the *snapshot*
287   buffer instead of the normal one.
288
289[source, c]
290--
291 <idle>-0       [002] d..4 23454.902256: wakeup_lat: next_comm=cyclictest lat=17
292--
293  We see on the "wakeup_lat" event happened on CPU 2, with a wake up latency 17 microseconds.
294
295This can be extracted into a *trace.dat* file that *trace-cmd*(3) can read and do further
296analysis, as well as *kernelshark*.
297
298[source, c]
299--
300    # trace-cmd extract -s
301    # trace-cmd report --cpu 2 | tail -30
302          <idle>-0     [002] 23454.902238: prandom_u32:          ret=1633425088
303          <idle>-0     [002] 23454.902239: sched_wakeup:         cyclictest:12275 [19] CPU:002
304          <idle>-0     [002] 23454.902241: hrtimer_expire_exit:  hrtimer=0xffffbbd68286fe60
305          <idle>-0     [002] 23454.902241: hrtimer_cancel:       hrtimer=0xffffbbd6826efe70
306          <idle>-0     [002] 23454.902242: hrtimer_expire_entry: hrtimer=0xffffbbd6826efe70 now=23455294430750 function=hrtimer_wakeup/0x0
307          <idle>-0     [002] 23454.902243: sched_waking:         comm=cyclictest pid=12272 prio=120 target_cpu=002
308          <idle>-0     [002] 23454.902244: prandom_u32:          ret=1102749734
309          <idle>-0     [002] 23454.902246: sched_wakeup:         cyclictest:12272 [120] CPU:002
310          <idle>-0     [002] 23454.902247: hrtimer_expire_exit:  hrtimer=0xffffbbd6826efe70
311          <idle>-0     [002] 23454.902248: write_msr:            6e0, value 4866ce957272
312          <idle>-0     [002] 23454.902249: local_timer_exit:     vector=236
313          <idle>-0     [002] 23454.902250: cpu_idle:             state=4294967295 cpu_id=2
314          <idle>-0     [002] 23454.902251: rcu_utilization:      Start context switch
315          <idle>-0     [002] 23454.902252: rcu_utilization:      End context switch
316          <idle>-0     [002] 23454.902253: prandom_u32:          ret=3692516021
317          <idle>-0     [002] 23454.902254: sched_switch:         swapper/2:0 [120] R ==> cyclictest:12275 [19]
318          <idle>-0     [002] 23454.902256: wakeup_lat:           next_comm=cyclictest lat=17
319          <idle>-0     [002] 23454.902258: kernel_stack:         <stack trace >
320=> trace_event_raw_event_synth (ffffffff8121a0db)
321=> action_trace (ffffffff8121e9fb)
322=> event_hist_trigger (ffffffff8121ca8d)
323=> event_triggers_call (ffffffff81216c72)
324=> trace_event_buffer_commit (ffffffff811f7618)
325=> trace_event_raw_event_sched_switch (ffffffff8110fda4)
326=> __traceiter_sched_switch (ffffffff8110d449)
327=> __schedule (ffffffff81c02002)
328=> schedule_idle (ffffffff81c02c86)
329=> do_idle (ffffffff8111e898)
330=> cpu_startup_entry (ffffffff8111eba9)
331=> secondary_startup_64_no_verify (ffffffff81000107)
332--
333
334BUGS
335----
336
337As *sqlhist* is just example code from a man page, it is guaranteed to contain
338lots of bugs. For one thing, not all error paths are covered properly.
339
340SEE ALSO
341--------
342trace-cmd(1), tracefs_sql(3)
343
344AUTHOR
345------
346Written by Steven Rostedt, <[email protected]>
347
348RESOURCES
349---------
350https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/
351
352COPYING
353-------
354Copyright \(C) 2021 , Inc. Free use of this software is granted under
355the terms of the GNU Public License (GPL).
356
357