xref: /aosp_15_r20/external/libtracefs/Documentation/libtracefs-sql.txt (revision 287e80b3a36113050663245e7f2c00d274188f18)
1libtracefs(3)
2=============
3
4NAME
5----
6tracefs_sql - Create a synthetic event via an SQL statement
7
8SYNOPSIS
9--------
10[verse]
11--
12*#include <tracefs.h>*
13
14struct tracefs_synth pass:[*]*tracefs_sql*(struct tep_handle pass:[*]_tep_, const char pass:[*]_name_,
15					const char pass:[*]_sql_buffer_, char pass:[**]_err_);
16--
17
18DESCRIPTION
19-----------
20Synthetic events are dynamically created events that attach two existing events
21together via one or more matching fields between the two events. It can be used
22to find the latency between the events, or to simply pass fields of the first event
23on to the second event to display as one event.
24
25The Linux kernel interface to create synthetic events is complex, and there needs
26to be a better way to create synthetic events that is easy and can be understood
27via existing technology.
28
29If you think of each event as a table, where the fields are the column of the table
30and each instance of the event as a row, you can understand how SQL can be used
31to attach two events together and form another event (table). Utilizing the
32SQL *SELECT* *FROM* *JOIN* *ON* [ *WHERE* ] syntax, a synthetic event can easily
33be created from two different events.
34
35For simple SQL queries to make a histogram instead of a synthetic event, see
36HISTOGRAMS below.
37
38*tracefs_sql*() takes in a _tep_ handler (See _tep_local_events_(3)) that is used to
39verify the events within the _sql_buffer_ expression. The _name_ is the name of the
40synthetic event to create. If _err_ points to an address of a string, it will be filled
41with a detailed message on any type of parsing error, including fields that do not belong
42to an event, or if the events or fields are not properly compared.
43
44The example program below is a fully functional parser where it will create a synthetic
45event from a SQL syntax passed in via the command line or a file.
46
47The SQL format is as follows:
48
49*SELECT* <fields> *FROM* <start-event> *JOIN* <end-event> *ON* <matching-fields> *WHERE* <filter>
50
51Note, although the examples show the SQL commands in uppercase, they are not required to
52be so. That is, you can use "SELECT" or "select" or "sElEct".
53
54For example:
55[source,c]
56--
57SELECT syscalls.sys_enter_read.fd, syscalls.sys_exit_read.ret FROM syscalls.sys_enter_read
58   JOIN syscalls.sys_exit_read
59   ON syscalls.sys_enter_read.common_pid = syscalls.sys_exit_write.common_pid
60--
61
62Will create a synthetic event that with the fields:
63
64  u64 fd; s64 ret;
65
66Because the function takes a _tep_ handle, and usually all event names are unique, you can
67leave off the system (group) name of the event, and *tracefs_sql*() will discover the
68system for you.
69
70That is, the above statement would work with:
71
72[source,c]
73--
74SELECT sys_enter_read.fd, sys_exit_read.ret FROM sys_enter_read JOIN sys_exit_read
75   ON sys_enter_read.common_pid = sys_exit_write.common_pid
76--
77
78The *AS* keyword can be used to name the fields as well as to give an alias to the
79events, such that the above can be simplified even more as:
80
81[source,c]
82--
83SELECT start.fd, end.ret FROM sys_enter_read AS start JOIN sys_exit_read AS end ON start.common_pid = end.common_pid
84--
85
86The above aliases _sys_enter_read_ as *start* and _sys_exit_read_ as *end* and uses
87those aliases to reference the event throughout the statement.
88
89Using the *AS* keyword in the selection portion of the SQL statement will define what
90those fields will be called in the synthetic event.
91
92[source,c]
93--
94SELECT start.fd AS filed, end.ret AS return FROM sys_enter_read AS start JOIN sys_exit_read AS end
95   ON start.common_pid = end.common_pid
96--
97
98The above labels the _fd_ of _start_ as *filed* and the _ret_ of _end_ as *return* where
99the synthetic event that is created will now have the fields:
100
101  u64 filed; s64 return;
102
103The fields can also be calculated with results passed to the synthetic event:
104
105[source,c]
106--
107select start.truesize, end.len, (start.truesize - end.len) as diff from napi_gro_receive_entry as start
108   JOIN netif_receive_skb as end ON start.skbaddr = end.skbaddr
109--
110
111Which would show the *truesize* of the _napi_gro_receive_entry_ event, the actual
112_len_ of the content, shown by the _netif_receive_skb_, and the delta between
113the two and expressed by the field *diff*.
114
115The code also supports recording the timestamps at either event, and performing calculations
116on them. For wakeup latency, you have:
117
118[source,c]
119--
120select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
121   JOIN sched_switch as end ON start.pid = end.next_pid
122--
123
124The above will create a synthetic event that records the _pid_ of the task being woken up,
125and the time difference between the _sched_waking_ event and the _sched_switch_ event.
126The *TIMESTAMP_USECS* will truncate the time down to microseconds as the timestamp usually
127recorded in the tracing buffer has nanosecond resolution. If you do not want that
128truncation, use *TIMESTAMP* instead of *TIMESTAMP_USECS*.
129
130Finally, the *WHERE* clause can be added, that will let you add filters on either or both events.
131
132[source,c]
133--
134select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
135   JOIN sched_switch as end ON start.pid = end.next_pid
136   WHERE start.prio < 100 && (!(end.prev_pid < 1 || end.prev_prio > 100) || end.prev_pid == 0)
137--
138
139*NOTE*
140
141Although both events can be used together in the *WHERE* clause, they must not be mixed outside
142the top most "&&" statements. You can not OR (||) the events together, where a filter of one
143event is OR'd to a filter of the other event. This does not make sense, as the synthetic event
144requires both events to take place to be recorded. If one is filtered out, then the synthetic
145event does not execute.
146
147[source,c]
148--
149select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
150   JOIN sched_switch as end ON start.pid = end.next_pid
151   WHERE start.prio < 100 && end.prev_prio < 100
152--
153
154The above is valid.
155
156Where as the below is not.
157
158[source,c]
159--
160select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
161   JOIN sched_switch as end ON start.pid = end.next_pid
162   WHERE start.prio < 100 || end.prev_prio < 100
163--
164
165
166KEYWORDS AS EVENT FIELDS
167------------------------
168
169In some cases, an event may have a keyword. For example, regcache_drop_region has "from"
170as a field and the following will not work
171
172[source,c]
173--
174  select from from regcache_drop_region
175--
176
177In such cases, add a backslash to the conflicting field, and this will tell the parser
178that the "from" is a field and not a keyword:
179
180[source,c]
181--
182  select \from from regcache_drop_region
183--
184
185HISTOGRAMS
186----------
187
188Simple SQL statements without the *JOIN* *ON* may also be used, which will create a histogram
189instead. When doing this, the struct tracefs_hist descriptor can be retrieved from the
190returned synthetic event descriptor via the *tracefs_synth_get_start_hist*(3).
191
192In order to utilize the histogram types (see xxx) the CAST command of SQL can be used.
193
194That is:
195
196[source,c]
197--
198  select CAST(common_pid AS comm), CAST(id AS syscall) FROM sys_enter
199--
200
201Which produces:
202
203[source,c]
204--
205 # echo 'hist:keys=common_pid.execname,id.syscall' > events/raw_syscalls/sys_enter/trigger
206
207 # cat events/raw_syscalls/sys_enter/hist
208
209{ common_pid: bash            [     18248], id: sys_setpgid                   [109] } hitcount:          1
210{ common_pid: sendmail        [      1812], id: sys_read                      [  0] } hitcount:          1
211{ common_pid: bash            [     18247], id: sys_getpid                    [ 39] } hitcount:          1
212{ common_pid: bash            [     18247], id: sys_dup2                      [ 33] } hitcount:          1
213{ common_pid: gmain           [     13684], id: sys_inotify_add_watch         [254] } hitcount:          1
214{ common_pid: cat             [     18247], id: sys_access                    [ 21] } hitcount:          1
215{ common_pid: bash            [     18248], id: sys_getpid                    [ 39] } hitcount:          1
216{ common_pid: cat             [     18247], id: sys_fadvise64                 [221] } hitcount:          1
217{ common_pid: sendmail        [      1812], id: sys_openat                    [257] } hitcount:          1
218{ common_pid: less            [     18248], id: sys_munmap                    [ 11] } hitcount:          1
219{ common_pid: sendmail        [      1812], id: sys_close                     [  3] } hitcount:          1
220{ common_pid: gmain           [      1534], id: sys_poll                      [  7] } hitcount:          1
221{ common_pid: bash            [     18247], id: sys_execve                    [ 59] } hitcount:          1
222--
223
224Note, string fields may not be cast.
225
226The possible types to cast to are:
227
228*HEX* - convert the value to use hex and not decimal
229
230*SYM* - convert a pointer to symbolic (kallsyms values)
231
232*SYM-OFFSET* - convert a pointer to symbolic and include the offset.
233
234*SYSCALL* - convert the number to the mapped system call name
235
236*EXECNAME* or *COMM* - can only be used with the common_pid field. Will show the task
237name of the process.
238
239*LOG* or *LOG2* - bucket the key values in a log 2 values (1, 2, 3-4, 5-8, 9-16, 17-32, ...)
240
241The above fields are not case sensitive, and "LOG2" works as good as "log".
242
243A special CAST to _COUNTER_ or __COUNTER__ will make the field a value and not
244a key. For example:
245
246[source,c]
247--
248  SELECT common_pid, CAST(bytes_req AS _COUNTER_) FROM kmalloc
249--
250
251Which will create
252
253[source,c]
254--
255  echo 'hist:keys=common_pid:vals=bytes_req' > events/kmem/kmalloc/trigger
256
257  cat events/kmem/kmalloc/hist
258
259{ common_pid:       1812 } hitcount:          1  bytes_req:         32
260{ common_pid:       9111 } hitcount:          2  bytes_req:        272
261{ common_pid:       1768 } hitcount:          3  bytes_req:       1112
262{ common_pid:          0 } hitcount:          4  bytes_req:        512
263{ common_pid:      18297 } hitcount:         11  bytes_req:       2004
264--
265
266RETURN VALUE
267------------
268Returns 0 on success and -1 on failure. On failure, if _err_ is defined, it will be
269allocated to hold a detailed description of what went wrong if it the error was caused
270by a parsing error, or that an event, field does not exist or is not compatible with
271what it was combined with.
272
273CREATE A TOOL
274-------------
275
276The below example is a functional program that can be used to parse SQL commands into
277synthetic events.
278
279[source, c]
280--
281   man tracefs_sql | sed -ne '/^EXAMPLE/,/FILES/ { /EXAMPLE/d ; /FILES/d ; p}' > sqlhist.c
282   gcc -o sqlhist sqlhist.c `pkg-config --cflags --libs libtracefs`
283--
284
285Then you can run the above examples:
286
287[source, c]
288--
289  sudo ./sqlhist 'select start.pid, (end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as lat from sched_waking as start
290                  JOIN sched_switch as end ON start.pid = end.next_pid
291                  WHERE start.prio < 100 || end.prev_prio < 100'
292--
293EXAMPLE
294-------
295[source,c]
296--
297#include <stdio.h>
298#include <stdlib.h>
299#include <stdarg.h>
300#include <string.h>
301#include <errno.h>
302#include <unistd.h>
303#include <tracefs.h>
304
305static void usage(char **argv)
306{
307	fprintf(stderr, "usage: %s [-ed][-n name][-s][-S fields][-m var][-c var][-T][-t dir][-f file | sql-command-line]\n"
308		"  -n name - name of synthetic event 'Anonymous' if left off\n"
309		"  -t dir - use dir instead of /sys/kernel/tracing\n"
310		"  -e - execute the commands to create the synthetic event\n"
311		"  -m - trigger the action when var is a new max.\n"
312		"  -c - trigger the action when var changes.\n"
313		"  -s - used with -m or -c to do a snapshot of the tracing buffer\n"
314		"  -S - used with -m or -c to save fields of the end event (comma deliminated)\n"
315		"  -T - used with -m or -c to do both a snapshot and a trace\n"
316		"  -f file - read sql lines from file otherwise from the command line\n"
317		"            if file is '-' then read from standard input.\n",
318		argv[0]);
319	exit(-1);
320}
321
322enum action {
323	ACTION_DEFAULT		= 0,
324	ACTION_SNAPSHOT		= (1 << 0),
325	ACTION_TRACE		= (1 << 1),
326	ACTION_SAVE		= (1 << 2),
327	ACTION_MAX		= (1 << 3),
328	ACTION_CHANGE		= (1 << 4),
329};
330
331#define ACTIONS ((ACTION_MAX - 1))
332
333static int do_sql(const char *instance_name,
334		  const char *buffer, const char *name, const char *var,
335		  const char *trace_dir, bool execute, int action,
336		  char **save_fields)
337{
338	struct tracefs_synth *synth;
339	struct tep_handle *tep;
340	struct trace_seq seq;
341	enum tracefs_synth_handler handler;
342	char *err;
343	int ret;
344
345	if ((action & ACTIONS) && !var) {
346		fprintf(stderr, "Error: -s, -S and -T not supported without -m or -c");
347		exit(-1);
348	}
349
350	if (!name)
351		name = "Anonymous";
352
353	trace_seq_init(&seq);
354	tep = tracefs_local_events(trace_dir);
355	if (!tep) {
356		if (!trace_dir)
357			trace_dir = "tracefs directory";
358		perror(trace_dir);
359		exit(-1);
360	}
361
362	synth = tracefs_sql(tep, name, buffer, &err);
363	if (!synth) {
364		perror("Failed creating synthetic event!");
365		if (err)
366			fprintf(stderr, "%s", err);
367		free(err);
368		exit(-1);
369	}
370
371	if (tracefs_synth_complete(synth)) {
372		if (var) {
373			if (action & ACTION_MAX)
374				handler = TRACEFS_SYNTH_HANDLE_MAX;
375			else
376				handler = TRACEFS_SYNTH_HANDLE_CHANGE;
377
378			if (action & ACTION_SAVE) {
379				ret = tracefs_synth_save(synth, handler, var, save_fields);
380				if (ret < 0) {
381					err = "adding save";
382					goto failed_action;
383				}
384			}
385			if (action & ACTION_TRACE) {
386				/*
387				 * By doing the trace before snapshot, it will be included
388				 * in the snapshot.
389				 */
390				ret = tracefs_synth_trace(synth, handler, var);
391				if (ret < 0) {
392					err = "adding trace";
393					goto failed_action;
394				}
395			}
396			if (action & ACTION_SNAPSHOT) {
397				ret = tracefs_synth_snapshot(synth, handler, var);
398				if (ret < 0) {
399					err = "adding snapshot";
400 failed_action:
401					perror(err);
402					if (errno == ENODEV)
403						fprintf(stderr, "ERROR: '%s' is not a variable\n",
404							var);
405					exit(-1);
406				}
407			}
408		}
409		tracefs_synth_echo_cmd(&seq, synth);
410		if (execute) {
411			ret = tracefs_synth_create(synth);
412			if (ret < 0) {
413				fprintf(stderr, "%s\n", tracefs_error_last(NULL));
414				exit(-1);
415			}
416		}
417	} else {
418		struct tracefs_instance *instance = NULL;
419		struct tracefs_hist *hist;
420
421		hist = tracefs_synth_get_start_hist(synth);
422		if (!hist) {
423			perror("get_start_hist");
424			exit(-1);
425		}
426		if (instance_name) {
427			if (execute)
428				instance = tracefs_instance_create(instance_name);
429			else
430				instance = tracefs_instance_alloc(trace_dir,
431								  instance_name);
432			if (!instance) {
433				perror("Failed to create instance");
434				exit(-1);
435			}
436		}
437		tracefs_hist_echo_cmd(&seq, instance, hist, 0);
438		if (execute) {
439			ret = tracefs_hist_start(instance, hist);
440			if (ret < 0) {
441				fprintf(stderr, "%s\n", tracefs_error_last(instance));
442				exit(-1);
443			}
444		}
445	}
446
447	tracefs_synth_free(synth);
448
449	trace_seq_do_printf(&seq);
450	trace_seq_destroy(&seq);
451	return 0;
452}
453
454int main (int argc, char **argv)
455{
456	char *trace_dir = NULL;
457	char *buffer = NULL;
458	char buf[BUFSIZ];
459	int buffer_size = 0;
460	const char *file = NULL;
461	const char *instance = NULL;
462	bool execute = false;
463	char **save_fields = NULL;
464	const char *name;
465	const char *var;
466	int action = 0;
467	char *tok;
468	FILE *fp;
469	size_t r;
470	int c;
471	int i;
472
473	for (;;) {
474		c = getopt(argc, argv, "ht:f:en:m:c:sS:TB:");
475		if (c == -1)
476			break;
477
478		switch(c) {
479		case 'h':
480			usage(argv);
481		case 't':
482			trace_dir = optarg;
483			break;
484		case 'f':
485			file = optarg;
486			break;
487		case 'e':
488			execute = true;
489			break;
490		case 'm':
491			action |= ACTION_MAX;
492			var = optarg;
493			break;
494		case 'c':
495			action |= ACTION_CHANGE;
496			var = optarg;
497			break;
498		case 's':
499			action |= ACTION_SNAPSHOT;
500			break;
501		case 'S':
502			action |= ACTION_SAVE;
503			tok = strtok(optarg, ",");
504			while (tok) {
505				save_fields = tracefs_list_add(save_fields, tok);
506				tok = strtok(NULL, ",");
507			}
508			if (!save_fields) {
509				perror(optarg);
510				exit(-1);
511			}
512			break;
513		case 'T':
514			action |= ACTION_TRACE | ACTION_SNAPSHOT;
515			break;
516		case 'B':
517			instance = optarg;
518			break;
519		case 'n':
520			name = optarg;
521			break;
522		}
523	}
524
525	if ((action & (ACTION_MAX|ACTION_CHANGE)) == (ACTION_MAX|ACTION_CHANGE)) {
526		fprintf(stderr, "Can not use both -m and -c together\n");
527		exit(-1);
528	}
529	if (file) {
530		if (!strcmp(file, "-"))
531			fp = stdin;
532		else
533			fp = fopen(file, "r");
534		if (!fp) {
535			perror(file);
536			exit(-1);
537		}
538		while ((r = fread(buf, 1, BUFSIZ, fp)) > 0) {
539			buffer = realloc(buffer, buffer_size + r + 1);
540			strncpy(buffer + buffer_size, buf, r);
541			buffer_size += r;
542		}
543		fclose(fp);
544		if (buffer_size)
545			buffer[buffer_size] = '\0';
546	} else if (argc == optind) {
547		usage(argv);
548	} else {
549		for (i = optind; i < argc; i++) {
550			r = strlen(argv[i]);
551			buffer = realloc(buffer, buffer_size + r + 2);
552			if (i != optind)
553				buffer[buffer_size++] = ' ';
554			strcpy(buffer + buffer_size, argv[i]);
555			buffer_size += r;
556		}
557	}
558
559	do_sql(instance, buffer, name, var, trace_dir, execute, action, save_fields);
560	free(buffer);
561
562	return 0;
563}
564--
565
566FILES
567-----
568[verse]
569--
570*tracefs.h*
571	Header file to include in order to have access to the library APIs.
572*-ltracefs*
573	Linker switch to add when building a program that uses the library.
574--
575
576SEE ALSO
577--------
578*sqlhist*(1),
579*libtracefs*(3),
580*libtraceevent*(3),
581*trace-cmd*(1),
582*tracefs_synth_init*(3),
583*tracefs_synth_add_match_field*(3),
584*tracefs_synth_add_compare_field*(3),
585*tracefs_synth_add_start_field*(3),
586*tracefs_synth_add_end_field*(3),
587*tracefs_synth_append_start_filter*(3),
588*tracefs_synth_append_end_filter*(3),
589*tracefs_synth_create*(3),
590*tracefs_synth_destroy*(3),
591*tracefs_synth_free*(3),
592*tracefs_synth_echo_cmd*(3),
593*tracefs_hist_alloc*(3),
594*tracefs_hist_alloc_2d*(3),
595*tracefs_hist_alloc_nd*(3),
596*tracefs_hist_free*(3),
597*tracefs_hist_add_key*(3),
598*tracefs_hist_add_value*(3),
599*tracefs_hist_add_name*(3),
600*tracefs_hist_start*(3),
601*tracefs_hist_destory*(3),
602*tracefs_hist_add_sort_key*(3),
603*tracefs_hist_sort_key_direction*(3)
604
605AUTHOR
606------
607[verse]
608--
609*Steven Rostedt* <[email protected]>
610*Tzvetomir Stoyanov* <[email protected]>
611*sameeruddin shaik* <[email protected]>
612--
613REPORTING BUGS
614--------------
615Report bugs to  <[email protected]>
616
617LICENSE
618-------
619libtracefs is Free Software licensed under the GNU LGPL 2.1
620
621RESOURCES
622---------
623https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/
624
625COPYING
626-------
627Copyright \(C) 2020 VMware, Inc. Free use of this software is granted under
628the terms of the GNU Public License (GPL).
629