Documentation/trace-cmd/trace-cmd-record.1.txt

TRACE-CMD-RECORD(1)
===================

NAME
----
trace-cmd-record - record a trace from the Ftrace Linux internal tracer

SYNOPSIS
--------
*trace-cmd record* ['OPTIONS'] ['command']

DESCRIPTION
-----------
The trace-cmd(1) record command will set up the Ftrace Linux kernel tracer to
record the specified plugins or events that happen while the 'command'
executes. If no command is given, then it will record until the user hits
Ctrl-C.

The record command of trace-cmd will set up the Ftrace tracer to start tracing
the various events or plugins that are given on the command line. It will then
create a number of tracing processes (one per CPU) that will start recording
from the kernel ring buffer straight into temporary files. When the command is
complete (or Ctrl-C is hit) all the files will be combined into a trace.dat
file that can later be read (see trace-cmd-report(1)).

OPTIONS
-------
*-p* 'tracer'::
    Specify a tracer. Tracers usually do more than just trace an event.
    Common tracers are: *function*, *function_graph*, *preemptirqsoff*,
    *irqsoff*, *preemptoff* and *wakeup*. A tracer must be supported by the
    running kernel. To see a list of available tracers, see trace-cmd-list(1).

*-e* 'event'::
    Specify an event to trace. Various static trace points have been added to
    the Linux kernel. They are grouped by subsystem where you can enable all
    events of a given subsystem or specify specific events to be enabled. The
    'event' is of the format "subsystem:event-name". You can also just specify
    the subsystem without the ':event-name' or the event-name without the
    "subsystem:". Using "-e sched_switch" will enable the "sched_switch" event
    where as, "-e sched" will enable all events under the "sched" subsystem.

    The 'event' can also contain glob expressions. That is, "*stat*" will
    select all events (or subsystems) that have the characters "stat" in their
    names.

    The keyword 'all' can be used to enable all events.

*-a*::
    Every event that is being recorded has its output format file saved
    in the output file to be able to display it later. But if other
    events are enabled in the trace without trace-cmd's knowledge, the
    formats of those events will not be recorded and trace-cmd report will
    not be able to display them. If this is the case, then specify the
    *-a* option and the format for all events in the system will be saved.

*-T*::
    Enable a stacktrace on each event. For example:

          <idle>-0     [003] 58549.289091: sched_switch:         kworker/0:1:0 [120] R ==> trace-cmd:2603 [120]
          <idle>-0     [003] 58549.289092: kernel_stack:         <stack trace>
=> schedule (ffffffff814b260e)
=> cpu_idle (ffffffff8100a38c)
=> start_secondary (ffffffff814ab828)

*--func-stack*::
    Enable a stack trace on all functions. Note this is only applicable
    for the "function" plugin tracer, and will only take effect if the
    -l option is used and succeeds in limiting functions. If the function
    tracer is not filtered, and the stack trace is enabled, you can live
    lock the machine.

*-f* 'filter'::
    Specify a filter for the previous event. This must come after a *-e*. This
    will filter what events get recorded based on the content of the event.
    Filtering is passed to the kernel directly so what filtering is allowed
    may depend on what version of the kernel you have. Basically, it will
    let you use C notation to check if an event should be processed or not.

----------------------------------------
    ==, >=, <=, >, <, &, |, && and ||
----------------------------------------

    The above are usually safe to use to compare fields.

*--no-filter*::
    Do not filter out the trace-cmd threads. By default, the threads are
    filtered out to not be traced by events. This option will have the trace-cmd
    threads also be traced.

*-R* 'trigger'::
    Specify a trigger for the previous event. This must come after a *-e*.
    This will add a given trigger to the given event. To only enable the trigger
    and not the event itself, then place the event after the *-v* option.

    See Documentation/trace/events.txt in the Linux kernel source for more
    information on triggers.

*-v*::
    This will cause all events specified after it on the command line to not
    be traced. This is useful for selecting a subsystem to be traced but to
    leave out various events. For Example: "-e sched -v -e "\*stat\*"" will
    enable all events in the sched subsystem except those that have "stat" in
    their names.

    Note: the *-v* option was taken from the way grep(1) inverts the following
    matches.

*-F*::
    This will filter only the executable that is given on the command line. If
    no command is given, then it will filter itself (pretty pointless).
    Using *-F* will let you trace only events that are caused by the given
    command.

*-P* 'pid'::
     Similar to *-F* but lets you specify a process ID to trace.

*-c*::
     Used with either *-F* (or *-P* if kernel supports it) to trace the process'
     children too.

*--user*::
     Execute the specified *command* as given user.

*-C* 'clock'::
     Set the trace clock to "clock".

     Use trace-cmd(1) list -C to see what clocks are available.

*-o* 'output-file'::
    By default, trace-cmd report will create a 'trace.dat' file. You can
    specify a different file to write to with the *-o* option.

*-l* 'function-name'::
    This will limit the 'function' and 'function_graph' tracers to only trace
    the given function name. More than one *-l* may be specified on the
    command line to trace more than one function. This supports both full
    regex(3) parsing, or basic glob parsing. If the filter has only alphanumeric,
    '_', '*', '?' and '.' characters, then it will be parsed as a basic glob.
    to force it to be a regex, prefix the filter with '^' or append it with '$'
    and it will then be parsed as a regex.

*-g* 'function-name'::
    This option is for the function_graph plugin. It will graph the given
    function. That is, it will only trace the function and all functions that
    it calls. You can have more than one *-g* on the command line.

*-n* 'function-name'::
    This has the opposite effect of *-l*. The function given with the *-n*
    option will not be traced. This takes precedence, that is, if you include
    the same function for both *-n* and *-l*, it will not be traced.

*-d*::
    Some tracer plugins enable the function tracer by default. Like the
    latency tracers. This option prevents the function tracer from being
    enabled at start up.

*-D*::
    The option *-d* will try to use the function-trace option to disable the
    function tracer (if available), otherwise it defaults to the proc file:
    /proc/sys/kernel/ftrace_enabled, but will not touch it if the function-trace
    option is available.  The *-D* option will disable both the ftrace_enabled
    proc file as well as the function-trace option if it exists.

    Note, this disable function tracing for all users, which includes users
    outside of ftrace tracers (stack_tracer, perf, etc).

*-O* 'option'::
    Ftrace has various options that can be enabled or disabled. This allows
    you to set them. Appending the text 'no' to an option disables it.
    For example: "-O nograph-time" will disable the "graph-time" Ftrace
    option.

*-s* 'interval'::
    The processes that trace-cmd creates to record from the ring buffer need
    to wake up to do the recording. Setting the 'interval' to zero will cause
    the processes to wakeup every time new data is written into the buffer.
    But since Ftrace is recording kernel activity, the act of this processes
    going back to sleep may cause new events into the ring buffer which will
    wake the process back up. This will needlessly add extra data into the
    ring buffer.

    The 'interval' metric is microseconds. The default is set to 1000 (1 ms).
    This is the time each recording process will sleep before waking up to
    record any new data that was written to the ring buffer.

*-r* 'priority'::
    The priority to run the capture threads at. In a busy system the trace
    capturing threads may be staved and events can be lost. This increases
    the priority of those threads to the real time (FIFO) priority.
    But use this option with care, it can also change the behaviour of
    the system being traced.

*-b* 'size'::
    This sets the ring buffer size to 'size' kilobytes. Because the Ftrace
    ring buffer is per CPU, this size is the size of each per CPU ring buffer
    inside the kernel. Using "-b 10000" on a machine with 4 CPUs will make
    Ftrace have a total buffer size of 40 Megs.

*-B* 'buffer-name'::
    If the kernel supports multiple buffers, this will add a buffer with
    the given name. If the buffer name already exists, that buffer is just
    reset and will not be deleted at the end of record execution. If the
    buffer is created, it will be removed at the end of execution (unless
    the *-k* is set, or 'start' command was used).

    After a buffer name is stated, all events added after that will be
    associated with that buffer. If no buffer is specified, or an event
    is specified before a buffer name, it will be associated with the
    main (toplevel) buffer.

     trace-cmd record -e sched -B block -e block -B time -e timer sleep 1

    The above is will enable all sched events in the main buffer. It will
    then create a 'block' buffer instance and enable all block events within
    that buffer. A 'time' buffer instance is created and all timer events
    will be enabled for that event.

*-m* 'size'::
    The max size in kilobytes that a per cpu buffer should be. Note, due
    to rounding to page size, the number may not be totally correct.
    Also, this is performed by switching between two buffers that are half
    the given size thus the output may not be of the given size even if
    much more was written.

    Use this to prevent running out of diskspace for long runs.

*-M* 'cpumask'::
    Set the cpumask for to trace. It only affects the last buffer instance
    given. If supplied before any buffer instance, then it affects the
    main buffer. The value supplied must be a hex number.

     trace-cmd record -p function -M c -B events13 -e all -M 5

    If the -M is left out, then the mask stays the same. To enable all
    CPUs, pass in a value of '-1'.

*-k*::
    By default, when trace-cmd is finished tracing, it will reset the buffers
    and disable all the tracing that it enabled. This option keeps trace-cmd
    from disabling the tracer and reseting the buffer. This option is useful for
    debugging trace-cmd.

    Note: usually trace-cmd will set the "tracing_on" file back to what it
    was before it was called. This option will leave that file set to zero.

*-i*::
    By default, if an event is listed that trace-cmd does not find, it
    will exit with an error. This option will just ignore events that are
    listed on the command line but are not found on the system.

*-N* 'host:port'::
    If another machine is running "trace-cmd listen", this option is used to
    have the data sent to that machine with UDP packets. Instead of writing
    to an output file, the data is sent off to a remote box. This is ideal for
    embedded machines with little storage, or having a single machine that
    will keep all the data in a single repository.

    Note: This option is not supported with latency tracer plugins:
      wakeup, wakeup_rt, irqsoff, preemptoff and preemptirqsoff

*-V* 'cid:port'::
    If recording on a guest VM and the host is running *trace-cmd listen* with
    the *-V* option as well, or if this is recording on the host, and a guest
    in running *trace-cmd listen* with the *-V* option, then connect to the
    listener (the same as connecting with the *-N* option via the network).
    This has the same limitations as the *-N* option above with respect to
    latency tracer plugins.

*-t*::
    This option is used with *-N*, when there's a need to send the live data
    with TCP packets instead of UDP. Although TCP is not nearly as fast as
    sending the UDP packets, but it may be needed if the network is not that
    reliable, the amount of data is not that intensive, and a guarantee is
    needed that all traced information is transfered successfully.

*-q* | *--quiet*::
    For use with recording an application. Suppresses normal output
    (except for errors) to allow only the application's output to be displayed.

*--date*::
    With the *--date* option, "trace-cmd" will write timestamps into the
    trace buffer after it has finished recording. It will then map the
    timestamp to gettimeofday which will allow wall time output from the
    timestamps reading the created 'trace.dat' file.

*--max-graph-depth* 'depth'::
    Set the maximum depth the function_graph tracer will trace into a function.
    A value of one will only show where userspace enters the kernel but not any
    functions called in the kernel. The default is zero, which means no limit.

*--cmdlines-size* 'size'::
    Set the number of entries the kernel tracing file "saved_cmdlines" can
    contain. This file is a circular buffer which stores the mapping between
    cmdlines and PIDs. If full, it leads to unresolved cmdlines ("<...>") within
    the trace. The kernel default value is 128.

*--module* 'module'::
    Filter a module's name in function tracing. It is equivalent to adding
    ':mod:module' after all other functions being filtered. If no other function
    filter is listed, then all modules functions will be filtered in the filter.

    '--module snd'  is equivalent to  '-l :mod:snd'

    '--module snd -l "*jack*"' is equivalent to '-l "*jack*:mod:snd"'

    '--module snd -n "*"' is equivalent to '-n :mod:snd'

*--proc-map*::
     Save the traced process address map into the trace.dat file. The traced
     processes can be specified using the option *-P*, or as a given 'command'.

*--profile*::
    With the *--profile* option, "trace-cmd" will enable tracing that can
    be used with trace-cmd-report(1) --profile option. If a tracer *-p* is
    not set, and function graph depth is supported by the kernel, then
    the function_graph tracer will be enabled with a depth of one (only
    show where userspace enters into the kernel). It will also enable
    various tracepoints with stack tracing such that the report can show
    where tasks have been blocked for the longest time.

    See trace-cmd-profile(1) for more details and examples.

*-G*::
    Set interrupt (soft and hard) events as global (associated to CPU
    instead of tasks). Only works for --profile.

*-H* 'event-hooks'::
    Add custom event matching to connect any two events together. When not
    used with *--profile*, it will save the parameter and this will be
    used by trace-cmd report --profile, too. That is:

    trace-cmd record -H hrtimer_expire_entry,hrtimer/hrtimer_expire_exit,hrtimer,sp
    trace-cmd report --profile

    Will profile hrtimer_expire_entry and hrtimer_expire_ext times.

    See trace-cmd-profile(1) for format.

*-S*:: (for --profile only)
    Only enable the tracer or events speficied on the command line.
    With this option, the function_graph tracer is not enabled, nor are
    any events (like sched_switch), unless they are specifically specified
    on the command line (i.e. -p function -e sched_switch -e sched_wakeup)

*--ts-offset offset*::
    Add an offset for the timestamp in the trace.dat file. This will add a
    offset option into the trace.dat file such that a trace-cmd report will
    offset all the timestamps of the events by the given offset. The offset
    is in raw units. That is, if the event timestamps are in nanoseconds
    the offset will also be in nanoseconds even if the displayed units are
    in microseconds.

*--tsync-interval*::
    Set the loop interval, in ms, for timestamps synchronization with guests:
        If a negative number is specified, timestamps synchronization is disabled
        If 0 is specified, no loop is performed - timestamps offset is calculated only twice,"
        at the beginning and at the end of the trace.
        Timestamps synchronization with guests works only if there is support for VSOCK.\n"

*--tsc2nsec*::
    Convert the current clock to nanoseconds, using tsc multiplier and shift from the Linux
        kernel's perf interface. This option does not change the trace clock, just assumes that
        the tsc multiplier and shift are applicable for the selected clock. You may use the
        "-C tsc2nsec" clock, if not sure what clock to select.
*--stderr*::
    Have output go to stderr instead of stdout, but the output of the command
    executed will not be changed. This is useful if you want to monitor the
    output of the command being executed, but not see the output from trace-cmd.

*--poll*::
    Waiting for data to be available on the trace ring-buffers may trigger
    IPIs. This might generate unacceptable trace noise when tracing low latency
    or real time systems. The poll option forces trace-cmd to use O_NONBLOCK.
    Traces are extracted by busy waiting, which will hog the CPUs, so only use
    when really needed.

*--name*::
    Give a specific name for the current agent being processed. Used after *-A* to
    give the guest being traced a name. Useful when using the vsocket ID instead of
    a name of the guest.

*--verbose*[='level']::
     Set the log level. Supported log levels are "none", "critical", "error", "warning",
     "info", "debug", "all" or their identifiers "0", "1", "2", "3", "4", "5", "6". Setting the log
     level to specific value enables all logs from that and all previous levels.
     The level will default to "info" if one is not specified.

     Example: enable all critical, error and warning logs

      trace-cmd record --verbose=warning

*--file-version*::
    Desired version of the output file. Supported versions are 6 or 7.

*--compression*::
    Compression of the trace output file, one of these strings can be passed:

    'any'  - auto select the best available compression algorithm

    'none' - do not compress the trace file

    'name' - the name of the desired compression algorithms. Available algorithms can be listed with
    trace-cmd list -c

EXAMPLES
--------

The basic way to trace all events:

------------------------------
 # trace-cmd record -e all ls > /dev/null
 # trace-cmd report
       trace-cmd-13541 [003] 106260.693809: filemap_fault: address=0x128122 offset=0xce
       trace-cmd-13543 [001] 106260.693809: kmalloc: call_site=81128dd4 ptr=0xffff88003dd83800 bytes_req=768 bytes_alloc=1024 gfp_flags=GFP_KERNEL|GFP_ZERO
              ls-13545 [002] 106260.693809: kfree: call_site=810a7abb ptr=0x0
              ls-13545 [002] 106260.693818: sys_exit_write:       0x1


------------------------------


To use the function tracer with sched switch tracing:

------------------------------
 # trace-cmd record -p function -e sched_switch ls > /dev/null
 # trace-cmd report
              ls-13587 [002] 106467.860310: function: hrtick_start_fair <-- pick_next_task_fair
              ls-13587 [002] 106467.860313: sched_switch: prev_comm=trace-cmd prev_pid=13587 prev_prio=120 prev_state=R ==> next_comm=trace-cmd next_pid=13583 next_prio=120
       trace-cmd-13585 [001] 106467.860314: function: native_set_pte_at <-- __do_fault
       trace-cmd-13586 [003] 106467.860314: function:             up_read <-- do_page_fault
              ls-13587 [002] 106467.860317: function:             __phys_addr <-- schedule
       trace-cmd-13585 [001] 106467.860318: function: _raw_spin_unlock <-- __do_fault
              ls-13587 [002] 106467.860320: function: native_load_sp0 <-- __switch_to
       trace-cmd-13586 [003] 106467.860322: function: down_read_trylock <-- do_page_fault


------------------------------

Here is a nice way to find what interrupts have the highest latency:
------------------------------------------
 # trace-cmd record -p function_graph -e irq_handler_entry  -l do_IRQ sleep 10
 # trace-cmd report
          <idle>-0     [000] 157412.933969: funcgraph_entry:                  |  do_IRQ() {
          <idle>-0     [000] 157412.933974: irq_handler_entry:    irq=48 name=eth0
          <idle>-0     [000] 157412.934004: funcgraph_exit:       + 36.358 us |  }
          <idle>-0     [000] 157413.895004: funcgraph_entry:                  |  do_IRQ() {
          <idle>-0     [000] 157413.895011: irq_handler_entry:    irq=48 name=eth0
          <idle>-0     [000] 157413.895026: funcgraph_exit:                        + 24.014 us |  }
          <idle>-0     [000] 157415.891762: funcgraph_entry:                  |  do_IRQ() {
          <idle>-0     [000] 157415.891769: irq_handler_entry:    irq=48 name=eth0
          <idle>-0     [000] 157415.891784: funcgraph_exit:       + 22.928 us |  }
          <idle>-0     [000] 157415.934869: funcgraph_entry:                  |  do_IRQ() {
          <idle>-0     [000] 157415.934874: irq_handler_entry:    irq=48 name=eth0
          <idle>-0     [000] 157415.934906: funcgraph_exit:       + 37.512 us |  }
          <idle>-0     [000] 157417.888373: funcgraph_entry:                  |  do_IRQ() {
          <idle>-0     [000] 157417.888381: irq_handler_entry:    irq=48 name=eth0
          <idle>-0     [000] 157417.888398: funcgraph_exit:       + 25.943 us |  }


------------------------------------------

An example of the profile:
------------------------------------------
 # trace-cmd record --profile sleep 1
 # trace-cmd report --profile --comm sleep
task: sleep-21611
  Event: sched_switch:R (1) Total: 99442 Avg: 99442 Max: 99442 Min:99442
     <stack> 1 total:99442 min:99442 max:99442 avg=99442
       => ftrace_raw_event_sched_switch (0xffffffff8105f812)
       => __schedule (0xffffffff8150810a)
       => preempt_schedule (0xffffffff8150842e)
       => ___preempt_schedule (0xffffffff81273354)
       => cpu_stop_queue_work (0xffffffff810b03c5)
       => stop_one_cpu (0xffffffff810b063b)
       => sched_exec (0xffffffff8106136d)
       => do_execve_common.isra.27 (0xffffffff81148c89)
       => do_execve (0xffffffff811490b0)
       => SyS_execve (0xffffffff811492c4)
       => return_to_handler (0xffffffff8150e3c8)
       => stub_execve (0xffffffff8150c699)
  Event: sched_switch:S (1) Total: 1000506680 Avg: 1000506680 Max: 1000506680 Min:1000506680
     <stack> 1 total:1000506680 min:1000506680 max:1000506680 avg=1000506680
       => ftrace_raw_event_sched_switch (0xffffffff8105f812)
       => __schedule (0xffffffff8150810a)
       => schedule (0xffffffff815084b8)
       => do_nanosleep (0xffffffff8150b22c)
       => hrtimer_nanosleep (0xffffffff8108d647)
       => SyS_nanosleep (0xffffffff8108d72c)
       => return_to_handler (0xffffffff8150e3c8)
       => tracesys_phase2 (0xffffffff8150c304)
  Event: sched_wakeup:21611 (1) Total: 30326 Avg: 30326 Max: 30326 Min:30326
     <stack> 1 total:30326 min:30326 max:30326 avg=30326
       => ftrace_raw_event_sched_wakeup_template (0xffffffff8105f653)
       => ttwu_do_wakeup (0xffffffff810606eb)
       => ttwu_do_activate.constprop.124 (0xffffffff810607c8)
       => try_to_wake_up (0xffffffff8106340a)
------------------------------------------

SEE ALSO
--------
trace-cmd(1), trace-cmd-report(1), trace-cmd-start(1), trace-cmd-stop(1),
trace-cmd-extract(1), trace-cmd-reset(1), trace-cmd-split(1),
trace-cmd-list(1), trace-cmd-listen(1), trace-cmd-profile(1)

AUTHOR
------
Written by Steven Rostedt, <[email protected]>

RESOURCES
---------
https://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git/

COPYING
-------
Copyright \(C) 2010 Red Hat, Inc. Free use of this software is granted under
the terms of the GNU Public License (GPL).