This post describes all the steps to reproduce the Trace Compass demonstration made at the Tracing Summit, August 20th 2019 in San Diego.

Setup (prepare the workspace):

  1. Download and run the Trace Compass development snapshot

  2. Install the following incubator features (Tools -> Add-ons):

    • Trace Compass Scripting Javascript (Incubation)
    • Generic Callstack (Incubation)
    • Trace Compass uftrace (Incubation)
    • Trace Compass Filters (Incubation)
  3. Open the Global Filters view (Window -> Show View) and put it beside the Control view (below the Project Explorer), so that it is conveniently in its own corner of the screen.

  4. You may now save this perspectve (Window -> Save Perspective As) to easily go back to this state (using Window -> Reset Perspective).

  5. Import the 2 trace sets: uftrace + lttng kernel traces of ‘ls’ and lttng ust + kernel of ‘vlc’ and create an experiment for each of those.

  6. Create a script: right-click the Tracing project, New -> File, name it filterFunctions.js.

  7. Copy-paste the content of filterFunctions.js to that file, save it and close it.

Demo one: Filter userspace + kernel trace

See the screencast of this demo

This demo will show a common workflow of userspace + kernel trace analysis with the objective of improving the performances of the application. This trace set is very simple but it allows a good demonstration of the approach. We’ll analyze the callstack, see which functions took the most time and analyze why.

The traces in this demo were taken using uftrace for the userspace and LTTng for the kernel trace. It traced the ls -R command on a directory.

  1. Open the demo_ls_2019 experiment.

  2. Expand the experiment, then the Views element, then the Callstack element, and double-click the Flame Graph (incubator) view.

  3. Observe the third level (row 2) at which there are quite a few functions calls of various durations. Hover over the largest of those and observe the duration statistics (mean, max, min, std dev). We see a good variability in the ‘readdir’ function, with a mean duration of ~11us, but a maximum of 10ms.

  4. Let’s write a filter for the outliers of this function. In the Global Filters view, write readdir && Duration > 1ms (careful with the case).

  5. Open the Flame Chart (incubator) view, under the Callstack analysis. Click on the Reset the Time Scale to Default icon (House Icon) and double-click on the ‘main’ element to bring it to window range.

  6. We observe right away the longest elements and scrolling down, we see that the thread was blocked during those long calls.

  7. To analyze why the thread was blocked, we open the Critical path view, under the OS Execution Graph analysis and move it beside the trace’s event table, so that Resources, Critical Path and Flame Chart are visible.

  8. From the Flame Chart, right-click on any of the function lines in the tree on the left and select Follow 6065/6065.

  9. We observe that the thread was waiting for network on the sshfs process. Indeed one of the directory to list was an sshfs mountpoint and we see it clearly here.

  10. Notice that the filter we entered earlier applies to all the views, but since readdir is not present on most of the kernel views, nothing is highlighted in those views. We can either remove that filter and apply filters locally for each view by clicking the / on each view, or edit the filter until we find something the right one.

    • Let’s double-click the filter in the Global Filters view and edit it with the TID TID == 6065 || (readdir && Duration > 1ms). This time, the kernel views are filtered, but we lost the filter on the Flame Chart since all items are for the TID 6065.
    • Let’s double-click the filter again and edit it with (!(Symbol present) && TID == 6065) || (readdir && Duration > 1ms). Now, we get everything filtered as expected with one filter. It may take some time and trial and error to find the proper filter for everything and once we have it, it’s hard to remember and write it each time.

Which leads us to the second demo.

Demo two: Script a workflow analysis of userspace + kernel

See the screencast of this demo

In this second demo, we’ll analyze a 30 second userspace + kernel trace of the vlc application playing from a network stream. The workflow shown with the first demo is much less obvious with larger traces. We have to navigate through a lot more threads and function calls. It is possible to script previous workflow in such a way that we can just execute the script on the trace, wait for its execution to terminate and have the results displayed for us.

  1. Reset the perspective to the default or just close all the Flame views.

  2. Open the demo_vlc_2019 experiment

  3. Configure the symbols.

    • Right-click on the experiment and select Configure symbol.

    • To get the symbols for vlc, click on the LTTng - demo_vlc_ust tab, check Use custom target root directory and browse to the trace directory, select the files directory.

    • To get the other system symbols, click on the GNU nm - demo_vlc_ust tab, and add the kernel.syms file located in the same files directory of the trace.

  4. Open the filterFunctions.js script (you may need to click on one of the variables under the file to actually open the file). Double-click on the tab with this file to view the content of the file full screen. See that the symbol to search for is defined at the beginning of the script. It uses the LTTng-UST CallStack (Incubator) analysis to parse all the function calls and calculate statistics on the desired one. It then creates a filter using that duration and the concerned TIDs. It opens all the required views. We can even create new views not available in Trace Compass.

  5. Right-click on the script and select Run As… -> Ease Script and wait for the script execution to terminate. You can read your emails, get a coffee, or even start navigating the trace during that time.

  6. At the end of the script execution, we see the Segment Store Table with all the function calls of larger duration than 1/3 above the average. We can sort them, go to any of them. The Critical Path view also already has one of the involved thread displayed.

  7. The Console view shows the output of the script, ie the computed average and the filter that was applied to the views, in case we want to edit it.

    • Careful Do not deactivate the global filter, or the Segment Store Table view may cause temporary hang.
  8. Arrange the views such that Resources, Critical Path and Flame Chart (incubator) can be visible at the same type.

  9. In the Segment Store Table view, sort the segments by duration, with the longest on top, click the longest one and zoom in towards that section. Look at the 3 views concurrently:

    • The Flame Chart shows a long duration and scrolling down shows that the thread was blocked part of the time.

    • Looking at the Critical Path shows that the thread was waiting for alsa-sink and pulseaudio during that time.

    • Given this long duration happened only once and relatively at the beginning of the thread, it’s probably due to some audio initialization

    • The second longest segment shows a similar pattern for the other vlc thread.

  10. Let’s look at the other above average segments.

    • The Critical Path and the Flame Chart show that the thread was simply preempted for 200us during that time.

    • Looking at the Resources view, we see that there were no CPU contention at that moment, it was simply preempted by an alsa-sink thread.

    • It would take more insight into the application to further analyze this. There’s probably CPU affinity between those processes, or at least for this vlc thread, which seems to always run on CPU 0.

  11. Close the Segment Store Table view and remove the filter.

    • We can look at other instances of the DecoderDecode function call and see the pattern of more or less small preemptions by alsa.

References