Async-Profiler and Flame Graphs - Cobalt Computing Services

Recently I decided to profile a java application with async-profiler rather than the often go-to options of JFR and perf. Async-profiler is a sampling profiler by Andrei Pangin and one of its features I liked was that it can produce mixed stack trace flame graphs to give a more complete picture e.g. java stack traces with linux kernel traces. When I’m able to I will add some sample graphs and output from the investigation I did using this tool, but for now here are just a few points that may be useful for anyone else interested in using it.

Key features of async-profiler are that it uses the hotspot API AsyncGetCallTrace to obtain java stack traces outside of safe points. This means threads don’t have to wait to reach a safe point which can introduce a safe point bias in the trace. There is also no need to create a symbol table file for a JVM, like with linux perf, and it can output in flame graph or JFR format. The profiler allows stack traces to be obtained for events such as cpu, memory allocation, locks, java methods and linux perf events. In addition it includes native JVM code and it does not sample idle threads.

The latest version of async-profiler can be downloaded from github here: https://github.com/async-profiler/async-profiler. In this setup I used version 3.0. Our target was a java springboot application running in an AWS EKS cluster.

async-profiler: 3.0
openjdk version: 21.0.4 2024-07-16 LTS
kernel releae: 5.10.216-204.855.amzn2.x86_64
containerd runtime: 1.7.11
kubernetes version: 1.3
Spring Boot: 3.3.4

To get the profiler to run in this environment it had to be made available in the container with the correct ownership and some modifications to linux settings had to be made.

Install the profiler in the same container directory as the host i.e. if in /tmp on host then it also needs to be in /tmp on the container. One way to do this is to su to the user running the process and copy the profiler into the home directory. If you connect to the container you can then move the profiler to the appropriate directory, in our case /tmp.

su <user>
cp -r <path>/async-profiler-3.0-linux-x64 /run/container/io.containerd.runtime.v2.task/k8s.io/<cid>/rootfs/home/<user>

Make sure that the files on the container are owned by the correct user. Following the steps in the point above should make sure that this is the case.
Make kernel symbols available from by making the following changes.

sysctl kernel.perf_event_paranoid=1
sysctl kernel.kptr_restrict=0

Async-profiler can run either as an agent via the -javaagent command line option or directly the from the host on a running JVM. In this case it was run directly from the host without restarting the application. Once connectivity to the JVM was established profiling options could be listed using ./asprof list <pid>. In our configuration the following options were available:

Basic events:
  cpu
  alloc
  lock
  wall
  itimer
  ctimer
Java method calls:
  ClassName.methodName
Perf events:
  page-faults
  context-switches
  cycles
  instructions
  cache-references
  cache-misses
  branch-instructions
  branch-misses
  bus-cycles
  L1-dcache-load-misses
  LLC-load-misses
  dTLB-load-misses
  rNNN
  pmu/event-descriptor/
  mem:breakpoint
  trace:tracepoint
  kprobe:func
  uprobe:path

The examples below profiled the target application for 30 seconds and generated flame graphs for analysis. NOTE: the output file in our case was created in /tmp of the container not the host. This needed to be collected from the container after sampling was complete.

./asprof -e cpu -d 30 -f /tmp/out-cpu.html <pid>
./asprof -e wall -d 30 -f /tmp/out-wall.html <pid>
./asprof -e lock -d 30 -f /tmp/out-lock.html <pid>

Flame graphs show code paths that are consuming CPU cycles. The graph displays thread stack traces captured during the sampling period. The x axis width represents the number of times a particular stack frame was present and the y axis represents the stack trace depth. The top leaf or edge show what is on-CPU, with the widest being the most frequent in the sample period. The async-profiler flame graph uses colour to indicate where the stack trace is from; green is java method, yellow is JVM code, red is native methods, orange is linux kernel code.

Flame graphs can then be generated for different tests for analysis or comparison between good and bad traces e.g. are we suddenly making new system calls in a graph from a trace that was slow ?

There is some very good documentation in the README including troubleshooting tips and links to videos on how to use the profiler as well as lots of demos and walkthroughs of solving problems with flame graphs.

A final thing to note is that the profiler uses the JVM Tool Interface (JVM TI) to dynamically load an agent from the shared object file libasyncProfiler.so. When loading the library a warning message was printed to the application log with the following message:

"Dynamically loading of agents will be disallowed by default in a future release"

OpenJDK have started to print this message to inform users that this feature will be removed in future versions. Agents will need to be explicitly specified as part of the command line arguments to ensure that the owner of the application approves. See JEP 451 for further details.