[Linux kernel] How to Use ftrace Efficiently (Function Tracer)

[Linux kernel] How to Use ftrace Efficiently (Function Tracer)

In this post, I’ll introduce how to use ftrace effectively to observe a process's call stack.

  • Linux kernel version: 6.1v
  • Machine: Armv8-A
  • Device: Raspberry pi

About ftrace

Many developers who want to deepen their understanding of the Linux kernel often find ftrace to be an excellent tool for learning. There are various approaches to mastering the kernel, but focusing on activities with ftrace can provide valuable insights, especially in:

  • Understanding the detailed workings of key subsystems within the Linux kernel.
  • Exploring the data structures of these subsystems in a clear, hands-on way.

Here are some notable advantages of ftrace:

  • In real-world projects, ftrace is used extensively to understand and diagnose system behavior.
  • Various tools provide user-friendly interfaces to make ftrace outputs easier to interpret.
  • ftrace is particularly useful for investigating performance issues within the kernel.

Efficient Ways to Use the Function Tracer

One practical way to set up ftrace is by using a shell script to save commonly used commands. Below is an example shell script to configure function tracing in the kernel:

#!/bin/bash

echo 0 > /sys/kernel/debug/tracing/tracing_on
sleep 1
echo "tracing_off"

echo 0 > /sys/kernel/debug/tracing/events/enable
sleep 1
echo "events disabled"

echo bcm2835_mmc_irq > /sys/kernel/debug/tracing/set_ftrace_filter
sleep 1
echo "set_ftrace_filter initialized"

echo function > /sys/kernel/debug/tracing/current_tracer
sleep 1
echo "function tracer enabled"
 
echo 1 > /sys/kernel/debug/tracing/events/printk/enable
echo 1 > /sys/kernel/debug/tracing/events/irq/irq_handler_entry/enable
echo 1 > /sys/kernel/debug/tracing/events/sched/sched_switch/enable
echo 1 > /sys/kernel/debug/tracing/events/sched/sched_wakeup/enable
sleep 1
echo "events enabled"

echo 1 > /sys/kernel/debug/tracing/options/func_stack_trace
echo 0 > /sys/kernel/debug/tracing/options/stacktrace
echo 1 > /sys/kernel/debug/tracing/options/sym-offset
echo "function stack trace enabled"

echo 1 > /sys/kernel/debug/tracing/tracing_on
echo "tracing_on"        

A particularly important line in this script is:

echo bcm2835_mmc_irq > /sys/kernel/debug/tracing/set_ftrace_filter        

With this command, ftrace will capture the call stack each time the bcm2835_mmc_irq function is called. Another critical line is:

echo 1 > /sys/kernel/debug/tracing/options/func_stack_trace        

This enables the func_stack_trace option, which displays the entire call stack when the specified function in set_ftrace_filter is triggered.

You can save this script as function_trace.sh and execute it as needed.

Analyzing the Output

After running function_trace.sh, you can collect the call stack from your device to see the trace output. An example of the trace output is below:

 <idle>-0       [000] d.h1.  2228.822487: irq_handler_entry: irq=37 name=mmc1
<idle>-0       [000] d.h1.  2228.822488: bcm2835_mmc_irq+0x4/0x680 <-__handle_irq_event_percpu+0x88/0x280
<idle>-0       [000] d.h1.  2228.822489: <stack trace>
 => bcm2835_mmc_irq+0x8/0x680
 => __handle_irq_event_percpu+0x88/0x280
 => handle_irq_event_percpu+0x20/0x68
 => handle_irq_event+0x50/0xa8
 => handle_fasteoi_irq+0xe4/0x198
 => generic_handle_domain_irq+0x34/0x50  
 => gic_handle_irq+0xa0/0xd8
 => call_on_irq_stack+0x2c/0x54
 => do_interrupt_handler+0xe0/0xf8
 => el1_interrupt+0x38/0x70
 => el1h_64_irq_handler+0x18/0x28
 => el1h_64_irq+0x64/0x68
 => finish_task_switch+0x98/0x290
 => __schedule+0x2ac/0x830
 => schedule_idle+0x28/0x50
 => do_idle+0xec/0x240
 => cpu_startup_entry+0x2c/0x38
 => rest_init+0xe4/0xe8
 => arch_post_acpi_subsys_init+0x0/0x28
 => start_kernel+0x6f8/0x730
 => __primary_switched+0xbc/0xc4        

From the output, we can deduce the following key points:

  • The interrupt, irq=37 name=mmc1, paused the current process running at finish_task_switch+0x98.
  • The irq name is "mmc1" and is assigned the number 37.
  • The notation d.h1. signifies that the code is running in an interrupt context, where execution should be fast. Due to the interrupt context, kernel functions that may cause delays, such as mutex_lock or schedule, should be avoided.

If this post is helpful to you, please follow me Austin Kim . In the next post, we’ll cover more practical tips for monitoring internal Linux kernel operations.

Best regards,

Austin Kim


10/27/2024

Hi Austin, please update your scripts to use the path /sys/kernel/tracing as I will be deprecating the /sys/kernel/debug/tracing path.

Like
Reply

I really appreciate your article — it helped me a lot in getting into the Linux kernel. I found one thing to be questionable. I was wondering why you called `sleep 1` between the settings.

Like
Reply

Very informative and services for the good 😊

Very helpful and I appreciate Austin for such a knowledgeable content.

To view or add a comment, sign in

More articles by Austin Kim

Others also viewed

Explore content categories