Using eBPF BCC for Non-Intrusive Analysis of Function Execution Time

We all know that monitoring function execution time is crucial when developing and maintaining backend services. Through monitoring, we can promptly identify performance bottlenecks, optimize code, and ensure service stability and response speed. However, traditional methods often involve adding statistics to the code and reporting them, which, although effective, typically only target functions considered to be on the critical path.

Suppose at some point, we suddenly need to monitor the execution time of a function that wasn’t a focus of attention. In this case, modifying the code and redeploying the service can be a cumbersome and time-consuming task. This is where eBPF (extended Berkeley Packet Filter) and BCC (BPF Compiler Collection) come in handy. By using eBPF, we can dynamically insert probes to monitor function execution time without modifying code or redeploying services. This not only greatly simplifies the monitoring process but also reduces the impact on service performance.

In the following article, we will detail how to use eBPF BCC to analyze service function execution time non-intrusively, and demonstrate its powerful capabilities through practical examples.

Read More

Redis Deadlock Problem Caused by Stream Data Read and Write (2)

In Redis Issue Analysis: “Deadlock” Problem Caused by Stream Data Read and Write (1), we successfully reproduced the bug mentioned in the Issue, observing that the Redis Server CPU spiked, unable to establish new connections, and existing connections couldn’t perform any read or write operations. With the help of powerful eBPF profile tools, we observed where the CPU time was mainly consumed. Now, let’s take a look at the debugging process and fix for this bug.

Debugging the bug

Considering that the Redis server process is still running, we can use GDB to attach to the process and set breakpoints to see the specific execution process. In the flame graph, we saw that the time-consuming handleClientsBlockedOnKey function contains a while loop statement. Since CPU spikes are usually caused by infinite loops, to verify if there’s an infinite loop in this while statement, we can set breakpoints at line 565 before the while loop and line 569 inside it, then continue multiple times to observe.

1
2
3
4
5
while((ln = listNext(&li))) {
client *receiver = listNodeValue(ln);
robj *o = lookupKeyReadWithFlags(rl->db, rl->key, LOOKUP_NOEFFECTS);
...
}

Read More

Redis Deadlock Problem Caused by Stream Data Read and Write (1)

In the Redis project, an issue named “[BUG] Deadlock with streams on redis 7.2” issue 12290 caught my attention. In this bug, the Redis server gets stuck in an infinite loop while processing specific client requests, which is extremely rare in a high-performance, high-reliability database system like Redis.

This issue is not just an ordinary bug report; it’s actually a learning process that deeply explores Redis’s internal mechanisms. From the discovery of the problem, to the detailed description of reproduction steps, to in-depth analysis of the issue, and finally to the proposal of solutions, each step is full of challenges and discoveries. Whether you’re a Redis user or a developer interested in the internal mechanisms of databases, I believe you can gain valuable insights from this issue.

Before we start investigating this bug, let’s briefly understand the background knowledge here: Redis’s stream data type.

Read More