LevelDB Explained - The Implementation Details of MemTable

0 Comment

In LevelDB, all write operations are first recorded in a Write-Ahead Log (WAL) to ensure durability. The data is then stored in a MemTable. The primary role of the MemTable is to store recently written data in an ordered fashion in memory. Once certain conditions are met, the data is flushed to disk in batches.

LevelDB maintains two types of MemTables in memory. One is writable and accepts new write requests. When it reaches a certain size threshold, it is converted into an immutable MemTable. A background process is then triggered to write it to disk, forming an SSTable. During this process, a new MemTable is created to accept new write operations, ensuring that write operations can continue without interruption.

When reading data, LevelDB first queries the MemTable. If the data is not found, it then queries the immutable MemTable, and finally the SSTable files on disk. In LevelDB’s implementation, both the writable MemTable and the immutable MemTable are implemented using the MemTable class. In this article, we will examine the implementation details of the memtable.

Read More

LevelDB Explained - Understanding Multi-Version Concurrency Control (MVCC)

0 Comment

In database systems, concurrent access is a common scenario. When multiple users read and write to a database simultaneously, ensuring the correctness of each person’s read and write results becomes a challenge that concurrency control mechanisms need to address.

Consider a simple money transfer scenario: account A initially has $1000 and needs to transfer $800 to account B. The transfer process includes two steps: deducting money from account A and adding money to account B. If someone queries the balances of accounts A and B between these two steps, what would they see?

Without any concurrency control, the query would reveal an anomaly: account A has been debited by $800, leaving only $200, while account B hasn’t yet received the transfer, still showing its original amount! This is a typical data inconsistency problem. To solve this issue, database systems need some form of concurrency control mechanism.

Read More

In-depth Experience with 3 MCP Servers via Cursor: Impressive but Not Yet Practical?

0 Comment

When Large Language Models (LLMs) first emerged, they primarily generated responses based on pre-trained data. These early models had two main drawbacks:

  1. They lacked knowledge of recent events. For instance, a model trained in March 2024 wouldn’t know about events in May 2024.
  2. They couldn’t utilize external tools. “Tools” here can be understood as function calls. For example, if I had a tool function to publish an article, I couldn’t use natural language to make the LLM call this function.

To address these issues, OpenAI was the first to introduce function calling capabilities in their models, as detailed in their blog post: Function calling and other API updates.

Background: Understanding Function Calling

At this point, we can inform the model about the tools we have, what parameters each tool requires, what it can do, and what its output will be. When the model receives a specific task, it helps us select the appropriate tool and parse the parameters. We can then execute the corresponding tool and get the result. This process can be iterated, allowing the AI to decide the next steps based on the tool’s output.

I found an animated GIF online that illustrates the workflow after function calling was introduced:

Understanding the function calling processUnderstanding the function calling process

Read More