LevelDB Explained - Prefix Compression and Restart Points in BlockBuilder

2025/07/18

In LevelDB, SSTable (Sorted Strings Table) is the file format for storing key-value pairs. A previous article, LevelDB Explained - A Step by Step Guide to SSTable Build, introduced the creation process of SSTable files, where we learned that an SSTable file is composed of multiple data blocks, which are the fundamental units of the file.

These data blocks can be categorized into two types: key-value data blocks and filter data blocks. Accordingly, LevelDB implements two types of BlockBuilder classes to assemble them: BlockBuilder and FilterBlockBuilder. In this article, we’ll dive into the implementation of BlockBuilder.

First, let’s look at a simple diagram showing the storage structure of a DataBlock in LevelDB. The source for the diagram can be found at leveldb_datablock_en.dot.

LevelDB DataBlock Storage Structure

LevelDB Explained - A Step by Step Guide to SSTable Build

2025/06/27

0 Comment

In LevelDB, when the key-value pairs in the in-memory table (MemTable) reach a certain size, they are flushed to disk as an SSTable file. The disk files are also layered, with each layer containing multiple SSTable files. During runtime, LevelDB will merge and reorganize SSTable files at appropriate times, continuously compacting data into lower layers.

SSTable has a specific format for organizing data, aiming to ensure data is sorted and can be looked up quickly. So, how does SSTable store these key-value pairs internally, and how does it improve read and write performance? What are the optimization points in the implementation of the entire SSTable file?

In this article, we will carefully analyze the creation process of SSTable files, breaking it down step by step to see how it’s implemented. Before we begin, here’s a high-level diagram to give you a general idea.

Overview of SSTable file building steps

LevelDB Explained - The Implementation Details of a High-Performance LRU Cache

2025/06/13

0 Comment

In computer systems, caches are ubiquitous. From CPU caches to memory caches, from disk caches to network caches, they are everywhere. The core idea of a cache is to trade space for time by storing frequently accessed “hot” data in high-performance storage to improve performance. Since caching devices are expensive, their storage size is limited, which means some cached data must be evicted. The eviction policy is crucial here; an unreasonable policy might evict data that is about to be accessed, leading to a very low cache hit rate.