LevelDB Explained - Bloom Filter Implementation and Visualization

In LevelDB, data is stored in SSTable files. When using Get() to query a key, it may be necessary to read multiple blocks from the SST file. To reduce disk reads, LevelDB provides a FilterPolicy strategy. If it can determine that a key is not in the current SSTable file, it can skip reading that file, thus improving query efficiency.

LevelDB supports user-defined filter policies but provides a default Bloom filter implementation. A Bloom filter is a space-efficient data structure used to determine whether an element is a member of a set. It has a certain false positive rate but no false negatives. In simple terms, if a Bloom filter determines that an element does not exist, then the element definitely does not exist; if a Bloom filter determines that an element exists, then the element may not exist.

Read More

LevelDB Explained - Preparing the Development Environment

LevelDB is an excellent LSM Tree storage component developed in C++. Although its overall codebase is not large, its design is ingenious and worth studying. During the process of reading the source code, I have compiled a series of articles to gradually break down the implementation details of LevelDB. However, before reading the code, it’s best to prepare the entire development environment.

This article will start from the most basic step of pulling the code, recording my process of preparing the entire environment, including configuring the VSCode IDE and using the clangd plugin, as well as how to configure compilation options. Then, through a simple read and write code demo, we’ll briefly use LevelDB to gain a perceptual understanding of this library. Additionally, we’ll introduce how to run test cases. LevelDB’s test cases are well-written, and during the code reading process, we can use these cases to better understand the code.

Read More

LevelDB Explained - Posix File Operation Details

LevelDB supports running on various operating systems. To adapt to different operating systems, it needs to encapsulate some system calls, such as file operations, thread operations, time operations, etc. In the exposed include files, the env.h file defines various interfaces used by LevelDB. This includes the Env class, which encapsulates file operations, directory operations, etc., as well as some file abstract classes such as SequentialFile, WritableFile, and RandomAccessFile for sequential reading, random reading, and writing files.

Through abstract interfaces, LevelDB can run on different operating systems by implementing the corresponding Env subclass for each platform. This article takes the POSIX system environment as an example to first look at how the abstracted file operation-related interfaces are implemented.

Read More