LevelDB Explained - Arena, Random, CRC32, and More.

LevelDB implements several utility tools, such as the custom memory allocator Arena and the random number generation class Random. These implementations consider specific use cases, making optimizations and trade-offs that are worth studying. This article will mainly discuss the implementation of the following parts:

  • Memory management Arena, a simple and efficient memory allocation manager suitable for LevelDB;
  • Random number generator Random, a good linear congruential pseudorandom generation algorithm that uses bitwise operations instead of modulo to optimize execution efficiency.
  • CRC32 cyclic redundancy check, used to detect errors during data transmission or storage;
  • Integer encoding and decoding, used to store numbers in byte streams or parse numbers from byte streams.

In addition, there are some more complex utils components that will be discussed in separate articles, such as:

Read More

LevelDB Explained - How To Read and Write WAL Logs

LevelDB uses Write-Ahead Logging (WAL) to ensure data durability. When a write operation occurs, LevelDB first writes the data to the log file, and then applies it to the in-memory data structure (such as MemTable). When the system or database restarts after a crash, LevelDB checks the records in the WAL log file. By reading and replaying these log records, LevelDB can rebuild the data state that had not been fully written to disk when the crash occurred.

LevelDB WAL Log Writing Process

The overall WAL log-related operation process is as follows:

  1. LevelDB first writes the data to the WAL log. This ensures that the data won’t be lost even in the event of a system crash.
  2. The data is written to the MemTable in memory, which is a fast memory operation.
  3. LevelDB confirms the write completion to the client.
  4. Over time, when the MemTable is full, it is flushed to SSTable files on disk.
  5. Once the MemTable has been successfully flushed to SSTable, the corresponding WAL log can be cleared.

Let’s take a detailed look at the implementation.

Read More

LevelDB Explained - Understanding Advanced C++ Techniques

The overall code of LevelDB is quite understandable, without using many esoteric C++ techniques. However, there are some implementations that are relatively uncommon, such as flexible arrays, symbol exporting for linking, and Pimpl class design. This article will review these advanced C++ techniques to help better understand the implementation of LevelDB.

Flexible Arrays

In the LRUHandle structure definition in util/cache.cc, there’s a flexible array member char key_data[1], used to implement variable-length data structures in C/C++.

1
2
3
4
5
6
7
8
9
struct LRUHandle {
// ...
char key_data[1]; // Beginning of key

Slice key() const {
assert(next != this);
return Slice(key_data, key_length);
}
};

Read More