LevelDB Explained - How to implement SkipList

In LevelDB, the data in the memory MemTable is stored in a SkipList to support fast insertion. Skip lists are probabilistic data structures proposed by William Pugh in his paper Skip Lists: A Probabilistic Alternative to Balanced Trees. They are somewhat similar to ordered linked lists but can have multiple levels, trading space for time, allowing for fast query, insertion, and deletion operations with an average time complexity of $ O(\log n) $. Compared to some balanced trees, the code implementation is relatively simple and performance is stable, making it widely applicable.

Inspirational approach to skip list implementation

So, what are the principles behind skip lists? How are they implemented in LevelDB? What are the highlights and optimizations in LevelDB’s skip list implementation? How does it support single-threaded writing and concurrent reading of skip lists? This article will delve into these aspects from the principles and implementation of skip lists. Finally, it provides a visualization page that intuitively shows the construction and overall structure of skip lists.

Read More

Claude3.5's System Prompts - No Apologies, Face Blind, Hallucinate...

Recently, Anthropic released the system prompts for the Claude3.5 model, which are very worth learning from. The entire prompt is written in English, quite lengthy, and constrains many behaviors of the model. Let’s take a look together.

Claude3.5 System Prompts

Read More

LevelDB Explained - Arena, Random, CRC32, and More.

LevelDB implements several utility tools, such as the custom memory allocator Arena and the random number generation class Random. These implementations consider specific use cases, making optimizations and trade-offs that are worth studying. This article will mainly discuss the implementation of the following parts:

  • Memory management Arena, a simple and efficient memory allocation manager suitable for LevelDB;
  • Random number generator Random, a good linear congruential pseudorandom generation algorithm that uses bitwise operations instead of modulo to optimize execution efficiency.
  • CRC32 cyclic redundancy check, used to detect errors during data transmission or storage;
  • Integer encoding and decoding, used to store numbers in byte streams or parse numbers from byte streams.

In addition, there are some more complex utils components that will be discussed in separate articles, such as:

Read More