LevelDB Explained - How to Design a High-Performance HashTable

2024/12/25

Hash tables are a classic data structure that anyone who has written code should be familiar with. Every programming language has its own hash table implementation that’s ready to use out of the box. As a result, while many people have used hash tables, probably not many have implemented one themselves.

Designing a high-performance hash table actually requires considering quite a few details, such as how to handle hash collisions and how to handle hash table expansion. Some mature hash table implementations, like the one in the C++ standard library, have a large codebase and can be difficult to understand.

Fortunately, when implementing LRU Cache, LevelDB also implemented a simple and efficient hash table. The overall code is very concise - small but complete, making it very worth studying. Using LevelDB’s hash table implementation as an example, this article analyzes how to design a high-performance hash table.

LevelDB Explained - How to Analyze the Time Complexity of SkipLists?

2024/09/24

In the previous article LevelDB Explained - How to implement SkipList, we analyzed in detail the implementation of skip lists in LevelDB. Then in LevelDB Explained - How to Test Parallel Read and Write of SkipLists?, we analyzed the test code for LevelDB skip lists. One question remains: how do we analyze the time complexity of skip lists?

After analyzing the time complexity of skip lists, we can understand the choice of probability value and maximum height in LevelDB, as well as why Redis chooses a different maximum height. Finally, this article will also provide a simple benchmark code to examine the performance of skip lists.

This article will not involve very advanced mathematical knowledge, only simple probability theory, so you can read on without worry. The performance analysis of skip lists has several approaches worth learning from. I hope this article can serve as a starting point and bring some inspiration to everyone.

Breaking Down Skip List Performance Analysis

Knowing the principles and implementation of LevelDB, we can deduce that in extreme cases, where the height of each node is 1, the time complexity of search, insertion, and deletion operations of the skip list will degrade to O(n). In this case, the performance is considerably worse than balanced trees. Of course, due to the randomness involved, no input sequence can consistently lead to the worst performance.

So, how about the average performance of skip lists? We’ve previously stated the conclusion that it’s similar to the average performance of balanced trees. Introducing a simple random height can ensure that the average performance of skip lists is comparable to balanced trees. Is there any analysis method behind this that can analyze the performance of skip lists?

LevelDB Explained - How to Test Parallel Read and Write of SkipLists?

2024/09/18

In the previous article LevelDB Source Code Reading: Principles, Implementation, and Visualization of Skip Lists, we started by discussing the drawbacks of current binary search trees and balanced trees, which led to the introduction of skip lists as a data structure. Then, combining with the original paper, we explained the implementation principles of skip lists. Next, we analyzed in detail the code implementation in LevelDB, including the iterator implementation and extreme performance optimization for parallel reading. Finally, we provided a visualization page that intuitively shows the skip list construction process.

However, two questions remain:

How to test the LevelDB skip list code to ensure functional correctness? Especially, how to ensure the correctness of skip list implementation in parallel read and write scenarios.
How to quantitatively analyze the time complexity of skip lists?

Next, by analyzing LevelDB’s test code, we’ll first answer the first question. The quantitative analysis of skip list performance will be covered in a separate article.