-
Notifications
You must be signed in to change notification settings - Fork 571
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
BEP-365: Support multi-database based on data pattern
- Loading branch information
Showing
2 changed files
with
146 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,146 @@ | ||
<pre> | ||
BEP: 365 | ||
Title: Support multi-database based on data pattern | ||
Status: Draft | ||
Type: Standards | ||
Created: 2024-03-15 | ||
</pre> | ||
|
||
|
||
# BEP-365: Support multi-database based on data pattern | ||
|
||
- [BEP-365: Support multi-database based on data pattern](#bep-364-support-multi-database-based-on-data-pattern) | ||
- [1. Summary](#1-summary) | ||
- [2. Motivation](#2-motivation) | ||
- [3. Specification](#3-specification) | ||
- [3.1 Multi-Databases](#31-multi-databases) | ||
- [3.2 Folder Structure](#32-folder-structure) | ||
- [4. Rationale](#4-rationale) | ||
- [4.1 Block DataBase](#41-block-database) | ||
- [4.2 Trie DataBase](#42-trie-database) | ||
- [4.3 Original DataBase](#43-original-database) | ||
- [5. Compatibility](#5-compatibility) | ||
- [6. Test](#6-test) | ||
- [7. License](#7-license) | ||
|
||
|
||
## 1. Summary | ||
|
||
This BEP proposes an efficient storage solution on the BSC(BNB Smart Chain) which can improve the performance and maintainability significantly by splitting blockchain data into multi-databases based on data patterns. | ||
|
||
## 2. Motivation | ||
|
||
Currently, all of the BSC node's data, except for historical block and state data, is stored in a single key-value database instance, with different types of data segregated by different prefixes, as shown in the table below. With the rapid increase in the amount of data, several problems are being faced: | ||
|
||
- Mixed storage of data with different patterns in a single database results in inefficient performance and additional compaction overhead; | ||
- Less and Less efficient querying as the single database size increases continuously; | ||
- Concurrent reading and writing of a single database cannot fully utilize disk bandwidth; | ||
- Not friendly to using multiple disks for data storage | ||
- Inability to customize database parameters to optimize read/write performance for data from different patterns | ||
|
||
| Database | Category | Key Scheme | Size | | ||
| --------------- | ---------------------------------- | --------------------------------------------------- | ----------- | | ||
| KV Store | **Headers** | headerPrefix + num + hash | 72.28MiB | | ||
| | **Bodies** | blockBodyPrefix + num + hash | 12.40GiB | | ||
| | **Receipt lists** | blockReceiptsPrefix + num + hash | 7.73GiB | | ||
| | **Difficulties** | headerPrefix + num+ hash + headerTDSuffix | 4.03MiB | | ||
| | **Block number -> hash** | headerNumberPrefix + hash | 3.61MiB | | ||
| | **Block hash -> number** | headerPrefix + num + headerHashSuffix | 1.33GiB | | ||
| | **Transaction index** | txLookupPrefix + hash | 176.04GiB | | ||
| | **Bloombit index** | bloomBitsPrefix + bit + section + hash | 8.12GiB | | ||
| | **Contract codes** | CodePrefix + code hash | 20.23GiB | | ||
| | **Path trie state lookups** | stateIDPrefix + state root | 3.52MiB | | ||
| | **Path trie account nodes** | trieNodeAccountPrefix + hexPath | 40.34GiB | | ||
| | **Path trie storage nodes** | trieNodeStoragePrefix + accountHash + hexPath | 473.95GiB | | ||
| | **Account snapshot** | SnapshotAccountPrefix + account hash | 13.17GiB | | ||
| | **Storage snapshot** | SnapshotStoragePrefix + account hash + storage hash | 246.98GiB | | ||
| Ancient (Chain) | **Bodies** | - | 797.23GiB | | ||
| | **Receipts** | - | 664.62GiB | | ||
| | **Diffs** | - | 356.49MiB | | ||
| | **Headers** | - | 20.21GiB | | ||
| | **Hashes** | - | 1.23GiB | | ||
| Ancient (State) | **Account Data** | - | 1.52GiB | | ||
| | **Storage Data** | - | 1.63GiB | | ||
| | **History Meta** | - | 248.81MiB | | ||
| | **Account Index** | - | 2.03GiB | | ||
| | **Storage Index** | - | 3.65GiB | | ||
|
||
This BEP aims to address these issues and provide a more efficient and high-performance way to store and maintain block and state data. | ||
|
||
## 3. Specification | ||
|
||
### 3.1 Multi-Databases | ||
|
||
Chaindata will be divided into three stores, BlockStore, TrieStore, and Others, according to data schema and read/write behavior. After database separation, the data layout obtained by `db inspect` is as the below table shows: | ||
|
||
- Block Database: Block-related data is stored in this store, including _headers_, _bodies_, _receipts_, _difficulties_, _number-to-hash indexes_, _hash-to-number indexes_, and historical block data. | ||
- Trie Database: All trie nodes of the current state and historical state data of nearly 9w blocks are stored here. | ||
- Original Database: The remaining data will be stored in this store, including _snapshot_, _txIndex_, _contract code_, and other metadata, etc. | ||
|
||
| Database | Type | Category | Data Pattern | | ||
| -------------------- | -------- | ----------------------- | ----------------------------------------- | | ||
| **BlockDatabase** | KV Store | Headers | Sequential (except “Block hash-> number”) | | ||
| | | Bodies | | | ||
| | | Receipt lists | | | ||
| | | Difficulties | | | ||
| | | Block number -> hash | | | ||
| | | Block hash -> number | | | ||
| | Ancient | Bodies | | | ||
| | | Receipts | | | ||
| | | Diffs | | | ||
| | | Headers | | | ||
| | | Hashes | | | ||
| **TrieDataBase** | KV Store | Path trie account nodes | Path Hash | | ||
| | | Path trie storage nodes | | | ||
| | Ancient | Account Data | Sequential | | ||
| | | Storage Data | | | ||
| | | History Meta | | | ||
| | | Account Index | | | ||
| | | Storage Index | | | ||
| **OriginalDatabase** | KV Store | Account snapshot | Hash | | ||
| | | Storage snapshot | | | ||
| | | Transaction index | | | ||
| | | Bloombit index | | | ||
| | | Contract codes | | | ||
|
||
### 3.2 Folder Structure | ||
|
||
The folder structure for multiple databases is shown below. The original database is located within the `chaindata/` folder, and new `block/` and `state/` folders have been introduced to store block and trie data. Additionally, there is an `ancient` folder for storing historical data under each of these directories. | ||
|
||
[folder structure](./assets/bep-365/folder.png) | ||
|
||
## 4. Rationale | ||
|
||
Separate databases according to data schema and read/write behavior can improve the performance, scalability, and maintainability of chain nodes. | ||
|
||
### 4.1 Block DataBase | ||
|
||
Diving deeper into the data flow of block-related data, recent blocks are first stored in a key-value (KV) database and then appended sequentially to the ancient database and deleted from the KV database when reaching the ancient threshold, which wastes some disk bandwidth. | ||
|
||
Originally the latest 90K blocks will be retained in the key-value database by default, regarded as "non-finalized" in the context of Proof-of-work. Now with Proof-of-Stake-Authority and fast-finality, there’s no need to keep so many blocks for blockchain reorganization. It makes more sense to rely on the finalized block for chain freezing directly after fast-finality is enabled on the BSC Mainnet. This means we can keep less recent blocks before migrating them to the ancient database for blockchain reorganization. | ||
|
||
In summation, the block data retained in the kv database will be reduced to 20-30 blocks after using the finalized block as the chain freeze indicator, this part of the data can be stored in memory, but considering the robustness and stability of the node, using a separated kv database will be more reasonable and probability most of the block data are deleted when flush from memtable to sst file and then append them to the ancient database. Therefore, unnecessary bandwidth consumption can be reduced. | ||
|
||
### 4.2 Trie DataBase | ||
|
||
The trie node data constitutes roughly half of the overall key-value database volume and grows rapidly. Latency to read and write trie node data significantly impacts the block execution and verification performance. Improving its read and write speed can contribute to an overall improvement in blockchain performance. | ||
|
||
Through in-depth analysis of the data model and read/write behavior of Trie node data, it is evident that Trie nodes exhibit significant overwrite operations in path mode. The keys are relatively ordered along the paths. If this data is stored with other hash-keyed data in the same key-value database, it would significantly amplify the cost of database compaction, leading to senseless bandwidth consumption. If splitting the database for trie data, the independent db can process compaction with a more simplified LSM hierarchy, which would reduce the read/write latency of the entire database and improve the performance. | ||
|
||
### 4.3 Original DataBase | ||
|
||
After separating the block and trie data, the remaining data is stored in the original database containing the account/storage snapshot, txIndex, and contract code. Due to the reduction in the amount of data, the depth of the LSM tree is reduced, thus the read/write performance of the original database will be improved. | ||
|
||
It is worth mentioning that during the execution phase of the blockchain, there is frequent access to snapshot data. This reduction of reading latency of snapshot data will contribute significantly to overall execution performance. | ||
|
||
## 5. Compatibility | ||
|
||
There are no compatibility issues and no hard fork is required, but the node needs to be resync. | ||
|
||
## 6. Test | ||
|
||
TBD | ||
|
||
## 7. License | ||
|
||
The content is licensed under [CC0](https://creativecommons.org/publicdomain/zero/1.0/). |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.