Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded BGSave #1483

Open
madolson opened this issue Dec 24, 2024 · 3 comments
Open

Multithreaded BGSave #1483

madolson opened this issue Dec 24, 2024 · 3 comments
Assignees

Comments

@madolson
Copy link
Member

One of the bottlenecks that we often observe in AWS is the time it takes to do a full sync either when replacing a node or bootstrapping a new nodes into a cluster. We can partially solve this bottleneck by supporting the usage of additional threads for a multi-threaded save and restore. The save can be trivially parallelized by either having each thread own a subset of the slots (for cluster mode) or a subset of the hash table buckets (for standalone). The restore is a bit less trivial, but by properly setting the initial state of the dictionary, we should be able to safely load data into it from multiple threads.

Each of these threads can use its own TCP connection for transmitting the full-sync data. The primary and replica will need to negotiate the number of RDB full sync connections to use.

@ranshid
Copy link
Member

ranshid commented Dec 24, 2024

Each of these threads can use its own TCP connection for transmitting the full-sync data. The primary and replica will need to negotiate the number of RDB full sync connections to use.

Does your suggestion here implies that the feature will only work for rd-channel enabled primary/replica?

I think we can also look at some ways we can use a single multiplexing connection to pass all traffic on a single stream. For example QUIC can help logically separate the data sent by each thread OR we can also logically multiplex the slot/bucket data using a dedicated headers which will proceed each data chunk (like crc is).

@madolson
Copy link
Member Author

I think we can also look at some ways we can use a single multiplexing connection to pass all traffic on a single stream. For example QUIC can help logically separate the data sent by each thread OR we can also logically multiplex the slot/bucket data using a dedicated headers which will proceed each data chunk (like crc is).

I suppose this is related, #1300.

@PingXie
Copy link
Member

PingXie commented Jan 8, 2025

I like this idea!

By the way, what you described in this issue is the diskless full sync. I think there's also value in speeding up the RDB snapshot (to disk). We'll likely need to use a "scatter-gather" solution that writes different parts of the keyspace in parallel to separate files and then concatenates them to form the final RDB file.

@xbasel xbasel self-assigned this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants