Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Persister Download Snapshot Files from Object Store Concurrently or in Parallel #24604

Open
Tracked by #25533
mgattozzi opened this issue Jan 25, 2024 · 3 comments
Labels

Comments

@mgattozzi
Copy link
Contributor

mgattozzi commented Jan 25, 2024

See #24604 (comment) below

In #24588 I had created the initial implementation of the persister to read and write Segment Info Files, the Catalog, and Parquet data to the Object Store. However, one flaw is that we download the Segment Info Files one at a time as seen here:

for item in &list[0..end] {
let bytes = self.object_store.get(&item.location).await?.bytes().await?;
output.push(serde_json::from_slice(&bytes)?);
}

If the network is slow this could really slow down any time we need to acquire the segment data.

In the PR I had tried to make it concurrent and while the code in theory should have been successful this was thwarted by the lifetimes created with the #[async_trait] proc macro. Getting it to use Rayon's par_iter() method also did not work. While async fn in traits is now stable it has it's own issues as well that were hard to untangle with the codebase at this time.

This is a known bottleneck currently and so to make sure we keep track of it I've opened up this issue to keep track of it and to keep track of any progress made addressing it.

@mgattozzi mgattozzi added the v3 label Jan 25, 2024
@hiltontj
Copy link
Contributor

Although things have changed a fair bit since this was opened - we now have snapshots instead of segments - we are still loading the snapshot files in sequence here:

for item in &list[0..end] {
let bytes = self.object_store.get(&item.location).await?.bytes().await?;
output.push(serde_json::from_slice(&bytes)?);
}

@hiltontj hiltontj changed the title Make PersisterImpl Download Segment Info Files from Object Store Concurrently or in Parallel Make Persister Download Snapshot Files from Object Store Concurrently or in Parallel Nov 12, 2024
@pauldix
Copy link
Member

pauldix commented Nov 12, 2024

This is definitely something we'll want to do. We'll also want to fetch WAL files in parallel as well. Maybe we should create an epic around optimizing startup time?

@hiltontj
Copy link
Contributor

I created an epic here: #25533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants