Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory consumption of very high-resolution merges #408

Merged
merged 7 commits into from
Mar 23, 2024

Conversation

nvictus
Copy link
Member

@nvictus nvictus commented Mar 20, 2024

This PR addresses a high memory consumption issue when a large number of very high resolution coolers are merged. It should improve the performance not only of cooler merge but also of cooler cload pairs and cooler load.

In pre-calculating offsets to use for the merge execution plan, we were loading (and concatenating) all bin1_offset indexes into memory. This isn't an issue for typical coolers, but can become prohibitively large for many inputs at high resolutions, where a single index vector can be ~2GB in size at human 10bp resolution.

  • Now we use lazy HDF5 datasets and load each bin1_offset index incrementally during merge execution planning. This results in a drastic improvement for merges involving e.g. 100s of datasets.
  • We also expose the merge buffer argument to cooler cload pairs and cooler load, and the max-merge option to cooler load, to give the user more flexibility in controlling maximum memory consumption during the actual merge epochs.

@nvictus nvictus requested review from Phlya and thomas-reimonn March 20, 2024 23:24
@nvictus nvictus merged commit a1b6cb0 into open2c:master Mar 23, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant