Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve in-memory virtual map mode to run garbage collection for merged copies #17331

Open
artemananiev opened this issue Jan 11, 2025 · 1 comment · May be fixed by #17405
Open

Improve in-memory virtual map mode to run garbage collection for merged copies #17331

artemananiev opened this issue Jan 11, 2025 · 1 comment · May be fixed by #17405
Assignees
Labels
Milestone

Comments

@artemananiev
Copy link
Member

This is a follow-up for #15448, a corner case that was not addressed in that feature.

Assume there are many copies (fast copy versions) of a virtual map, and some copy is not released for some reason:

  • Since the copy is not released, it cannot be flushed or merged
  • The next copy after that can be merged, though, and the next after next, too
  • Copies are merged till at some point (some version) a copy is so large that its size exceeds flush threshold, and it can't be merged any longer
  • The next copy after that can still be merged
  • And so on

So in the end the list of copies in the virtual pipeline will look like this (newest to oldest):

  • mutable copy
  • immutable copy version N, contains changes from versions N - X + 1 to N
  • immutable copy version N - X, contains changes from version N - 2X + 1 to N - X
  • ...
  • immutable copy version M, which is never released

The problem above is that all these intermediate copies may contain lots of obsolete mutations. Current in-memory mode implementation is that garbage collection is never run for these copies, but it should be.

This ticket is to improve in-memory mode for virtual maps:

  • Every copy is first checked if its size exceeds flush threshold
  • If so, garbage collection is run for this copy, otherwise no GC
  • Then the size of the copy is checked again. If it still exceeds the threshold, the copy is flushed
  • Otherwise it is merged to the next version
@artemananiev
Copy link
Member Author

Bumping priority to Critical, since it will need to be backported to 0.58

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
1 participant