Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Mark and Sweep Garbage Collection for DD Package #644

Open
burgholzer opened this issue Jul 4, 2024 · 0 comments
Open

✨ Mark and Sweep Garbage Collection for DD Package #644

burgholzer opened this issue Jul 4, 2024 · 0 comments
Labels
c++ Anything related to C++ code DD Anything related to the DD package enhancement New feature or request good first issue Good for newcomers refactor Anything related to code refactoring

Comments

@burgholzer
Copy link
Member

What's the problem this feature will solve?

As off now, our decision diagram package uses manual reference counting for garbage collection, i.e., every DD Node contains a 32bit field tracking a reference count.
The reference count is increased and decreases before and after manipulating decision diagrams.

For example, in classical simulation, the reference count of the initial state (DD) is increased. Then, the quantum operations of the computation are applied one after the other while always increasing the reference count of the new state and decreasing the reference count of the old state.

After each adjustment, the garbage collection routine is called, which checks all the unique unique tables for whether they have reached their (configurable) garbage collection limit.
If one of them reaches their limit, the real collection is started and the respective compute tables are cleared.

On the positive side, this kind of reference counting always allows one to determine the amount of "dead" nodes in the unique tables, i.e., nodes that have a reference count of 0.
If only a single active DD is maintained at a time, this directly gives the size of that decision diagram in O(1) complexity. It also allows to directly judge the percentage of active vs dead nodes in the unique tables, which can be used as the basis for a decision to grow the unique table or to conduct garbage collection. However, the latter scenario is hypothetical at the moment, as the tables can currently not be resized.

On the negative side,

  • the reference count has a negative impact on the memory footprint of each DD node. Space savings in each node can be extremely valuable and drastically increase performance.
  • the reference count has a negative runtime impact in the sense that constantly increasing and decreasing it consumes a measurable amount of time that would be better spent in actual computations most of the time
  • the way we currently update the reference count during simulation is not very favourable to keeping compute table entries around. Since we never increase the reference count of the quantum operation DDs when performing the simulation, any matrix node will always be considered "dead". So even if we were to adopt something like Compute Table Reset Improvements #335, it would always collect the whole MxV multiplication table if there are too many matrix nodes.

In essence, keeping track of the reference count doesn't really get us much, while it leads to several performance throttles.

Describe the solution you'd like

The literature on Binary Decision Diagrams (BDDs) has established two kinds of garbage collection schemes: reference counting, as we adopted it, or a mark-and-sweep approach, where there is no dedicated reference count field in each node and no continuous reference counting.
Instead, at periodic intervals, the population of the unique tables is checked (our garbage collection check), and if a table contains too many nodes, all active DDs are traversed once in a depth-first fashion where every node is marked. (This mark can probably be optimized/hidden in one of the pointer bits of the node)
Subsequently, the unique tables are crawled and any non-marked node is collected.
Afterwards, the active nodes are unmarked.

This only requires two traversals of the active decision diagram at the point of initiating the mark-and-sweep. The reference counting approach would have at least traversed as much of the diagrams over the course of the reference counting procedure. So this is an easy win for runtime.
As for the memory footprint, this also completely eliminates the need for the ref member in nodes. The "mark" can be hidden in the bits of a pointer or in a flag array (similar to what we already use for matrices) if the alignment of the node structs permit adding such flags without increasing their size (effectively making use of the padding due to alignment requirements).

Naturally, this is a breaking change and will require adaptation through this library as well as mqt-dddim and mqt-qcec. However, the degree of changes should be moderately low.

As always with these kinds of changes, it is important to benchmark the overall performance of the resulting package. A perfect use case for our DD package benchmarking tool (which might require a few adaptations as well based on the proposed changes).

@burgholzer burgholzer added enhancement New feature or request good first issue Good for newcomers refactor Anything related to code refactoring DD Anything related to the DD package c++ Anything related to C++ code labels Jul 4, 2024
@burgholzer burgholzer added this to the DD Package Improvements milestone Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Anything related to C++ code DD Anything related to the DD package enhancement New feature or request good first issue Good for newcomers refactor Anything related to code refactoring
Projects
Status: Todo
Status: Todo
Development

No branches or pull requests

1 participant