✨ Mark and Sweep Garbage Collection for DD Package #644
Labels
c++
Anything related to C++ code
DD
Anything related to the DD package
enhancement
New feature or request
good first issue
Good for newcomers
refactor
Anything related to code refactoring
Milestone
What's the problem this feature will solve?
As off now, our decision diagram package uses manual reference counting for garbage collection, i.e., every DD
Node
contains a 32bit field tracking a reference count.The reference count is increased and decreases before and after manipulating decision diagrams.
For example, in classical simulation, the reference count of the initial state (DD) is increased. Then, the quantum operations of the computation are applied one after the other while always increasing the reference count of the new state and decreasing the reference count of the old state.
After each adjustment, the garbage collection routine is called, which checks all the unique unique tables for whether they have reached their (configurable) garbage collection limit.
If one of them reaches their limit, the real collection is started and the respective compute tables are cleared.
On the positive side, this kind of reference counting always allows one to determine the amount of "dead" nodes in the unique tables, i.e., nodes that have a reference count of 0.
If only a single active DD is maintained at a time, this directly gives the size of that decision diagram in
O(1)
complexity. It also allows to directly judge the percentage of active vs dead nodes in the unique tables, which can be used as the basis for a decision to grow the unique table or to conduct garbage collection. However, the latter scenario is hypothetical at the moment, as the tables can currently not be resized.On the negative side,
In essence, keeping track of the reference count doesn't really get us much, while it leads to several performance throttles.
Describe the solution you'd like
The literature on Binary Decision Diagrams (BDDs) has established two kinds of garbage collection schemes: reference counting, as we adopted it, or a mark-and-sweep approach, where there is no dedicated reference count field in each node and no continuous reference counting.
Instead, at periodic intervals, the population of the unique tables is checked (our garbage collection check), and if a table contains too many nodes, all active DDs are traversed once in a depth-first fashion where every node is marked. (This mark can probably be optimized/hidden in one of the pointer bits of the node)
Subsequently, the unique tables are crawled and any non-marked node is collected.
Afterwards, the active nodes are unmarked.
This only requires two traversals of the active decision diagram at the point of initiating the mark-and-sweep. The reference counting approach would have at least traversed as much of the diagrams over the course of the reference counting procedure. So this is an easy win for runtime.
As for the memory footprint, this also completely eliminates the need for the
ref
member in nodes. The "mark" can be hidden in the bits of a pointer or in a flag array (similar to what we already use for matrices) if the alignment of the node structs permit adding such flags without increasing their size (effectively making use of the padding due to alignment requirements).Naturally, this is a breaking change and will require adaptation through this library as well as
mqt-dddim
andmqt-qcec
. However, the degree of changes should be moderately low.As always with these kinds of changes, it is important to benchmark the overall performance of the resulting package. A perfect use case for our DD package benchmarking tool (which might require a few adaptations as well based on the proposed changes).
The text was updated successfully, but these errors were encountered: