-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cranelift: consider adding a loop-peeling optimization pass #10111
Comments
If we don't care about cleaning up conditional code that only happens in the first iteration of the loop, and can statically be shown as such after the peeling, and we only care about being able to LICM side-effectful-but-idempotent instructions, then we could also just investigate extending aegraph elaboration to support LICM'ing the idempotent parts of the side-effectful skeleton when that wouldn't move one side-effectful instruction across another one. But that kinda feels a little too specific and one-off for my tastes, in comparison to the peeling which effectively gives us that and also some more stuff all at once (ie is more general/powerful and probably not very different in terms of engineering effort?). |
I'd be interested in working through some example IR you've see to motivate this -- the main reason being that my preference is actually the opposite, toward better LICM rather than general peeling. The main reasons are that:
In contrast, "better LICM" feels much more tractable to me: it doesn't require bulk CFG manipulation (only instruction placement/scheduling which we're already doing) and gets the same concrete benefits on e.g. repeated trap checks without paying the large associated cost (duplicating the loop body). |
That is, a pass that unrolls one iteration of the loop body just before the actual loop. This is often beneficial because loop bodies might contain conditional code that only executes on the first iteration.
There could be additional benefit for Cranelift in particular: right now, Cranelift will deduplicate various kinds side-effecting-but-idempotent instructions (such as
trapz
s) but will not move those instructions outside of their original blocks and hoist them just above the loop even when it would be valid to do so. But if we have already unrolled the first iteration of a loop, then that unrolled first iteration will dominate the rest of the loop and when we dedupe those instructions from inside the loop body to their unrolled counterparts, we are effectively doing exactly that code motion we wanted to do before but could not.Random tidbits:
Could we do this at the same time as the aegraphs pass? I think so? But it would be in the same traversal, not actually in ISLE, similar to eg the alias analysis optimization.
We will need some heuristics for when to peel loops or not. Don't think we want the code bloat of peeling every single loop in the whole program. Maybe just innermost loops? Maybe just loops of up to a certain size? Lots of experimenting to be done in this regard.
The text was updated successfully, but these errors were encountered: