-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++: Improve AliasedSSA performance #17225
Conversation
( | ||
def.getAnAllocation() = use.getAnAllocation() or | ||
not exists(def.getAnAllocation()) or | ||
not exists(def.getAnAllocation()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you meant for this to be use.getAnAllocation()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I think x instanceof EntireAllocationMemoryLocation
implies exists(x.getAnAllocation())
. So those two last cases aren't really doing anything anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And finally, if both def
and use
are EntireAllocationMemoryLocation
s and they have the same virtual variable (as they do in this case), I think def.getAnAllocation() = use.getAnAllocation()
always holds. So if this speeds up anything I'm pretty sure it's simply because you're permuting some joins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you meant for this to be
use.getAnAllocation()
?
Yes. I wasn't 100% sure these lines were required, but based on nearby cases I thought they might be. You seem to think not, so I'll delete them.
I think def.getAnAllocation() = use.getAnAllocation() always holds.
I'll check again, but I'm fairly sure I had found a case where it didn't (and that fact exploded performance). That might imply there's a deeper bug to be fixed and this change would only patch it. I'll investigate further...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And finally, if both def and use are EntireAllocationMemoryLocations and they have the same virtual variable (as they do in this case), I think def.getAnAllocation() = use.getAnAllocation() always holds.
This has been a bit of a pain to investigate, but it's definitely not the case on the AcademySoftwareFoundation/openexr
snapshot or the MRVA top 10. There are thousands of exceptions (possibly many more, I had to arbitrarily constrain the query so it would execute in reasonable time). Many times the issue seems to occur when a [template?] function takes multiple pointer arguments, between the memory locations of those arguments. Other cases seem to relate to dynamic allocations.
The question I'm trying to figure out next is why these cases occur - and whether they should be fixed at the source, or if we in fact should not be assuming def.getAnAllocation() = use.getAnAllocation()
holds here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a limited understanding of how this works, but it seems to me that if two or more allocations in the same function escape, they end up in the same virtual variable (TAllAliasedMemory
for the function) but with different allocations (EntireAllocationMemoryLocation
being smaller than TAllAliasedMemory
). So from the point of view of getExtentOverlap
, I think it's right that we have to check that def.getAnAllocation() = use.getAnAllocation()
for EntireAllocationMemoryLocation
s.
So I tentatively think my change here is correct. And I've removed the two lines you pointed out are redundant now. I think the next step is to try to come up with a test case where the difference matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still have a limited understanding of how this works, but it seems to me that if two or more allocations in the same function escape, they end up in the same virtual variable (TAllAliasedMemory for the function) but with different allocations (EntireAllocationMemoryLocation being smaller than TAllAliasedMemory).
Yes, that is all correct (I refer you to the slides I did at the Oxford offsite. I discusses the concept of virtual variables). A virtual variable is a memory location that represents a set of memory locations. AllAliasedMemory
is a virtual variable that groups together all memory locations that escape in a given function. Those memory locations may have different allocations. For example, something like:
void test() {
int* p = new int;
int* q = new int;
escape(&p);
escape(&q);
// now both p and q escape, but they have different allocations
}
EntireAllocationMemoryLocation
is the memory location that represents a parameter indirection or the memory that is pointed to by a dynamic allocation.
I do think that this PR makes the IR alias analysis more incorrect, but I haven't been able to think of how we may show this in a testcase. Granted, I also haven't given it a lot of thought yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so we're happy that def.getAnAllocation() = use.getAnAllocation()
should affect results in theory, it's more than just a performance hint.
The code you wrote in the comment immediately above is very similar to some of the real world cases I've looked at. I've not been able to find an existing test that exposes this behaviour though (perhaps I need to write a new one?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MathiasVP do you have any idea how best to expose the behaviour of this predicate in tests? Short of writing a direct test for getExtentOverlap
itself, which I suppose is an option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A value numbering test is probably the easiest way to get hard proofs of the change in behavior since:
- That's a library that's directly used in queries, and
- That's a library that makes full use of alias analysis
... I've tried and so far failed to write a test exposing the behaviour of I have confirmed how it performs locally with a small range of projects and dataflow queries:
I've also started a DCA run as further verification. |
Dataflow doesn't use the IR ssa analysis. So I'm not surprised you're not seeing any differences 😂 If you want to find a query thst exercises the changes here, you can pick a query that uses the MustFlow library |
That can't be right, there's a large reproducible speedup of
(this time running the 'after' version first; no meaningful change on the two other projects I tried it on though) The DCA run showed nothing of interest. |
That's because the alias analysis/the IR-based ssa is cached as part of IR construction. So it's running as one big stage. But that doesn't mean that those predicates have any influence on dataflow/taint-tracking 🙂 Dataflow and taint-tracking totally ignore all operands and instruction that are considered by alis analysis. |
I've just pushed two commits adding a (direct) test and then showing the difference this change makes. I'm much less experienced with the alias analysis logic than you are, but I would say that the |
It's caused by the additional logic in |
@MathiasVP I'm waiting for your opinion about the test I added in the last two commits. I'm keen to get this performance issue solved but I'm struggling with lack of knowledge of the internals of this library. |
| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:34:446:34 | *a[0..?)<unknown> | MustTotallyOverlap | | ||
| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:34:446:34 | ?*a | MustTotallyOverlap | | ||
| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:34:446:34 | ?*a[0..?)<unknown> | MustTotallyOverlap | | ||
| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:43:446:43 | *b | MustExactlyOverlap | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused by this. Why are *a
and *b
exactly overlapping? That means that a write *a
will be fully read by a read of *b
, but looking at the test you added that is clearly not the case. So I think something is wrong here. Although, I dont think your fix is the right one. I will look a bit into this today!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that these results come from the commit with my fix removed (i.e. as main
), so it's an existing problem. I would argue that my change improves things by removing these rows, but I wouldn't argue that removing them is necessarily the best fix possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree this is not the fault of your changes. It has been around for a very long time. I managed to dig up an old comment from Jonas on one of my first PRs here: #2772 (comment) 😄
I do agree that your changes makes this slightly better in this specific case, indeed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it turns out that escaped EntireAllocationMemoryLocation
always exactly overlap as specified here then I think I don't fully understand EntireAllocationMemoryLocation
. As written in #2696 it's supposed to represent "access to the entire contiguous allocation, without knowing exactly how big that allocation is", and I don't see why two such accesses should exactly overlap. It may be worth discussing this with Dave since he was the one that introduced these memory locations
That may also point us in the direction of the "right solution".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dbartol we're looking at the "// EntireAllocationMemoryLocation exactly overlaps itself." case in AliasedSSA.qll
. I really don't have the language nailed down - what exactly is a EntireAllocationMemoryLocation
, wha's the difference between the location and the allocation for that matter, what is MustExactlyOverlap
and ... should they exactly overlap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, missed that. That's a bug.
The obvious fix would be to check that the allocation also matches before saying that the two EntireAllocationMemoryLocation
s overlap. That should be safe, but I think there's an underlying problem: I believe the current code assumes that no two EntireAllocationMemoryLocation
s share the same virtual variable. By design, we unsoundly assume that the memory pointed to by two different pointer parameters doesn't overlap, so we create a separate virtual variable for each indirect parameter.
In the suspicious test case, are the two memory accesses accessing the same indirect parameter? Or are we conflating the EntireAllocationMemoryLocation
s from two different indirect parameters? If it's the latter, I think that means we've managed to assign at least one of the EntireMemoryAllocation
objects to multiple virtual variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still need some more help on this, I've been thrown in at the deep end here. I think @dbartol is saying that my change is correct, but that there might be other places where a similar change is required to distinguish EntireAllocationMemoryLocation
s within the same virtual variable? Do you have any idea where? I guess I can search all uses of EntireAllocationMemoryLocation
if this is what we're looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, Dave is saying that the change should be safe, but that the change shouldn't be necessary to do here since the current code assumes that two EntireAllocationMemoryLocations
never have the same virtual variable.
I'm not sure that's true, though: An EntireAllocationMemoryLocation
is an AllocationMemoryLocation
(see here), and the virtual variable of an AllocationMemoryLocation
is AllAliasedMemory
when the allocation escapes (see here). So the assumption that two EntireAllocationMemoryLocation
never share the same virtual variable is definitely false.
So either we need:
- @dbartol to explain to us which code makes this assumption (and fix that code), or
- to change the definition of
getVirtualVariable
forEntireAllocationMemoryLocation
so that it always returnsthis
even if the allocation escapes.
The latter of those two will then result in def.getVirtualVariable() = use.getVirtualVariable()
implying that they have the same allocation (which means the change you are suggesting here will be enforced automatically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cc @dbartol on ^
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no longer convinced that we assume that two different EntireAllocationMemoryLocation
s are never part of the same virtual variable. I believe that we try to treat the extra "variable" we introduce for pointer parameters like any other local variable: If its address never escapes, then it gets its own virtual variable. If its address does escape, it gets grouped into the "all aliased memory" virtual variable.
So, two different EntireAllocationMemoryLocation
s should never overlap, even if they are in the same virtual variable. I think this just means changing:
use instanceof EntireMemoryAllocation and
result instanceof MustExactlyOverlap
to
use = def and
result instanceof MustExactlyOverlap
I'm having trouble coming up with a test case that would show the difference in behavior, though.
Sorry! I missed the very last part of your sentence so I totally missed that you were waiting on me for this. Yeah, this change is clearly behavior changing, and I think we should find the "right" instead of introducing faulty logic. As I wrote in my comment above, I think there is already something fishy going on. I will take a look and get back to you! |
I just had a look for the same problem on a couple more projects we've had performance troubles with lately ( Honestly I'm looking to finish with this work very soon. It's taken up way more of my brain space over the last few weeks than it deserves, and it has no relation to any of my actual priorities at the moment. I'm willing to:
What are your thoughts? |
@dbartol are you able to reply to the last point in the thread? One way or another, I am going to be done with this issue next week. |
I've just switched this to @dbartol's suggestion:
If it looks good I'll do a fresh DCA run. |
@geoffw0 can you point me to an example of this? |
Look at the -| ssa.cpp:13:23:13:23 | *p | ssa.cpp:13:23:13:23 | ?*p | MustTotallyOverlap |
| ssa.cpp:13:23:13:23 | *p[0..4)<int> | ssa.cpp:13:23:13:23 | *p | MayPartiallyOverlap |
| ssa.cpp:13:23:13:23 | *p[0..4)<int> | ssa.cpp:13:23:13:23 | *p[0..4)<int> | MustExactlyOverlap |
| ssa.cpp:13:23:13:23 | *p[0..4)<int> | ssa.cpp:13:23:13:23 | ?*p | MayPartiallyOverlap |
| ssa.cpp:13:23:13:23 | *p[4..8)<int> | ssa.cpp:13:23:13:23 | *p | MayPartiallyOverlap |
| ssa.cpp:13:23:13:23 | *p[4..8)<int> | ssa.cpp:13:23:13:23 | *p[4..8)<int> | MustExactlyOverlap |
| ssa.cpp:13:23:13:23 | *p[4..8)<int> | ssa.cpp:13:23:13:23 | ?*p | MayPartiallyOverlap |
-| ssa.cpp:13:23:13:23 | ?*p | ssa.cpp:13:23:13:23 | *p | MayPartiallyOverlap | corresponds with this C++ code: int ChiPhiNode(Point* p, bool which1, bool which2) { // line 13
if (which1) {
p->x++;
}
else {
p->y++;
}
if (which2) {
p->x++;
}
else {
p->y++;
}
return p->x + p->y;
} |
@dbartol how do you feel about the (at this point I'd be happy to just undo the changes if we're not comfortable with them - leaving just the new test cases and an issue in a backlog somewhere) |
Merged in recent Update: DCA run successful, no alert changes, analysis time about the same overall. |
@dbartol any thoughts on my answer to your question above? |
I've added your commit. I'm happy with it. ✨
@jketema perhaps we could have an opinion from someone currently on the CPP team, then we can merge this? |
I don't really have an opinion, and forming one at this point would likely take an excessive amount of time, as this is code I have never looked at or touched. If DCA is happy with @dbartol 's last changes and if @MathiasVP thinks this looks ok, I'm happy for us to merge this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this as well!
Thanks for the review. Merged in the latest |
DCA looks good, merging. |
Improve AliasedSSA performance, in particular on projects where it's been found to perform poorly (such as
AcademySoftwareFoundation/openexr
).Draft PR. The change I propose here definitely speeds up analysis on the target project, but is (in theory) behaviour changing and I'm not at all sure it's correct. @MathiasVP I think you're easily the most knowledgeable person to discuss this area of code. Once we're settled on a change, there may be some other improvements to follow up, and it will need wider testing.