Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++: Improve AliasedSSA performance #17225

Merged
merged 11 commits into from
Oct 21, 2024
Merged

C++: Improve AliasedSSA performance #17225

merged 11 commits into from
Oct 21, 2024

Conversation

geoffw0
Copy link
Contributor

@geoffw0 geoffw0 commented Aug 14, 2024

Improve AliasedSSA performance, in particular on projects where it's been found to perform poorly (such as AcademySoftwareFoundation/openexr).

Draft PR. The change I propose here definitely speeds up analysis on the target project, but is (in theory) behaviour changing and I'm not at all sure it's correct. @MathiasVP I think you're easily the most knowledgeable person to discuss this area of code. Once we're settled on a change, there may be some other improvements to follow up, and it will need wider testing.

@geoffw0 geoffw0 added the C++ label Aug 14, 2024
@geoffw0 geoffw0 requested a review from a team as a code owner August 14, 2024 14:33
@geoffw0 geoffw0 marked this pull request as draft August 14, 2024 14:40
(
def.getAnAllocation() = use.getAnAllocation() or
not exists(def.getAnAllocation()) or
not exists(def.getAnAllocation())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant for this to be use.getAnAllocation()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think x instanceof EntireAllocationMemoryLocation implies exists(x.getAnAllocation()). So those two last cases aren't really doing anything anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And finally, if both def and use are EntireAllocationMemoryLocations and they have the same virtual variable (as they do in this case), I think def.getAnAllocation() = use.getAnAllocation() always holds. So if this speeds up anything I'm pretty sure it's simply because you're permuting some joins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you meant for this to be use.getAnAllocation()?

Yes. I wasn't 100% sure these lines were required, but based on nearby cases I thought they might be. You seem to think not, so I'll delete them.

I think def.getAnAllocation() = use.getAnAllocation() always holds.

I'll check again, but I'm fairly sure I had found a case where it didn't (and that fact exploded performance). That might imply there's a deeper bug to be fixed and this change would only patch it. I'll investigate further...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And finally, if both def and use are EntireAllocationMemoryLocations and they have the same virtual variable (as they do in this case), I think def.getAnAllocation() = use.getAnAllocation() always holds.

This has been a bit of a pain to investigate, but it's definitely not the case on the AcademySoftwareFoundation/openexr snapshot or the MRVA top 10. There are thousands of exceptions (possibly many more, I had to arbitrarily constrain the query so it would execute in reasonable time). Many times the issue seems to occur when a [template?] function takes multiple pointer arguments, between the memory locations of those arguments. Other cases seem to relate to dynamic allocations.

The question I'm trying to figure out next is why these cases occur - and whether they should be fixed at the source, or if we in fact should not be assuming def.getAnAllocation() = use.getAnAllocation() holds here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a limited understanding of how this works, but it seems to me that if two or more allocations in the same function escape, they end up in the same virtual variable (TAllAliasedMemory for the function) but with different allocations (EntireAllocationMemoryLocation being smaller than TAllAliasedMemory). So from the point of view of getExtentOverlap, I think it's right that we have to check that def.getAnAllocation() = use.getAnAllocation() for EntireAllocationMemoryLocations.

So I tentatively think my change here is correct. And I've removed the two lines you pointed out are redundant now. I think the next step is to try to come up with a test case where the difference matters.

Copy link
Contributor

@MathiasVP MathiasVP Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have a limited understanding of how this works, but it seems to me that if two or more allocations in the same function escape, they end up in the same virtual variable (TAllAliasedMemory for the function) but with different allocations (EntireAllocationMemoryLocation being smaller than TAllAliasedMemory).

Yes, that is all correct (I refer you to the slides I did at the Oxford offsite. I discusses the concept of virtual variables). A virtual variable is a memory location that represents a set of memory locations. AllAliasedMemory is a virtual variable that groups together all memory locations that escape in a given function. Those memory locations may have different allocations. For example, something like:

void test() {
  int* p = new int;
  int* q = new int;
  escape(&p);
  escape(&q);
  // now both p and q escape, but they have different allocations
}

EntireAllocationMemoryLocation is the memory location that represents a parameter indirection or the memory that is pointed to by a dynamic allocation.

I do think that this PR makes the IR alias analysis more incorrect, but I haven't been able to think of how we may show this in a testcase. Granted, I also haven't given it a lot of thought yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so we're happy that def.getAnAllocation() = use.getAnAllocation() should affect results in theory, it's more than just a performance hint.

The code you wrote in the comment immediately above is very similar to some of the real world cases I've looked at. I've not been able to find an existing test that exposes this behaviour though (perhaps I need to write a new one?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MathiasVP do you have any idea how best to expose the behaviour of this predicate in tests? Short of writing a direct test for getExtentOverlap itself, which I suppose is an option.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A value numbering test is probably the easiest way to get hard proofs of the change in behavior since:

  • That's a library that's directly used in queries, and
  • That's a library that makes full use of alias analysis

@geoffw0
Copy link
Contributor Author

geoffw0 commented Aug 30, 2024

... I've tried and so far failed to write a test exposing the behaviour of getExtentOverlap in this case.

I have confirmed how it performs locally with a small range of projects and dataflow queries:

AcademySoftwareFoundation/openexr, cpp/path-injection, before: 84 results, 3,081 seconds
AcademySoftwareFoundation/openexr, cpp/path-injection, after: 84 results, 77 seconds
Sphereserver/Source-X, cpp/sql-injection, before: 0 results, 37 seconds
Sphereserver/Source-X, cpp/sql-injection, after: 0 results, 38 seconds
WebKit-JavaScriptCore, cpp/type-confusion, before: 443 results, 516 seconds
WebKit-JavaScriptCore, cpp/type-confusion, after: 443 results, 512 seconds

I've also started a DCA run as further verification.

@MathiasVP
Copy link
Contributor

I have confirmed how it performs locally with a small range of projects and dataflow queries:

Dataflow doesn't use the IR ssa analysis. So I'm not surprised you're not seeing any differences 😂

If you want to find a query thst exercises the changes here, you can pick a query that uses the MustFlow library

@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 2, 2024

If you want to find a query thst exercises the changes here, you can pick a query that uses the MustFlow library

That can't be right, there's a large reproducible speedup of cpp/path-injection on AcademySoftwareFoundation/openexr. Here it is again:

AcademySoftwareFoundation/openexr, cpp/path-injection, after: 84 results, 76 seconds
AcademySoftwareFoundation/openexr, cpp/path-injection, before: 84 results, 3266 seconds

(this time running the 'after' version first; no meaningful change on the two other projects I tried it on though)


The DCA run showed nothing of interest.

@MathiasVP
Copy link
Contributor

That's because the alias analysis/the IR-based ssa is cached as part of IR construction. So it's running as one big stage. But that doesn't mean that those predicates have any influence on dataflow/taint-tracking 🙂

Dataflow and taint-tracking totally ignore all operands and instruction that are considered by alis analysis.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 3, 2024

I've just pushed two commits adding a (direct) test and then showing the difference this change makes. I'm much less experienced with the alias analysis logic than you are, but I would say that the MustExactlyOverlap results we lose were wrong - just because something escapes just not mean it must overlap something else that escapes. On the flip side, the MayPartiallyOverlap results were probably correct (and I'm not exactly sure why we lose them yet). Do you agree?

@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 3, 2024

On the flip side, the MayPartiallyOverlap results were probably correct (and I'm not exactly sure why we lose them yet).

It's caused by the additional logic in getOverlap, transforming certain outputs of getExtentOverlap from MustTotallyOverlap / MustExactlyOverlap to MayPartiallyOverlap. I can repair it (and make all of them MayPartiallyOverlap), but I'm fairly sure that will reintroduce the performance issue (and that may be worse than the missing data). I'll wait for your advice now.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 9, 2024

@MathiasVP I'm waiting for your opinion about the test I added in the last two commits. I'm keen to get this performance issue solved but I'm struggling with lack of knowledge of the internals of this library.

| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:34:446:34 | *a[0..?)<unknown> | MustTotallyOverlap |
| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:34:446:34 | ?*a | MustTotallyOverlap |
| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:34:446:34 | ?*a[0..?)<unknown> | MustTotallyOverlap |
| ssa.cpp:446:34:446:34 | *a | ssa.cpp:446:43:446:43 | *b | MustExactlyOverlap |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused by this. Why are *a and *b exactly overlapping? That means that a write *a will be fully read by a read of *b, but looking at the test you added that is clearly not the case. So I think something is wrong here. Although, I dont think your fix is the right one. I will look a bit into this today!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that these results come from the commit with my fix removed (i.e. as main), so it's an existing problem. I would argue that my change improves things by removing these rows, but I wouldn't argue that removing them is necessarily the best fix possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree this is not the fault of your changes. It has been around for a very long time. I managed to dig up an old comment from Jonas on one of my first PRs here: #2772 (comment) 😄

I do agree that your changes makes this slightly better in this specific case, indeed

Copy link
Contributor

@MathiasVP MathiasVP Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it turns out that escaped EntireAllocationMemoryLocation always exactly overlap as specified here then I think I don't fully understand EntireAllocationMemoryLocation. As written in #2696 it's supposed to represent "access to the entire contiguous allocation, without knowing exactly how big that allocation is", and I don't see why two such accesses should exactly overlap. It may be worth discussing this with Dave since he was the one that introduced these memory locations

That may also point us in the direction of the "right solution".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbartol we're looking at the "// EntireAllocationMemoryLocation exactly overlaps itself." case in AliasedSSA.qll. I really don't have the language nailed down - what exactly is a EntireAllocationMemoryLocation, wha's the difference between the location and the allocation for that matter, what is MustExactlyOverlap and ... should they exactly overlap?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, missed that. That's a bug.

The obvious fix would be to check that the allocation also matches before saying that the two EntireAllocationMemoryLocations overlap. That should be safe, but I think there's an underlying problem: I believe the current code assumes that no two EntireAllocationMemoryLocations share the same virtual variable. By design, we unsoundly assume that the memory pointed to by two different pointer parameters doesn't overlap, so we create a separate virtual variable for each indirect parameter.

In the suspicious test case, are the two memory accesses accessing the same indirect parameter? Or are we conflating the EntireAllocationMemoryLocations from two different indirect parameters? If it's the latter, I think that means we've managed to assign at least one of the EntireMemoryAllocation objects to multiple virtual variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need some more help on this, I've been thrown in at the deep end here. I think @dbartol is saying that my change is correct, but that there might be other places where a similar change is required to distinguish EntireAllocationMemoryLocations within the same virtual variable? Do you have any idea where? I guess I can search all uses of EntireAllocationMemoryLocation if this is what we're looking for?

Copy link
Contributor

@MathiasVP MathiasVP Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Dave is saying that the change should be safe, but that the change shouldn't be necessary to do here since the current code assumes that two EntireAllocationMemoryLocations never have the same virtual variable.

I'm not sure that's true, though: An EntireAllocationMemoryLocation is an AllocationMemoryLocation (see here), and the virtual variable of an AllocationMemoryLocation is AllAliasedMemory when the allocation escapes (see here). So the assumption that two EntireAllocationMemoryLocation never share the same virtual variable is definitely false.

So either we need:

  • @dbartol to explain to us which code makes this assumption (and fix that code), or
  • to change the definition of getVirtualVariable for EntireAllocationMemoryLocation so that it always returns this even if the allocation escapes.

The latter of those two will then result in def.getVirtualVariable() = use.getVirtualVariable() implying that they have the same allocation (which means the change you are suggesting here will be enforced automatically).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cc @dbartol on ^

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no longer convinced that we assume that two different EntireAllocationMemoryLocations are never part of the same virtual variable. I believe that we try to treat the extra "variable" we introduce for pointer parameters like any other local variable: If its address never escapes, then it gets its own virtual variable. If its address does escape, it gets grouped into the "all aliased memory" virtual variable.

So, two different EntireAllocationMemoryLocations should never overlap, even if they are in the same virtual variable. I think this just means changing:

use instanceof EntireMemoryAllocation and
result instanceof MustExactlyOverlap

to

use = def and
result instanceof MustExactlyOverlap

I'm having trouble coming up with a test case that would show the difference in behavior, though.

@MathiasVP
Copy link
Contributor

Sorry! I missed the very last part of your sentence so I totally missed that you were waiting on me for this. Yeah, this change is clearly behavior changing, and I think we should find the "right" instead of introducing faulty logic.

As I wrote in my comment above, I think there is already something fishy going on. I will take a look and get back to you!

@geoffw0
Copy link
Contributor Author

geoffw0 commented Sep 24, 2024

I just had a look for the same problem on a couple more projects we've had performance troubles with lately (Miseryset/Android-ImageMagick7 and Sphereserver/Source-X) that we suspected might be related - but the change makes no difference (and there are signs those projects may have been improved by other changes). So there is in fact only one project known to be affected by this issue - AcademySoftwareFoundation/openexr.

Honestly I'm looking to finish with this work very soon. It's taken up way more of my brain space over the last few weeks than it deserves, and it has no relation to any of my actual priorities at the moment. I'm willing to:

  • (a) call this an imperfect improvement, merge as-is.
  • (b) abandon and close it (in practice, probably merge the new test, but not the alias analysis code change).
  • (c) work on a better solution, but only with a clear understanding of what this would look like.

What are your thoughts?

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 4, 2024

@dbartol are you able to reply to the last point in the thread?

One way or another, I am going to be done with this issue next week.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 8, 2024

I've just switched this to @dbartol's suggestion:

use = def and
result instanceof MustExactlyOverlap
  • performance is still very much improved (though a newer database doesn't match the numbers I posted earlier; AcademySoftwareFoundation/openexr, cpp/path-injection, before: I aborted after 10m; after: 48s)
  • the test does in fact reveal a bit about what's going on here; we lose the a / b conflation on test lines 184 and 446 (as with my solution), we also lose overlaps between things like *p and ?*p in quite a lot of places, i.e. between may and non-may accesses. I've no idea what the consequences of this are.

If it looks good I'll do a fresh DCA run.

@dbartol
Copy link
Contributor

dbartol commented Oct 8, 2024

we also lose overlaps between things like *p and ?*p in quite a lot of places

@geoffw0 can you point me to an example of this?

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 8, 2024

we also lose overlaps between things like *p and ?*p in quite a lot of places

@geoffw0 can you point me to an example of this?

Look at the aliased_ssa_overlap.expected changes in 4a131dd . The query is just reporting the results of getOverlap. For example the first change, on 13:

-| ssa.cpp:13:23:13:23 | *p | ssa.cpp:13:23:13:23 | ?*p | MustTotallyOverlap |
 | ssa.cpp:13:23:13:23 | *p[0..4)<int> | ssa.cpp:13:23:13:23 | *p | MayPartiallyOverlap |
 | ssa.cpp:13:23:13:23 | *p[0..4)<int> | ssa.cpp:13:23:13:23 | *p[0..4)<int> | MustExactlyOverlap |
 | ssa.cpp:13:23:13:23 | *p[0..4)<int> | ssa.cpp:13:23:13:23 | ?*p | MayPartiallyOverlap |
 | ssa.cpp:13:23:13:23 | *p[4..8)<int> | ssa.cpp:13:23:13:23 | *p | MayPartiallyOverlap |
 | ssa.cpp:13:23:13:23 | *p[4..8)<int> | ssa.cpp:13:23:13:23 | *p[4..8)<int> | MustExactlyOverlap |
 | ssa.cpp:13:23:13:23 | *p[4..8)<int> | ssa.cpp:13:23:13:23 | ?*p | MayPartiallyOverlap |
-| ssa.cpp:13:23:13:23 | ?*p | ssa.cpp:13:23:13:23 | *p | MayPartiallyOverlap |

corresponds with this C++ code:

int ChiPhiNode(Point* p, bool which1, bool which2) { // line 13
  if (which1) {
    p->x++;
  }
  else {
    p->y++;
  }

  if (which2) {
    p->x++;
  }
  else {
    p->y++;
  }

  return p->x + p->y;
}

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 10, 2024

@dbartol how do you feel about the *p and ?*p overlap changes revealed by the test?

(at this point I'd be happy to just undo the changes if we're not comfortable with them - leaving just the new test cases and an issue in a backlog somewhere)

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 10, 2024

Merged in recent main, fixed formatting, started new DCA run.

Update: DCA run successful, no alert changes, analysis time about the same overall.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 15, 2024

@dbartol any thoughts on my answer to your question above?

@dbartol
Copy link
Contributor

dbartol commented Oct 17, 2024

@geoffw0 Ah, I forgot that there will be two EntireAllocationMemoryLocations for a given allocation: a "may" access and a "must" access. Here's a branch that adds the correct fix and restores the test expectations.

@geoffw0
Copy link
Contributor Author

geoffw0 commented Oct 17, 2024

I've added your commit. I'm happy with it. ✨

cpp/path-injection on AcademySoftwareFoundation/openexr now takes 224s 👍

@jketema perhaps we could have an opinion from someone currently on the CPP team, then we can merge this?

@jketema
Copy link
Contributor

jketema commented Oct 18, 2024

@jketema perhaps we could have an opinion from someone currently on the CPP team, then we can merge this?

I don't really have an opinion, and forming one at this point would likely take an excessive amount of time, as this is code I have never looked at or touched. If DCA is happy with @dbartol 's last changes and if @MathiasVP thinks this looks ok, I'm happy for us to merge this.

Copy link
Contributor

@dbartol dbartol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@MathiasVP MathiasVP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this as well!

@jketema
Copy link
Contributor

jketema commented Oct 21, 2024

Thanks for the review. Merged in the latest main and running some DCA on this now.

@jketema jketema marked this pull request as ready for review October 21, 2024 18:19
@jketema jketema added the no-change-note-required This PR does not need a change note label Oct 21, 2024
@jketema
Copy link
Contributor

jketema commented Oct 21, 2024

DCA looks good, merging.

@jketema jketema merged commit 9ef1a9c into github:main Oct 21, 2024
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ no-change-note-required This PR does not need a change note
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants