-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Valgrind errors with struct assignment on ARM and PPC64LE #776
Comments
Not beautiful but what about just suppressing theses errors https://www.valgrind.org/docs/manual/mc-manual.html#mc-manual.suppfiles, https://www.valgrind.org/docs/manual/manual-core.html#manual-core.suppress ? |
I think that won't work well: they can show up in 100 different places, they'll move around, etc. Obviously we're not going to be maintaining a separate file with every struct assignment in the codebase. :P I think suppressions might be workable for our CI purposes where we're targeting a small set of compilers and if we move something we can up just fix it on the spot. We also should be concerned about callers: I don't want to know how many security critical programs went years without being tested under valgrind just because they linked openssl and it returned ub tainted randomness. For something like the ct timer tester I think we could make it just totally suppress the memcpy arguments check-- that's not what we're looking for with that test. |
The syntax is pretty clever, we could for example suppress it in
I was thinking more into these directions. That the user can't use valgrind reliably is sad but not our fault and don't see what we could do about it except document it. And maybe report a valgrind bug. The only related bug I could find is https://bugs.kde.org/show_bug.cgi?id=148543. |
Yeah, we should talk to the valgrind authors. This isn't technically a valgrind bug and GCC has previously fixed the behaviour. But LLVM/Clang indicate that they will not change the behaviour and instead require that they be used with a libc that can self memcpy safely, and this isn't a crazy position. The memcpy-to-self seems like something that is pretty unlikely to be an actual bug, so perhaps at a minimum valgrind could be convinced to classify that as a different kind of error from general overlap. |
Do we know anyone that is involved in the C standard? ... perhaps they could be convinced to at least elevate self-memcpy to implementation defined. :) I normally don't consider the testing of something even started until a compiler bug or two is reported... getting the language changed would be a new bar to meet in the future. :) |
I think I might know someone on twitter, I'll post a tweet and start tagging :) |
Some more references: I'm not very motivated to talk to Valgrind after this thread but yeah, it's old, hm.
I don't believe that this is a language issue. I tend to agree with the clang devs here. |
Clang devs don't have any control over libc. But if they think must-support-memcpy-self is a reasonable requirement for memcpy, then isn't that a case that it's a reasonable implementation defined behaviour for memcpy? ::shrugs:: That is in fact defacto what they're relying on now-- that they're only used with a memcpy that has an implementation defined behaviour for self copies. You shouldn't let that thread throw you on valgrind-- the person with the argument for not changing valgrind doesn't appear to listed as a valgrind author, could just be an internet hothead. I'm not sure what change I'd ask from valgrind other than making it so you can separately suppress memcpy-self from overlaps (if that isn't already the case). I wouldn't want it to stop returning errors for actual bugs in source code. [Especially now that I see that thread pointed out a plausible memcpy construction that would break self assignment.] |
I think this would miss the point. Clang and libc together form a C implementation, and things like "implementation-defined" apply to C programs calling
Fair. Do you want to give a try? Otherwise I can do it next week or so, I guess I don't have the time right now. |
Yeah. That's also my point. They're perfectly right that the "implementation" (complier + libc) can do whatever it wants, I agree. But relying on undocumented libc behaviour is not wise, esp because glibc has changed before -- memcpy on overlapping regions used to work.
I'm going to seek some friendly input for a few days first, but sure. |
I can no longer reproduce this, not with clang or with gcc, I'm not sure if it is because of a code change or new compilers or valgrind changed
|
@elichai I'm not sure if we ever had this issue on x86_64. The matrix in the initial comment is on ARM8. |
Oops I forgot that part. you're right. |
This seems to happen with clang 7 also on -O2 builds: |
@gmaxwell Did you reach out to them? |
So we see this now with GCC too: There's an age-old open GCC bug: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=32667 So far we've seen this only in Line 84 in c6b6b8f
As long as this the case, I believe we should just mitigate this in our code. Here are a few things we could do:
The first four look all reasonable to me and I'd probably ACK any of those. If you ask me, we should just do the first. In the interest of avoiding bikeshedding, I would open the PR but it'll be easier if someone with an ARM system can do it because I can't test (without compiling for my phone, which is a PITA). |
or split into inplace and neg, and neg branches then calls inplace. Most usage would just use inplace directly. |
I'd really like to avoid pointer comparison, AFAIU comparing 2 pointers that point to different objects(ala not in the same array) is UB in C. |
No, it's defined. The |
With default configuration flags, ARMv8 and GCC 10.2.0, I get the |
@jonasnick Ok, damn. Is this the only instance? I'm not sure how to move forward then. I'm tempted to say "if it's only in two places, then let's fix it" but I worry that we'll end up replacing every struct assignment in the code base. At least we've seen this only with We should still benchmark if it makes a difference. Or directly split the function into two variants. If the compiler inserts |
This also occurs with on ppcel64. |
Right! This was in fact noted in the opening comment here but I mis-renamed the issue... |
I've looked at the output of I think the scope of the issue is still small but this is more than the two code locations we've seen before. :/ I suggest to add All of this is acceptable, I think. But it's an ugly hack. |
Fwiw, if we do this hack we should also test in CI that we actually use these methods (preferably by running the tests in valgrind on the affected architectures). |
I agree though it's hard to run Valgrind on foreign architectures. Valgrind and qemu-user don't go together... We could pay for some EC2 ARM instances and hand them to Cirrus CI (that is supported). But even this won't cover PowerPC. No matter what, this will be a project on its own. For now, I'd prefer some clever grep on the assembly. |
Well ok, the issue with this is that this needs to be something which is statically checkable. This leaves us with only two options:
In both cases, we'd hope the compiler won't turn it into a And in both cases, it's probably better to ensure that all of the field members are initialized to avoid UB (or any discussion on whether this is UB or not, which may be worse than the actual UB :P). So we think we should first merge #791 . |
I've run into this again in #1433, see https://cirrus-ci.com/build/5276023932059648. There's at least one new case
(Note:
No useful debug info here, but I tracked this down to the known |
LLVM people are working on a draft DR: https://reviews.llvm.org/D86993#4525896 |
For some time there has been a clang issue where clang emits a memcpy of a struct to itself and valgrind flags it as undefined behaviour. Clang developers gave the response that when the compiler does it, nothing is undefined behaviour. This is response is mostly true, but memcpy is a libc defined function and can change its behaviour out from under the compiler (and has in the past, it used to work okay with overlapping memory) -- in fact, memcpy can be replaced by dynamic linker at runtime (at the direction of the caller).
In any case wisdom of clang's behaviour aside, valgrind can't tell if it was the code or the compiler (which is also part of why valgrind is useful for finding compiler induced non-constant time behaviour).
For this library it is a practical problem both because valgrind errors are rightfully frightening to developers who would call the library but also because some people want to make running valgrind part of "make check" which would potentially elevate these useless errors to build failures (and teach users to ignore these sorts of reports).
It's also the case that where these issues are triggered is unstable and highly environment specific.
E.g. PPC64LE Fedora 32 Clang 10.0.0-2 -Os fails but other optimization levels do not.
Checkout this matrix of configurations on ARM8:
So out of 24 configurations, 4 fail.
Each of the failing cases look like:
(The clang build with only one error gets only the ecmult_const_impl.h:260 one, because :266 is endo only)
So there are failures with both GCC and Clang Os only. GCC does it for 32 and 64 only with endo. Clang does it only with 64 bit but doesn't care about endo.
GCC previously had a bug a long time ago on this which was closed as resolved: https://gcc.gnu.org/bugzilla//show_bug.cgi?id=19410
Bug in LLVM tracker: https://bugs.llvm.org/show_bug.cgi?id=11763
I think this is a serious impedement to using valgrind except as a development/review tool, because the location of these issues is unpredictable, depends on random build flags, depends on the compiler version, and could come and go due to unrelated changes to the codebase. There are a large number of struct assignments in the code too, so a large number of places where one could crop up.
The text was updated successfully, but these errors were encountered: