sum.c is erroneous in the presence of compiler optimization #2

comex · 2015-10-25T05:43:28Z

There are two problems with sum.c as written: since the sum, s, is not used, any optimizing compiler will remove the loop entirely; and even if it is used, at least GCC and Clang are able to determine that the loop will result in s == NUMBER and remove it anyway.

It's hard to come up with a simple alternate solution because compilers are pretty tricky with these kinds of simple loops. One possibility is to make s volatile, but this will somewhat artificially constrain performance by making the CPU store each iteration to memory rather than keeping it in a register. Another is to add

asm("" :: "r"(s));

in the loop, to force the compiler to put the sum in some (single) register before executing an empty block of assembly.

The text was updated successfully, but these errors were encountered:

auscompgeek · 2015-10-25T12:55:31Z

I see two possible clean solutions:

Move the summation into a separate function. This may or may not work, depending on whether gcc and clang are intelligent enough to determine whether a function is deterministic. I haven't investigated this.
Print out the end value of s. This is done in fill_array.c and fill_array_out_of_order.c as well, so the same should probably be done here.

jvns · 2015-10-25T13:52:12Z

Thanks for reporting this! This was originally tested with gcc -O2 on OS X and it didn't optimize out the loop, but it certainly doesn't work on my Linux machine.

radarek · 2015-11-02T13:57:21Z

On my machine (OSX 10.11) -O2 actually optimized it away:

$ gcc -O2 sum.c
$ time ./a.out 500000000

real    0m0.004s
user    0m0.001s
sys 0m0.002s

-O1 gives the same result. Only -O0 version isn't optimized in this case:

$ gcc -O0 sum.c
$ time ./a.out 500000000

real    0m1.079s
user    0m1.054s
sys 0m0.008s

$ gcc -v
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.0.0
Thread model: posix

I would suggest to use -O0 optimization level. We measure how fast is real loop, not optimized out one (which means no code at all).

edgar-bonet · 2019-02-10T11:33:51Z

radarek wrote:

I would suggest to use -O0 optimization level.

I disagree. At -O0, gcc tends to generate hugely inefficient code. Benchmarking that tells you nothing about the speed you can expect from a C program.

I suggest the following:

#include <stdlib.h>
#include <stdio.h>

// Number to guess: How many iterations of
// this loop can we go through in a second?

int main(int argc, char **argv) {
    unsigned int NUMBER, i, s;
    NUMBER = atoi(argv[1]);

    for (s = i = 0; i < NUMBER; ++i) {
        s += i;
    }
    printf("%u\n", s);

    return 0;
}

Notice three changes:

The value of s is printed in order to ensure it will be computed
s += 1 is replaced by s += i because with the former gcc is able to figure out the result at compile time, which it doesn't with the later
the numbers are unsigned in order to avoid undefined behavior when s overflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sum.c is erroneous in the presence of compiler optimization #2

sum.c is erroneous in the presence of compiler optimization #2

comex commented Oct 25, 2015

auscompgeek commented Oct 25, 2015

jvns commented Oct 25, 2015

radarek commented Nov 2, 2015

edgar-bonet commented Feb 10, 2019

sum.c is erroneous in the presence of compiler optimization #2

sum.c is erroneous in the presence of compiler optimization #2

Comments

comex commented Oct 25, 2015

auscompgeek commented Oct 25, 2015

jvns commented Oct 25, 2015

radarek commented Nov 2, 2015

edgar-bonet commented Feb 10, 2019