Minor fixes #592

dorjoy03 · 2023-10-15T16:50:47Z

Just a few smaller things I noticed.

gperciva · 2023-10-15T18:17:48Z

libcperciva/events/events_network.c

@@ -25,7 +25,7 @@
 * the value INT_MAX + 1 (in case every possible file descriptor is in use
 * and being polled for).
 */
-CTASSERT((nfds_t)(-1) <= (size_t)(-1));
+CTASSERT((nfds_t)(-1) <= (size_t)(SIZE_MAX));


Thanks for your interest, @dorjoy03!

As far as I'm aware, (size_t)-1 is identical to (size_t)SIZE_MAX; so this is a stylistic change.

In this case,

CTASSERT((nfds_t)(-1) <= (size_t)(-1));

looks more clear to me: the only difference between the left-hand side and right-hand side is the type, so we immediately see that it's comparing the types.

Could you please elaborate on why you think SIZE_MAX is more appropriate here?

My main motivation for this change here was that SIZE_MAX is more explicit as it's provided in the stdint.h header i.e., the maximum value a size_t type can have and by (size_t)(-1) we do mean exactly that but it is more implicit than SIZE_MAX. I guess for me (size_t)(-1) would make more sense if there was no explicit max value provided in any header.

But I can revert this change here if the previous version is more clear to you. Let me know. I can do a force push together with other changes needed (if any).

I think that (size_t)(-1) is actually more idiomatic in these cases, so unfortunately the first commit isn't necessary. In system C programming, there's a lot of history behind (int)(-1) indicating a "failure" or "invalid value"; (size_t)(-1) is doing the same thing here.

For example, take a look at some of the results for
https://github.com/search?q=repo%3Afreebsd%2Ffreebsd-src+%22%28size_t%29-1%22&type=code
in the FreeBSD source code. (As a side note, they seem to prefer (size_t)-1, i.e. without the parentheses around the -1, which isn't how I'd write it, but hey, it's not my code!)

As another example, consider the mbrlen() and mbrtowc() multibyte functions in C99. They return (size_t)(-1) or even (size_t)(-2) to indicate that something funky is happening.

Other than the CTASSERT, we're always using (size_t)(-1) to indicate an invalid value; so I think it's more appropriate to keep them as such, rather than replacing them with SIZE_MAX.

Got it. Makes sense. I will remove the first commit. Thanks!

gperciva · 2023-10-15T18:19:59Z

libcperciva/events/events_network.c

@@ -25,7 +25,7 @@
 * the value INT_MAX + 1 (in case every possible file descriptor is in use


Note to self: there's a missing word in this comment; I'm fixing that in the libcperciva repository.

gperciva · 2023-10-15T18:52:18Z

libcperciva/datastruct/elasticarray.c

@@ -161,8 +161,7 @@ elasticarray_append(struct elasticarray * EA,
 	size_t nsize;

 	/* Check for overflow. */
-	if ((nrec > SIZE_MAX / reclen) ||
-	    (nrec * reclen > SIZE_MAX - EA->size)) {
+	if (nrec > ((SIZE_MAX - EA->size) / reclen)) {


For the record, these changes to the overflow checks look great.

I was curious, so I dug into the archives. Those overflow checks were part of the earliest instance of elasticarray.c I could find, which was spiped 0.9.9.1 on 2011-07-02. I'm guessing that Colin copied the overflow check from elasticarray_resize() to _append() and _shrink(), then added a function-specific check to each.

I don't know what I was thinking 12 years ago, but that idiom is often used when reclen is a constant since it allows the division to be performed at compile-time. Of course in this case reclen isn't a constant, but maybe we could move that part into the macro in elasticarray.h?

I don't know if the performance difference would be meaningful though; want to code up a performance test for elasticarray?

I think one reason of doing it the division way i.e., nrec > SIZE_MAX / reclen instead of the multiplication way i.e., nrec * reclen > SIZE_MAX is so that the nrec * reclen doesn't overflow, right?

My initial guess is that we're not going to see any different in performance in any real use. I mean, as soon as malloc() or network traffic gets involved, the time it takes arithmetic is going to be a total non-issue.

But sure, I'll write up some perftests. It would still be interesting to see if it would be faster in a pathological case.

@dorjoy03: yes, doing that check with a division is necessary to avoid overflow.

Colin's idea is to move the

if ((nrec > SIZE_MAX / reclen)

check into the header file, in the definition of prefix##_append(). Since prefix##_append() knows the value of reclen at compile-time, the compiler can do the division, so that check would be a simple if (nrec > SOMETHING). If that overflow check doesn't fail, then we'd call elasticarray_append() and do the new check.

The question now is "is having a check in the header file worth it?". I'm leaning towards "no", because I'd expect that overflow check to trigger approximately 0% of the time. But I need to investigate more, because there might be some cases where it does arise. And even if it's not worth it for _append(), it might be worth it for _shrink().

I'm not working tomorrow, but I'll revisit this on Tuesday.

We're cautious about changing code. One tech company uses the motto "move fast and break things"; we aim to do the reverse: "move slow and don't break anything". :)

The reason for removing the first condition expression is that if the first condition evaluates to true, the second condition will also evaluate to true. So doing only the second condition check without overflow should handle all the cases that were handled before.
But of course it's always good to be safer. Let me know if I can be of any help.

dorjoy03 · 2023-10-16T14:04:21Z

I have now removed the SIZE_MAX changes

dorjoy03 · 2023-10-16T17:49:10Z

Noticed that in the resize function, there were expressions like EA->alloc * 2 and nsize * 4 which could overflow. Although not sure if it can ever happen in practice or the implications of it. I did some changes in e99ebbf to fix that.

gperciva · 2023-10-17T18:31:34Z

libcperciva/datastruct/elasticarray.c

 		/* We need to enlarge the buffer. */
-		nalloc = EA->alloc * 2;


I think the problem here is that the comments don't make it clear what's happening. (It took me a while to figure it out!)

Let's look at the previous code, imagining that size_t can only hold one decimal digit, with the example of:

SIZE_MAX = 9; EA->alloc = 6; nsize = 8;

We then get nalloc = 6 * 2 = 12 mod 10 = 2, due to our "simplified example" size_t only going up to 9.

The next lines (of the original code) is:

if (nalloc < nsize) nalloc = nsize;

Since we have 2 < 8, we set nalloc = 8. That's an entirely suitable value.

In other words, the if statement has two benefits. It sets nalloc to a good value:

if doubling EA->alloc is not enough to hold nsize

if doubling EA->alloc "overflows"

I put "overflows" in quotation marks, because according to the C99 specification, it's not technically "overflowing":

A computation involving unsigned operands can never overﬂow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

C99 6.2.5 Types, paragraph 9

If you can think of a counter-example, I'm all ears! But unless there's a specific example where this code produces a bad value, I think the original code is fine.

It would certainly benefit from a comment, though! I'm happy to write it, but I'm also happy if you'd like to write one. I think it needs a comment immediately before the if (nalloc < nsize) line, explaining the two cases where it's useful.

PS: I forgot to add that size_t is definitely unsigned:

The types are ... size_t which is the unsigned integer type of the result of the sizeof operator

C99 7.17 Common definitions <stddef.h>

Right. In this change the old code was probably doing the right thing but would have been clearer with a comment. I noticed that too but thought it could be clearer to handle the 'overflow' / 'wrap around' in code more explicitly.
But if you look at the nsize * 4 below, that one could overflow in a problematic way, right? so I changed both.

Hmm. That's information that would be great to put into the commit message, so that it's clear to everybody that the commit includes both fixes an "overflow" and rewrites other code for clarity. For tarsnap, we prefer not to mix multiple things in a single commit. So I would have made one commit for the nsize * 4 fix, and another for rewriting.

Anyway. Now that I understand what's happening with the (EA->alloc < size), I don't think the rewriting is necessary, particularly because it adds an extra if statement. I think the best solution is to have a commit which adds a comment as such:

/* We need to enlarge the buffer. */ nalloc = EA->alloc * 2; /* * Safely handle if nalloc: 1) is not large enough to hold * nsize, or 2) was reduced modulo SIZE_MAX+1. */ if (nalloc < nsize) nalloc = nsize;

PS: I'm not 100% sold on the exact wording of that new comment. In particular, the use of "SIZE_MAX+1" is questionable: I'm not certain that SIZE_MAX must be a power of 2.

(I'm still confident that the right solution is a comment above if (nalloc < nsize), it's just that I'm searching for the best wording of that comment.)

Could we instead add more comments above the nalloc = EA->alloc * 2 statement and keep everything else like before. The comment could be something like:

/* * We need to enlarge the buffer. * Even if the below expression could do modulo wrapping, the nalloc < nsize check underneath should take care of it. */ nalloc = EA->alloc * 2;

Two quibbles: 1) "should" or "will"? If we write "should", then that sounds like we're hoping it'll work. But as Yoda said, "do or do not, there is no ~~try~~ hope".
2) line wrapping; there shouldn't be a newline after "buffer.". If we really need to have a manual line-break, the comment should start with /*-. (That's a somewhat obscure detail in our STYLE.)

Hmm. What about

In the case of modulo wrapping, the if statement will ensure a sensible default.

I'm not sold on the wording, though. I still think it's more clear to say:

We need to enlarge the buffer.

Handle special cases. (and give details)

but I could be convinced otherwise!

Added code comment in e7500c0.

gperciva · 2023-10-17T18:52:57Z

libcperciva/datastruct/elasticarray.c

 		if (nalloc < nsize)
 			nalloc = nsize;
-	} else if (EA->alloc > nsize * 4) {


I'm much more suspicious about this code. Unless I've made a mistake in my investigation, there's multiple cases where this produces bad values.

(all examples with SIZE_MAX = 9)

For example, if we have:

EA->alloc = 5 nsize = 5

Then we end up with 5 * 4 mod 10 = 0, so we take this case, but then calculate nalloc = nsize * 2 = 5 * 2 mod 10 = 0, which is not at all what we want!

Maybe that's a silly example; why would we be resizing to the same size?

Consider another one:

EA->alloc = 7 nsize = 6

We're checking nsize * 4 = 6 * 4 mod 10 = 4, so we take this case; then set nalloc = 6 * 2 mod 10 = 2.

I'm going to break for lunch, then double-check this.

(I normally wouldn't post a half-finished investigation, but I thought it might be useful to see what I meant by "finding a counter-example" in my response to the other code in this function.)

Agree. As I said in the reply to the previous comment, the 'if' block is not buggy but this 'else if' block definitely looks buggy which is why I changed this to division instead. But I definitely agree that the 'if' block's changes is not really needed as it was not buggy but the main motivation was to be more explicit as I needed to change this 'else if' block anyway.

Let me know your final thoughts. Good stuff!

Changing it to

if (EA->alloc / 4 > nsize)

would succeed in making it safe, but division is slower than multiplication.

In this case, how likely is this going to be a problem?

on a 64-bit system, this would mean requesting over 4.6e18 bytes (or 4.6 million TB). That's never going to happen.

on the other hand, SIZE_MAX is only required to be at least 65535. That would mean we'd run into this problem with only 16 kB, which is an entirely reasonable size for an elastic array.

A more plausible "old or low-powered system" would be 32 bits, which would put the "problematic request" at 1 GB. That's unlikely, but not impossible.

Right now, I'm leaning towards keeping (EA->alloc > nsize * 4) and doing the nalloc = nsize * 2 arithmetic. But then adding

/* If nsize is huge, the above calculations might be wrong. */ if (nsize > SIZE_MAX / 4) nalloc = nsize;

The compiler will calculate SIZE_MAX / 4, so it's a simple if (nsize > WHATEVER) check. And hopefully the branch prediction will figure out that nsize is not likely to be huge; in which case, the performance hit would be negligible.

Good discussion. Is the division going to be problematic in practice perf wise? Although I did not think it through, if so, could we do right shift instead of division like: else if ((EA->alloc >> 2) > nsize)? What do you think?

Also it might make sense to think about the original essence of the existing algorithm. I think if we assume that there is no overflow in the 'else if' case, then the original algorithm meant if EA->alloc is equal or greater than nsize i.e., the if condition is not taken, and if it really was greater than nsize * 4, then down the line a realloc would happen of size nsize * 2. But now that the else if block could be taken (according to existing code) even though EA->alloc is not really greater than nsize * 4 and if we use the new if(nsize > SIZE_MAX / 4) condition as you suggested, a realloc call might happen down below. But it really meant to go to the last 'else' block which doesn't cause any realloc system call.

Let me know what you think.

Oops, my bad on the "division" thing. Since we're dealing with unsigned integers, I think it's safe to assume that the compiler would implement it as EA->alloc >> 2, so speed isn't a concern.

With that in mind, I'm actually leaning towards EA->alloc / 4 after all.

...

I kind-of feel like an undergraduate physics student doing thought experiments about the speed of light. "Suppose that we are in a space ship travelling at 0.9c, but now we want to decelerate to 0.26c. Should we resize the array?"

... yeah, the closest match to the original algorithm is to do nothing. So EA->alloc / 4 would be the best thing.

So I now think that it's best to go with one commit which does

- } else if (EA->alloc > nsize * 4) { + } else if (EA->alloc / 4 > nsize) {

and explains why in the commit message.

Great stuff! Separate commit fixing the bug in 6b2e3cb.

gperciva · 2023-10-19T18:27:50Z

libcperciva/datastruct/elasticarray.c

@@ -28,6 +28,12 @@ resize(struct elasticarray * EA, size_t nsize)
 	if (EA->alloc < nsize) {
 		/* We need to enlarge the buffer. */
 		nalloc = EA->alloc * 2;
+		/*
+		 * Handle if nalloc is not large enough to hold nsize which can


Missing comma: hold nsize, which can

gperciva · 2023-10-19T18:28:22Z

libcperciva/datastruct/elasticarray.c

+		 * Handle if nalloc is not large enough to hold nsize which can
+		 * happen when: 1) EA->alloc is small enough that EA->alloc * 2
+		 * is still smaller than nsize, or 2) EA->alloc is large enough
+		 * that EA->alloc * 2 wraps around becoming smaller than nsize.


Missing comma: wraps around, becoming smaller

gperciva · 2023-10-19T18:37:19Z

libcperciva/datastruct/elasticarray.c

@@ -28,6 +28,12 @@ resize(struct elasticarray * EA, size_t nsize)
 	if (EA->alloc < nsize) {
 		/* We need to enlarge the buffer. */
 		nalloc = EA->alloc * 2;
+		/*


Almost there! Just a few nitpicks left.

In addition to two missing commas in the comment, there's a few things in the commit message. I'm going to copy the original one, so that it's easier to see in the github web interface:

This commit adds a code comment describing the cases handled by the if (nalloc < nsize) block. One such case is the potential wrap around of EA->alloc * 2. Without the code comment, it was not quite apparent upon seeing the code that the wrap around is properly handled.

Please use two spaces after a period. (Yes, it's a somewhat niche style of English writing, but that's what it says in STYLE.)

I would delete "code" in "code comment" (this happens twice)

I would replace "quite apparent upon seeing the code" with "clear".

So that part of the sentence would be "...it was not clear that the wrap..."

gperciva · 2023-10-19T18:41:44Z

libcperciva/datastruct/elasticarray.c

@@ -30,7 +30,7 @@ resize(struct elasticarray * EA, size_t nsize)
 		nalloc = EA->alloc * 2;
 		if (nalloc < nsize)
 			nalloc = nsize;
-	} else if (EA->alloc > nsize * 4) {
+	} else if (EA->alloc / 4 > nsize) {


Agreed on the code! Just a few nitpicks in the commit message. Again, the original one:

In the case of modulo wrapping of nsize * 4, the else if block could be
taken even though EA->alloc is not really greater than nsize * 4. This
can lead to nalloc becoming smaller than nsize which will result in the
buffer to be reallocated to be smaller than the desired nsize.

two spaces after a period, please

missing comma: "smaller than nsize, which will"

gperciva · 2023-10-19T19:42:07Z

Looking good!

I'm still trying to wrap (no pun intended) my mind around the proposed changes to _append and _shrink, and whether anything should go into the header file, etc.

Could you please put the 2 commits for resize() into a new PR? (after modifying them)

That way, those 2 commits can move on to the next stage of review, without all the extra discussion in this PR. I'll continue to work on the perftests.

dorjoy03 · 2023-10-20T14:55:42Z

Thanks @gperciva

Could you please put the 2 commits for resize() into a new PR? (after modifying them)

I submitted a new PR #595. I have gone through all the nitpicks there.

Should I reset this branch to the first commit and force push, now that the later 2 commits have been made into a separate PR?

gperciva · 2023-10-20T19:18:26Z

Thanks! When it's convenient, sure you could omit those commits from this PR. But it's not a priority right now, since there's still the perftests to go.

dorjoy03 · 2023-10-20T19:31:27Z

Okay. I will wait for the pertests then.

gperciva · 2023-10-21T21:47:04Z

When convenient, please omit those 2 commits, rebase on top of master, and do a force push.

Now that #597 is merged, I'm hoping that I'll have the ability to manually trigger the Actions.

dorjoy03 · 2023-10-22T08:24:45Z

When convenient, please omit those 2 commits, rebase on top of master, and do a force push.

Done.

gperciva · 2023-10-25T23:19:04Z

Hmm, unless I've done something terribly wrong in the perftest Tarsnap/libcperciva#502, it looks like the changes are significantly slower. I'm surprised that there's this much of a difference: a factor of 2 slower for repeated _append() or _shrink() operations?

If you can see any mistakes in the perftest, I'm more than happy to fix them and run the tests again!

dorjoy03 · 2023-10-26T17:45:37Z

I believe there is nothing else to do other than to close this PR as per Tarsnap/libcperciva#502 (comment)

gperciva reviewed Oct 15, 2023

View reviewed changes

dorjoy03 force-pushed the minor-fixes branch from 756f05c to cf0de1f Compare October 16, 2023 14:02

gperciva reviewed Oct 17, 2023

View reviewed changes

dorjoy03 force-pushed the minor-fixes branch from e99ebbf to e7500c0 Compare October 19, 2023 16:27

dorjoy03 requested a review from gperciva October 19, 2023 16:30

gperciva reviewed Oct 19, 2023

View reviewed changes

dorjoy03 mentioned this pull request Oct 20, 2023

Elastic array wrap around bug #595

Closed

elasticarray: remove redundant checks

a0af45b

dorjoy03 force-pushed the minor-fixes branch from e7500c0 to a0af45b Compare October 22, 2023 08:21

gperciva mentioned this pull request Oct 25, 2023

Elasticarray perftest Tarsnap/libcperciva#502

Merged

dorjoy03 closed this Oct 26, 2023

		@@ -25,7 +25,7 @@
		* the value INT_MAX + 1 (in case every possible file descriptor is in use

Minor fixes #592

Minor fixes #592

Conversation

dorjoy03 commented Oct 15, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dorjoy03 commented Oct 16, 2023

dorjoy03 commented Oct 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dorjoy03 Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gperciva Oct 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dorjoy03 Oct 18, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gperciva commented Oct 19, 2023

dorjoy03 commented Oct 20, 2023

gperciva commented Oct 20, 2023

dorjoy03 commented Oct 20, 2023

gperciva commented Oct 21, 2023

dorjoy03 commented Oct 22, 2023

gperciva commented Oct 25, 2023

dorjoy03 commented Oct 26, 2023

dorjoy03 Oct 18, 2023 •

edited

Loading

gperciva Oct 17, 2023 •

edited

Loading

dorjoy03 Oct 18, 2023 •

edited

Loading