Roaring64 doesn't handle out of memory #630

madscientist · 2024-06-02T18:46:17Z

In the 32bit version of croaring the returned value of roaring_malloc() is checked and if it's NULL (the runtime cannot allocate more memory) then NULL is returned and we don't try to dereference that pointer.

In the 64bit version of croaring, the return value of roaring_malloc() is not checked: the code assumes that it can never return NULL. If it does, we'll definitely get a SIGSEGV due to dereferencing a NULL pointer.

What is the correct memory model we should be using here?

lemire · 2024-06-03T01:21:30Z

We try to catch null pointers. Would you propose a pull request?

madscientist · 2024-06-03T03:20:09Z

I can do so later this week. I'm looking at the C++ 64bit wrapper right now.

madscientist · 2024-06-03T19:50:50Z

I took a quick look at this. Actually, the 32bit version doesn't handle this very well either. For example we see things like:

roaring_bitmap_t *roaring_bitmap_from_range(uint64_t min, uint64_t max,
                                            uint32_t step) {
    if (max >= UINT64_C(0x100000000)) {
        max = UINT64_C(0x100000000);
    }
    if (step == 0) return NULL;
    if (max <= min) return NULL;
    roaring_bitmap_t *answer = roaring_bitmap_create();
    if (step >= (1 << 16)) {
        for (uint32_t value = (uint32_t)min; value < max; value += step) {
            roaring_bitmap_add(answer, value);
        }
        return answer;
    }

Note that (a) the return code of NULL is already used by this method to denote "invalid input" so overloading it to also mean "out of memory" seems problematic, and (b) we do not check the return of roaring_bitmap_create() to see if it's NULL, before we (c) pass it to roaring_bitmap_add(), which (d) will dereference the pointer without checking for NULL.

And that's not all, of course, there are many places which internally allocate memory and could return NULL where they aren't checked; see below for examples.

It seems if we really want to get serious about handling this we'll need a lot of extra checks throughout all the functions we have, where if we get a NULL back we have to clean up whatever allocations we may have made, then return the NULL (if we're returning a pointer) or an error code.

I wonder if this is worthwhile for something that we hardly ever expect to happen and if it DOES happen would be almost impossible for our caller to cleanly recover from, anyway. Are there really programs out there which attempt to recover from this? All the software I've been involved with (even production software) more or less gives up and exits if malloc() returns NULL.

Another option to consider would be have roaring_*alloc() check for a return of NULL and "do something". For example we could add a new entry to the roaring_memory_t struct, something like .fatal or .out_of_memory or whatever which would contain a function that the roaring allocator would invoke when .malloc returns NULL. The important thing about this function is that it would to be marked noreturn (or at least behave that way). So, the default implementation maybe would generate some output to stderr then call _exit() or something, I don't know. If people are using C++ it could throw an exception.

Of course people can already install such a roaring_memory_t implementation for the alloc functions, but the key difference is that roaring would require this behavior and thus would not check any allocations for failure and this would be part of the documentation.

It's not great, but the alternative is to make extensive changes to both the 32bit and 64bit implementations to fully plumb through checks for NULL. Just as an example: many places we call container_add(), which invokes get_writable_copy_if_shared(), which could return NULL on OOM. Then we turn around and hand that out to places like bitset_container_set() which indirects through the pointer without checking. Making all this code support handling OOM errors will be an enormous amount of work, adding a lot of extra error handling conditions throughout the code, with no realistic possibility of testing them (at least not systematically).

If we don't want to commit to that then maybe it's better to just admit this up-front and make it part of the interface, rather than whistling past the graveyard.

madscientist · 2024-06-03T19:56:43Z

There are some memory cleanups that we certainly should do: for example I notice that roaring_bitmap_free() behaves correctly if you pass it NULL but roaring64_bitmap_free() will dump core: modern free() implementations (and C++ delete) all treat free of NULL as a no-op and don't crash.

lemire · 2024-06-03T20:38:39Z

@madscientist You are correct, hence my wording... We try to catch null pointers. We definitively do not succeed.

madscientist · 2024-06-03T20:49:26Z

Well, I'm not sure where to go from here. Does it make sense to change roaring64 to be "more or less as good" as roaring and check in the same places? It seems to me like we really only reliably check in places which are pretty unlikely to cause problems; for example allocating a roaring_bitmap_t (40 bytes or so) is unlikely to fail compared to expanding internal structures which are likely to be a lot larger.

Here's another idea: we could create a new method in roaring_memory_t for handling out of memory and leave it set to NULL by default. If it's NULL you get the behavior we have now, which is sort of "well, we try, in some places, to return NULL cleanly on OOM but there's a better than average chance we'll SEGV". If it's not NULL then we call that function when we run out of memory and that function would either not return (handle the problem, probably either _exit() or throw an exception), or else if it returns you get the same behavior as if it's NULL.

Then someone who cares can do something holistically, and there's some kind of story to tell in the docs.

Thoughts?

madscientist · 2024-06-03T20:51:10Z

I guess, users can already have this "out of memory" facility if they want it by just providing replacement alloc functions.

lemire · 2024-06-03T21:32:51Z

To be honest, I’m was hoping that static analysis could help.

lemire · 2024-06-03T22:38:13Z

Note that this is rather theoretical. In practice, under many operating systems, you will get back a non-nul value, and the failure occurs where you try to page in the memory.

lemire mentioned this issue Jun 4, 2024

fix: add null checks to roaring64 #633

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roaring64 doesn't handle out of memory #630

Roaring64 doesn't handle out of memory #630

madscientist commented Jun 2, 2024

lemire commented Jun 3, 2024

madscientist commented Jun 3, 2024

madscientist commented Jun 3, 2024 •

edited

Loading

madscientist commented Jun 3, 2024

lemire commented Jun 3, 2024

madscientist commented Jun 3, 2024

madscientist commented Jun 3, 2024

lemire commented Jun 3, 2024

lemire commented Jun 3, 2024

Roaring64 doesn't handle out of memory #630

Roaring64 doesn't handle out of memory #630

Comments

madscientist commented Jun 2, 2024

lemire commented Jun 3, 2024

madscientist commented Jun 3, 2024

madscientist commented Jun 3, 2024 • edited Loading

madscientist commented Jun 3, 2024

lemire commented Jun 3, 2024

madscientist commented Jun 3, 2024

madscientist commented Jun 3, 2024

lemire commented Jun 3, 2024

lemire commented Jun 3, 2024

madscientist commented Jun 3, 2024 •

edited

Loading