Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite the C++ Roaring64Map implementation to use 64-bit croaring #629

Open
madscientist opened this issue Jun 2, 2024 · 4 comments
Open

Comments

@madscientist
Copy link
Contributor

Now that we have a native implementation of 64-bit croaring in C (yay!!) the C++ interface to CRoaring should be reworked to use this rather than using a std::map of 32bit croaring structures.

I know this has been mentioned in the past I'm leaving this issue here as a reminder.

@madscientist
Copy link
Contributor Author

So, I did a very quick and dirty first pass which involved (a) copying roaring.hh to roaring64.hh and (b) query-replacing lots of "roaring" to "roaring64", just to see what's there and what's missing.

The very first thing I realize is that while roaring_bitmap_t is a defined type in header files for 32bit maps, roaring64_bitmap_t is an undefined type in headers: the definition is only present in the source file. This means that unlike the Roaring C++ class which contains an embedded roaring_bitmap_t, we will have to have Roaring64 contain a pointer (probably a unique_ptr) and perform a new operation in the constructor.

Of course this will work but I kind of don't like this from an efficiency etc. point of view. What is the thinking on creating some kind of internal header containing the definition of roaring64_bitmap_t? Obviously we cannot prevent it from being public because it would have to appear in the published headers.

Or is there some alternative that people would prefer? Just to continue poking at this I'll go with the pointer solution for now: what I'm really trying to understand is what parts of the C interface are missing in order to provide an equivalent C++ interface as for the 32bit croaring.

@SLieve
Copy link
Contributor

SLieve commented Jun 2, 2024

An internal-only header sounds fine to me. The biggest gaps I see in terms of feature parity are portableDeserializeFrozen and the associated scaffolding required, as well as extensive tests to ensure the implementations are exactly the same.

@madscientist
Copy link
Contributor Author

When looking at the actual implementation I'm not really sure that, contrary to my previous assertion, it's a good idea to break the abstraction. There's a lot of simplification to be had by making the C++ class just a wrapper around a pointer, and the performance shouldn't be any different than the C interface. By not inlining the implementation we don't need to worry about the 32bit methods like roaring_bitmap_overwrite and other similar things.

Yes, there are a few things I've had to comment out so far and I don't have it all compiling yet :) but we'll see. I'm still playing with it. I reworked the iterator as well, I kind of don't like the 32bit C++ iterator implementation. It would be nice of we could create an actual forward iterator, or even a complete random-access iterator: it seems like it should be possible.

@madscientist
Copy link
Contributor Author

I have an implementation of roaring64.hh which is basically hacked together but everything compiles. It needs a bunch more tests.

The things that have an interface in roaring.hh (32bit) that I commented out of the 64bit interface due to lack of support:

  • api::roaring64_bitmap_range_uint64_array()
  • api::roaring64_bitmap_remove_run_compression()
  • api::roaring64_bitmap_shrink_to_fit()
  • api::roaring64_bitmap_rank_many()
  • The non-portable serialize/deserialize functions
  • The bitmap_frozen functions
  • copy-on-write support
  • api::roaring64_bitmap_printf()
  • api::roaring64_bitmap_or_many()

Additionally there are features of the existing Roaring64Map which aren't supported:

  • Using sets of uint32_t values, as well as uint64_t values, to work with Roaring64 (e.g., creating a Roaring64 from a list of uint32_t)
  • Myabe some others... I didn't carefully check this.

The interface between Roaring and Roaring64map is already somewhat different, with different capabilities. I think the API for C++ roaring could be improved in some ways. The iterator situation is lacking, IMO, for example. I implemented the std::bitset methods as well so that Roaring64 could be used as a more-or-less drop-in replacemant for std::bitset (obviously bitset has a templated maximum size which roaring doesn't need so not everything makes sense).

I implemented the equivalent of the std::bitset::reference class so that we could return lvalues from operator[] for example. I expect this could also be used to allow output iterators in which case we could probably create bidirectional iterators. I'm not sure it's worth it though. We could definitely use const and reverse iterators though.

I implemented an iterator where end() was a nullptr for the iterator object so we didn't need the tricksy static variable. If we don't use bidirectional iterators this will be fine and is very simple. I also didn't implement post-increment operator++ as it's very expensive and not useful IMO. Ditto post-decrement operator--..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants