-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E (95% range scan, 5% insert) performance #2
Comments
Thanks for your question. I didn't investigate it before, but now have. Here's my understanding: TLDRThe reason is subtly stated in another paper.
Longer versionOne way to intuitively think about why ART requires more memory accesses is that user keys are only stored in the leaf nodes whereas in RB trees every node stores a key. Which means ART has additional supporting links/nodes to get to those leaf nodes. Hence any traversal that requires you to go through a lot of next-user-keys would require passing through those supporting links/nodes. I ran some experiments with the dataset of 100,000 random 64 bit integers. RBtree will have
In total 216,196 links. Hence ART has about 16k more links than RB tree. More links means more load instructions and likely more cache misses. Height of ART is 5 whereas RBtree's height is 20. Point queries only require traversing the height hence ART is faster. Range scan is a Plots of perf events. For better understanding of the traversal, here's a raw dump of the moves taken to get to the next key for a single YCSB scan request of length 50 in both ART, TreeMap. It'd be nice to somehow derive a theoretical worst case bound for ART for the number of links visited depending on Hope that explains why ART is slower for range scans as compared to RBTree. Let me know if you have further questions. |
Hm, I think you're right. However, in binary trees you also have to go up even more links sometimes during range-scans. But I think in comparison to a binary tree a cache oblivious B+ tree might be even way better because of the locality and CPU caches. That is, I don't know how a cache oblivious indirect binary tree compares (with just one array and without direct pointers), which is built using the Van Emde Boas layout. BTW: Any plans to kind of "advance" your ART to a HOT? https://dbis.uibk.ac.at/sites/default/files/2018-04/hot-height-optimized-author-version.pdf |
On another note, do you have any recommendations about the JVM implementations and hardware support (for instance SIMD instructions)? Seems in-memory it's now all about using cache-lines efficiently, but I don't know what's possible with JVM based languages nowadays. Also, I'm not sure if it makes sense to use Azul Zing or the GraalVM (the latter makes sense for me, for instance as I'm using lambdas often times, and small intermediate objects seem to be not that costly). |
Hi,
do you have any idea why range scans for 64Bit Integers are faster in Red/Black trees (TreeMap)?
BTW: Regarding the height they improved their trie with HOT, which especially should address the issue to reduce the height. I think they now use tries within the trie, basically.
Kind regards
Johannes
The text was updated successfully, but these errors were encountered: