segment_map: Also segment the bucket array #1

StephanDollberg · 2024-03-15T15:15:40Z

Merge into our fork while we sort out the upstreaming part.

In segmented mode we only applied the segmenting to the values array but not the bucket array. As a result there the pattern of there still being a deallocation followed by an increased allocation when resizing the hash map continues to exist. Further, in environments where the max allocation size is limited because of fragmentation issues this can lead to problems. To avoid both of these issues this patch makes the bucket array use the same datastructure as the values array, i.e.: a `std::vector` when linear and `segmented_vector` when segmented (or the passed datastructure if specified). This extra indirection does add some overhead in the segmented case. Looking at the quick benchmarks we see: Before: ``` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmarking |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:------------- | 8,912,995.09 | 112.20 | 0.1% | 225,712,537.08 | 26,628,198.00 | 8.476 | 25,133,812.23 | 0.1% | 1.15 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector iterate while adding then removing` | 65,440,597.50 | 15.28 | 0.1% | 496,971,523.50 | 195,721,929.00 | 2.539 | 64,749,156.50 | 11.2% | 1.44 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector random insert erase` | 63,254,162.50 | 15.81 | 0.1% | 540,753,642.50 | 188,790,381.00 | 2.864 | 101,168,500.00 | 6.3% | 1.39 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector 50% probability to find` | 9,777,270.50 | 102.28 | 0.2% | 281,149,360.00 | 28,833,467.00 | 9.751 | 25,968,567.75 | 0.1% | 1.19 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector iterate while adding then removing` | 220,368,952.00 | 4.54 | 0.2% |2,707,978,150.00 | 659,198,358.00 | 4.108 | 347,649,399.00 | 3.8% | 2.43 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector random insert erase` | 156,887,435.00 | 6.37 | 0.1% |2,166,844,490.00 | 464,728,290.00 | 4.663 | 266,835,027.00 | 2.5% | 1.73 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector 50% probability to find` ``` After: ``` | ns/op | op/s | err% | ins/op | cyc/op | IPC | bra/op | miss% | total | benchmarking |--------------------:|--------------------:|--------:|----------------:|----------------:|-------:|---------------:|--------:|----------:|:------------- | 8,921,748.31 | 112.09 | 0.1% | 226,313,644.69 | 26,684,106.00 | 8.481 | 25,174,702.92 | 0.1% | 1.18 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector iterate while adding then removing` | 75,578,500.00 | 13.23 | 0.1% | 597,036,791.50 | 226,059,912.00 | 2.641 | 64,865,689.00 | 11.3% | 1.14 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector random insert erase` | 74,928,542.00 | 13.35 | 0.1% | 677,557,943.00 | 223,726,152.00 | 3.029 | 91,606,575.00 | 7.0% | 1.13 | `ankerl::unordered_dense::map<uint64_t, size_t> segmented_vector 50% probability to find` | 10,079,993.00 | 99.21 | 0.4% | 293,716,069.73 | 29,697,236.40 | 9.890 | 25,980,823.83 | 0.1% | 1.20 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector iterate while adding then removing` | 220,081,085.00 | 4.54 | 0.1% |2,721,992,469.00 | 658,245,042.00 | 4.135 | 345,686,575.00 | 3.8% | 2.42 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector random insert erase` | 158,126,693.00 | 6.32 | 0.1% |2,191,768,626.00 | 468,710,736.00 | 4.676 | 267,938,632.00 | 2.5% | 1.74 | `ankerl::unordered_dense::map<std::string, size_t> segmented_vector 50% probability to find` ``` If we think this is not unconditionally acceptable then we could possibly add another template parameter (or make IsSegmented an enum) to decide which parts are supposed to be segmented. Fixes martinus#94

CLAassistant · 2024-03-15T15:15:47Z

All committers have signed the CLA.

travisdowns · 2024-03-18T19:02:34Z

include/ankerl/unordered_dense.h

-            std::memcpy(m_buckets, other.m_buckets, sizeof(Bucket) * bucket_count());
+            if constexpr (IsSegmented) {
+                for (auto i = 0UL; i < bucket_count(); ++i) {
+                    at(m_buckets, i) = at(other.m_buckets, i);


Perhaps in the future this could still do the memcpy for each segment, right?

Yes, though we'd need some form of fragment iterator API.

travisdowns

LGTM

dotnwat

nice. is this the 3rd-party container that was referenced in Travis' issue filed in upstream Seastar regarding a fragmented data structure for use in metrics?

StephanDollberg · 2024-03-19T09:19:32Z

nice. is this the 3rd-party container that was referenced in Travis' issue filed in upstream Seastar regarding a fragmented data structure for use in metrics?

It would be one of the candidates I guess yes.

StephanDollberg requested review from dotnwat and travisdowns March 15, 2024 15:15

travisdowns reviewed Mar 18, 2024

View reviewed changes

travisdowns approved these changes Mar 18, 2024

View reviewed changes

dotnwat reviewed Mar 18, 2024

View reviewed changes

StephanDollberg merged commit e3ebdd2 into redpanda-data:redpanda Mar 19, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segment_map: Also segment the bucket array #1

segment_map: Also segment the bucket array #1

StephanDollberg commented Mar 15, 2024

CLAassistant commented Mar 15, 2024 •

edited

Loading

travisdowns Mar 18, 2024

StephanDollberg Mar 19, 2024

travisdowns left a comment

dotnwat left a comment

StephanDollberg commented Mar 19, 2024

segment_map: Also segment the bucket array #1

segment_map: Also segment the bucket array #1

Conversation

StephanDollberg commented Mar 15, 2024

CLAassistant commented Mar 15, 2024 • edited Loading

travisdowns Mar 18, 2024

Choose a reason for hiding this comment

StephanDollberg Mar 19, 2024

Choose a reason for hiding this comment

travisdowns left a comment

Choose a reason for hiding this comment

dotnwat left a comment

Choose a reason for hiding this comment

StephanDollberg commented Mar 19, 2024

CLAassistant commented Mar 15, 2024 •

edited

Loading