Partial Map-Reduce #491

Upgrade-tests use git to clone the current repository and check the necessary versions out. The cloned repository was always saved in vshard/test/var, even when the actual --var argument for test-run.py was /tmp/var. Lets better make the copied code also stored by the path given in --var. NO_DOC=internal

It makes the unknown variables treated as errors, not as 'nil's. Otherwise it is too easy to use a wrong variable name somewhere and get it as nil, and the tests would even pass, but not test what they are supposed to. NO_DOC=internal

The only map-reduce function is router.map_callrw(). At some point there was a task to introduce a new mode of Map-Reduce - partial, by bucket IDs, not on the whole cluster. For that task was introduced a new function router.map_part_callrw() which had the same Map-Reduce and error handling stages. Only Ref stage was different. The new helpers in this commit were supposed to reuse some code between those two map-call functions. Later it was decided to leave just one map-call function and add a new option to it. But these new helpers still look useful to have as separate functions. They make the map-call function really small and simple. NO_DOC=internal NO_TEST=refactoring

It ensures the ref is still in place. A read-only operation. It is going to be used in the future commits about partial map-reduce. Router will be going potentially more than once to some storages and at all times it would ensure the ref is still in place. NO_DOC=internal

The storage-size of the Partial Map-Reduce feature. NO_DOC=later

The previous commit was failing some tests. Lets patch them up. That commit isn't amended so as to keep its original shape in respect to the external contributor. NO_DOC=bugfix

Introduce a partial ref-map-reduce API for vshard. It guarantees that in case of success the function is executed exactly once on the storages, that contain the given list of buckets. NO_DOC=later

The previous commit was failing some tests. Lets patch them up. That commit isn't amended so as to keep its original shape in respect to the external contributor. NO_DOC=bugfix

This is useful for RW map-reduce requests which need to send multiple network requests in parallel to multiple masters. In-parallel means using is_async netbox feature. But it only works if the connection is already established. Which means that the connection establishment ideally must also be parallel. NO_DOC=internal NO_TEST=already covered

There were a number of minor issues with the previous several commits, like the tests running way too long or some cases not being covered or the code being non-critically suboptimal. Lets fix them all. The original commits aren't amended so as to keep their original shape in respect to the external contributor. NO_DOC=bugfix

There are 2 Ref-Map-Reduce functions right now - map_callrw() and map_part_callrw(). Their only difference is that the former refs the whole cluster, while the latter refs only a subset of storages. The rest is the same. There is an idea, that better lets merge these functions into one and make the bucket IDs an option. The commit extracts the Ref stages of both functions into separate helpers which will allow to keep this future single function very short and simple. NO_DOC=internal NO_TEST=refactoring

@TarantoolBot

The behavior is regulated with the new bucket_ids option. @TarantoolBot document Title: vshard: `bucket_ids` option for `router.map_callrw()` The option is an array of numeric bucket IDs. When specified, the Ref-Map-Reduce is only performed on the masters having at least one of these buckets. By default all the stages are done on all masters in the cluster. Example: ```Lua -- Assume buckets 1, 2, 3 cover replicasets UUID_A and UUID_B. res, err = vshard.router.map_callrw(func, args, {bucket_ids = {1, 2, 3}}) assert(res[UUID_A] == {func_result_from_A}) assert(res[UUID_B] == {func_result_from_B}) ```

Lets merge all map_callrw() tests into a single file. NO_DOC=internal

When there were only 2, all cases would either cover a single replicaset or "all" of them. Lets make them 3, so that some tests actually cover a part of a cluster which is not just a single replicaset. NO_DOC=internal

'moved_buckets' function would treat as "moved" all the buckets which are not strictly ACTIVE. But that isn't optimal. Also the 'moved_buckets' func would assume that when ref creation is started, by the end of it the buckets stay unchanged. That isn't true. Thirdly, the moved buckets could contain the destination where did they move to. Returning this to the router would make the re-discovery faster. Fourthly, PINNED buckets were not considered ACTIVE. The commit fixes all these issues. Firstly, when a bucket is SENDING, returning an error right away isn't good. The router would just keep retrying then, without any progress. The bucket is actually here, it is not moved yet. Better let the storage try to take a ref. Then one of 2 results are possible: - It waits without useless active retries. And then SENDING fails and becomes ACTIVE. Ref is taken, all good. - It waits without useless active retries. SENDING turns into SENT. Ref is taken for the other buckets, and this one is reported as moved. Similar logic applies to RECEIVING. Secondly, after a ref is taken, the not-moved buckets could become moved. Need to re-check them before returning the ref. Luckily, the storage can use bucket_generation to avoid this double-check when nothing changed in _bucket. NO_DOC=bugfix

During the partial Map-Reduce the router might visit some storages more than once. Happens when after a ref on storage-A another storage-B reports A as having taken some buckets. Then router would come back to A to confirm that. The storage still must hold its previously created ref in order for such checks to make any sense. Otherwise any of the previously confirmed buckets could have had escaped by now. Without the ref-checking the router could reach the Map stage and send some Map requests even though could detect earlier, that not all storages would succeed. This wasn't strictly speaking a bug, but it was clearly suboptimal behaviour leading to the requests being executed not on all the needed storages while the others would report errors. NO_DOC=internal

It tests not only partial Map-Reduce. It covers a bit of the full one as well. NO_DOC=internal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial Map-Reduce #491

Partial Map-Reduce #491

Commits on Sep 23, 2024

Commits on Oct 10, 2024