Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial Map-Reduce #491

Merged
merged 17 commits into from
Oct 10, 2024
Merged

Partial Map-Reduce #491

merged 17 commits into from
Oct 10, 2024

Commits on Sep 23, 2024

  1. test: use vardir for repository copying

    Upgrade-tests use git to clone the current repository and
    check the necessary versions out.
    
    The cloned repository was always saved in vshard/test/var, even
    when the actual --var argument for test-run.py was /tmp/var.
    
    Lets better make the copied code also stored by the path given
    in --var.
    
    NO_DOC=internal
    Gerold103 committed Sep 23, 2024
    Configuration menu
    Copy the full SHA
    cea68f3 View commit details
    Browse the repository at this point in the history

Commits on Oct 10, 2024

  1. test: enable strict mode in Lua

    It makes the unknown variables treated as errors, not as 'nil's.
    Otherwise it is too easy to use a wrong variable name somewhere
    and get it as nil, and the tests would even pass, but not test
    what they are supposed to.
    
    NO_DOC=internal
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    c437f69 View commit details
    Browse the repository at this point in the history
  2. router: extract a couple of map-reduce helpers

    The only map-reduce function is router.map_callrw(). At some point
    there was a task to introduce a new mode of Map-Reduce - partial,
    by bucket IDs, not on the whole cluster. For that task was
    introduced a new function router.map_part_callrw() which had the
    same Map-Reduce and error handling stages. Only Ref stage was
    different.
    
    The new helpers in this commit were supposed to reuse some code
    between those two map-call functions.
    
    Later it was decided to leave just one map-call function and add a
    new option to it. But these new helpers still look useful to have
    as separate functions. They make the map-call function really
    small and simple.
    
    NO_DOC=internal
    NO_TEST=refactoring
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    5e3ebb4 View commit details
    Browse the repository at this point in the history
  3. storage: introduce ref.check()

    It ensures the ref is still in place. A read-only operation. It is
    going to be used in the future commits about partial map-reduce.
    Router will be going potentially more than once to some storages
    and at all times it would ensure the ref is still in place.
    
    NO_DOC=internal
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    165dc18 View commit details
    Browse the repository at this point in the history
  4. storage: implement partial map-reduce API

    The storage-size of the Partial Map-Reduce feature.
    
    NO_DOC=later
    darthunix authored and Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b743601 View commit details
    Browse the repository at this point in the history
  5. Make the tests pass from prev commit

    The previous commit was failing some tests. Lets patch them up.
    That commit isn't amended so as to keep its original shape in
    respect to the external contributor.
    
    NO_DOC=bugfix
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    87aefc7 View commit details
    Browse the repository at this point in the history
  6. router: implement partial map-reduce API

    Introduce a partial ref-map-reduce API for vshard. It guarantees
    that in case of success the function is executed exactly once on
    the storages, that contain the given list of buckets.
    
    NO_DOC=later
    darthunix authored and Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    5ab5b4c View commit details
    Browse the repository at this point in the history
  7. Make the tests pass from prev commit

    The previous commit was failing some tests. Lets patch them up.
    That commit isn't amended so as to keep its original shape in
    respect to the external contributor.
    
    NO_DOC=bugfix
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    8c95293 View commit details
    Browse the repository at this point in the history
  8. router: improve master connection parallelism in map-reduce

    This is useful for RW map-reduce requests which need to send
    multiple network requests in parallel to multiple masters.
    
    In-parallel means using is_async netbox feature. But it only
    works if the connection is already established.
    
    Which means that the connection establishment ideally must also
    be parallel.
    
    NO_DOC=internal
    NO_TEST=already covered
    darthunix authored and Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    ff8e8a0 View commit details
    Browse the repository at this point in the history
  9. Review fixes for Partial Map-Reduce

    There were a number of minor issues with the previous several
    commits, like the tests running way too long or some cases not
    being covered or the code being non-critically suboptimal.
    
    Lets fix them all. The original commits aren't amended so as to
    keep their original shape in respect to the external contributor.
    
    NO_DOC=bugfix
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    7ff139a View commit details
    Browse the repository at this point in the history
  10. router: move Ref stage of Map-Reduce into new func

    There are 2 Ref-Map-Reduce functions right now - map_callrw() and
    map_part_callrw(). Their only difference is that the former refs
    the whole cluster, while the latter refs only a subset of
    storages. The rest is the same.
    
    There is an idea, that better lets merge these functions into one
    and make the bucket IDs an option.
    
    The commit extracts the Ref stages of both functions into separate
    helpers which will allow to keep this future single function very
    short and simple.
    
    NO_DOC=internal
    NO_TEST=refactoring
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    b2c3c6e View commit details
    Browse the repository at this point in the history
  11. router: merge map_callrw and map_part_callrw

    The behavior is regulated with the new bucket_ids option.
    
    @TarantoolBot document
    Title: vshard: `bucket_ids` option for `router.map_callrw()`
    
    The option is an array of numeric bucket IDs. When specified, the
    Ref-Map-Reduce is only performed on the masters having at least
    one of these buckets. By default all the stages are done on all
    masters in the cluster.
    
    Example:
    ```Lua
    -- Assume buckets 1, 2, 3 cover replicasets UUID_A and UUID_B.
    res, err = vshard.router.map_callrw(func, args, {bucket_ids = {1, 2, 3}})
    assert(res[UUID_A] == {func_result_from_A})
    assert(res[UUID_B] == {func_result_from_B})
    ```
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    c72abb9 View commit details
    Browse the repository at this point in the history
  12. test: move test_map_callrw_raw() to another file

    Lets merge all map_callrw() tests into a single file.
    
    NO_DOC=internal
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    59deb28 View commit details
    Browse the repository at this point in the history
  13. test: +1 replicaset to map_callrw() tests

    When there were only 2, all cases would either cover a single
    replicaset or "all" of them. Lets make them 3, so that some tests
    actually cover a part of a cluster which is not just a single
    replicaset.
    
    NO_DOC=internal
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    ca39cac View commit details
    Browse the repository at this point in the history
  14. storage: fix moved buckets check

    'moved_buckets' function would treat as "moved" all the buckets
    which are not strictly ACTIVE. But that isn't optimal.
    
    Also the 'moved_buckets' func would assume that when ref creation
    is started, by the end of it the buckets stay unchanged. That
    isn't true.
    
    Thirdly, the moved buckets could contain the destination where
    did they move to. Returning this to the router would make the
    re-discovery faster.
    
    Fourthly, PINNED buckets were not considered ACTIVE.
    
    The commit fixes all these issues.
    
    Firstly, when a bucket is SENDING, returning an error right away
    isn't good. The router would just keep retrying then, without any
    progress. The bucket is actually here, it is not moved yet.
    
    Better let the storage try to take a ref. Then one of 2 results
    are possible:
    - It waits without useless active retries. And then SENDING fails
        and becomes ACTIVE. Ref is taken, all good.
    - It waits without useless active retries. SENDING turns into
        SENT. Ref is taken for the other buckets, and this one is
        reported as moved.
    
    Similar logic applies to RECEIVING.
    
    Secondly, after a ref is taken, the not-moved buckets could become
    moved. Need to re-check them before returning the ref. Luckily,
    the storage can use bucket_generation to avoid this double-check
    when nothing changed in _bucket.
    
    NO_DOC=bugfix
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    ead3770 View commit details
    Browse the repository at this point in the history
  15. storage: fix moved buckets ref check

    During the partial Map-Reduce the router might visit some storages
    more than once. Happens when after a ref on storage-A another
    storage-B reports A as having taken some buckets.
    
    Then router would come back to A to confirm that. The storage
    still must hold its previously created ref in order for such
    checks to make any sense. Otherwise any of the previously
    confirmed buckets could have had escaped by now.
    
    Without the ref-checking the router could reach the Map stage and
    send some Map requests even though could detect earlier, that not
    all storages would succeed.
    
    This wasn't strictly speaking a bug, but it was clearly suboptimal
    behaviour leading to the requests being executed not on all the
    needed storages while the others would report errors.
    
    NO_DOC=internal
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    f316ad1 View commit details
    Browse the repository at this point in the history
  16. test: rename map_part_test to map_callrw_test

    It tests not only partial Map-Reduce. It covers a bit of the full
    one as well.
    
    NO_DOC=internal
    Gerold103 committed Oct 10, 2024
    Configuration menu
    Copy the full SHA
    08aa380 View commit details
    Browse the repository at this point in the history