Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: implement partial map-reduce #442

Closed
wants to merge 3 commits into from

Conversation

darthunix
Copy link
Contributor

@darthunix darthunix commented Nov 7, 2023

Extend ref-map-reduce API with a new partial execution method:

vshard.router.partial_callrw(bucket_ids, func, args, opts, callback)

It allows users to call a function exactly once on all masters that contain buckets from the list.

Also current PR improves the connection establishment for all ref-map-reduce methods.

@darthunix darthunix marked this pull request as ready for review November 7, 2023 14:58
@Gerold103 Gerold103 self-requested a review November 9, 2023 20:18
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch! Good stuff, but we need to work on it a bit.

README.md Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
vshard/storage/init.lua Outdated Show resolved Hide resolved
vshard/storage/init.lua Outdated Show resolved Hide resolved
vshard/storage/init.lua Outdated Show resolved Hide resolved
vshard/storage/init.lua Outdated Show resolved Hide resolved
@darthunix darthunix marked this pull request as draft November 25, 2023 07:35
@darthunix darthunix force-pushed the multicall branch 4 times, most recently from 4950115 to 0995a72 Compare November 28, 2023 06:19
@darthunix darthunix marked this pull request as ready for review November 28, 2023 06:19
@darthunix darthunix marked this pull request as draft November 29, 2023 06:14
@darthunix darthunix marked this pull request as ready for review November 29, 2023 06:40
@darthunix darthunix marked this pull request as draft November 29, 2023 10:08
@darthunix darthunix marked this pull request as ready for review November 30, 2023 09:32
@darthunix darthunix changed the title Feat: implement multicall Feat: implement partial map-reduce Nov 30, 2023
vshard/router/init.lua Outdated Show resolved Hide resolved
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch!

vshard/storage/init.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
vshard/replicaset.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! We are on the finish line already.

vshard/storage/init.lua Show resolved Hide resolved
test/storage-luatest/storage_1_test.lua Outdated Show resolved Hide resolved
Comment on lines 205 to 267
local bid_extra = 3001
local timeout = 0.1
local rid = 42
local res, err = ivshard.storage._call(
'storage_ref_with_lookup',
rid,
timeout,
{bid1, bid_extra}
)
ilt.assert_equals(err, nil)
ilt.assert_equals(res, {bid_extra})
ivshard.storage._call('storage_unref', rid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, cover:

  • An error. And ensure that a ref wasn't made then.
  • More than one bucket missing. "One" is a corner case which might be working by luck.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More than one bucket missing

This one is still not covered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look at the "multiple buckets" section of the test_ref_with_lookup().

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The moved result of storage_ref_with_lookup() is always empty or just one bucket, this is my point. I can't see a case when moved has more than a single bucket in it. I think this is important to test because by accident now or some time afterwards somebody might break the storage code so it would exit on the first loop iteration, and your tests would still pass.

vshard/router/init.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
@darthunix darthunix force-pushed the multicall branch 2 times, most recently from 5951459 to b5abf70 Compare January 29, 2024 10:15
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes!

test/instances/storage.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
vshard/replicaset.lua Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
Comment on lines 369 to 460
res = g.router:exec(function(bid1, bid2)
local val, err, err_uuid = ivshard.router.map_part_callrw({bid2, bid1}, 'do_map', {3},
{timeout = iwait_timeout})
return {
val = val,
err = err,
err_uuid = err_uuid,
}
end, {bid1, bid2})
t.assert(res.val == nil)
t.assert_covers(res.err, {
code = box.error.PROC_LUA,
type = 'ClientError',
message = 'map_err'
})
t.assert_equals(res.err_uuid, rs2_uuid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add a check after this, that no dangling refs are left on the storages. For that you can check vshard.storage.ref.count == 0 on both instances.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked handling unref error

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find it anywhere. Can you please point at a specific location?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have really lost this code on rebases. Fixed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, still no check for absence of references.

Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
test/router-luatest/router_test.lua Outdated Show resolved Hide resolved
vshard/router/init.lua Outdated Show resolved Hide resolved
test/storage-luatest/storage_1_test.lua Outdated Show resolved Hide resolved
@darthunix darthunix force-pushed the multicall branch 2 times, most recently from d261e12 to d0074ff Compare March 6, 2024 07:02
Introduce a partial ref-map-reduce API for vshard. It guarantees
that in case of success the function is executed exactly once on
the storages, that contain the given list of buckets.

Algorithm.

Router:
1. Group buckets by replicasets based on the router's cache.
2. Ref stage. For each storage:
   a. Async send ref id, timeout and a group of the corresponding buckets to
   the storage. The aim is to reference this storage and check what buckets
   are absent.
Storage:
3. Refer session with the ref id and timeout, passed from the router.
4. Lookup for the passed buckets. If any of them were not found on the storage,
return these buckets back in response to the router.
Router:
5. Await and collect returned responses. If timeout has expired, set the error for
this response.
6. If any of responses contains error,send unref to the refed storages and return
the error to the user.
7. If the collected results contain moved buckets, search for them and update
the router's cache. Decrease the timeout and goto 1.
8. Map stage. For each storage:
   a. Replace a bucket list with a group of buckets refed on the target storage.
   b. Async send a map function with modified arguments and a ref id to the storage.
Storage:
9. Execute storage_map: if the ref id has expired, return error. Otherwise,
ref.use -> execute -> ref.del from storage_map(). Return results.
Router:
10. Reduce stage. Await results (and optionally apply a callback to each result):
    if timeout expired, return error to the user. Otherwise, return result.
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes! Hm, still some of the old comments remain ignored. Lets recite them all here again (old and new ones together):

Also have a look at CI, please. It is completely red.

Comment on lines +235 to +238
-- Call a partial map one more time to make sure there are no references left.
res = g.router:exec(function(bid1, bid2)
local val, err, err_uuid = ivshard.router.map_part_callrw(
{bid1, bid2}, 'do_map', {42}, {timeout = iwait_timeout})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does calling map_part_callrw() second time ensures that there are no references left? It will succeed even if there are references anyway.

Comment on lines +282 to +286
-- Check that there is no dangling references after the error.
init = map_part_init()
res = g.router:exec(function(bid1, bid2)
local val, err, err_uuid = ivshard.router.map_part_callrw({bid1, bid2}, 'do_map', {3},
{timeout = iwait_timeout})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto. The only 2 ways to check if there are no references left is to 1) try to send a bucket, 2) manually check vshard.storage.ref.count == 0. The second is the simplest way.

@@ -13,6 +13,7 @@ _G.ivconst = require('vshard.consts')
_G.ivutil = require('vshard.util')
_G.iverror = require('vshard.error')
_G.ivtest = require('test.luatest_helpers.vtest')
_G.itable_new = require('table.new')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need it, table.new() is already available in the global namespace.

@Gerold103 Gerold103 mentioned this pull request Sep 23, 2024
@Gerold103
Copy link
Collaborator

Finished in #491.

@Gerold103 Gerold103 closed this Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants