-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: implement partial map-reduce #442
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch! Good stuff, but we need to work on it a bit.
4950115
to
0995a72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch!
dccd824
to
519dadc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! We are on the finish line already.
local bid_extra = 3001 | ||
local timeout = 0.1 | ||
local rid = 42 | ||
local res, err = ivshard.storage._call( | ||
'storage_ref_with_lookup', | ||
rid, | ||
timeout, | ||
{bid1, bid_extra} | ||
) | ||
ilt.assert_equals(err, nil) | ||
ilt.assert_equals(res, {bid_extra}) | ||
ivshard.storage._call('storage_unref', rid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, cover:
- An error. And ensure that a ref wasn't made then.
- More than one bucket missing. "One" is a corner case which might be working by luck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More than one bucket missing
This one is still not covered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the "multiple buckets" section of the test_ref_with_lookup()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The moved
result of storage_ref_with_lookup()
is always empty or just one bucket, this is my point. I can't see a case when moved
has more than a single bucket in it. I think this is important to test because by accident now or some time afterwards somebody might break the storage code so it would exit on the first loop iteration, and your tests would still pass.
5951459
to
b5abf70
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes!
test/router-luatest/router_test.lua
Outdated
res = g.router:exec(function(bid1, bid2) | ||
local val, err, err_uuid = ivshard.router.map_part_callrw({bid2, bid1}, 'do_map', {3}, | ||
{timeout = iwait_timeout}) | ||
return { | ||
val = val, | ||
err = err, | ||
err_uuid = err_uuid, | ||
} | ||
end, {bid1, bid2}) | ||
t.assert(res.val == nil) | ||
t.assert_covers(res.err, { | ||
code = box.error.PROC_LUA, | ||
type = 'ClientError', | ||
message = 'map_err' | ||
}) | ||
t.assert_equals(res.err_uuid, rs2_uuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, add a check after this, that no dangling refs are left on the storages. For that you can check vshard.storage.ref.count == 0
on both instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have checked handling unref error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find it anywhere. Can you please point at a specific location?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have really lost this code on rebases. Fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, still no check for absence of references.
995ba20
to
b80d17c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! We still seem to be stuck with some comments. Perhaps you don't see them all. Here are the links:
d261e12
to
d0074ff
Compare
Introduce a partial ref-map-reduce API for vshard. It guarantees that in case of success the function is executed exactly once on the storages, that contain the given list of buckets. Algorithm. Router: 1. Group buckets by replicasets based on the router's cache. 2. Ref stage. For each storage: a. Async send ref id, timeout and a group of the corresponding buckets to the storage. The aim is to reference this storage and check what buckets are absent. Storage: 3. Refer session with the ref id and timeout, passed from the router. 4. Lookup for the passed buckets. If any of them were not found on the storage, return these buckets back in response to the router. Router: 5. Await and collect returned responses. If timeout has expired, set the error for this response. 6. If any of responses contains error,send unref to the refed storages and return the error to the user. 7. If the collected results contain moved buckets, search for them and update the router's cache. Decrease the timeout and goto 1. 8. Map stage. For each storage: a. Replace a bucket list with a group of buckets refed on the target storage. b. Async send a map function with modified arguments and a ref id to the storage. Storage: 9. Execute storage_map: if the ref id has expired, return error. Otherwise, ref.use -> execute -> ref.del from storage_map(). Return results. Router: 10. Reduce stage. Await results (and optionally apply a callback to each result): if timeout expired, return error to the user. Otherwise, return result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes! Hm, still some of the old comments remain ignored. Lets recite them all here again (old and new ones together):
- Feat: implement partial map-reduce #442 (comment) - missing a test case.
- Feat: implement partial map-reduce #442 (comment) - export of an unnecessary function.
- Feat: implement partial map-reduce #442 (comment) - no test for more than one moved bucket (see my latest response).
- Feat: implement partial map-reduce #442 (comment) - no test for absence of dangling references (the current test will pass even if there are references).
Also have a look at CI, please. It is completely red.
-- Call a partial map one more time to make sure there are no references left. | ||
res = g.router:exec(function(bid1, bid2) | ||
local val, err, err_uuid = ivshard.router.map_part_callrw( | ||
{bid1, bid2}, 'do_map', {42}, {timeout = iwait_timeout}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does calling map_part_callrw()
second time ensures that there are no references left? It will succeed even if there are references anyway.
-- Check that there is no dangling references after the error. | ||
init = map_part_init() | ||
res = g.router:exec(function(bid1, bid2) | ||
local val, err, err_uuid = ivshard.router.map_part_callrw({bid1, bid2}, 'do_map', {3}, | ||
{timeout = iwait_timeout}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto. The only 2 ways to check if there are no references left is to 1) try to send a bucket, 2) manually check vshard.storage.ref.count == 0
. The second is the simplest way.
@@ -13,6 +13,7 @@ _G.ivconst = require('vshard.consts') | |||
_G.ivutil = require('vshard.util') | |||
_G.iverror = require('vshard.error') | |||
_G.ivtest = require('test.luatest_helpers.vtest') | |||
_G.itable_new = require('table.new') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need it, table.new()
is already available in the global namespace.
Finished in #491. |
Extend ref-map-reduce API with a new partial execution method:
It allows users to call a function exactly once on all masters that contain buckets from the list.
Also current PR improves the connection establishment for all ref-map-reduce methods.