-
Notifications
You must be signed in to change notification settings - Fork 34
clear target in the group server when worker fails to connect to peer and reverts back to targeting #282
base: master
Are you sure you want to change the base?
Conversation
… and reverts back to targeting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the relcast behavior is right. do we use the target information for anything? I'm not 100% clear on the issue that this is meant to resolve.
@@ -213,6 +213,18 @@ handle_cast({request_target, Index, WorkerPid, _WorkerRef}, State=#state{tid=TID | |||
{keys, State#state.group_keys}]}}, | |||
libp2p_group_worker:assign_target(WorkerPid, {Target, ClientSpec}), | |||
{noreply, NewState}; | |||
handle_cast({clear_target, _Kind, _WorkerPid, Ref}, State=#state{workers = Workers}) -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but the target for each worker in the relcast server is always the same. maybe we should just drop the cast here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah you are right, relcast doesnt need this but maybe its better to handle the clear target cast as a noop instead, would avoid any future potential confusion if someone is in this flow and wondering why its not handled and doesnt have full context ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's what I meant, sorry. I think a noop clause here with a good comment is the right thing.
lager:debug("clearing target for worker ~p ", [_WorkerPid]), | ||
%% the ref is stable across restarts, so use that as the lookup key | ||
case lookup_worker(Ref, #worker.ref, State) of | ||
Worker=#worker{} -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory the workerpid in here should match the one passed in the message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, Worker.pid() should equal WorkerPid from the message. As per the comment, the ref is used for the lookup as that is consistent across restarts of the worker
%% the ref is stable across restarts, so use that as the lookup key | ||
case lookup_worker(Ref, #worker.ref, State) of | ||
Worker=#worker{} -> | ||
%% TODO - should the pid be set to undefined along with target ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the pid would be set undefined here, the worker is still running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed todo
Related PR ( both issues affect same flow ): #281
When a group worker is trying to connect to a remote peer/target and it fails to do so it will end up reverting back to targeting if the max connect retry threshold is breached.
At this point targeting will request a new target from the group sever. However the previously assigned target peer will never be reassigned as the group server has not been told the original assign_target failed and as far as its concerned the target is assigned to a worker ( the group server saves a target as assigned to a worker as part of its assign_target call and does so before the group worker receives the target, there is no handshake to indicate assigned and received/connected).
When this happens the previously assigned peer will never be reassigned to a new group worker.