Skip to content
This repository has been archived by the owner on Apr 22, 2024. It is now read-only.

clear target in the group server when worker fails to connect to peer and reverts back to targeting #282

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

andymck
Copy link
Contributor

@andymck andymck commented May 18, 2020

Related PR ( both issues affect same flow ): #281

When a group worker is trying to connect to a remote peer/target and it fails to do so it will end up reverting back to targeting if the max connect retry threshold is breached.

At this point targeting will request a new target from the group sever. However the previously assigned target peer will never be reassigned as the group server has not been told the original assign_target failed and as far as its concerned the target is assigned to a worker ( the group server saves a target as assigned to a worker as part of its assign_target call and does so before the group worker receives the target, there is no handshake to indicate assigned and received/connected).

When this happens the previously assigned peer will never be reassigned to a new group worker.

Copy link
Contributor

@evanmcc evanmcc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the relcast behavior is right. do we use the target information for anything? I'm not 100% clear on the issue that this is meant to resolve.

@@ -213,6 +213,18 @@ handle_cast({request_target, Index, WorkerPid, _WorkerRef}, State=#state{tid=TID
{keys, State#state.group_keys}]}},
libp2p_group_worker:assign_target(WorkerPid, {Target, ClientSpec}),
{noreply, NewState};
handle_cast({clear_target, _Kind, _WorkerPid, Ref}, State=#state{workers = Workers}) ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but the target for each worker in the relcast server is always the same. maybe we should just drop the cast here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah you are right, relcast doesnt need this but maybe its better to handle the clear target cast as a noop instead, would avoid any future potential confusion if someone is in this flow and wondering why its not handled and doesnt have full context ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's what I meant, sorry. I think a noop clause here with a good comment is the right thing.

lager:debug("clearing target for worker ~p ", [_WorkerPid]),
%% the ref is stable across restarts, so use that as the lookup key
case lookup_worker(Ref, #worker.ref, State) of
Worker=#worker{} ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In theory the workerpid in here should match the one passed in the message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Worker.pid() should equal WorkerPid from the message. As per the comment, the ref is used for the lookup as that is consistent across restarts of the worker

%% the ref is stable across restarts, so use that as the lookup key
case lookup_worker(Ref, #worker.ref, State) of
Worker=#worker{} ->
%% TODO - should the pid be set to undefined along with target ?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the pid would be set undefined here, the worker is still running.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed todo

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants