clear target in the group server when worker fails to connect to peer and reverts back to targeting #282

andymck · 2020-05-18T15:20:30Z

Related PR ( both issues affect same flow ): #281

When a group worker is trying to connect to a remote peer/target and it fails to do so it will end up reverting back to targeting if the max connect retry threshold is breached.

At this point targeting will request a new target from the group sever. However the previously assigned target peer will never be reassigned as the group server has not been told the original assign_target failed and as far as its concerned the target is assigned to a worker ( the group server saves a target as assigned to a worker as part of its assign_target call and does so before the group worker receives the target, there is no handshake to indicate assigned and received/connected).

When this happens the previously assigned peer will never be reassigned to a new group worker.

… and reverts back to targeting

evanmcc

I'm not sure the relcast behavior is right. do we use the target information for anything? I'm not 100% clear on the issue that this is meant to resolve.

evanmcc · 2020-05-18T20:49:29Z

src/group/libp2p_group_relcast_server.erl

@@ -213,6 +213,18 @@ handle_cast({request_target, Index, WorkerPid, _WorkerRef}, State=#state{tid=TID
                                             {keys, State#state.group_keys}]}},
    libp2p_group_worker:assign_target(WorkerPid, {Target, ClientSpec}),
    {noreply, NewState};
+handle_cast({clear_target, _Kind, _WorkerPid, Ref}, State=#state{workers = Workers}) ->


but the target for each worker in the relcast server is always the same. maybe we should just drop the cast here?

Yeah you are right, relcast doesnt need this but maybe its better to handle the clear target cast as a noop instead, would avoid any future potential confusion if someone is in this flow and wondering why its not handled and doesnt have full context ?

yeah, that's what I meant, sorry. I think a noop clause here with a good comment is the right thing.

…n gossip server

Vagabond · 2021-03-06T18:30:50Z

src/group/libp2p_group_gossip_server.erl

+    lager:debug("clearing target for worker ~p ", [_WorkerPid]),
+    %% the ref is stable across restarts, so use that as the lookup key
+    case lookup_worker(Ref, #worker.ref, State) of
+        Worker=#worker{} ->


In theory the workerpid in here should match the one passed in the message?

yes, Worker.pid() should equal WorkerPid from the message. As per the comment, the ref is used for the lookup as that is consistent across restarts of the worker

Vagabond · 2021-03-06T18:31:11Z

src/group/libp2p_group_gossip_server.erl

+    %% the ref is stable across restarts, so use that as the lookup key
+    case lookup_worker(Ref, #worker.ref, State) of
+        Worker=#worker{} ->
+            %% TODO - should the pid be set to undefined along with target ?


I don't think the pid would be set undefined here, the worker is still running.

removed todo

clear target in the group server when worker fails to connect to peer…

c0f1801

… and reverts back to targeting

andymck marked this pull request as ready for review May 18, 2020 15:27

andymck mentioned this pull request May 18, 2020

make winner/loser stream close decision deterministic when sim dials occur between peers #281

Merged

evanmcc reviewed May 18, 2020

View reviewed changes

change clear_target in relcast server to a noop. Fix return of same i…

3e51471

…n gossip server

Vagabond reviewed Mar 6, 2021

View reviewed changes

remove todo

593826b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clear target in the group server when worker fails to connect to peer and reverts back to targeting #282

clear target in the group server when worker fails to connect to peer and reverts back to targeting #282

andymck commented May 18, 2020 •

edited

Loading

evanmcc left a comment

evanmcc May 18, 2020

andymck May 18, 2020

evanmcc May 18, 2020

Vagabond Mar 6, 2021

andymck Mar 8, 2021

Vagabond Mar 6, 2021

andymck Mar 8, 2021

clear target in the group server when worker fails to connect to peer and reverts back to targeting #282

Are you sure you want to change the base?

clear target in the group server when worker fails to connect to peer and reverts back to targeting #282

Conversation

andymck commented May 18, 2020 • edited Loading

evanmcc left a comment

Choose a reason for hiding this comment

evanmcc May 18, 2020

Choose a reason for hiding this comment

andymck May 18, 2020

Choose a reason for hiding this comment

evanmcc May 18, 2020

Choose a reason for hiding this comment

Vagabond Mar 6, 2021

Choose a reason for hiding this comment

andymck Mar 8, 2021

Choose a reason for hiding this comment

Vagabond Mar 6, 2021

Choose a reason for hiding this comment

andymck Mar 8, 2021

Choose a reason for hiding this comment

andymck commented May 18, 2020 •

edited

Loading