-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow decentralized users to join late and catch up #775
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huge work, thank you! finally got rid of the mess in the server with promises 🤩
some potential improvement but nothing serious
…nRoundEndCommunication
…rotected abstract _add
…ndParticipants and stop resetting the nodes list at the end of round
…with forEach instead of map
…rs" before waiting for participants if needed
…tract to enforce subclass override
…o sendPeersForRoundIfNeeded
…dEnoughParticipantsMsgIfNeeded
…f the constructor and get rid of the static async server constructor
… added after routers are initialized)
…regation' event to get the aggregated weights
59bf5a6
to
d75338a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superbe, thanks for the hard work!
Closes #718
Decentralized issues
Currently, we can only train decentralized with the exact number of peers specified in
minNbOfParticipants
.minNbOfParticipants
peers can join one round because the server sends the list of peers as soon as the threshold is reached, so theminNbOfParticipants + 1
nth peer joining doesn't get the current round's peer list but is included in the next round'sEssentially, once the peers list has been sent, joining and leaving is not possible anymore.
Solution Implemented
onRoundBeginCommunication
, peers manifest their interest to join the current round: they send aPeerJoinsRound
to the server. The server keeps a list of peers wanting to join (but doesn't reply with a peer list as it currently does).PeerIsReady
, i.e. ready to exchange weight updates. The server waits until all the peers that sent aPeerJoinsRound
sends theirPeerIsReady
and then send the round's peer list.i. This allows for some time for peers to join the round. To prevent new peers from continually joining and waiting for new peers to be ready, the server can stop including peers in this round as soon as one peer is ready (and include them in the next).
ii. Peers can leave and notify the server before. As long as the peer list hasn't been sent, peers can join and leave without it being a problem.
minNbOfParticipants
threshold, the peers wait for more participantsminNbOfParticipants
peers can participate in the same round (because thePeerJoinsRound
step allows some time for participants to join, instead of directly starting the weight update whenminNbOfParticipants
peers joined)Refactoring
aggregator.add
returns a promise for the aggregated weights (no more need for the perpetual promise loop for the server controller) + new aggregator testsEventEmitter
trainingInformation.decentralizedSecure
and addaggregationStrategy
('mean' or 'secure')