-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreachable Providers (server): Provider-Record Prioritization #9982
Comments
I'm not sure that sorting/prioritizing the provider records will help that much. It would only help if the DHT server has multiple provider records for the requested CID. If it has only one then it will need to return it which is the current situation anyway. We could gather metrics on the number of provider records held by servers on average for each CID. Even if servers respond with their "most viable" provider records the client still has to pick from the aggregated responses. The slight skew towards more viable records may not be that noticeable. It would be better if the DHT protocol could return the age of the provider record which would let the client order by recency, the assumption being that peers that have recently reprovided the CID are more likely to be online. |
IMO it would be more efficient to add a See probe-lab/network-measurements#49 (comment) The 2 suggested solutions sound a bit like quick-fixes, that we may want to remove once we have the cleaner and more efficient protocol change. IDK if it's a good time investment. |
I think we've looked at it with the CID Hoarder study that @cortze did. Haven't searched to find it right now, but will have a look. IIRC, we found that they remain among the 20 closest. |
@yiannisbot , the study from August 2022 showed a stable in-degree ratio over 80h -> link to the section in the study |
@cortze could you remind me again how "in-degree" is defined exactly? |
yep, it refers to the original PR Holders for a CID that are present in the set of 20 closest peers |
Paris discussion from 2023-07-19: We think this should be part of the DHT codebase and plan to pick it up as part of the refactoring work. If someone wants to lend a hand with this let us know 👍 |
Checklist
Description
Context
Our recent measurements show that IPFS has trouble fulfilling its baseline practical use case of hosting static websites.
The reasons are at least twofold. On the one hand, large content providers still have trouble announcing all their CIDs to the DHT, and on the other hand, the existing provider records point to unreachable peers. This issue addresses the second reason.
As an example, the below graph showcases the unique providing peers as identified by distinct PeerIDs discovered throughout a specific day in the IPFS DHT for
ipld.io
(source).The graph shows that >70% of provider records point to peers that are not reachable. The remaining peers are mainly only reachable via a relaying peer. This either increases latency, if the traffic is relayed or increases time to connect, if the relay is used to facilitate a hole punch.
This trend is not unique to
ipld.io
but a general theme among all our measured websites: https://probelab.io/websites/.Problem
When a peer tries to access a website over IPFS, it looks up the provider records in the DHT and tries to connect to the returned peers with a concurrency factor of 10. Due to the large number of provider records that point to peers that are long gone, the peer likely receives such records and therefore will time out in an attempt to establish a connection with them. This significantly lengthens the resolution process to the extent that the whole operation potentially times out (that's a hypothesis).
Proposal
We identified two ways forward to alleviate the above issue.
This GH issue is for proposal 1).
The corresponding issue for 2) is #9984.
Provider Record Prioritization
There was a conversation on 2023-06-19 between ProbeLab and some of the Kubo maintainers where the following strategy was proposed:
We tally consecutive reprovides for each
(PeerID, CID)
tuple. When another peer looks up a certain CID, it gets served the peers with the highest number of consecutive reprovides. We use this number as a proxy for the uptime of a peer. Another factor we could account for is the type of Multiaddresses a peer has announced. We should prioritize un-relayed peers over relayed ones.There are a few things to consider (non-exhaustive list):
A potential mitigation could be to define an upper limit a counter can reach. In case of a tie, the servers could return the peers in random order.
Concrete Proposal
I think a concrete proposal fosters efficient discussion. Here's my take:
Count the number of consecutive reprovides. The counter can be
3
at max. Return the provider records in an order that prioritizes a high counter value. In case of a tie, prioritize un-relayed peers. In case there's still a tie, randomize the order with each response. A counter increase is only allowed every ReprovideInterval/2.Measurements
TBD: How can we substantiate the proposal with numbers? some ideas
References
The text was updated successfully, but these errors were encountered: