-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bitswap/server: WANT_HAVE
requests should use Blockstore.Has
#657
Comments
This is relatively urgent for us and if there is no capacity from your side to implement this, we could submit a patch as well |
@Wondertan Preliminary investigation does not indicate that there is any specific reason that |
Triage notes:
|
I would also take a look at how cached blockstores(like bloom) would behave with this change. |
It is true that it also risks having to do both the I think there are other improvements that can probably have a more significant performance improvement. Specifically the use of worker pools and communication between these goroutines can be streamlined. Additionally, high frequency metrics updates may also contribute non-trivial overhead. Improvements will be supplied as more profiling is done. |
@gammazero, it possible for both requests to come for the same CID from a single peer? Or do you mean they are deduplicated across multiple peers? |
Gentle ping @gammazero. We would like to understand the reasoning behind the risk a bit more. |
Last ping @gammazero |
@Wondertan This is only across different peers, AFAIK. Any single peer should only issue a WantHave or a WantBlock for a CID. Some things to consider... A bitswap message may have a number of CID entries. Some are Using Getting block sizes is done as a concurrent batch, which is generally faster for more than a very small number of blocks, although it may also depend on backend store, cached values, etc. This means that we will want to separate the At a later time the peer client may decide to send a message to retrieve the blocks for which I coded a prototype corresponding to the above description, and did not see a significant performance improvement. However, I may not have been testing at sufficient scale and/or was using a similar storage backend to what you are using. Given specific storage backends, it may make a very significant difference in performance and/or cost to avoid reading block sizes. I propose two changes:
|
@Wondertan Here are the prototype and kubo configuration: |
@gammazero, that's a great solution and fixes our concern! Thank you ❤️ Should we start testing the prototype now or after your PR gets ready for review? |
@Wondertan You can start testing it now, and any feedback will be helpful. We would like to review and test in our cluster soon, but have not yet so cannot make any guarantees about how it works. |
Currently,
WANT_HAVE
requests rely onGetSize
to check for presence of the requested block and the learnt size gets discarded.Basically, the Blockstore has to lookup data size that's unused. Depending on
Blockstore
implementation, this might have different consequences. Intuitively, looking up data size should be more expensive compared to checking data existence. In practice, this is what happens in our case.Hence the proposal is to add new method to blockstore manager that performs only the Has check and refactor
ReceivedMessage
to decouple processing ofWANT_HAVE
andWANT_BLOCK
messages.The text was updated successfully, but these errors were encountered: