You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 1, 2023. It is now read-only.
Improve/fix request splitting based on duplication
Currently, the session request splitter attempts to minimize duplication of blocks by splitting requests so each block request goes to fewer peers when bitswap is receiving lots of duplicates, in an attempt to bring that number down.
There is at least one case where this is not a good approach: #120
Specifically, if we fail to get a response from a request, we fall back to broadcast (see sessions.handleTick) -- which tends to result in more duplicates, which tends to cause us to split more, which in the case where we are already failing to fetch blocks, is the opposite of what we want.
My recommendation is to reset the split to a low value whenever we miss a block (i.e. handleTick) and reset dup tracking as well, and not start again till the blocks we broadcast for are received. alternatively, we could track if a want was targetted or broadcast, and not count any broadcasts in dup tracking, since they always produce dupes.
Another improvement here: currently the splitting is very binary and goes from one extreme to the other, because the code for adjusting it is so simple. It could be made less sensitive by making adjustments less frequent the longer a session is running.
Increase Wantlist size
There's no really good reason to limit session wants to 32 at this point -- it oughta at least be a slightly bigger number. Low hanging, needs to be tested
Returning blocks
Increasing TaskWorker Concurrency
ipfs/boxo#116 -- probably can't hurt, is low hanging, but would need to be experimentally tested to see if it helps.
Making Bitswap less error prone
It's hard to tell how much tis affects things these days, but one potential slowdown for bitswap is that the protocol has no error correction, meaning that frames can get dropped without noticing. This can result in lost want requests. Right now the only fix is to periodically rebroadcast the whole wantlist (see rebroadcastTimer in messageQueue.go). Another fix I proposed was to extend the protocol: ipfs/specs#201
Improve decision logic
The previous bitswap maintainer @whyrusleeping stated "The peer request queue is a priority queue that sorts available tasks by some metric, currently, that metric is very simple and aims to fairly address the tasks of each other peer. More advanced decision logic will be implemented in the future." I've yet to touch this in my time on bitswap, but I wonder what could be accomplished here, since most of my work has been around the requesting of blocks rather than the sending of them.
Tracking Performance
These wouldn't necessarily improve performance themselves but might help identify bottlenecks more effectively:
Better Simulated Networks
There is a PR that would add simulated DHT queries to Benchmarks -- #136 -- I'm not sure that's the best direction
Honestly, I think realistic testbeds at a higher level are the best option here
Better tracing/logging
If we get a test bed that supports Jaeger, I think tracing would be the key tool to actually tracking performance. That would be able to give us actual time elapsed in a request in different parts of the code, and to see how calls actually get made in real world scenarios.
Fair warning: Logging and tracing may end up touching go-log, which has its own rats nest of tasks to make it better, which have lingered for some time.
--
I'll add more ideas if I think of them -- @hannahhoward
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
As @dirkmc takes over on a lot of Bitswap maintenance, just want to summarize what I see as possible gotchas and directions for bitswap improvement:
Sessions (requesting blocks):
See existing PR: #157
Currently, the session request splitter attempts to minimize duplication of blocks by splitting requests so each block request goes to fewer peers when bitswap is receiving lots of duplicates, in an attempt to bring that number down.
There is at least one case where this is not a good approach:
#120
Specifically, if we fail to get a response from a request, we fall back to broadcast (see sessions.handleTick) -- which tends to result in more duplicates, which tends to cause us to split more, which in the case where we are already failing to fetch blocks, is the opposite of what we want.
My recommendation is to reset the split to a low value whenever we miss a block (i.e. handleTick) and reset dup tracking as well, and not start again till the blocks we broadcast for are received. alternatively, we could track if a want was targetted or broadcast, and not count any broadcasts in dup tracking, since they always produce dupes.
Another improvement here: currently the splitting is very binary and goes from one extreme to the other, because the code for adjusting it is so simple. It could be made less sensitive by making adjustments less frequent the longer a session is running.
There's no really good reason to limit session wants to 32 at this point -- it oughta at least be a slightly bigger number. Low hanging, needs to be tested
Returning blocks
ipfs/boxo#116 -- probably can't hurt, is low hanging, but would need to be experimentally tested to see if it helps.
It's hard to tell how much tis affects things these days, but one potential slowdown for bitswap is that the protocol has no error correction, meaning that frames can get dropped without noticing. This can result in lost want requests. Right now the only fix is to periodically rebroadcast the whole wantlist (see rebroadcastTimer in messageQueue.go). Another fix I proposed was to extend the protocol: ipfs/specs#201
The previous bitswap maintainer @whyrusleeping stated "The peer request queue is a priority queue that sorts available tasks by some metric, currently, that metric is very simple and aims to fairly address the tasks of each other peer. More advanced decision logic will be implemented in the future." I've yet to touch this in my time on bitswap, but I wonder what could be accomplished here, since most of my work has been around the requesting of blocks rather than the sending of them.
Tracking Performance
These wouldn't necessarily improve performance themselves but might help identify bottlenecks more effectively:
There is a PR that would add simulated DHT queries to Benchmarks -- #136 -- I'm not sure that's the best direction
Honestly, I think realistic testbeds at a higher level are the best option here
If we get a test bed that supports Jaeger, I think tracing would be the key tool to actually tracking performance. That would be able to give us actual time elapsed in a request in different parts of the code, and to see how calls actually get made in real world scenarios.
Fair warning: Logging and tracing may end up touching go-log, which has its own rats nest of tasks to make it better, which have lingered for some time.
--
I'll add more ideas if I think of them -- @hannahhoward
The text was updated successfully, but these errors were encountered: