-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
possible bitswap stall issue #5183
Comments
Actually the wantlist output I pasted was from after I had restarted the node and the problem was solved. I don't know how it was before. or how many nodes were in the wantlist. |
I have same story with number of hashes. If using ipfs-go (with chrome's ipfs-companion), the hash is not downloading. If I disable ipfs-companion (thus enabling ipfs-js) - then download works fast. |
Also, try to reproduce with the latest release candidate running on both machines: https://dist.ipfs.io/go-ipfs/v0.4.16-rc1 |
|
Tested on 0.4.16-rc1 (amd64), same problem. |
Ah... So, js-ipfs doesn't use the DHT to find or announce content (last time I checked). I'm guessing:
Looking at the debug info, I don't see any obvious deadlocks/issues. Without knowing which node has the hash, it's a bit difficult to tell where the issue is. |
Something strange going on. Here is another interesting hash: ipfs-js finds it pretty quickly, but ipfs-go has problems with it. Somehow, my public node QmTtggHgG1tjAHrHfBDBLPmUvn5BwNRpZY4qMJRXnQ7bQj (0.4.16-rc1) managed to download it in the past (e.g. "ipfs get" works), but getting it from another node (also 0.4.16-rc1) does not:
What is going on? How could it be that ipfs-go and ipfs-js have different (incompatible?) routing services? |
That's a new bug, fixed in #5200. Could very well have caused the issue. To work around it, you can disable IPNS over pubsub. However, that also wouldn't (as far as I know) be responsible for the original bug. |
One potential cause is a peer restart. That is, if one of the peers restarts but the other sees the new connection before seeing the old connection close, it won't re-send the wantlist. We can fix this by either:
|
Hello, is there a fix about this?We have a similar problem running ipfs in a private network At one point ipfs just stops downloading Result from "ipfs bitswap wantlist" and ipfs swarm peers --streams Debug logs Is it possible the reason to be the use of the quic protoc? Thanks in advance for the support |
QUIC shouldn't affect this. We're going to release a new release ASAP, probably by the end of the week with a completely refactored bitswap so let's see what that does for this. |
New information: @mattober has run into this issue. He has two nodes: A gateway and a "host" (storing the data). The gateway shows two connections to the host, one IPv4, one IPv6. The IPv4 connection has an open DHT stream and the IPv6 connection has has an open DHT stream (!?) and an open bitswap stream. The host shows one connection to the gateway (IPv4). This connection has an open DHT stream and an open relay stream (!?) and no bitswap stream. |
Related: ipfs/go-bitswap#99, ipfs/go-bitswap#99 (comment). |
I feel like there is no specific actionable information on a current version of bitswap to work with here, given that it's been near rewritten completely since 0.4.15 and the only current potential problem referenced is peers not resending wantlists on a reconnect. My belief is that we've addressed this as best we can with the periodic wantlist rebroadcast. And beyond that really there's no further improvement beyond error correction in the protocol. So I am inclined to close this issue, understand for anyone following it that we are still pursuing avenues to address potential stalls on an ongoing basis as we identify potential issues in current code. |
In IRC, @fiatjaf reported that one of his ipfs nodes running 0.4.15 on a VPS was stalling trying to list out a particular directory. I confirmed that all the data was accessible, and even fetched it all to my local node. He then connected that VPS peer to my node, and it still couldnt fetch the data. The peers wantlist showed a single hash in it, that my node definitely had. If the nodes were actually connected successfully, then this implies a possible bug in bitswap.
Further questions I have here are around whether or not the fetch was using sessions. getting a stack dump of any node in this position would be nice too.
The text was updated successfully, but these errors were encountered: