Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Filter subscriptions are not stable and missing messages #2139

Closed
weboko opened this issue Sep 17, 2024 · 4 comments · Fixed by #2137
Closed

bug: Filter subscriptions are not stable and missing messages #2139

weboko opened this issue Sep 17, 2024 · 4 comments · Fixed by #2137
Assignees
Labels
bug Something isn't working

Comments

@weboko
Copy link
Collaborator

weboko commented Sep 17, 2024

This is a bug report

Problem

  • Initial Observations:

    • Stream resets over Mplex causing incoming pipe failures.
    • Compatibility issues prevent upgrading to Yamux, which is better suited for large data transfers.
    • Errors observed: stream reset and buffer overflow.
  • Recent Updates:

    • Upgrading to libp2p v2 has been initiated.
    • Mutex locks for protocol peer management are helpful but do not fully resolve issues with Filter and LightPush.
    • Unreliable nodes in the network affect performance, highlighting a need for better node selection and reconnection mechanisms.
  • Specific Observations:

    • When it works, Filter and LightPush function correctly, but issues arise intermittently.
    • Two browser instances showed different behaviors: one worked continuously while the other failed after a few sequences.
    • Possible issues with node health and js-waku's ability to find and reconnect to better nodes.

Full report: #2139 (comment)

Proposed Solutions

Notes

@weboko weboko added the bug Something isn't working label Sep 17, 2024
@weboko weboko added this to Waku Sep 17, 2024
@weboko weboko moved this to Triage in Waku Sep 17, 2024
@weboko
Copy link
Collaborator Author

weboko commented Sep 17, 2024

@danisharora099 to update description and share info about the problem and findings

@danisharora099
Copy link
Collaborator

danisharora099 commented Sep 18, 2024

Further investigations lead to observation of:

waku:error:filter:v2 Error with receiving pipe +4s CodeError: stream reset
    at MplexStream.reset (index.js:25309:21)
    at MplexStreamMuxer._handleIncoming (index.js:25717:28)
    at MplexStreamMuxer.sink (index.js:25624:36)
    at async Promise.all (index 0)

It's possible that this is caused by Mplex not being able to handle the muxing; more about Yamux:

Yamux natively supports flow control, it is better suited for applications that require the transfer of large amounts of data.
Until recently, the reason mplex was still supported was compatibility with js-libp2p, which didn’t have yamux support. Now that js-libp2p has gained yamux support, mplex should only be used to provide backward-compatibility with legacy nodes.

Mplex does not have backpressure, this means if you send more data than the other peer is able to receive (on one stream, cross stream still have TCP backpressure) the stream will Reset itself due to a buffer overflow

This could make sense considering it's in the beginning, Filter works well. It's only after it's received some data, does it start to error. And then we observe Stream Reset errors.

@danisharora099 danisharora099 changed the title bug: Fitler subscriptions are not stable and missing messages bug: Filter subscriptions are not stable and missing messages Sep 25, 2024
@danisharora099
Copy link
Collaborator

danisharora099 commented Oct 8, 2024

Updates:

When it works, it works. There is definitely room for improvement in certain aspects, but for the times it doesn't work at all are the weird times.
Re-summarising my finding from the last few days:

It might be fixed by #2137
Unfortunately not the case. I have been extensively testing the RC with mutex locks, and problems around failures with LightPush and Filter still exist quite a lot. Maybe better, but still a problem.

Image


Some interesting observation:
I opened two browser instances simultaneously pushing sequences of messages with LP, and receiving with Filter, with a bunch of nodes in their local peer cache. It was seen that on the first window, it kept going. Filter worked. LightPush worked. (Tested until 3 sequences of 10 messages, and running) Not to say that it didn't miss a few messages.
The other tab started to fail with LightPush on the 2nd sequence, and never recevered from it. Filter was still receiving messages sent by the other node (not self's as it wasn't able to push any).
Seems like two problems at hand, or maybe a fusion into one:

  • there are unhealthy/unreliable nodes in the network being found
  • js-waku should do a better job at finding and reconnecting to better nodes
    • to be understood if this is js-waku's fault of not doing a good job at finding better nodes, or just no good nodes available

Image
Image

Few mins later:

  • Instance 2 completely couldn't recover with LP and Filter
  • Instance 1 gave up with Filter, going strong with LP

Image

@weboko
Copy link
Collaborator Author

weboko commented Oct 11, 2024

Reopening since issue is still present but might be fixed by #2158

@weboko weboko moved this from Done to Code Review / QA in Waku Oct 11, 2024
@weboko weboko closed this as completed Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants