-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve some falky tests and improve CI times #2401
base: master
Are you sure you want to change the base?
Conversation
…ow manual in future commit
@@ -49,7 +49,7 @@ pub async fn transfer_back(ctx: &TestContext) -> Result<(), Failed> { | |||
} | |||
// wait until alice sees the transaction | |||
timeout( | |||
ctx.config.sync_timeout(), | |||
ctx.config.sync_timeout().checked_mul(2).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same here: Which issue does it solve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes in the CI the query can take very long time to be resolved increasing the timeout doesn't change the behaviour tested and improve stability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to increase default wallet_sync_timeout
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -369,7 +368,7 @@ mod tests { | |||
.add_address(&peer_id, unroutable_peer_addr.clone()); | |||
} | |||
SwarmEvent::ConnectionClosed { peer_id, .. } => { | |||
panic!("PeerId {peer_id:?} disconnected"); | |||
dbg!(peer_id); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to insert the peer into left_to_discover
table after it was disconnected
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to make, sometimes, the test timeout : https://github.com/FuelLabs/fuel-core/actions/runs/11813227788/job/32909924335#step:8:2439
Should I remove this ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think we should rewrite discovery_works
test. And maybe we should use the service itself instead of the p2p service. You can leave the previous implementation and create a new tech debt to rewrite this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done #2434
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think this will make all of our lives better.
I'm going to wait until Green's concerns are resolved, but I'd be ready to approve.
For clarity, this PR will not solve all the problems but at least a big chunks of it and is a big step forward we could do a round 2 with everything we catch in the next days. |
@@ -49,7 +49,7 @@ pub async fn transfer_back(ctx: &TestContext) -> Result<(), Failed> { | |||
} | |||
// wait until alice sees the transaction | |||
timeout( | |||
ctx.config.sync_timeout(), | |||
ctx.config.sync_timeout().checked_mul(2).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to increase default wallet_sync_timeout
instead
// When | ||
for _ in 0..1000 { | ||
let result = executor.validate(&block).map(|_| ()); | ||
|
||
if start.elapsed().as_secs() > 60 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if start.elapsed().as_secs() > 60 { | |
if start.elapsed().as_secs() > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't get reset between each loop and before the timeout was 60 sec for the whole test, no ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, it is true. Maybe it is better to create start
before each executor.validate
and check that it less than a second
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// When | ||
for _ in 0..1000 { | ||
let result = executor.validate(&block).map(|_| ()); | ||
|
||
if start.elapsed().as_secs() > 60 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if start.elapsed().as_secs() > 60 { | |
if start.elapsed().as_secs() > 1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Linked Issues/PRs
Fix #2408
Fix #2407
Fix #2406
Fix #2351
Fix #2393
Fix #2394
Fix #2395
Description
This PR fix an issue in P2P heartbeat. The problem was that P2P heartbeat was updated only if new blocks were received or produced. This means that if we start the node from an existing db but doesn't produce blocks and not connect it to anyone it will send block height 0 to the peers that connects to him. We believe that this fix, resolves #2408 #2407 #2406 and #2351.
For #2394 we just increased the timeouts.
For #2393 we removed the panic in the test and just let p2p reconnect
For #2395 we launch this test using multi-threads mode of Tokio to follow the convention of all the others tests that launch a node using
FuelCoreDriver
. Also we added a kill of the driver to try to kill the node in a more graceful way in all of the test, it should fix a lot of flakyness in these testsThis PR also change the CI workflow by removing all docker related jobs and codecov job. These two set of jobs has been moved to separated workflow that are not triggered automatically but can be triggered manually on the "Actions" tab of this repository (after the merge of this PR).
The tests launched by the CI job now use
nextest
that allow us to add timeout for each test and provide more detailed output. The timeout is currently 5 min (and 8 for two really big tests) because we have tests that take a long time but we should lower it in the future.The steps on the matrix are not cancelled anymore when one failed to allow possible other success and cache their success for a relaunch of the tests.
There is still more improve to do on our tests especially on timeout and rapid execution but this should improve a lot our workflow.
Checklist
Before requesting review