Resolve some falky tests and improve CI times #2401

AurelienFT · 2024-10-28T15:13:46Z

Linked Issues/PRs

Description

This PR fix an issue in P2P heartbeat. The problem was that P2P heartbeat was updated only if new blocks were received or produced. This means that if we start the node from an existing db but doesn't produce blocks and not connect it to anyone it will send block height 0 to the peers that connects to him. We believe that this fix, resolves #2408 #2407 #2406 and #2351.

For #2394 we just increased the timeouts.
For #2393 we removed the panic in the test and just let p2p reconnect
For #2395 we launch this test using multi-threads mode of Tokio to follow the convention of all the others tests that launch a node using FuelCoreDriver. Also we added a kill of the driver to try to kill the node in a more graceful way in all of the test, it should fix a lot of flakyness in these tests

This PR also change the CI workflow by removing all docker related jobs and codecov job. These two set of jobs has been moved to separated workflow that are not triggered automatically but can be triggered manually on the "Actions" tab of this repository (after the merge of this PR).

The tests launched by the CI job now use nextest that allow us to add timeout for each test and provide more detailed output. The timeout is currently 5 min (and 8 for two really big tests) because we have tests that take a long time but we should lower it in the future.
The steps on the matrix are not cancelled anymore when one failed to allow possible other success and cache their success for a relaunch of the tests.

There is still more improve to do on our tests especially on timeout and rapid execution but this should improve a lot our workflow.

Checklist

Breaking changes are clearly marked as such in the PR description and changelog
New behavior is reflected in tests
The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

I have reviewed the code myself
I have created follow-up issues caused by this PR and linked them here

…ow manual in future commit

.github/workflows/ci.yml

bin/e2e-test-client/src/tests/transfers.rs

crates/services/p2p/src/discovery.rs

.config/nextest.toml

bin/e2e-test-client/src/tests/script.rs

xgreenx · 2024-11-12T21:30:34Z

bin/e2e-test-client/src/tests/transfers.rs

@@ -49,7 +49,7 @@ pub async fn transfer_back(ctx: &TestContext) -> Result<(), Failed> {
    }
    // wait until alice sees the transaction
    timeout(
-        ctx.config.sync_timeout(),
+        ctx.config.sync_timeout().checked_mul(2).unwrap(),


The same here: Which issue does it solve?

Sometimes in the CI the query can take very long time to be resolved increasing the timeout doesn't change the behaviour tested and improve stability

I think it is better to increase default wallet_sync_timeout instead

xgreenx · 2024-11-12T21:33:43Z

crates/services/p2p/src/discovery.rs

@@ -369,7 +368,7 @@ mod tests {
                                    .add_address(&peer_id, unroutable_peer_addr.clone());
                            }
                            SwarmEvent::ConnectionClosed { peer_id, .. } => {
-                                panic!("PeerId {peer_id:?} disconnected");
+                                dbg!(peer_id);


Maybe we need to insert the peer into left_to_discover table after it was disconnected

It seems to make, sometimes, the test timeout : https://github.com/FuelLabs/fuel-core/actions/runs/11813227788/job/32909924335#step:8:2439

Should I remove this ?

Hmm, I think we should rewrite discovery_works test. And maybe we should use the service itself instead of the p2p service. You can leave the previous implementation and create a new tech debt to rewrite this test.

crates/services/upgradable-executor/src/executor.rs

crates/services/p2p/src/service.rs

MitchTurner

LGTM. I think this will make all of our lives better.

I'm going to wait until Green's concerns are resolved, but I'd be ready to approve.

AurelienFT · 2024-11-13T08:58:02Z

For clarity, this PR will not solve all the problems but at least a big chunks of it and is a big step forward we could do a round 2 with everything we catch in the next days.

xgreenx · 2024-11-13T18:55:05Z

bin/e2e-test-client/src/tests/transfers.rs

@@ -49,7 +49,7 @@ pub async fn transfer_back(ctx: &TestContext) -> Result<(), Failed> {
    }
    // wait until alice sees the transaction
    timeout(
-        ctx.config.sync_timeout(),
+        ctx.config.sync_timeout().checked_mul(2).unwrap(),


I think it is better to increase default wallet_sync_timeout instead

xgreenx · 2024-11-13T18:56:06Z

crates/services/upgradable-executor/src/executor.rs

            // When
            for _ in 0..1000 {
                let result = executor.validate(&block).map(|_| ());

+                if start.elapsed().as_secs() > 60 {


Suggested change

if start.elapsed().as_secs() > 60 {

if start.elapsed().as_secs() > 1 {

It doesn't get reset between each loop and before the timeout was 60 sec for the whole test, no ?

Oh, yeah, it is true. Maybe it is better to create start before each executor.validate and check that it less than a second

xgreenx · 2024-11-13T18:56:15Z

crates/services/upgradable-executor/src/executor.rs

            // When
            for _ in 0..1000 {
                let result = executor.validate(&block).map(|_| ());

+                if start.elapsed().as_secs() > 60 {


Suggested change

if start.elapsed().as_secs() > 60 {

if start.elapsed().as_secs() > 1 {

…nto resolve_flaky_tests

Test adding nextest

17adede

AurelienFT added the no changelog Skip the CI check of the changelog modification label Oct 28, 2024

AurelienFT self-assigned this Oct 28, 2024

Remove nextest addition

42f5439

AurelienFT changed the title ~~Test adding nextest~~ Try resolve some falky tests and reduce timeout Oct 28, 2024

AurelienFT and others added 20 commits October 28, 2024 16:48

Readd nextest without retried to add timeout

0b63735

Improve robustness backpressure tests

f9bb547

change timing back pressure

e96b25e

Remove timeout in poa test

8ded514

Readd timeout and fix ci

e3ad9c6

Merge branch 'master' into resolve_flaky_tests

3ff4769

Merge branch 'master' into resolve_flaky_tests

95d02ca

try to debug

0e69bfd

fmt, spellcheck

e8137a4

use nocapture

a97ad11

Add base last height to p2p

4dec2aa

allow clippy

fd4bc3b

Clean up branch and split docker production to remove it and only all…

40ada1a

…ow manual in future commit

fmt

6bb56a4

Fix flaky gas price test

3f30d90

remove launch of docker builds

d2fce0f

Merge branch 'master' into resolve_flaky_tests

e4fa26c

remove cancel of other jobs when one fails

dc9c226

fix disable of cancel for jobs

efb84ba

Fix test gas price

5311409

AurelienFT changed the title ~~Try resolve some falky tests and reduce timeout~~ Resolve some falky tests and improve CI times Nov 12, 2024

AurelienFT added 4 commits November 12, 2024 12:36

Split codecov out of CI and increase timing before timeout

d376350

Try to fix gas price test

fc8b8f7

Fix all gas price tests

e86c0b4

remove unused tracing

3179ead

AurelienFT requested review from MitchTurner, rafal-ch, xgreenx and Dentosal as code owners November 12, 2024 16:17

AurelienFT marked this pull request as draft November 12, 2024 16:40

AurelienFT marked this pull request as ready for review November 12, 2024 16:44

AurelienFT requested a review from a team November 12, 2024 16:48

rymnc reviewed Nov 12, 2024

View reviewed changes

.github/workflows/ci.yml Show resolved Hide resolved

bin/e2e-test-client/src/tests/transfers.rs Outdated Show resolved Hide resolved

crates/services/p2p/src/discovery.rs Outdated Show resolved Hide resolved

.config/nextest.toml Show resolved Hide resolved

Enhance test informations and add doc in Contributing.

b2e110c

rymnc previously approved these changes Nov 12, 2024

View reviewed changes

xgreenx reviewed Nov 12, 2024

View reviewed changes

MitchTurner reviewed Nov 13, 2024

View reviewed changes

Readd ntest with bigger timeout and reconnect on peer discovery

f29e216

AurelienFT dismissed rymnc’s stale review via f29e216 November 13, 2024 06:50

AurelienFT requested review from xgreenx and rymnc November 13, 2024 06:51

AurelienFT added 2 commits November 13, 2024 08:21

Fix big timeout filter

d0b7b43

Change timeout test execution

d140df4

rymnc previously approved these changes Nov 13, 2024

View reviewed changes

xgreenx reviewed Nov 13, 2024

View reviewed changes

Update timeouts and remove new behavior discovery_works

f32a3e3

AurelienFT dismissed rymnc’s stale review via f32a3e3 November 13, 2024 23:08

Merge branch 'master' into resolve_flaky_tests

08269f3

AurelienFT requested review from xgreenx and rymnc November 13, 2024 23:08

AurelienFT and others added 4 commits November 14, 2024 09:11

edit snapshot

596dc90

Merge branch 'resolve_flaky_tests' of github.com:FuelLabs/fuel-core i…

6e6611c

…nto resolve_flaky_tests

Merge branch 'master' into resolve_flaky_tests

95e126e

Merge branch 'master' into resolve_flaky_tests

10eafea

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resolve some falky tests and improve CI times #2401

Resolve some falky tests and improve CI times #2401

AurelienFT commented Oct 28, 2024 •

edited

Loading

xgreenx Nov 12, 2024

AurelienFT Nov 13, 2024

xgreenx Nov 13, 2024

AurelienFT Nov 13, 2024

xgreenx Nov 12, 2024

AurelienFT Nov 13, 2024

AurelienFT Nov 13, 2024 •

edited

Loading

xgreenx Nov 13, 2024

AurelienFT Nov 13, 2024

MitchTurner left a comment

AurelienFT commented Nov 13, 2024

xgreenx Nov 13, 2024

xgreenx Nov 13, 2024

AurelienFT Nov 13, 2024

xgreenx Nov 13, 2024

AurelienFT Nov 13, 2024

xgreenx Nov 13, 2024

AurelienFT Nov 13, 2024

	if start.elapsed().as_secs() > 60 {
	if start.elapsed().as_secs() > 1 {

Resolve some falky tests and improve CI times #2401

Are you sure you want to change the base?

Resolve some falky tests and improve CI times #2401

Conversation

AurelienFT commented Oct 28, 2024 • edited Loading

Linked Issues/PRs

Description

Checklist

Before requesting review

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AurelienFT Nov 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MitchTurner left a comment

Choose a reason for hiding this comment

AurelienFT commented Nov 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AurelienFT commented Oct 28, 2024 •

edited

Loading

AurelienFT Nov 13, 2024 •

edited

Loading