Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve some falky tests and improve CI times #2401

Open
wants to merge 45 commits into
base: master
Choose a base branch
from

Conversation

AurelienFT
Copy link
Contributor

@AurelienFT AurelienFT commented Oct 28, 2024

Linked Issues/PRs

Fix #2408
Fix #2407
Fix #2406
Fix #2351

Fix #2393
Fix #2394
Fix #2395

Description

This PR fix an issue in P2P heartbeat. The problem was that P2P heartbeat was updated only if new blocks were received or produced. This means that if we start the node from an existing db but doesn't produce blocks and not connect it to anyone it will send block height 0 to the peers that connects to him. We believe that this fix, resolves #2408 #2407 #2406 and #2351.

For #2394 we just increased the timeouts.
For #2393 we removed the panic in the test and just let p2p reconnect
For #2395 we launch this test using multi-threads mode of Tokio to follow the convention of all the others tests that launch a node using FuelCoreDriver. Also we added a kill of the driver to try to kill the node in a more graceful way in all of the test, it should fix a lot of flakyness in these tests

This PR also change the CI workflow by removing all docker related jobs and codecov job. These two set of jobs has been moved to separated workflow that are not triggered automatically but can be triggered manually on the "Actions" tab of this repository (after the merge of this PR).

The tests launched by the CI job now use nextest that allow us to add timeout for each test and provide more detailed output. The timeout is currently 5 min (and 8 for two really big tests) because we have tests that take a long time but we should lower it in the future.
The steps on the matrix are not cancelled anymore when one failed to allow possible other success and cache their success for a relaunch of the tests.

There is still more improve to do on our tests especially on timeout and rapid execution but this should improve a lot our workflow.

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

@AurelienFT AurelienFT added the no changelog Skip the CI check of the changelog modification label Oct 28, 2024
@AurelienFT AurelienFT self-assigned this Oct 28, 2024
@AurelienFT AurelienFT changed the title Test adding nextest Try resolve some falky tests and reduce timeout Oct 28, 2024
@AurelienFT AurelienFT changed the title Try resolve some falky tests and reduce timeout Resolve some falky tests and improve CI times Nov 12, 2024
@AurelienFT AurelienFT marked this pull request as draft November 12, 2024 16:40
@AurelienFT AurelienFT marked this pull request as ready for review November 12, 2024 16:44
@AurelienFT AurelienFT requested a review from a team November 12, 2024 16:48
.github/workflows/ci.yml Show resolved Hide resolved
bin/e2e-test-client/src/tests/transfers.rs Outdated Show resolved Hide resolved
crates/services/p2p/src/discovery.rs Outdated Show resolved Hide resolved
.config/nextest.toml Show resolved Hide resolved
rymnc
rymnc previously approved these changes Nov 12, 2024
Copy link
Member

@MitchTurner MitchTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think this will make all of our lives better.

I'm going to wait until Green's concerns are resolved, but I'd be ready to approve.

@AurelienFT
Copy link
Contributor Author

For clarity, this PR will not solve all the problems but at least a big chunks of it and is a big step forward we could do a round 2 with everything we catch in the next days.

rymnc
rymnc previously approved these changes Nov 13, 2024
bin/e2e-test-client/src/tests/transfers.rs Outdated Show resolved Hide resolved
// When
for _ in 0..1000 {
let result = executor.validate(&block).map(|_| ());

if start.elapsed().as_secs() > 60 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if start.elapsed().as_secs() > 60 {
if start.elapsed().as_secs() > 1 {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't get reset between each loop and before the timeout was 60 sec for the whole test, no ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yeah, it is true. Maybe it is better to create start before each executor.validate and check that it less than a second

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// When
for _ in 0..1000 {
let result = executor.validate(&block).map(|_| ());

if start.elapsed().as_secs() > 60 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if start.elapsed().as_secs() > 60 {
if start.elapsed().as_secs() > 1 {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no changelog Skip the CI check of the changelog modification
Projects
None yet
4 participants