Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test] baseline failures #15632

Closed
wants to merge 10 commits into from
Closed

Conversation

danielxiangzl
Copy link
Contributor

Description

How Has This Been Tested?

Key Areas to Review

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Move Compiler
  • Other (specify)

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

@danielxiangzl danielxiangzl added the CICD:build-failpoints-images Build failpoints docker image label Dec 18, 2024
Copy link

trunk-io bot commented Dec 18, 2024

⏱️ 3h 18m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
adhoc-forge-test / forge 31m 🟥
test-target-determinator 28m 🟩🟩🟩🟩🟩 (+3 more)
rust-move-tests 14m 🟥🟩
rust-cargo-deny 14m 🟩🟩🟩🟩🟩 (+3 more)
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 13m 🟩
rust-move-tests 12m 🟩
check-dynamic-deps 9m 🟩🟩🟩🟩🟩 (+4 more)
semgrep/ci 4m 🟩🟩🟩🟩🟩 (+4 more)
general-lints 4m 🟩🟩🟩🟩🟩 (+3 more)
file_change_determinator 2m 🟩🟩🟩🟩🟩 (+3 more)

settingsfeedbackdocs ⋅ learn more about trunk.io

)
.await
.unwrap();
panic!("test_fault_tolerance_of_leader_equivocation");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panic! statement causes the test to fail unconditionally, regardless of whether the test logic succeeds or fails. Since the test appears to be validating leader equivocation fault tolerance, the test should be allowed to complete normally and verify the expected behavior through its assertions.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

Comment on lines +20 to +48
#[async_trait]
impl NetworkLoadTest for PerformanceBenchmark {
async fn test(
&self,
swarm: Arc<tokio::sync::RwLock<Box<dyn Swarm>>>,
_report: &mut TestReport,
duration: Duration,
) -> Result<()> {
let validators = { swarm.read().await.get_validator_clients_with_names() };
// 10 vals, test 1,2,3 failures
let num_bad_leaders = 1;
for (name, validator) in validators[..num_bad_leaders].iter() {
validator
.set_failpoint(
"consensus::leader_equivocation".to_string(),
"return".to_string(),
)
.await
.map_err(|e| {
anyhow!(
"set_failpoint to set consensus leader equivocation on {} failed, {:?}",
name,
e
)
})?;
};
Ok(())
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should honor the duration parameter by adding tokio::time::sleep(duration).await after setting the failpoints. This ensures the test runs for the expected duration and gives the failpoints time to take effect. Without this sleep, the test may complete prematurely before the fault injection has meaningful impact.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

@danielxiangzl danielxiangzl force-pushed the daniel-baseline-failures branch from dde8216 to 328f7e8 Compare December 18, 2024 21:56
Comment on lines 22 to 48
async fn test(
&self,
swarm: Arc<tokio::sync::RwLock<Box<dyn Swarm>>>,
_report: &mut TestReport,
duration: Duration,
) -> Result<()> {
let validators = { swarm.read().await.get_validator_clients_with_names() };
// 10 vals, test 1,2,3 failures
let num_bad_leaders = 3;
for (name, validator) in validators[..num_bad_leaders].iter() {
validator
.set_failpoint(
"consensus::leader_equivocation".to_string(),
"return".to_string(),
)
.await
.map_err(|e| {
anyhow!(
"set_failpoint to set consensus leader equivocation on {} failed, {:?}",
name,
e
)
})?;
};
Ok(())
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test currently returns immediately after setting the failpoints, without waiting for the specified duration. This means the test may complete before the injected failures have time to manifest and be observed. Adding tokio::time::sleep(duration).await before returning would ensure the test runs for the intended duration while the failpoints are active.

Spotted by Graphite Reviewer

Is this helpful? React 👍 or 👎 to let us know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:build-failpoints-images Build failpoints docker image
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant