Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partially fix flaky tests #308

Merged
merged 7 commits into from
Oct 17, 2023
Merged

Partially fix flaky tests #308

merged 7 commits into from
Oct 17, 2023

Conversation

cyqsimon
Copy link
Collaborator

@cyqsimon cyqsimon commented Oct 17, 2023

This is a partial fix, and hence does not close (!! no GitHub, don't close it) #303. A full refactor of main is necessary to get rid of the race condition entirely.

Problems found and fixed

  • Timestamps cause non-deterministic output. Disabled for tests.
  • Got rid of the terrible hack introduced in 07c734a
  • Race condition between input and output worker threads on program exit.

@cyqsimon
Copy link
Collaborator Author

I'll git commit --amend && git push -f a few times to make it fail again. I think there are still problems with Windows.

- This makes snapshots more human-inspectable
- `pause_by_space`
- `rearranged_by_tab`
@cyqsimon
Copy link
Collaborator Author

cyqsimon commented Oct 17, 2023

Okay, found something else in pause_by_space. I think this is the root cause of Windows-specific failures. For details see the extended commit message of 464f7ec.

I realised the problem is not with any single test, but with the actual program termination logic, which explains why the random failures are all over the place.

TLDR, race condition. Bandwhich has one worker thread to handle inputs (a.k.a. terminal events), and one to print to terminal. Currently they are unsynchronized on program exit, which in tests can cause fewer/more draw events than expected, and in deployment can cause garbage to be printed to terminal (which I've personally seen a few times).

Difficult to diagnose, but should be an easy fix.

@cyqsimon
Copy link
Collaborator Author

Actually I'll just go manually trigger the tests. See https://github.com/imsnif/bandwhich/actions/workflows/ci.yaml?query=branch%3Aflaky-test-fix.

@cyqsimon
Copy link
Collaborator Author

Aright. After thinking more about the current implementation, I think the best way forward is to rewrite main using message passing. Currently it's actually the case that the input thread draws to the terminal as well, which is just a disaster waiting to happen:

bandwhich/src/main.rs

Lines 182 to 194 in 5d2ee96

Event::Resize(_x, _y) if !raw_mode => {
let paused = paused.load(Ordering::SeqCst);
ui.draw(
paused,
dns_shown,
elapsed_time(
*last_start_time.read().unwrap(),
*cumulative_time.read().unwrap(),
paused,
),
ui_offset.load(Ordering::SeqCst),
);
}

bandwhich/src/main.rs

Lines 240 to 256 in 5d2ee96

Event::Key(KeyEvent {
modifiers: KeyModifiers::NONE,
code: KeyCode::Tab,
kind: KeyEventKind::Press,
..
}) => {
let paused = paused.load(Ordering::SeqCst);
let elapsed_time = elapsed_time(
*last_start_time.read().unwrap(),
*cumulative_time.read().unwrap(),
paused,
);
let table_count = ui.get_table_count();
let new = ui_offset.load(Ordering::SeqCst) + 1 % table_count;
ui_offset.store(new, Ordering::SeqCst);
ui.draw(paused, dns_shown, elapsed_time, new);
}

I will merge this PR now without closing #303. This work warrants a separate PR.

@cyqsimon cyqsimon marked this pull request as ready for review October 17, 2023 08:28
@cyqsimon cyqsimon changed the title Fix flaky tests Partially fix flaky tests Oct 17, 2023
@cyqsimon cyqsimon merged commit d9cc84b into main Oct 17, 2023
@cyqsimon cyqsimon deleted the flaky-test-fix branch October 17, 2023 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant