Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_informer.rb fails sporadically #584

Open
cben opened this issue Nov 24, 2022 · 8 comments
Open

test_informer.rb fails sporadically #584

cben opened this issue Nov 24, 2022 · 8 comments

Comments

@cben
Copy link
Collaborator

cben commented Nov 24, 2022

I'm seeing various sporadic failures on test_informer.rb, or sometimes it gets stuck until timeout kills it. Examples:

@grosser would you like to investigate?
I haven't looked inside, no idea if just flaky test, or actual bug...

@grosser
Copy link
Contributor

grosser commented Nov 24, 2022

not using this library, so no thanks :D

@grosser
Copy link
Contributor

grosser commented Nov 24, 2022

oh no
image

@grosser
Copy link
Contributor

grosser commented Nov 24, 2022

ahh it's the actual kubeclient ... did this repo get renamed ?
... I though this was something else 🤦
I'll take a look ...

@grosser
Copy link
Contributor

grosser commented Nov 24, 2022

thx for the nice writeup, good to have the actual backtraces and to know it's not a single issue but multiple places

  • ran it 100 times on 3.0 locally and no failure
  • ran it 100 times on 2.7.6 and no failure

the "expected 1 got 2 watch" error would mean that the watch crashed and restarted :/

#586

... maybe you get it to fail again locally 🤞

@cben
Copy link
Collaborator Author

cben commented Nov 27, 2022

(Yes, repo was moved under manageiq org when Alissa @abonas was leaving Red Hat so we don't depend on her for future maintainer handoffs, and in hope Adam @agrare would join or at least be backup maintainer as I'm having less and less time for it.)

@cben
Copy link
Collaborator Author

cben commented Nov 27, 2022

BTW I notice with_worker does this: sleep(0.03) # wait for worker to watch which (at least in theory) is not guaranteed. And one test does sleep(0.02) # wait for watch to finish. Generally all uses of sleep() are suspect.
But I haven't dug into logic to say if any sleep race conditions are plausible explanations for any actual failure modes...

@grosser
Copy link
Contributor

grosser commented Nov 27, 2022

maybe #587 fixes this ...

@DocX
Copy link
Contributor

DocX commented Jan 5, 2023

Also few race conditions will be fixed in #597 that could cause the flakiness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants