Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: Add retry if promote member fails. #19272

Merged

Conversation

jmao-dd
Copy link
Contributor

@jmao-dd jmao-dd commented Jan 25, 2025

To handle the case of learner not being ready when being promoted, add retry with some delay to improve the stability of e2e tests.

This is for #19216

Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.

@k8s-ci-robot
Copy link

Hi @jmao-dd. Thanks for your PR.

I'm waiting for a etcd-io member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link

codecov bot commented Jan 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 68.86%. Comparing base (83771ff) to head (692947a).
Report is 38 commits behind head on main.

Additional details and impacted files

see 29 files with indirect coverage changes

@@            Coverage Diff             @@
##             main   #19272      +/-   ##
==========================================
+ Coverage   68.81%   68.86%   +0.04%     
==========================================
  Files         420      420              
  Lines       35649    35680      +31     
==========================================
+ Hits        24533    24572      +39     
+ Misses       9691     9684       -7     
+ Partials     1425     1424       -1     

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 83771ff...692947a. Read the comment docs.

"go.etcd.io/etcd/server/v3/etcdserver"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

Comment on lines 181 to 191
_, err = epc.Etcdctl(e2e.WithEndpoints(endpoints)).MemberPromote(ctx, id)
attempt := 1
for err != nil && (attempt < 4) && strings.Contains(err.Error(), "can only promote a learner member which is in sync with leader") {
t.Logf("Learner is not ready yet, retry for the %v time", attempt)
time.Sleep(100 * time.Duration(attempt) * time.Millisecond)
_, err = epc.Etcdctl(e2e.WithEndpoints(endpoints)).MemberPromote(ctx, id)
attempt++
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_, err = epc.Etcdctl(e2e.WithEndpoints(endpoints)).MemberPromote(ctx, id)
attempt := 1
for err != nil && (attempt < 4) && strings.Contains(err.Error(), "can only promote a learner member which is in sync with leader") {
t.Logf("Learner is not ready yet, retry for the %v time", attempt)
time.Sleep(100 * time.Duration(attempt) * time.Millisecond)
_, err = epc.Etcdctl(e2e.WithEndpoints(endpoints)).MemberPromote(ctx, id)
attempt++
}
attempt := 0
for attempt < 3 {
_, err = epc.Etcdctl(e2e.WithEndpoints(endpoints)).MemberPromote(ctx, id)
if err == nil || !strings.Contains(err.Error(), "can only promote a learner member which is in sync with leader"){
break
}
time.Sleep(100 * time.Millisecond)
attempt++
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not resolve this comment correctly. We only need to call MemberPromote inside the loop. It causes a regression #19279

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, sorry I didn't notice the double call to MemberPromote.

@ahrtr
Copy link
Member

ahrtr commented Jan 25, 2025

Please squash the commit.

To handle the case of learner not being ready when being promoted, add
retry with some delay to improve the stability of e2e tests.

Signed-off-by: Jiayin Mao <[email protected]>
@jmao-dd jmao-dd force-pushed the jmao/19216-flakey-promote-learner-test branch from 5a868a8 to 692947a Compare January 26, 2025 00:56
@k8s-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jmao-dd, serathius

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@serathius serathius merged commit 532c601 into etcd-io:main Jan 26, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants