Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDP-272039: Fail when no tests are run #1809

Merged
merged 6 commits into from
Sep 12, 2024
Merged

CLOUDP-272039: Fail when no tests are run #1809

merged 6 commits into from
Sep 12, 2024

Conversation

josvazg
Copy link
Collaborator

@josvazg josvazg commented Sep 6, 2024

CLOUDP-272039

Detect and fail when a test suite did not run any tests. Then:

  • deletion-protection test was not running because the test suite was removed at CLOUDP-229283: disable ownership detection #1371
  • helm-update was not running because of a mistake comparing the major versions.
  • helm-update was failing because we had to pass the pull secret name at upgrade time.

All Submissions:

  • Have you signed our CLA?

@josvazg josvazg requested review from helderjs, s-urbaniak and cveticm and removed request for helderjs September 6, 2024 15:31
@josvazg josvazg added the cloud-tests Run expensive Cloud Tests: Integration & E2E label Sep 6, 2024
Copy link
Contributor

github-actions bot commented Sep 6, 2024

AKO_E2E_TEST=1 ginkgo --label-filter="atlas-gov" --timeout 120m --nodes=10 --flake-attempts=1 --randomize-all --race --cover --v --trace --show-node-events --output-interceptor-mode=none --coverpkg=github.com/mongodb/mongodb-atlas-kubernetes/v2/pkg/... | tee .e2e.log
if grep -q "0 Passed" < .e2e.log ; then
echo "No tests were run!"
exit 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit concerned with the repetition of this line across many scripts below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI should be running the same code as the local dev, but it does not. That is why this fix needs so much repetition.

Fixing the duplicity is a bigger issue than the focus of this PR. It has already been flagged before but not prioritised.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josvazg I took a quick look on Ginkgo and we can reduce duplication by adding the following code to the Ginkgo suites we have.

var _ = ReportAfterSuite("Ensure test suite was not empty", func(r Report) {
	Expect(r.SuiteSucceeded && r.PreRunStats.SpecsThatWillRun > 0).To(BeTrue(), "Suite must run at least 1 test in order to be valid")
})

What do you think?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm, yes I think that would be better.

There is a problem, when running locally we are calling the int tests and int/clusterwide somewhat separately, so we might run a label that only runs on one of them and the other should not complain. There is probably another solution for it, but need to figure that out.

In the meantime, I need to understand the helm-update test failure, which only happens in the CI for some reason.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied this suggestion already.

@@ -130,6 +131,10 @@ func WaitTestApplication(data *model.TestDataProvider, ns, labelKey, labelValue
// temp
isAppRunning := func() func() bool {
return func() bool {
status, err := k8s.GetPodStatus(data.Context, data.K8SClient, ns)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these debugging bits? if not, could you clarify how it supports the second status check (line 138)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I am debugging, please ignore these for the time being.

I will mark as draft and iterate in the helm-update error first, then will try to move to check the issues within ginkgo instead of from a script, as you suggested above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the traces, thinking about a different way to debug

@josvazg josvazg marked this pull request as draft September 10, 2024 12:03
@josvazg
Copy link
Collaborator Author

josvazg commented Sep 10, 2024

Debugging helm-update e2e test issue.

@josvazg
Copy link
Collaborator Author

josvazg commented Sep 11, 2024

After adding traces:
https://github.com/mongodb/mongodb-atlas-kubernetes/actions/runs/10808820530/job/29983049730?pr=1809

  failed to get pods: failed to get deployment: deployments.apps "mongodb-atlas-operator" not found
  failed to get stream: container "manager" in pod "mongodb-atlas-operator-6f76cd65d9-8kbkh" is waiting to start: image can't be pulled

@josvazg
Copy link
Collaborator Author

josvazg commented Sep 11, 2024

@josvazg josvazg removed the cloud-tests Run expensive Cloud Tests: Integration & E2E label Sep 11, 2024
@josvazg josvazg marked this pull request as ready for review September 11, 2024 12:56
Signed-off-by: jose.vazquez <[email protected]>
@josvazg
Copy link
Collaborator Author

josvazg commented Sep 12, 2024

Easier review guide:

ab3c717 Forces failure if tests are skipped. It also fixed the int local script so it invoked clusterwide when using a cluster wide label

2ad192f Removes a non existing test tag deletion-protection that was removed some time ago

9b8d614 Fixes an issue comparing major versions that was skipping the helm-update e2e test.

0504f1f Ensures e2e test helm-update uses the correct pull secret when upgrading the helm chart.

de2f156 Improves the logging for helm-update which was needed to actually debug and understand the failure.

747f5ce Makes the helm upgrade an atomic operation which makes it more reliable and also would dump the error in line if it fails.

Copy link
Collaborator

@roothorp roothorp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Walked through in zoom together

@josvazg josvazg merged commit 8cfd626 into main Sep 12, 2024
56 checks passed
@roothorp roothorp deleted the fail-no-tests-run branch September 26, 2024 10:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants