-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assistance to broader Tag releases #395
Comments
Thanks for reaching out, we absolutely want to get back to general releases. As you rightly call out, test coverage across providers is one reason why we have not done that yet and any additional testing is welcome. Are you saying you have nightly tests running with the OFI NCCL plugin and the CXI provider already? If so, are you testing with the master branch? FWIW, we currently track unit tests, functional tests using just the plugin interfaces, NCCL-level tests using the nccl-tests benchmark suite, and application tests to gain confidence in a release. This wiki page has some more information about what gets tested with each Pull Request: https://github.com/aws/aws-ofi-nccl/wiki/Testing-and-Benchmarking. In addition, we have a more exhaustive nightly test suite that tests changes with real applications. |
Actually, these nightly tests I mentioned are tests for our internal software stack, and we are using nccl-tests as part of these tests. I apologize for the bad wording. |
Hey @rajachan, just checking in... |
@caio-davi Yes! One thing we can do in the near term is extending the PR presubmit checks to include your tests with the cxi provider, to ensure commits that go into the master branch build and work as expected . We can setup any Github hooks needed for your automation to see new PRs, run the tests on your systems, and report back test results. Feel free to reach out to us at |
Is there any update on this? One unfortunate aspect of the I'd go as far as to suggest that even if we can't validate nccl-net-ofi with a provider other than efa in CI, we should just put a disclaimer in the readme and still go back to doing general releases. |
@caio-davi and I talked and we are going to meet next week to find a path forward here. |
Hello team,
I noticed you have been updating most of the last tagged releases with the
-aws
suffix. I think I can assist if you need help testing it on other libfabric providers.We have some nightly tests running on slingshot fabric, and I could add your test bed to it. This way the releases would have been tested in at least two providers.
Please, let me know if it is something you are interested!
The text was updated successfully, but these errors were encountered: