Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update NVIDIA gdrcopy to v2.4.4 #9581

Open
wants to merge 1 commit into
base: IB/CMSSW_15_0_X/master
Choose a base branch
from

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Dec 16, 2024

Bug fixes:

  • fix a use-after-free bug of mr objects in gdrdv_vma_close();
  • fix a resource leak in gdrdrv_release.

Bug fixes:
  - fix a use-after-free bug of mr objects in gdrdv_vma_close();
  - fix a resource leak in gdrdrv_release.
@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 16, 2024

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 16, 2024

please test

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @fwyzard for branch IB/CMSSW_15_0_X/master.

@iarspider, @smuzaffar can you please review it and eventually sign? Thanks.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.
cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Dec 16, 2024

cms-bot internal usage

@fwyzard
Copy link
Contributor Author

fwyzard commented Dec 16, 2024

type bugfix

@cmsbuild
Copy link
Contributor

-1

Failed Tests: GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-205f2f/43481/summary.html
COMMIT: b2d9117
CMSSW: CMSSW_15_0_X_2024-12-16-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9581/43481/install.sh to create a dev area with all the needed externals and cmssw changes.

GPU Unit Tests

I found 3 errors in the following unit tests:

---> test testCudaDeviceAdditionWrapper had ERRORS
---> test testCudaDeviceAdditionKernel had ERRORS
---> test testTorchSimpleDnnCUDA had ERRORS

Comparison Summary

Summary:

  • You potentially added 4 lines to the logs
  • Reco comparison results: 18661 differences found in the comparisons
  • DQMHistoTests: Total files compared: 46
  • DQMHistoTests: Total histograms compared: 3510017
  • DQMHistoTests: Total failures: 20192
  • DQMHistoTests: Total nulls: 7
  • DQMHistoTests: Total successes: 3489798
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.008 KiB( 45 files compared)
  • DQMHistoSizes: changed ( 2024.000001 ): 0.008 KiB JetMET/SUSYDQM
  • DQMHistoSizes: changed ( 2024.303001 ): -0.016 KiB JetMET/SUSYDQM
  • Checked 202 log files, 172 edm output root files, 46 DQM output files
  • TriggerResults: found differences in 1 / 44 workflows

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 24 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53058
  • DQMHistoTests: Total failures: 863
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 52195
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

@smuzaffar
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-205f2f/43494/summary.html
COMMIT: b2d9117
CMSSW: CMSSW_15_0_X_2024-12-16-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmsdist/9581/43494/install.sh to create a dev area with all the needed externals and cmssw changes.

The following merge commits were also included on top of IB + this PR after doing git cms-merge-topic:

You can see more details here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-205f2f/43494/git-recent-commits.json
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-205f2f/43494/git-merge-result

GPU Unit Tests

I found 3 errors in the following unit tests:

---> test testCudaDeviceAdditionWrapper had ERRORS
---> test testCudaDeviceAdditionKernel had ERRORS
---> test testTorchSimpleDnnCUDA had ERRORS

Comparison Summary

Summary:

  • You potentially added 15 lines to the logs
  • Reco comparison results: 8 differences found in the comparisons
  • DQMHistoTests: Total files compared: 46
  • DQMHistoTests: Total histograms compared: 3510017
  • DQMHistoTests: Total failures: 466
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3509531
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
  • Checked 202 log files, 172 edm output root files, 46 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 0 differences found in the comparisons
  • DQMHistoTests: Total files compared: 7
  • DQMHistoTests: Total histograms compared: 53058
  • DQMHistoTests: Total failures: 32
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 53026
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 6 files compared)
  • Checked 24 log files, 30 edm output root files, 7 DQM output files
  • TriggerResults: no differences found

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants