Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exclude PluginManager cache from MaxMemoryPreload monitoring #46542

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

makortel
Copy link
Contributor

@makortel makortel commented Oct 29, 2024

PR description:

As described in #46359, developer areas with many checked out packages result in larger "max memory" reported by MaxMemoryPreload AllocMonitor. This additional memory is used by PluginManager cache.

As a simple (albeit a bit hacky) workaround, this PR adds hooks to the MaxMemoryPreload that allow its data collection to be paused, and uses those hooks in PluginManager to paused the data collection during the cache file reading.

Resolves #46359
Resolves cms-sw/framework-team#1074

PR validation:

Tested the impact with two developer areas of CMSSW_14_2_0_pre2: one with many packages checked out, and one with only two packages (FWCore/PluginManager and PerfTools/MaxMemoryPreload). The remaining difference in "max memory" was about 1 MB, or 0.05 %.

@cmsbuild
Copy link
Contributor

cmsbuild commented Oct 29, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @makortel for master.

It involves the following packages:

  • FWCore/PluginManager (core)
  • PerfTools/MaxMemoryPreload (core)

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@felicepantaleo, @missirol, @wddgit this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@makortel
Copy link
Contributor Author

@Dr15Jones please review

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-INPUT
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e2b65c/42447/summary.html
COMMIT: 45876c8
CMSSW: CMSSW_14_2_X_2024-10-29-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46542/42447/install.sh to create a dev area with all the needed externals and cmssw changes.

  • DAS Queries: The DAS query tests failed, see the summary page for details.

RelVals-INPUT

  • 2024.0000012024.000001_RunJetMET02024D_10k/step1_dasquery.log
  • 2024.0010012024.001001_RunZeroBias2024D_10k/step1_dasquery.log
  • 2024.1000012024.100001_RunJetMET02024C_10k/step1_dasquery.log
Expand to see more relval errors ...
  • 2024.101001
  • 2024.000001
  • 2024.001001
  • 2024.100001
  • 2024.101001

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 46
  • DQMHistoTests: Total histograms compared: 3569372
  • DQMHistoTests: Total failures: 416
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3568936
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
  • Checked 201 log files, 171 edm output root files, 46 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

Pull request #46542 was updated. @Dr15Jones, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2024

@cmsbuild, please abort

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2024

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2024

-1

Failed Tests: RelVals-INPUT
Size: This PR adds an extra 12KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-e2b65c/42621/summary.html
COMMIT: b121ddf
CMSSW: CMSSW_14_2_X_2024-11-06-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/46542/42621/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-INPUT

  • 2024.3030012024.303001_RunDisplacedJet2024E_10k/step1_dasquery.log
  • 2024.303001DAS Error

Comparison Summary

Summary:

  • You potentially added 1 lines to the logs
  • Reco comparison results: 10 differences found in the comparisons
  • DQMHistoTests: Total files compared: 46
  • DQMHistoTests: Total histograms compared: 3343138
  • DQMHistoTests: Total failures: 408
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3342710
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 45 files compared)
  • Checked 195 log files, 172 edm output root files, 46 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2024

Comparison differences are related to #46416

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2024

To document the level of reduction in max memory
image

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2024

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @antoniovilela, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2024

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2024

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

@makortel
Copy link
Contributor Author

makortel commented Nov 6, 2024

ignore tests-rejected with external-failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment