-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove redundant data members from InputFileCatalog to reduce memory use #47013
base: master
Are you sure you want to change the base?
Conversation
cms-bot internal usage |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47013/43114 |
A new Pull Request was created by @makortel for master. It involves the following packages:
@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild, please test |
@Dr15Jones please test |
It seems to me we don't have good tests for |
+1 Size: This PR adds an extra 24KB to repository Comparison SummarySummary:
|
The fileNames_ was used only in init(), and before being modified in init() it was a direct copy of logicalFileNames_. There doesn't seem to be any real need to store it as a data member, and removing it allows to save ~100 MB memory per stream on an example MC production (using premixing) DIGI job whose configuration specified had ~500k pileup files.
The logicalFileNames() was not used, so removing the member allows avoiding one copy of the LFNs.
The constructor makes a copy of the fileNames anyway. For the cases where InputFileCatalog is constructed with temporary or moved vector<string>, this change allows to avoid copying that vector. The Sources tend to pass in directly a temporary vector<string> from the ParameterSet, so there could be a visible reduction in memory churn.
ae9d552
to
6777337
Compare
Added a unit test for |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47013/43118 |
Pull request #47013 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
@cmsbuild, please test |
+1 Size: This PR adds an extra 28KB to repository Comparison SummarySummary:
|
PR description:
In #46975 (comment) I discovered the
InputFileCatalog
took ~488 MB memory per stream in a production DIGI job overlaying premixed pileup, where the job was configured with nearly 500k pileup files. A quick look inInputFileCatalog
showed the input file names are more or less stored three timesfileNames_
to communicate a copy of the input file names from constructor toinit()
logicalFileNames_
to partly communicate input file names from constructor toinit()
, and partly to allow cheaplogicalFileNames()
getter to themlogicalFileNames()
function is not really used, so in order to avoid storing the file names in member data, I decided to remove the member function and thelogicalFileNames_
member in the third commitFileCatalogItem::lfn_
stored infileCatalogItems_
memberlfn_
is being usedAn alternative to the second commit could be to keep the
logicalFileNames_
, and store theFileCatalogItem::lfn_
asstring_view
, but I felt that to be a tiny bit more complex.The fourth commit avoids one copy of the
std::vector<std::string>
of the file names when theInputFileCatalog
is constructed from a temporaryvector
, which is the case with all Sources that useInputFileCatalog
.The first commit adds a unit test for
InputFileCatalog
Resolves cms-sw/framework-team#1113
PR validation:
Unit tests passed in CMSSW_14_0_18. With example job #46975 (comment) MaxMemoryPreload showed 197 MB reduction in peak allocated memory on 1 thread.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
To be backported to 14_1_X and 14_0_X.