-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MSUnmerged initStandalone && Read AllUnmerged from file #11916
base: master
Are you sure you want to change the base?
Add MSUnmerged initStandalone && Read AllUnmerged from file #11916
Conversation
Jenkins results:
|
@amaltaro Take look at these two functions from the current PR: They are basically exactly the same as the respective methods at the MSUnmerged class:
and
The only difference is that they do all the job through io operations to disk instead of memory. When it comes to huge lists in the order of GBs this implroves the service performance and memory consumption tremendously. I was actually surprised to see the running speed did not degrade as much as I expected. I am considering adding this functionality as optional parameters to the already existing two methods. What do you think? |
With my latest commit, I finally get the direct deletions with the os based library right. Together with the lfn deletions I now also update the rse counters, such that upon the RSE cleanup we will have everything tracked just as it was with I have tested everything with 50 directories. So far so good. Tomorrow morning I'll shoot for cleaning the whole unmerged area at T2_CH_CERN FYI @amaltaro |
Jenkins results:
|
@amaltaro No need for a detailed review. Just try to grasp the idea of the two big improvements this code provides:
We may consider adding these functionalities to the main code as well. Refactoring them to become proper MSUnmerged methods is not too much work. |
The
NOTE: Please notice the mistaken field Here: initStandalone-T2_CH_CERN-2024-03-01T11:05+01:00-RealRun.log Here: #11904 (comment) is the report about the amount of space freed. [1]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Todor, for the smaller memory footprint improvements that you provided in here. I understand that the bulk of that implementation is in the getUnmergedfromFile
and filterUnmergedFromFile
methods, right?
In plain english, my understanding of the logic for parsing the files from the consistency monitoring is:
- for each file from ConMon dump, short it to a given path length and add it to a set of LFN paths (dirs.allUnmerged field)
- remove LFN paths from dirs.allUnmerged If they are also present in the protected list
- then iterate over each LFN path in dirs.allUnmerged and
a) open the ConMon dump, and scan each file/line for the LFN path, yielding those lines
Which means, if we have 10k unique directories, we would open that file 10k times and scan it as a whole 10k times. Even though that is doable, I think we can come up with other options.
One of those could be:
- either make a feature request to the Rucio ConMon to provide a sorted list of files, or open the ConMon dump and sort files by name (well, by LFN path)
- slice the ConMon dump by X files/lines (e.g. 50k entries)
- execute the MSUnmerged logic on each slice separately
What do you think?
hi @amaltaro
Correct
Yes but this is not new this is how things work even right now.
No this is not correct. The lists in
Again no. If you refer to this: https://github.com/dmwm/WMCore/pull/11916/files#diff-d829adc30e84e637ed37c8770f8a0f84eb357772a5f210b79c8ba0f385091dfdR353-R360
These are simple closures which are used as file list generators (indeed by opening and parsing the main file holding the full list of lfns at the site), but none of those are actually executed. Those are only function definitions recorded in the dictionary with toplevel directory paths. The key is the toplevel directory itself, the value is a pointer referring to the proper address in memory where those function definitions are. No file is actually opened during this process of filling the records in the dictionary. No file descriptors are created at this stage. We refer to those for executing the actual function and opening the main file for iterating through it (in order to filter the lists of lfns), only for those deletions for which we fail to delete the topmost path due to errors of the sort :
No we are not doing that. I explained the process above.
That may help. And we can use many search optimization algorithms. But indeed starting with a sorted list is much faster.
This was my exact approach at the beginning. And I ran into complications of properly doing the bookeeping in the database. those were not that difficult to sort, but would need require additional code. At the end I went boldly for working with the whole (the largest seen so far) list of lfns for CERN and it did not take more than few hundreds megabytes in memory, even though the file itself was in the order of Gb. |
Can one of the admins verify this patch? |
Fixes #11904
Status
Description
With the Current PR I provide an initialization script for running the MSUnmergred service standalone. It requires the service config files in order to be able to run it. Additional feature added is the ability to read all unmerged files from disk. This way we avoid uploading huge lists in the order of GBs in memory.
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
service_configs
WMCore virtual environment: