-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
illlegal file names in DBS ? #656
Comments
I tried to check if Lexicon would have spotted this, and much to my surprise it accepts the file name with
|
In any case, it seems not good that DBS does not allow to lookup information using as search string a file name stored in it. |
Probably should be
i.e. anchor the beginning and end of the string with |
sounds right @dan131riley. I doubt the lack of anchors was intentional. What do you think @amaltaro , any danger in making Lexicon really do what it meant to do ? |
I totally agree that we should add Regarding read operations, I did not know that regex checks were enforced there as well. Perhaps it's required to accept some "wildcards" in the user read calls? Otherwise, maybe this is something that we could discuss for the future generation and see whether it can be relaxed indeed. |
I was quite busy today and got a lot of unread emails. I don't remember on top of my head that DBS reader and writer use different checks. The only one we use is the Lexicon shared by DMWM. I will check the code. |
I checked DBS code, the reader has much relaxed check than the writer because that we introduced the common Lexicon much later than the time CMS data recorded. DBS reader has to be able to read the old data. Here is what we have in the reader lfn check:
The CMS lfn should have the format as define here : https://github.com/dmwm/WMCore/blob/bb573b442a53717057c169b05ae4fae98f31063b/src/python/WMCore/Lexicon.py#L347 |
thanks @yuyiguo . I remember that rules for reading had to be more relaxed, that's why I was suprised that some LFNs could be present in DBS yet not usable for reading. I suspect there is no solution for the files with bad names. I suggest to leave them as they are. I have now changed CRAB Publisher code so that it skips parent files which can't be found in DBS. And if a user complains that parents have not been recorded, we'll know what to say. |
Ok, @belforte . |
Hi @yuyiguo
I have found an odd thing chasing some CRAB Publisher issue.
There are jobs which processed this dataset
/EmbeddingRun2017E/ElMuFinalState-inputDoubleMu_94X_miniAOD-v2/USER
fromphys03
When inserting the outputs in DBS, I get an exception from DBS when looking up the jobs parent file names because those LFN ends with the underscore character
_
, i.e. calls to DBS API return error with that:while this works (of course this file is not present, so it returns an empty llist)
So.. if the underscore at the end is illegal, how coudl those file names enter in DBS to begin with ?
see:
https://cmsweb.cern.ch/das/request?instance=prod/phys03&input=file+dataset%3D%2FEmbeddingRun2017E%2FElMuFinalState-inputDoubleMu_94X_miniAOD-v2%2FUSER
Is this a shortcmoning in WMCore's Lexicon, or some stricter checking in DBS list API ?
In CRAB we always validate user LFN's with Lexicon before attempting to insert in DBS but I can not find clear confirmation that this dataset has been put in phys03 by CRAB, e.g. there is a single block with 15K files and CRAB always has a limit at 100 files per block.
The text was updated successfully, but these errors were encountered: