Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publisher fails on some tasks when listing parents #6771

Closed
belforte opened this issue Sep 16, 2021 · 5 comments
Closed

Publisher fails on some tasks when listing parents #6771

belforte opened this issue Sep 16, 2021 · 5 comments

Comments

@belforte
Copy link
Member

following up on https://cms-logbook.cern.ch/elog/Analysis+Operations/3550
I find that some tasks are not progressing becasue TaskPublish fails in a new way:

Exception: Failed to execute command: python /data/srv/TaskManager/py2only.3/slc7_amd64_gcc630/cms/crabtaskworker/py2only.3/lib/python2.7/site-packages/Publisher/TaskPublish.pyc  --configFile=/data/srv/Publisher/PublisherConfig.py --taskname=210914_075558:pellicci_crab_ElMu_NANOAOD_2017E_949V1.
 StdErr: Traceback (most recent call last):
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/py2only.3/CRABServer-py2only.3/build/lib/Publisher/TaskPublish.py", line 982, in <module>
    main()
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/py2only.3/CRABServer-py2only.3/build/lib/Publisher/TaskPublish.py", line 978, in main
    result = publishInDBS3(config, taskname, verbose)
  File "/build/cmsbld/jenkins/workspace/CRABServer_BuildOnRelease/w/BUILD/slc7_amd64_gcc630/cms/crabtaskworker/py2only.3/CRABServer-py2only.3/build/lib/Publisher/TaskPublish.py", line 727, in publishInDBS3
    blocksDict = destReadApi.listBlocks(logical_file_name=parentFile)
  File "/data/srv/TaskManager/py2only.3/slc7_amd64_gcc630/cms/dbs3-client/3.16.0-comp/lib/python2.7/site-packages/dbs/apis/dbsClient.py", line 655, in listBlocks
    return self.__callServer("blocks", params=kwargs)
  File "/data/srv/TaskManager/py2only.3/slc7_amd64_gcc630/cms/dbs3-client/3.16.0-comp/lib/python2.7/site-packages/dbs/apis/dbsClient.py", line 203, in __callServer
    self.__parseForException(http_error)
  File "/data/srv/TaskManager/py2only.3/slc7_amd64_gcc630/cms/dbs3-client/3.16.0-comp/lib/python2.7/site-packages/dbs/apis/dbsClient.py", line 230, in __parseForException
    raise HTTPError(http_error.url, data['exception'], data['message'], http_error.header, http_error.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Invalid Input Data /store/use...: Not Match Required Format

@belforte
Copy link
Member Author

The DBS error

RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Invalid Input Data /store/use...: Not Match Required Format

complains about parent file

/store/user/sbrommer/gc_storage/ElMu_data_2017_CMSSW944/TauEmbedding_ElMu_data_2017_CMSSW944_Run2017E/55/merged_5454.root_

The _ sign at right is odd an unusual, I do not know if illegal though, since file names are like that in the input DBS dataset
https://cmsweb.cern.ch/das/request?instance=prod/phys03&input=file+dataset%3D%2FEmbeddingRun2017E%2FElMuFinalState-inputDoubleMu_94X_miniAOD-v2%2FUSER

This is task 210914_075558:pellicci_crab_ElMu_NANOAOD_2017E_949V1
https://cmsweb.cern.ch/crabserver/ui/task/210914_075558%3Apellicci_crab_ElMu_NANOAOD_2017E_949V1

which used this config

from WMCore.Configuration import Configuration
config = Configuration()
config.section_('General')
config.General.transferOutputs = True
config.General.workArea = 'crab_projects/Embedded_NANOAOD_2017'
config.General.requestName = 'ElMu_NANOAOD_2017E_949V1'
config.section_('JobType')
config.JobType.psetName = 'LFVAnalysis_13TeV_NANOAOD_Embedded_2017_cfg.py'
config.JobType.pluginName = 'Analysis'
config.JobType.allowUndistributedCMSSW = True
config.section_('Data')
config.Data.inputDataset = '/EmbeddingRun2017E/ElMuFinalState-inputDoubleMu_94X_miniAOD-v2/USER'
config.Data.outputDatasetTag = 'EmbeddedElMu_NANOAOD_10222V1'
config.Data.publication = True
config.Data.unitsPerJob = 25
config.Data.inputDBS = 'phys03'
config.Data.splitting = 'FileBased'
config.section_('Site')
config.Site.storageSite = 'T2_IT_Bari'

I am puzzled

@belforte
Copy link
Member Author

belforte commented Sep 16, 2021

indeed DBS does not like the '_' at the end

this is OK
(Pdb) destReadApi.listBlocks(logical_file_name='/store/user/belfo/a/b/c/d.root')
[]
but this fails :
(Pdb) destReadApi.listBlocks(logical_file_name='/store/user/belfo/a/b/c/d.root_')
*** HTTPError: HTTP Error 400: Invalid Input Data /store/use...: Not Match Required Format
(Pdb) 

in this case dstReadApi points to phys03, but also global instance refuses the lfn with _ at the end.
I can tweak TaskPublisher.py to skip those parents, but am quite confused at how such file names made it to DBS in the first place, check on parent name should not be more strict than check on iserting new files !

@belforte
Copy link
Member Author

The original input dataset in phys03 does not look like produced by CRAB, anyhow I have asked Yuyi about the file format in dmwm/DBS#656

@belforte
Copy link
Member Author

Looks like I can make TaskPublish run changing

blocksDict = destReadApi.listBlocks(logical_file_name=parentFile)

to

try:
  blocksDict = destReadApi.listBlocks(logical_file_name=parentFile)
except:
  parentsToSkip.add(parentFile)
  continue

belforte added a commit to belforte/CRABServer that referenced this issue Sep 16, 2021
belforte added a commit to belforte/CRABServer that referenced this issue Sep 16, 2021
@belforte belforte reopened this Sep 17, 2021
@belforte
Copy link
Member Author

belforte commented Sep 17, 2021

the above fix is now deployed and works, but it would be better to make it less generic.
E.g. ignore error and go on in case of the now known Invalid Input Data /store/use...: Not Match Required Format but keep treating other exceptions as fatal and abort there so that they will be investigated like it happened now.
Moved to #6774

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant