MSOutput: set output datasets to VALID in DBS3 before announcing a standard workflow #9837

amaltaro · 2020-07-13T12:42:46Z

Impact of the new feature
ReqMgr2MS (MSOutput)

Is your feature request related to a problem? Please describe.
This has been discussed with Todor and Sharad, and it looks like we could accommodate a few more actions in the MSOutput, such that these functionalities can be deprecated in Unified.

Describe the solution you'd like
In addition to the output data placement performed by MSOutput (for wfs in closed-out and announced), we should also mark all those datasets as VALID in DBS3.
The DBS client API is:

dbsapi = DbsApi(url=url3)
dbsapi.updateDatasetType(dataset=DATASET_NAME, dataset_access_type='VALID')

but if we decide to run requests concurrently, then we could use the equivalent REST API (ask Yuyi for details).

The best order of actions, in my opinion, would be:

mark output datasets as VALID
if previous step was successful, then perform the output data placement

NOTE: RelVal workflows are not announced in an standalone mode, but in batches. In order to ease this transition, we should only mark output datasets as VALID for standard workflows; Unified will keep taking care of RelVals for a little longer.

Describe alternatives you've considered
Keep workflow announcement in Unified

Additional context
Here we have a more complete description of the workflow announcement process:
#8921
but some of those have already been implemented, and others might be slightly modified.

The text was updated successfully, but these errors were encountered:

haozturk · 2021-01-19T13:45:04Z

Hi @amaltaro I was thinking about this issue. I did not fully get why you suggested to deal with RelVals specially. As far as I understand from Unified code, RelVals are created in batches, but they are announced in a standalone mode. The relevant Unified module sets output datasets to VALID regardless of its type: [1]
[1] https://github.com/CMSCompOps/WmAgentScripts/blob/master/Unified/closor.py#L494

amaltaro · 2021-01-19T13:54:10Z

RelVal announcement only happens in batch - when all workflows within the batch have completed - and the scenario I wanted to avoid was to have a batch getting rejected or something like that, while some of its workflows have their output datasets marked as VALID, while others are marked still as PRODUCTION.

This DBS dataset status change would likely happen when workflows are set to closed-out. If you consider this not to be a real issue, then we can just deal with them in a standard way.

haozturk · 2021-01-19T15:01:20Z

I understand your point. Let's try to think about cases where a problem can occur if we set RelVal outputs as VALID in a standalone mode in MSOutput:

Let's say, in a batch some RelVal outputs are completely produced and set as VALID while others are still in PRODUCTION. If the batch is rejected at this moment, then all the outputs (both VALID and PRODUCTION ones) are going to be invalidated, so no problem. Can you think of a problem here?

IMO, the only critical item here is that when Unified is going to announce the batch, all outputs should be VALID. (Otherwise, we tell people that you can use these datasets while they can't, right?). So, somehow, MSOutput should be quicker than Unified. However, this argument is the same for output data placement as well, i.e. Unified announces the workflows without making sure that the Rucio rules are created. Since we haven't seen any issue with output data placement, I believe we will not see an issue with setting the outputs as VALID, either.

Please let me know if I am missing something.

haozturk · 2021-01-29T10:34:11Z

@amaltaro @todor-ivanov Besides my statement in the previous comment, I have another point to discuss: Is there a specific reason for having the following order of actions?

Set the output datasets as VALID on DBS.
If previous step is successful, do the output data placement.

What if the 1st step is successful and 2nd one is not. Then we're saying the users that you can use this dataset while we don't put a protection for that data. The other way around makes more sense to me. What do you think?

amaltaro · 2021-01-29T12:48:50Z

Let's say, in a batch some RelVal outputs are completely produced and set as VALID while others are still in PRODUCTION. If the batch is rejected at this moment, then all the outputs (both VALID and PRODUCTION ones) are going to be invalidated, so no problem. Can you think of a problem here?

If all outputs are invalidated, then indeed it should not be a problem.

So, somehow, MSOutput should be quicker than Unified.

This is something we cannot guarantee! Those services are asynchronous. For instance, if Unified moves the last workflow from a batch to closed-out, then in the next minute it starts moving all the workflows in the batch to announced, it's very likely that some workflows will still not have their final output data placement made by MSOutput.

What if the 1st step is successful and 2nd one is not. Then we're saying the users that you can use this dataset while we don't put a protection for that data. The other way around makes more sense to me. What do you think?

This is a good question. The reason we should have DBS status change first is, nothing changes if we (try to) set a dataset that is already in VALID status to VALID. Different than a rucio rule, the Tape destination is evaluated within the polling cycle, so we would risk creating multiple output data placements if we order them as you suggested.
Well, unless we always persist the output of the rucio output data placement, even if other actions failed... if I'm not wrong, this is how the service is implemented at the moment.

In short, it could be that changing the order would have no negative effect :-D

haozturk · 2021-01-29T14:37:02Z

I am afraid I did not get your point. How could we create multiple output data placement rules if we use the order that I am suggesting?

amaltaro · 2021-01-29T15:04:25Z

Right. The order you suggest is:

Run the output data placement.
if successful, then set the output datasets as VALID on DBS.
correct?

In this order, there is more code to be executed until we can persist the mongodb document changes, which I believe to happen in this method:

WMCore/src/python/WMCore/MicroService/Unified/MSOutput.py

Line 815 in ea14941

def docUploader(self, msOutDoc, update=False, keys=None, stride=None):

While performing the final data placement as a final stage (the order I suggested) would be the step right before persisting the mongo changes.

However, from a quick look at the code, I believe the correct thing will be to carry those actions sequentially, regardless of the status of the previous step. This is how I understand the concept of pipeline processing which Todor inserted in MSOutput. Of course, we need to keep the final state of each step, such that in the end of the pipeline, we know what is the final outcome for a given workflow.

amaltaro · 2023-05-27T11:11:59Z

@haozturk Hi Hasan, given the lack of activity on this issue and the fact that we actually provide a mechanism at the workflow spec level to decide which status the DBS data should be injected with, see:
#11236

I wonder if we can now consider this issue and the initial developments no longer relevant? Please let us know in the coming week what your thoughts are.

amaltaro added New Feature Medium Priority ReqMgr2MS Unified Porting labels Jul 13, 2020

amaltaro added this to the August_2020 milestone Jul 13, 2020

amaltaro modified the milestones: August_2020, December_2020 Oct 8, 2020

amaltaro added MSOutput and removed ReqMgr2MS labels Dec 18, 2020

haozturk self-assigned this Jan 11, 2021

This was referenced Mar 31, 2021

Create parameters to configure dbs logic in MSOutput dmwm/deployment#1044

Open

Set output datasets to VALID in DBS3 before announcing a standard workflow #10394

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSOutput: set output datasets to VALID in DBS3 before announcing a standard workflow #9837

MSOutput: set output datasets to VALID in DBS3 before announcing a standard workflow #9837

amaltaro commented Jul 13, 2020

haozturk commented Jan 19, 2021

amaltaro commented Jan 19, 2021

haozturk commented Jan 19, 2021

haozturk commented Jan 29, 2021

amaltaro commented Jan 29, 2021

haozturk commented Jan 29, 2021

amaltaro commented Jan 29, 2021

amaltaro commented May 27, 2023

MSOutput: set output datasets to VALID in DBS3 before announcing a standard workflow #9837

MSOutput: set output datasets to VALID in DBS3 before announcing a standard workflow #9837

Comments

amaltaro commented Jul 13, 2020

haozturk commented Jan 19, 2021

amaltaro commented Jan 19, 2021

haozturk commented Jan 19, 2021

haozturk commented Jan 29, 2021

amaltaro commented Jan 29, 2021

haozturk commented Jan 29, 2021

amaltaro commented Jan 29, 2021

amaltaro commented May 27, 2023