-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cat-B calibration to OSA #292
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, I left a few first set of comments. Thanks also for adapting the provenance tracking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Some final comments inline
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. In the current implementation, you have to make sure to change the DL1b subdirectory prod id in another cfg (e.g. CatB_tailcut84
) to differentiate it from the original DL1b files (produced only assuming Cat A calibration).
Alternatively, you could implement it in a way that if you activate the application of Cat B, the DL1b subdirectory is automatically changed to include the CatB prefix.
Nonetheless, from my side, this is good to go already.
The failing test is not related, it is just due to the datetime reported in the sequencer table title.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have one concern about adding the creation of CatB and application to DL1a. See comment inline
if run_str[-4:] != "0000": | ||
log.debug(f"{run_str} is not the first subrun of the run, so the script " | ||
"onsite_create_cat_B_calibration_file will not be launched for this subrun.") | ||
|
||
catB_calibration_file = get_catB_calibration_filename(int(run_str[:5])) | ||
n = 0 | ||
n_max = 10 | ||
while not catB_calibration_file.exists() and n<=n_max: | ||
time.sleep(120) | ||
n += 1 | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, if the subrun is not the file 0000 the processing will be kept sleeping until the file is available, right? It is not very convenient, but I see it as the most straightforward way in the current OSA scheme in which all analysis steps run within a single batch job for a given subrun.
Setting slurm dependencies would require changing the way OSA submits jobs to something more modular: one job per subrun per analysis step, hence dependencies can be set. In this way, you could also decouple the different time/memory requirements of each step. And maybe everything would be more efficient
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so you suggest to change OSA to submit one job per subrun?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way you have implemented it, it can probably work, but I think is prone to problems during processing.
The current scheme has the drawback of not being flexible for mixing processing at subrun and run levels. Normal processing goes subrun-wise, then cat B coefficients calculation is done on a run-wise basis. The same applies to e.g. DL3 production, once you have the DL2 merged run-wise, you would like to run it that way. That's why datasequence
has this limitation
r0_dl1 (subrun-wise) -> calculate_catB (run-wise) -> dl1ab (subrun-wise) -> merge dl1 (run-wise) -> dl1_dl2 (run-wise) -> dl2_dl3 (run-wise)
Each step would be a job with dependency on the previous stage.
However, this would be a major change in OSA I think. I do not know where to go from this point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably keep the current implementation and test it for some time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed it a week ago,, I had the same objections you do have. I suggested implementing dependencies in the individual slurm jobs of an array job, but as Maria pointed out, this means waiting for the full completion of the 0000 subrun (DL1a, DL1ab). It is clearly a bottle neck. Is catB calculated only with a subrun or does it use the information all subruns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the bottleneck is there anyway with both schemes, you'd have to wait.
Cat B calibration coefficient is calculated runwise, I'd say using the information from all interleaved events across the entire run. If these coefficients are to be applied I see no other way than waiting until the calibration B file is produced. The thing with dependencies is that they automatically link steps, without having to do the sleep until it finishes. Maybe it can be decoupled into two things: first the production of CatB calib files and second the possible application to the data (this is to be checked with Crab data).
I think it is fine as it is. Give it a try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the moment, we can work this way, but later on, I also prefer your proposal. Consider this just another step in the sequencer, not in the data sequence, another calibration step. Build a script that just produces the cat_B calibration for a run. It will depend on the ped_cal calibration, and the data sequence jobs of a given run will depend on the cta_b jobs for this run.
No description provided.