No info in DL2 files on RFs used? #1340

moralejo · 2025-01-23T17:50:11Z

@vuillaut @rlopezcoto
There seems to be no record inside DL2 files of what RandomForest were used to obtain the reconstructed quantities.

For the automation of the DL2 and DL3 production (with 9x10 combinations of declination line and NSB level, each with its own set of RFs) it would be convenient to keep at least the path to the used RF models (the path contains both the NSB level and the declination). In this way we can later easily find which IRFs to use for creating the DL3.

Any suggestion on the best way to do this?

vuillaut · 2025-01-23T20:02:55Z

Hi
That should be stored as metadata of the files, along with other metadata.
ctapipe provenance and tools should allow to implement this quite easily... looking into it 👀

vuillaut · 2025-01-24T11:42:12Z

After playing a bit with ctapipe Provenance and exchanging with @maxnoe , here is a summary and some thoughts:

ctapipe Tool automatically tracks the provenance and write it in a ToolName.log external file
there is an on-going discussion in ctapipe to write this provenance directly in the output file (e.g. HDF5 file)
still I think we can implement something for lstchain. I believe we want something quite formatted and machine-readable to automate the DL3 creation.

Requirements:

provenance information in the HDF5 files
machine-readable provenance information (e.g. in JSON)
unambiguous location to store provenance for each stage

Optional:

keep provenance information from input file (e.g. when creating a DL2 file from a DL1 file, the DL1 provenance is copied the same way the other dataset are copied)

Solution (proposal):

create a new group provenance at the root of the file and have each tool write a new dataset in this group. e.g. dl2.root.provenance['dl1_to_dl2']

Note: writing the provenance in the attrs of a dataset (e.g. dl2.root.dl2.event.telescope.parameters.LST_LSTCam.attrs) is not ok as several tools can write into the same dataset.

The provenance would look like that:

In: prov.keys()
Out: dict_keys(['activity_name', 'activity_uuid', 'start', 'stop', 'system', 'input', 'output', 'config'])

In: prov['config']['DL1ToDL2Tool']
Out: 
{'config_file': '/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/lstchain/data/lstchain_standard_config.json',
 'config_files': [],
 'input_files': ['/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/test_data/mc/dl1_simtel_theta_20_az_180_gdiffuse_10evts.h5'],
 'log_config': {},
 'log_datefmt': '%Y-%m-%d %H:%M:%S',
 'log_file': None,
 'log_file_level': 'INFO',
 'log_format': '[%(name)s]%(highlevel)s %(message)s',
 'log_level': 30,
 'logging_config': {},
 'output_dir': '/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/test_data/mc',
 'overwrite': False,
 'path_models': '/private/var/folders/cx/42c0v6k52kl38vcm9f5w_6980000gp/T/pytest-of-thomas/pytest-0/test_lstchain_mc_rfperformance0',
 'provenance_log': '/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/DL1 to DL2 Tool.provenance.log',
 'quiet': False,
 'show_config': False,
 'show_config_json': False}

Where one can retrieve prov['config']['DL1ToDL2Tool']['path_models']

Thoughts / comments ?

moralejo · 2025-01-24T13:41:52Z

Looks good, but about the "config": would it be just the path to the config file that is stores, or the full config (as a "stringified dictionary") like in the attrs of the events table in dl1 files?

vuillaut · 2025-01-24T16:39:15Z

The config here is exactly what I shared, it's provided directly by the Provenance tool, logging the arguments of the tool.
The lstchain config would still be saved in the parameters tables (dl1 and dl2) attributes.

maxnoe · 2025-01-24T17:21:23Z

and write it in a ToolName.log external file

That's actually configurable (--provenance-log) and for e.g. the grid-processing we use the same name as the output file with .provlog appended and put it next to the output file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No info in DL2 files on RFs used? #1340

No info in DL2 files on RFs used? #1340

moralejo commented Jan 23, 2025

vuillaut commented Jan 23, 2025

vuillaut commented Jan 24, 2025

moralejo commented Jan 24, 2025

vuillaut commented Jan 24, 2025

maxnoe commented Jan 24, 2025

No info in DL2 files on RFs used? #1340

No info in DL2 files on RFs used? #1340

Comments

moralejo commented Jan 23, 2025

vuillaut commented Jan 23, 2025

vuillaut commented Jan 24, 2025

moralejo commented Jan 24, 2025

vuillaut commented Jan 24, 2025

maxnoe commented Jan 24, 2025