Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No info in DL2 files on RFs used? #1340

Open
moralejo opened this issue Jan 23, 2025 · 5 comments
Open

No info in DL2 files on RFs used? #1340

moralejo opened this issue Jan 23, 2025 · 5 comments

Comments

@moralejo
Copy link
Collaborator

@vuillaut @rlopezcoto
There seems to be no record inside DL2 files of what RandomForest were used to obtain the reconstructed quantities.

For the automation of the DL2 and DL3 production (with 9x10 combinations of declination line and NSB level, each with its own set of RFs) it would be convenient to keep at least the path to the used RF models (the path contains both the NSB level and the declination). In this way we can later easily find which IRFs to use for creating the DL3.

Any suggestion on the best way to do this?

@vuillaut
Copy link
Member

Hi
That should be stored as metadata of the files, along with other metadata.
ctapipe provenance and tools should allow to implement this quite easily... looking into it 👀

@vuillaut
Copy link
Member

After playing a bit with ctapipe Provenance and exchanging with @maxnoe , here is a summary and some thoughts:

  1. ctapipe Tool automatically tracks the provenance and write it in a ToolName.log external file
  2. there is an on-going discussion in ctapipe to write this provenance directly in the output file (e.g. HDF5 file)
  3. still I think we can implement something for lstchain. I believe we want something quite formatted and machine-readable to automate the DL3 creation.

Requirements:

  • provenance information in the HDF5 files
  • machine-readable provenance information (e.g. in JSON)
  • unambiguous location to store provenance for each stage

Optional:

  • keep provenance information from input file (e.g. when creating a DL2 file from a DL1 file, the DL1 provenance is copied the same way the other dataset are copied)

Solution (proposal):

  • create a new group provenance at the root of the file and have each tool write a new dataset in this group. e.g. dl2.root.provenance['dl1_to_dl2']

Note: writing the provenance in the attrs of a dataset (e.g. dl2.root.dl2.event.telescope.parameters.LST_LSTCam.attrs) is not ok as several tools can write into the same dataset.

The provenance would look like that:

In: prov.keys()
Out: dict_keys(['activity_name', 'activity_uuid', 'start', 'stop', 'system', 'input', 'output', 'config'])

In: prov['config']['DL1ToDL2Tool']
Out: 
{'config_file': '/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/lstchain/data/lstchain_standard_config.json',
 'config_files': [],
 'input_files': ['/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/test_data/mc/dl1_simtel_theta_20_az_180_gdiffuse_10evts.h5'],
 'log_config': {},
 'log_datefmt': '%Y-%m-%d %H:%M:%S',
 'log_file': None,
 'log_file_level': 'INFO',
 'log_format': '[%(name)s]%(highlevel)s %(message)s',
 'log_level': 30,
 'logging_config': {},
 'output_dir': '/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/test_data/mc',
 'overwrite': False,
 'path_models': '/private/var/folders/cx/42c0v6k52kl38vcm9f5w_6980000gp/T/pytest-of-thomas/pytest-0/test_lstchain_mc_rfperformance0',
 'provenance_log': '/Users/thomas/Work/Projets/CTA/dev/cta-observatory/cta-lstchain/DL1 to DL2 Tool.provenance.log',
 'quiet': False,
 'show_config': False,
 'show_config_json': False}

Where one can retrieve prov['config']['DL1ToDL2Tool']['path_models']

Thoughts / comments ?

@moralejo
Copy link
Collaborator Author

Looks good, but about the "config": would it be just the path to the config file that is stores, or the full config (as a "stringified dictionary") like in the attrs of the events table in dl1 files?

@vuillaut
Copy link
Member

The config here is exactly what I shared, it's provided directly by the Provenance tool, logging the arguments of the tool.
The lstchain config would still be saved in the parameters tables (dl1 and dl2) attributes.

@maxnoe
Copy link
Member

maxnoe commented Jan 24, 2025

and write it in a ToolName.log external file

That's actually configurable (--provenance-log) and for e.g. the grid-processing we use the same name as the output file with .provlog appended and put it next to the output file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants