Added additional MC header information #921

LukasNickel · 2019-01-10T12:20:44Z

This is basically a continuation of #895 and #655.

MCRunHeader information from the simtel files gets stored in the MCHeaderContainer.
The Container should now contain every information of the MCRunHeader stored in the simtel files.

Item description is mainly taken from the hessio documentation at:
https://www.mpi-hd.mpg.de/hfm/~bernlohr/sim_telarray/Documentation/hessio_refman.pdf (page 80-81)
Range arrays are stored as max and min values (e.g. shower azimuth range).

Unfortunately I do not really know how corsika works and the hessio documentation does not mention
anything about the meaning of the corsika_* parameters. Those Descriptions should probably be updated
with something more meaningful.

- added missing information to the MCHeaderContainer - filled new entries in SimTelEventSource - added some of the new entries to the unit test - MCHeaderContainer should now contain every information provided in the MCRunHeader of the simtel files

kosack · 2019-01-10T12:47:57Z

One question I have here is whether or not we need a Container for this mc header info at all, or if we can just put it into the metadata of a container (e.g. event.mc.meta), which is a dictionary of headers. We could just copy the entire dict there in one line, and do no mapping of name to item. Certainly since it doesn't change per event, we will eventually move it outside of the event data structure anyhow.

However, i may be missing a use case.

The way the output files work when using a ctapipe.io.TableWriter is that anything in the some_container.meta dictionary automatically gets translated into header keywords (not a table row) in the output file associated with the Container's table. In the HDF5TableWriter, these are attributes of the dataset, and in a FITSTableWriter, it would be header cards in the HDU. If the data is in the Container proper, it is intended to be written as a row in a table. I'm happy either way, but perhaps it's simpler to just dump all headers, unless they are used in the code somewhere?

codecov · 2019-01-10T12:48:03Z

Codecov Report

Merging #921 into master will decrease coverage by 3.4%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #921      +/-   ##
==========================================
- Coverage   81.83%   78.42%   -3.41%     
==========================================
  Files         184      191       +7     
  Lines       10526    10792     +266     
==========================================
- Hits         8614     8464     -150     
- Misses       1912     2328     +416

Impacted Files	Coverage Δ
ctapipe/io/simteleventsource.py	`100% <100%> (ø)`	⬆️
ctapipe/io/tests/test_simteleventsource.py	`100% <100%> (ø)`	⬆️
ctapipe/io/containers.py	`100% <100%> (ø)`	⬆️
ctapipe/io/tests/test_targetio_event_source.py	`6.5% <0%> (-93.5%)`	⬇️
ctapipe/io/tests/test_lsteventsource.py	`13.79% <0%> (-86.21%)`	⬇️
ctapipe/io/tests/test_sst1meventsource.py	`14.28% <0%> (-85.72%)`	⬇️
ctapipe/io/tests/test_nectarcameventsource.py	`16.66% <0%> (-83.34%)`	⬇️
ctapipe/io/targetioeventsource.py	`14.72% <0%> (-80.63%)`	⬇️
ctapipe/io/nectarcameventsource.py	`18.34% <0%> (-77.88%)`	⬇️
ctapipe/io/lsteventsource.py	`17.83% <0%> (-76.33%)`	⬇️
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5993173...c57e7e8. Read the comment docs.

kosack · 2019-01-10T12:49:22Z

I approved these changes just so that we can perhaps discuss simplifying it as header info later (it doesn't hurt to keep the current implementation for now)

maxnoe · 2019-01-10T12:49:32Z

However, i may be missing a use case.

Yes, writing multiple input files to the same output file

kosack · 2019-01-10T12:52:49Z

see #922

kosack · 2019-01-10T12:54:08Z

Yes, writing multiple input files to the same output file

Yes that could be useful. Ok, let's keep it as a Container, and it will perhaps move elsewhere. That means that we should also not blindly use the same field names as simtelarray, they should somehow be "general" to any simulation code, so we can consider renaming any that are not clear, or expanding some abbreviations at the same time.

maxnoe

I think we should add the run id (or obsid), however we want to call it, to this container.

maxnoe · 2019-01-10T12:56:15Z

ctapipe/io/containers.py

+    corsika_low_E_model = Field(0.0, "Detector MC information")
+    corsika_high_E_model = Field(0.0, "Detector MC information")
+    corsika_bunchsize = Field(0.0, "Detector MC information")
+    corsika_wlen_min = Field(0.0, "Detector MC information")


minimum wavelength of cherenkov light

shouldn't a unit be there as well?

I think we should add the run id (or obsid), however we want to call it, to this container.

That can be in a separate container that is written at the same time as this one - otherwise we'd have to reproduce it for every container we want to have a run_id, etc in. However, having the corsika run id might be useful, since so far the "obs_id" is filled as the simtelarray run id, but we lose the original corsika id (which can be different).

maxnoe · 2019-01-10T13:02:42Z

ctapipe/io/containers.py

+    corsika_iact_options = Field(0.0, "Detector MC information")
+    corsika_low_E_model = Field(0.0, "Detector MC information")
+    corsika_high_E_model = Field(0.0, "Detector MC information")
+    corsika_bunchsize = Field(0.0, "Detector MC information")


number of photons per bunch (photons are not simulated individually but as bunches of n photons)

kosack

I think this is almost ready to merge, other than a small comment:

In some of the fields like num_showers, corsika_low_E_model, etc, the default value is a float (0.0), but those are integer quantities. Right now the way the output works is that it takes the type from the default value, or the currently filled value (which ever is present when the data are written), so it's best to use the correct type (e.g. 0 for integers).

Alternately, we could set the default for all of these fields in the container to np.NaN, which would ensure that if an EventSource doesn't fill in that info, it's clear that it is unknown. Otherwise the default will be used, which may not be correct (e.g. the low-energy model 0 is perhaps not what was used). In the future, it would of course be good to map some of these arcane integer numbers to strings so it's more clear, but that can happen later.

LukasNickel · 2019-01-26T12:35:47Z

Agreed, using 0.0 everywhere is probably not what we want.
Personally I prefer the np.Nan approach. I guess its less prone to errors in later usage.

- switched to np.nan as default values for the MCHeaderContainer (instead of 0.0) - (kept "None" strings)

LukasNickel · 2019-01-28T09:47:18Z

Am i missing something or is the problem with the Travis Build not connected to this PR? I dont see how my changes would affect ctapipe/io/tests/test_hdf5.py

dneise · 2019-01-28T09:52:23Z

The master is currently broken. Not your fault at all. We are working on it.

LukasNickel · 2019-01-28T09:54:48Z

The master is currently broken. Not your fault at all. We are working on it.

Ah, i thought that was fixed already. Thanks

dneise · 2019-01-28T09:55:24Z

We broke it again last friday

LukasNickel · 2019-01-28T10:00:15Z

We broke it again last friday

Oh
Could we try to get this merged than? I heard there is some interest from the machine learning group in getting all the MCHeader related PR's merged.

dneise · 2019-01-28T10:28:44Z

Apart from the master, which is currently broken, so the tests fail. This PR needs to be reviewed and approved by two ctapipe developers. I think this has not been done yet?

kosack · 2019-02-01T12:42:43Z

I think this is ready right? @maxnoe do you still have a problem?

maxnoe · 2019-02-01T14:48:09Z

No

kosack previously approved these changes Jan 10, 2019

View reviewed changes

kosack mentioned this pull request Jan 10, 2019

move event.mcheader and/or replace it with a metadata dict #922

Closed

maxnoe requested changes Jan 10, 2019

View reviewed changes

Better Description for corsika_bunchsize and corsika_wlen

2b9b101

LukasNickel dismissed kosack’s stale review via 2b9b101 January 10, 2019 13:12

kosack requested changes Jan 21, 2019

View reviewed changes

LukasNickel added 4 commits January 26, 2019 13:45

- added unit to corsika_wlen_min/max entries

c8a3243

- switched to np.nan as default values for the MCHeaderContainer (instead of 0.0) - (kept "None" strings)

fixed typo

bcfeace

fixed unit

89feba9

fix unit test

c57e7e8

LukasNickel requested review from maxnoe and kosack January 28, 2019 10:00

thomasgas approved these changes Jan 28, 2019

View reviewed changes

kosack approved these changes Jan 28, 2019

View reviewed changes

maxnoe approved these changes Feb 1, 2019

View reviewed changes

kosack merged commit 37f70ce into cta-observatory:master Feb 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added additional MC header information #921

Added additional MC header information #921

LukasNickel commented Jan 10, 2019

kosack commented Jan 10, 2019

codecov bot commented Jan 10, 2019 •

edited

Loading

kosack commented Jan 10, 2019

maxnoe commented Jan 10, 2019

kosack commented Jan 10, 2019

kosack commented Jan 10, 2019

maxnoe left a comment

maxnoe Jan 10, 2019

kosack Jan 11, 2019

kosack Jan 11, 2019

maxnoe Jan 10, 2019

kosack left a comment

LukasNickel commented Jan 26, 2019

LukasNickel commented Jan 28, 2019

dneise commented Jan 28, 2019

LukasNickel commented Jan 28, 2019

dneise commented Jan 28, 2019

LukasNickel commented Jan 28, 2019

dneise commented Jan 28, 2019

kosack commented Feb 1, 2019

maxnoe commented Feb 1, 2019

Added additional MC header information #921

Added additional MC header information #921

Conversation

LukasNickel commented Jan 10, 2019

kosack commented Jan 10, 2019

codecov bot commented Jan 10, 2019 • edited Loading

Codecov Report

kosack commented Jan 10, 2019

maxnoe commented Jan 10, 2019

kosack commented Jan 10, 2019

kosack commented Jan 10, 2019

maxnoe left a comment

Choose a reason for hiding this comment

maxnoe Jan 10, 2019

Choose a reason for hiding this comment

kosack Jan 11, 2019

Choose a reason for hiding this comment

kosack Jan 11, 2019

Choose a reason for hiding this comment

maxnoe Jan 10, 2019

Choose a reason for hiding this comment

kosack left a comment

Choose a reason for hiding this comment

LukasNickel commented Jan 26, 2019

LukasNickel commented Jan 28, 2019

dneise commented Jan 28, 2019

LukasNickel commented Jan 28, 2019

dneise commented Jan 28, 2019

LukasNickel commented Jan 28, 2019

dneise commented Jan 28, 2019

kosack commented Feb 1, 2019

maxnoe commented Feb 1, 2019

codecov bot commented Jan 10, 2019 •

edited

Loading