Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added additional MC header information #921

Merged
merged 6 commits into from
Feb 1, 2019

Conversation

LukasNickel
Copy link
Member

This is basically a continuation of #895 and #655.

MCRunHeader information from the simtel files gets stored in the MCHeaderContainer.
The Container should now contain every information of the MCRunHeader stored in the simtel files.

Item description is mainly taken from the hessio documentation at:
https://www.mpi-hd.mpg.de/hfm/~bernlohr/sim_telarray/Documentation/hessio_refman.pdf (page 80-81)
Range arrays are stored as max and min values (e.g. shower azimuth range).

Unfortunately I do not really know how corsika works and the hessio documentation does not mention
anything about the meaning of the corsika_* parameters. Those Descriptions should probably be updated
with something more meaningful.

- added missing information to the MCHeaderContainer
- filled new entries in SimTelEventSource
- added some of the new entries to the unit test
- MCHeaderContainer should now contain every information provided in the MCRunHeader of the simtel files
@kosack
Copy link
Contributor

kosack commented Jan 10, 2019

One question I have here is whether or not we need a Container for this mc header info at all, or if we can just put it into the metadata of a container (e.g. event.mc.meta), which is a dictionary of headers. We could just copy the entire dict there in one line, and do no mapping of name to item. Certainly since it doesn't change per event, we will eventually move it outside of the event data structure anyhow.

However, i may be missing a use case.

The way the output files work when using a ctapipe.io.TableWriter is that anything in the some_container.meta dictionary automatically gets translated into header keywords (not a table row) in the output file associated with the Container's table. In the HDF5TableWriter, these are attributes of the dataset, and in a FITSTableWriter, it would be header cards in the HDU. If the data is in the Container proper, it is intended to be written as a row in a table. I'm happy either way, but perhaps it's simpler to just dump all headers, unless they are used in the code somewhere?

@codecov
Copy link

codecov bot commented Jan 10, 2019

Codecov Report

Merging #921 into master will decrease coverage by 3.4%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #921      +/-   ##
==========================================
- Coverage   81.83%   78.42%   -3.41%     
==========================================
  Files         184      191       +7     
  Lines       10526    10792     +266     
==========================================
- Hits         8614     8464     -150     
- Misses       1912     2328     +416
Impacted Files Coverage Δ
ctapipe/io/simteleventsource.py 100% <100%> (ø) ⬆️
ctapipe/io/tests/test_simteleventsource.py 100% <100%> (ø) ⬆️
ctapipe/io/containers.py 100% <100%> (ø) ⬆️
ctapipe/io/tests/test_targetio_event_source.py 6.5% <0%> (-93.5%) ⬇️
ctapipe/io/tests/test_lsteventsource.py 13.79% <0%> (-86.21%) ⬇️
ctapipe/io/tests/test_sst1meventsource.py 14.28% <0%> (-85.72%) ⬇️
ctapipe/io/tests/test_nectarcameventsource.py 16.66% <0%> (-83.34%) ⬇️
ctapipe/io/targetioeventsource.py 14.72% <0%> (-80.63%) ⬇️
ctapipe/io/nectarcameventsource.py 18.34% <0%> (-77.88%) ⬇️
ctapipe/io/lsteventsource.py 17.83% <0%> (-76.33%) ⬇️
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5993173...c57e7e8. Read the comment docs.

kosack
kosack previously approved these changes Jan 10, 2019
@kosack
Copy link
Contributor

kosack commented Jan 10, 2019

I approved these changes just so that we can perhaps discuss simplifying it as header info later (it doesn't hurt to keep the current implementation for now)

@maxnoe
Copy link
Member

maxnoe commented Jan 10, 2019

However, i may be missing a use case.

Yes, writing multiple input files to the same output file

@kosack
Copy link
Contributor

kosack commented Jan 10, 2019

see #922

@kosack
Copy link
Contributor

kosack commented Jan 10, 2019

Yes, writing multiple input files to the same output file

Yes that could be useful. Ok, let's keep it as a Container, and it will perhaps move elsewhere. That means that we should also not blindly use the same field names as simtelarray, they should somehow be "general" to any simulation code, so we can consider renaming any that are not clear, or expanding some abbreviations at the same time.

Copy link
Member

@maxnoe maxnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the run id (or obsid), however we want to call it, to this container.

corsika_low_E_model = Field(0.0, "Detector MC information")
corsika_high_E_model = Field(0.0, "Detector MC information")
corsika_bunchsize = Field(0.0, "Detector MC information")
corsika_wlen_min = Field(0.0, "Detector MC information")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimum wavelength of cherenkov light

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't a unit be there as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add the run id (or obsid), however we want to call it, to this container.

That can be in a separate container that is written at the same time as this one - otherwise we'd have to reproduce it for every container we want to have a run_id, etc in. However, having the corsika run id might be useful, since so far the "obs_id" is filled as the simtelarray run id, but we lose the original corsika id (which can be different).

corsika_iact_options = Field(0.0, "Detector MC information")
corsika_low_E_model = Field(0.0, "Detector MC information")
corsika_high_E_model = Field(0.0, "Detector MC information")
corsika_bunchsize = Field(0.0, "Detector MC information")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of photons per bunch (photons are not simulated individually but as bunches of n photons)

Copy link
Contributor

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is almost ready to merge, other than a small comment:

In some of the fields like num_showers, corsika_low_E_model, etc, the default value is a float (0.0), but those are integer quantities. Right now the way the output works is that it takes the type from the default value, or the currently filled value (which ever is present when the data are written), so it's best to use the correct type (e.g. 0 for integers).

Alternately, we could set the default for all of these fields in the container to np.NaN, which would ensure that if an EventSource doesn't fill in that info, it's clear that it is unknown. Otherwise the default will be used, which may not be correct (e.g. the low-energy model 0 is perhaps not what was used). In the future, it would of course be good to map some of these arcane integer numbers to strings so it's more clear, but that can happen later.

@LukasNickel
Copy link
Member Author

Agreed, using 0.0 everywhere is probably not what we want.
Personally I prefer the np.Nan approach. I guess its less prone to errors in later usage.

- switched to np.nan as default values for the MCHeaderContainer (instead of 0.0)
- (kept "None" strings)
@LukasNickel
Copy link
Member Author

Am i missing something or is the problem with the Travis Build not connected to this PR? I dont see how my changes would affect ctapipe/io/tests/test_hdf5.py

@dneise
Copy link
Member

dneise commented Jan 28, 2019

The master is currently broken. Not your fault at all. We are working on it.

@LukasNickel
Copy link
Member Author

The master is currently broken. Not your fault at all. We are working on it.

Ah, i thought that was fixed already. Thanks

@dneise
Copy link
Member

dneise commented Jan 28, 2019

We broke it again last friday

@LukasNickel
Copy link
Member Author

We broke it again last friday

Oh
Could we try to get this merged than? I heard there is some interest from the machine learning group in getting all the MCHeader related PR's merged.

@dneise
Copy link
Member

dneise commented Jan 28, 2019

Apart from the master, which is currently broken, so the tests fail. This PR needs to be reviewed and approved by two ctapipe developers. I think this has not been done yet?

@kosack
Copy link
Contributor

kosack commented Feb 1, 2019

I think this is ready right? @maxnoe do you still have a problem?

@maxnoe
Copy link
Member

maxnoe commented Feb 1, 2019

No

@kosack kosack merged commit 37f70ce into cta-observatory:master Feb 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants