added the IO functions removed from PR #141 #150

jsitarek · 2021-04-26T14:58:33Z

Since #141 is already merged, as announced there I moved into this separate PR the IO functions.
the summary of the discussion with @maxnoe from the previous PR was that Max was not in favor of having functions that read arrays of IRFs, while I think this is very useful because it is not trivial as some transformations have to be done (separate lo/high bins vs a single list of bins, change of sequence of axes) and they are common in all the IRFs, so if not implemented this way, would result in quite some code repetition. There are some two generic functions in this PR: read_irf_grid and read_fits_bins_lo_hi that read in the actual IRFs (whatever type, including the transformation), and bins values (including their consistency), and two wrapper functions for Aeff and migration matrix.

Please let me know of any modifications of those functions that you would like

pyirf/io/gadf.py

maxnoe · 2021-04-27T09:41:21Z

pyirf/io/gadf.py

+
+    for this_file in files:
+        # [0] because there the IRFs are written as a single row of the table
+        irfs_all.append(QTable.read(this_file, hdu=extname)[field_name][0])


The transpose is missing here, this is why you have to do non-trivial transposes in edisp.

This goes back to the point that I really don't understand why we need this read_irf_grid function. Simple functions that read a table once and return the arrays. And stacking can then just be edisp = np.stack(edisps)

Also, keeping the table around is good, since it contains the metadata. So this function is just not as general as you might think it is.

I switched to np.stack.
Please note @maxnoe that the function is also making the required transposition that is there for all the IRFs and extracting the 0-th row - all the operations to convert between the FITS format and the one used inside pyirf. And it is working fine on either individual files or lists of files so I think it is very useful and it makes the custom functions that read individual IRFs simpler.

I agree that the table includes additional important information via metadata, but I think it would be more useful to extract this metadata and return such concrete information. Nevertheless, first we would need to agree (in #126 ) what information should be put there, with what names, etc.

@jsitarek

Nevertheless, first we would need to agree (in #126 ) what information should be put there, with what names, etc.

Actually no. Since at no point pyirf will use this metadata, users of pyirf can fill and read any metadata they like and use it for whatever purpose.

That's also the problem with this read_irf_grid function. You would have to open all the files again to also read the metadata.

So, please, write functions that read each IRF type, validating it's the correct type by looking at HDUCLASX header keywords and returning also the metadata.

This functions can then easily be called for multiple files and the results stacked / compared.

I think this has much better advantages than the little code repetition you avoid by using this read_irf_grid function.

@jsitarek

Nevertheless, first we would need to agree (in #126 ) what information should be put there, with what names, etc.

Actually no. Since at no point pyirf will use this metadata, users of pyirf can fill and read any metadata they like and use it for whatever purpose.

well, I would not be so sure about this. If we really have standardized metadata we can use them inside pyirf to do e.g. standarized interpolation over those values.

That's also the problem with this read_irf_grid function. You would have to open all the files again to also read the metadata.

as I explained above the metadata processing can be also added to this function. Please note that they should be likely the same for all the types of IRFs (no matter if you want to read Aeff or energy migration you still want to know at which zenith it was produced) so it takes perfect sense to read them in standarized way

So, please, write functions that read each IRF type, validating it's the correct type by looking at HDUCLASX header keywords and returning also the metadata.

I just implemented checking of all the HDUCLASX headers.
Again because of using one universal function it was pretty easy, especially that many of those fields are the same for different IRFs.

This functions can then easily be called for multiple files and the results stacked / compared.

I think this has much better advantages than the little code repetition you avoid by using this read_irf_grid function.

With every thing that we discuss in this thread the amount of code repetition in the approach that you suggested would grow larger and larger. Already now we have 3 issues that are repeated for all IRFs:

swapping of axis sequence

checks of HDUCLASX headers

combining IRFs produced for different parameters

and we have two IRFs already implemented, and a few more that can be added easily. So in the end I think we are avoiding a lot of code repetition

modified the tests on reading classes that they do not use actual data

codecov · 2021-04-27T16:18:25Z

Codecov Report

Merging #150 (d34f960) into master (0e7fcd2) will increase coverage by 0.68%.
The diff coverage is 97.76%.

@@            Coverage Diff             @@
##           master     #150      +/-   ##
==========================================
+ Coverage   89.04%   89.72%   +0.68%     
==========================================
  Files          42       42              
  Lines        1579     1713     +134     
==========================================
+ Hits         1406     1537     +131     
- Misses        173      176       +3

Impacted Files	Coverage Δ
pyirf/io/gadf.py	`98.14% <95.89%> (-1.86%)`	⬇️
pyirf/io/tests/test_gadf.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e7fcd2...d34f960. Read the comment docs.

pyirf/io/gadf.py

…will return also a list with one IRF entry

jsitarek · 2021-06-21T08:15:58Z

Hi @maxnoe

Coming back to this old story of the I/O classes. I implemented your comments back in May. We have a difference of opinion about the code repetition/generic methods, please check my latest replies from May, and let's see if we can converge and merge the I/O classes.

maxnoe · 2021-06-21T08:43:19Z

@jsitarek I think we should revisit this in a more global refactoring, now that gammapy was chosen as science tool for cta.

I think it makes sense to use the gammapy data structures inside pyirf, that will make many things a lot easier.

As for the IO parts, LST already is using gammapy, so you should also just use gammapy to load the irfs.

jsitarek · 2021-06-21T09:07:20Z

Hi @maxnoe, ok, so what is the proposal? You want to close the IO PR? or leave it open, but without updates for the time being?
After the refactoring merging would be messy, and if we agree on this that pyirf should have some IO classes inside I think it would be good to have those inside during refactoring, because there are all the tests of individual writing functions already implemented here, and hence when you refactor you would get immediately consistency tests done.

As we were discussing some months ago, all those IRF developements were urgent in LST, hence the IO was implemented separatelly, but if the IO functions are inside pyirf, lstchain would definitely profit from those

added the IO functions removed from PR #141

852ce6f

jsitarek added the input/output Format and file extensions of the input/output data. label Apr 26, 2021

jsitarek requested review from HealthyPear and maxnoe as code owners April 26, 2021 14:58

maxnoe reviewed Apr 26, 2021

View reviewed changes

pyirf/io/gadf.py Outdated Show resolved Hide resolved

maxnoe reviewed Apr 27, 2021

View reviewed changes

jsitarek added 2 commits April 27, 2021 13:57

Merge remote-tracking branch 'upstream/master' into read_irfs

1595ddf

changed the way how the axes are swapped in the IRF reading functions.

bbd3db5

modified the tests on reading classes that they do not use actual data

jsitarek added 2 commits April 27, 2021 18:23

solving codacy compaints

5a24a73

changed the way how stacking of IRFs is done

4bd5b80

maxnoe reviewed May 5, 2021

View reviewed changes

pyirf/io/gadf.py Outdated Show resolved Hide resolved

jsitarek added 3 commits May 6, 2021 12:46

if the IRF reading functions are executed on a list of one file they …

854995a

…will return also a list with one IRF entry

added checks on HDUCLASX headers when reading IRFs

4c47194

added missing dots in comments for which codacy complained

d34f960

chaimain mentioned this pull request May 7, 2021

Update DL3 Tool to to use pyirf IRF interpolation functions cta-observatory/cta-lstchain#711

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added the IO functions removed from PR #141 #150

added the IO functions removed from PR #141 #150

jsitarek commented Apr 26, 2021

maxnoe Apr 27, 2021

maxnoe Apr 27, 2021

jsitarek Apr 27, 2021

maxnoe Apr 28, 2021 •

edited

Loading

jsitarek May 6, 2021

codecov bot commented Apr 27, 2021 •

edited

Loading

jsitarek commented Jun 21, 2021

maxnoe commented Jun 21, 2021

jsitarek commented Jun 21, 2021

added the IO functions removed from PR #141 #150

Are you sure you want to change the base?

added the IO functions removed from PR #141 #150

Conversation

jsitarek commented Apr 26, 2021

maxnoe Apr 27, 2021

Choose a reason for hiding this comment

maxnoe Apr 27, 2021

Choose a reason for hiding this comment

jsitarek Apr 27, 2021

Choose a reason for hiding this comment

maxnoe Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

jsitarek May 6, 2021

Choose a reason for hiding this comment

codecov bot commented Apr 27, 2021 • edited Loading

Codecov Report

jsitarek commented Jun 21, 2021

maxnoe commented Jun 21, 2021

jsitarek commented Jun 21, 2021

maxnoe Apr 28, 2021 •

edited

Loading

codecov bot commented Apr 27, 2021 •

edited

Loading