Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added the IO functions removed from PR #141 #150

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

jsitarek
Copy link
Contributor

Since #141 is already merged, as announced there I moved into this separate PR the IO functions.
the summary of the discussion with @maxnoe from the previous PR was that Max was not in favor of having functions that read arrays of IRFs, while I think this is very useful because it is not trivial as some transformations have to be done (separate lo/high bins vs a single list of bins, change of sequence of axes) and they are common in all the IRFs, so if not implemented this way, would result in quite some code repetition. There are some two generic functions in this PR: read_irf_grid and read_fits_bins_lo_hi that read in the actual IRFs (whatever type, including the transformation), and bins values (including their consistency), and two wrapper functions for Aeff and migration matrix.

Please let me know of any modifications of those functions that you would like

@jsitarek jsitarek added the input/output Format and file extensions of the input/output data. label Apr 26, 2021
pyirf/io/gadf.py Outdated Show resolved Hide resolved
pyirf/io/gadf.py Outdated

for this_file in files:
# [0] because there the IRFs are written as a single row of the table
irfs_all.append(QTable.read(this_file, hdu=extname)[field_name][0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The transpose is missing here, this is why you have to do non-trivial transposes in edisp.

This goes back to the point that I really don't understand why we need this read_irf_grid function. Simple functions that read a table once and return the arrays. And stacking can then just be edisp = np.stack(edisps)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, keeping the table around is good, since it contains the metadata. So this function is just not as general as you might think it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched to np.stack.
Please note @maxnoe that the function is also making the required transposition that is there for all the IRFs and extracting the 0-th row - all the operations to convert between the FITS format and the one used inside pyirf. And it is working fine on either individual files or lists of files so I think it is very useful and it makes the custom functions that read individual IRFs simpler.

I agree that the table includes additional important information via metadata, but I think it would be more useful to extract this metadata and return such concrete information. Nevertheless, first we would need to agree (in #126 ) what information should be put there, with what names, etc.

Copy link
Member

@maxnoe maxnoe Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsitarek

Nevertheless, first we would need to agree (in #126 ) what information should be put there, with what names, etc.

Actually no. Since at no point pyirf will use this metadata, users of pyirf can fill and read any metadata they like and use it for whatever purpose.

That's also the problem with this read_irf_grid function. You would have to open all the files again to also read the metadata.

So, please, write functions that read each IRF type, validating it's the correct type by looking at HDUCLASX header keywords and returning also the metadata.

This functions can then easily be called for multiple files and the results stacked / compared.

I think this has much better advantages than the little code repetition you avoid by using this read_irf_grid function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsitarek

Nevertheless, first we would need to agree (in #126 ) what information should be put there, with what names, etc.

Actually no. Since at no point pyirf will use this metadata, users of pyirf can fill and read any metadata they like and use it for whatever purpose.

well, I would not be so sure about this. If we really have standardized metadata we can use them inside pyirf to do e.g. standarized interpolation over those values.

That's also the problem with this read_irf_grid function. You would have to open all the files again to also read the metadata.

as I explained above the metadata processing can be also added to this function. Please note that they should be likely the same for all the types of IRFs (no matter if you want to read Aeff or energy migration you still want to know at which zenith it was produced) so it takes perfect sense to read them in standarized way

So, please, write functions that read each IRF type, validating it's the correct type by looking at HDUCLASX header keywords and returning also the metadata.

I just implemented checking of all the HDUCLASX headers.
Again because of using one universal function it was pretty easy, especially that many of those fields are the same for different IRFs.

This functions can then easily be called for multiple files and the results stacked / compared.

I think this has much better advantages than the little code repetition you avoid by using this read_irf_grid function.

With every thing that we discuss in this thread the amount of code repetition in the approach that you suggested would grow larger and larger. Already now we have 3 issues that are repeated for all IRFs:

  • swapping of axis sequence
  • checks of HDUCLASX headers
  • combining IRFs produced for different parameters

and we have two IRFs already implemented, and a few more that can be added easily. So in the end I think we are avoiding a lot of code repetition

@codecov
Copy link

codecov bot commented Apr 27, 2021

Codecov Report

Merging #150 (d34f960) into master (0e7fcd2) will increase coverage by 0.68%.
The diff coverage is 97.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #150      +/-   ##
==========================================
+ Coverage   89.04%   89.72%   +0.68%     
==========================================
  Files          42       42              
  Lines        1579     1713     +134     
==========================================
+ Hits         1406     1537     +131     
- Misses        173      176       +3     
Impacted Files Coverage Δ
pyirf/io/gadf.py 98.14% <95.89%> (-1.86%) ⬇️
pyirf/io/tests/test_gadf.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e7fcd2...d34f960. Read the comment docs.

pyirf/io/gadf.py Outdated Show resolved Hide resolved
@jsitarek
Copy link
Contributor Author

Hi @maxnoe

Coming back to this old story of the I/O classes. I implemented your comments back in May. We have a difference of opinion about the code repetition/generic methods, please check my latest replies from May, and let's see if we can converge and merge the I/O classes.

@maxnoe
Copy link
Member

maxnoe commented Jun 21, 2021

@jsitarek I think we should revisit this in a more global refactoring, now that gammapy was chosen as science tool for cta.

I think it makes sense to use the gammapy data structures inside pyirf, that will make many things a lot easier.

As for the IO parts, LST already is using gammapy, so you should also just use gammapy to load the irfs.

@jsitarek
Copy link
Contributor Author

Hi @maxnoe, ok, so what is the proposal? You want to close the IO PR? or leave it open, but without updates for the time being?
After the refactoring merging would be messy, and if we agree on this that pyirf should have some IO classes inside I think it would be good to have those inside during refactoring, because there are all the tests of individual writing functions already implemented here, and hence when you refactor you would get immediately consistency tests done.

As we were discussing some months ago, all those IRF developements were urgent in LST, hence the IO was implemented separatelly, but if the IO functions are inside pyirf, lstchain would definitely profit from those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
input/output Format and file extensions of the input/output data.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants