Standardize spectrum.meta.header #1125

cshanahan1 · 2024-03-08T15:14:26Z

cshanahan1
Mar 8, 2024

Related issues / PRs mentioning this topic:
#617
#1102
#1107

It has come up several times that there is no consistency in what is populated in spectrum.meta['header'] when reading in a FITS file. Consequently, writing out a Spectrum1D to FITS is unpredictable (what extension will the keywords in meta.header be written to? what keywords will be dropped? what format should meta.header be?). For example, when using the tabular fits reader to read in a fits file, only some of the header keywords from the 1st extension (ignoring the 0th) will be populated in meta['header'] as an OrderedDict, but when using tabular-fits to write out this same Spectrum1D that was read in with tabular-fits, it requires a Header() object in spectrum.meta['header'].

There has been some recent discussion about a multi-extension fits read/write format to preserve a PrimaryHDU on the output file to describe the data within, and doing this would require more structure in .meta['header'] to determine what should be written to the 0th extension, and what should be written to the data extension.

So, we need to standardize:

What format spectrum.meta['header'] should be in (currently there is a mix of Dict, Header, and HDUList amongst writers).
What happens to primary header data, if present in an input file. Currently, most formats ignore a primary header but some like jwst_reader will combine 0th and 1st from a file read in, and put them all in the 1st when written out.

Thoughts?

cshanahan1 · 2024-03-08T15:29:59Z

cshanahan1
Mar 8, 2024
Author

I think that the solution to '1' should be that we decide everything in meta['header'] should be a Header object.

For '2' I think that structuring meta['header']into meta['header']['primary-header'] and meta['header']['data-header'] makes sense. Tabular fits doesn't allow writing to the 0th extension, so any format that uses this to write will not actually be able to preserve the structure of the output file (can either drop primary or concatenate), but structuring them like this in Spectrum1D will make it easier to write formats that do preserve this structure.

0 replies

dhomeier · 2024-03-13T16:37:15Z

dhomeier
Mar 13, 2024
Maintainer

There has been some recent discussion about a multi-extension fits read/write format to preserve a PrimaryHDU on the output file to describe the data within, and doing this would require more structure in .meta['header'] to determine what should be written to the 0th extension, and what should be written to the data extension.

Is this about just using its header – #617 (comment) indicates that there is common demand to have this available for general observation-related info – or also about using its data section as well?
I think there would be very limited use for the latter, since PrimaryHDU can only store image-type data; so unless for some reason the description needs some larger numerical array information, it does not seem to make much sense to "encrypt" it that way.

0 replies

dhomeier · 2024-03-13T22:45:04Z

dhomeier
Mar 13, 2024
Maintainer

Tabular fits doesn't allow writing to the 0th extension, so any format that uses this to write will not actually be able to preserve the structure of the output file (can either drop primary or concatenate), but structuring them like this in Spectrum1D will make it easier to write formats that do preserve this structure.

More specifically, its currently direct use of Table.write does not allow this, but replacing

specutils/specutils/io/default_loaders/tabular_fits.py

Line 188 in 98dfcfd

tab.write(file_name, format="fits", **kwargs)

fits.HDUList([primary_hdu, fits.BinTableHDU(tab)]).writeto(file_name, **kwargs)

(with whatever info we wish to have added to primary_hdu) should get around that limitation quite easily (and even allow writing to others than hdu=1, should there ever be need for it).

0 replies

rosteen · 2024-03-14T17:07:08Z

rosteen
Mar 14, 2024
Maintainer

FWIW I support both of @cshanahan1's proposals, it seems reasonable to store both primary and data headers in Header objects in the specutils objects' meta dictionary. That work can be separate from standardizing how the various writers use that information to generate the output file headers.

0 replies

dhomeier · 2024-03-14T21:00:49Z

dhomeier
Mar 14, 2024
Maintainer

I agree it makes sense to store all relevant headers. I am a bit uncertain about the naming – primary_header vs. data_header may not always be a meaningful designation, as some formats like plain wcs-fits have their data in the primary HDU, while others may have extensions with their headers that are storing uncertainties, masks etc. – not "data" in a certain sense.
From a quick glance it seems to me that actually the large majority of the readers in default_loaders are reading the primary header into meta['header']. So it may be least disruptive to keep that convention (resp. adopt it for those loaders not following it yet), and use a somewhat flatter structure putting the extension header in meta['data_header'], or meta['ext_header_N'] – with the second pattern we could even accommodate any number of additional extension HDU headers.

0 replies

aragilar · 2024-03-18T00:52:04Z

aragilar
Mar 18, 2024

While I agree that moving towards some level of consistency of what's in meta is good from making it easier to use spectra from different places, I feel that doing that so that the writers can dump out the header is a mistake. Most of the code in https://github.com/astropy/specutils/blob/main/specutils/io/default_loaders/dc_common.py exists to work around the fact that people copied across headers without checking that the contents were valid or still made sense (and most of my time spent on getting new survey data is pushing back on survey teams which have done this, and need to clean up their headers). Internally at Data Central we have a writer whose purpose is to provide a "quick-look" version of any spectra we have, and even though we know what's in the headers and have checked it (and if needed provide a fixed version), we use what's in the database (which is why it's currently not public, it's too coupled to our db structure).

I think if we want to improve what FITS writers we have, then we should require users to pass the metadata in as additional arguments, and warn when they miss out metadata we think they should have (e.g. ORIGIN, AUTHOR), rather than digging into meta. We should also look at what we can standardise in terms of other metadata (e.g. we don't specify anything about the time of observation, nor the position, and that's a common thing that people misencode in headers), with things like obscore and the various FITS dictionaries/conventions being a good place to start.

0 replies

dhomeier · 2024-03-18T23:35:23Z

dhomeier
Mar 18, 2024
Maintainer

Wouldn't that be caught to some extent by pulling exclusively from meta['header']? meta in general can contain all sorts of objects not related to FITS or even serialisable as a FITS header, but I'd think if people are adding something specifically to meta['header'], they'd do it with the point of writing it out to a header in mind. Of course blindly copying all header info from a file as originally read in may still happen, but I don't know if a full sanity check of that is within the scope of specutils.

1 reply

aragilar Mar 19, 2024

My feeling is that what appears should be opt-in, not opt-out (as would happen if you pull in meta["header"], rather than specific arguments), and we should push users (via warnings) toward having a complete set of metadata (at least setting ORIGIN and AUTHOR in the beginning). The files I've seen have come from different pipelines in different languages (at least Python/astropy and IDL), so think requiring users to pass in what is needed plus some basic sanity checks would be a big improvement to the quality of what is produced by users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardize spectrum.meta.header #1125

{{title}}

Replies: 7 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Standardize spectrum.meta.header #1125

cshanahan1 Mar 8, 2024

Replies: 7 comments · 1 reply

cshanahan1 Mar 8, 2024 Author

dhomeier Mar 13, 2024 Maintainer

dhomeier Mar 13, 2024 Maintainer

rosteen Mar 14, 2024 Maintainer

dhomeier Mar 14, 2024 Maintainer

aragilar Mar 18, 2024

dhomeier Mar 18, 2024 Maintainer

aragilar Mar 19, 2024

cshanahan1
Mar 8, 2024

Replies: 7 comments 1 reply

cshanahan1
Mar 8, 2024
Author

dhomeier
Mar 13, 2024
Maintainer

dhomeier
Mar 13, 2024
Maintainer

rosteen
Mar 14, 2024
Maintainer

dhomeier
Mar 14, 2024
Maintainer

aragilar
Mar 18, 2024

dhomeier
Mar 18, 2024
Maintainer