Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Control which SED goes in the catalog #219

Open
cdeil opened this issue Feb 16, 2018 · 6 comments
Open

Control which SED goes in the catalog #219

cdeil opened this issue Feb 16, 2018 · 6 comments
Assignees
Milestone

Comments

@cdeil
Copy link
Contributor

cdeil commented Feb 16, 2018

We have to implement a way to control which SED goes in the catalog.

Yesterday I made this change:

# TODO: fix this!

As you can see in these updated examples, there were cases before and after where the SED doesn't match the spectral model we have in the catalog:
cdeil/gamma-cat-status@218dfa9

This is a reminder issue for myself to implement a good solution for this and to document it properly.

@cdeil cdeil added this to the 0.1 milestone Feb 16, 2018
@cdeil cdeil self-assigned this Feb 16, 2018
@pdeiml
Copy link
Collaborator

pdeiml commented Apr 12, 2018

I read through the code which makes the collection and the catalog and I have a few suggestions.

We have https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml which is intended to control which datasets (tev-*.yaml) go into the collection. But - as you can see in https://github.com/gammapy/gamma-cat/blob/master/gammacat/collection.py#L275 - it is not used to control it. The datasets which are being written to the collection are searched by going through the info.yaml files.
I think this is not the idea of https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml
Moreover, https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml is not used for the seds and lightcurves, too.

Therefore, my suggestion is the following:
We keep the CollectionMaker as it is, meaning that the list of leds, seds and datasets which go into the collection are produced by scanning through the info.yaml files and the folders, respectively (I don't see a reason why a sed, lc, dataset which exists in the input should not go to the collection).
To control which sed, lc, dataset goes to the catalog, we use https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml for the datasets and we write two new yaml-files gamma_cat_lightcurve.yaml and gamma_cat_sed.yaml.
Another way to implement it whould be to only have one file for the control of the catalog which could be formatted like this (the references are arbitrary, I only want to show the format):

  • source_id: 1
    status: source
    sed: 2013ApJ...764...38A
    lc: 2015ApJ...456..40H
    dataset: 2017A&A..56...40A

@cdeil
Copy link
Contributor Author

cdeil commented Apr 12, 2018

@pdeiml - gamma_cat_dataset.yaml is the configuration file that controls the content of the catalog.

You can see that it's used from catalog.py here:
https://github.com/gammapy/gamma-cat/search?utf8=%E2%9C%93&q=gamma_cat_dataset.yaml+&type=

Of course, a fix for this issue and generally improvements to the catalog making / configuration are very welcome! If you take on this task, please make small pull requests that improve code / docs / content of the catalog that are easy and quick to review for me. (and not a large restructuring / rewrite, together with a fix for this issue)

Since it seems it wasn't clear to you how the catalog is built, starting by improving the contributor docs (and maybe renaming gamma_cat_dataset.yaml to gamma_cat_catalog_config.yaml or something) on that point would probably be the best first step.

For users, the most useful improvement (apart from fixing this issue which SED is filled in the catalog) would be to explain what's in the catalog here: https://gamma-cat.readthedocs.io/data/catalog.html

Again, I would suggest to make small improvements, e.g. just to link to https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_columns.yaml and say that those columns are available, or add a link to https://github.com/gammapy/gamma-cat/blob/master/input/gammacat/gamma_cat_dataset.yaml and explain that this controls the content of the catalog.

@pdeiml
Copy link
Collaborator

pdeiml commented Apr 12, 2018

Well, gamma_cat_dataset.yaml is written down in catalog.py but if you take a closer look to the CatalogMaker, you see in https://github.com/gammapy/gamma-cat/blob/master/gammacat/catalog.py#L576 that not the gamma_cat_dataset.yaml in the input folder is used but the one in the output folder which is basically a sum of all output resources.
This means that in fact input/gamma_cat_dataset.yaml is not used by the catalog (I am sure that make.py will break as soon as I delete the file but it is not really used).

But what about my suggestion above with one single file for all types of data and that we copy the datasets by scanning through the folders?

@cdeil
Copy link
Contributor Author

cdeil commented Apr 12, 2018

That sounds like a regression bug then. Initially when I wrote and used this, gamma_cat_dataset.yaml from the input folder was used. Can you start by making a PR fixing this?
In this case, please also update this output file (or all catalog output files) so that we can see what changes in the catalog:
https://github.com/gammapy/gamma-cat/blob/master/output/gammacat.yaml

It would also be good to have some checks, e.g. you could add a test_catalog.py here:
https://github.com/gammapy/gamma-cat/tree/master/gammacat/tests
which reads the catalog and puts some asserts to make sure it's filled OK.
Especially "regression tests", i.e. some information / number that is currently not filled correctly, is a good candidate to add an assert line.

But what about my suggestion above with one single file for all types of data and that we copy the datasets by scanning through the folders?

I don't understand. IMO the way it should work is that only input folder is scanned once to make an input index file. Then the output collection is made, starting with the input index. Then the catalog is made, from the output collection (via the output index), plus the config file what to put in the catalog. I thought it already worked that way; if not, please fix.

@pdeiml - I'm very busy this week, but if there's still questions, I'd prefer a 10 min call over Github back and forth.

@pdeiml
Copy link
Collaborator

pdeiml commented Apr 12, 2018

I will take a closer look to this but I think that is a regression bug, yes.

I totally agree with you about the procedure at the end of your comment but I think that is not working like that. I will look at this problem as well the next days.

@cdeil cdeil assigned pdeiml and unassigned cdeil Apr 12, 2018
@cdeil
Copy link
Contributor Author

cdeil commented Apr 12, 2018

@pdeiml - Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants