Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input values / values used to summaries output data #104

Closed
natemcintosh opened this issue Dec 3, 2024 · 4 comments
Closed

Add input values / values used to summaries output data #104

natemcintosh opened this issue Dec 3, 2024 · 4 comments

Comments

@natemcintosh
Copy link
Collaborator

Goal

Save the input values used in the summary dataframe for later use.

Context

The current summaries contain estimated quantities. It will be useful for later plotting (and model evaluation?) to also save the raw input values that were used. Since the model currently takes 8 weeks of data, that's only 56 new data points to include.

Suggest calling them either input_value or value_used.

Required features

  • Add rows for input values. For each of the columns, I loosely suggest:
    • _variable: "input_value"
    • value: the value
    • _lower: either same as the value, or NA
    • _upper: either same as the value, or NA
    • _width: 0.5?
    • _point: "point"?
    • _interval: NA?
    • reference_date: the date that value is from
    • geo_value: the location of the value
    • model: EpiNow2
    • disease: the disease

Out of scope

Anything beyond what is necessary for adding these new rows.

Related

  • This issue which discusses putting the input values in the metadata file.
@athowes
Copy link
Collaborator

athowes commented Dec 9, 2024

I can see that "currently takes 8 weeks of data, that's only 56 new data points" means that 8 * 7 = 56. Which means there is one data point per day. So the "collection of input values" here refers to the data?

I think for certain functions fit with Stan then the fit object will already contain the data, so we wouldn't need to do something separate. I'm not sure if this is the case for EpiNow2 or not, trying to investigate. If it isn't I think it's a sensible thing for them to be returning so could make an issue there.

Are there examples of plots and model evaluations you'd want? I wonder how hard it would be to get something compatible with https://cran.r-project.org/web/packages/loo/index.html.

@natemcintosh
Copy link
Collaborator Author

So the "collection of input values" here refers to the data?

Correct.

Are there examples of plots and model evaluations you'd want?

The data anomaly plots we make every week would be a good example. Right now they require pulling data from two sources to make; saving the input values in the output would mean the plots could be made from a single data source.

@athowes
Copy link
Collaborator

athowes commented Dec 9, 2024

https://epiforecasts.io/EpiNow2/dev/reference/estimate_infections.html returns:

A list of output including: posterior samples, summarised posterior samples, data used to fit the model, and the fit object itself.

So epinow should have the data already included.

But perhaps we are not saving the whole output of EpiNow2?

@natemcintosh
Copy link
Collaborator Author

Just checked the output variables of the summary and samples parquet files, but did not see it in there. I think we may have to manually tack it onto the summary table from the epinow2 object.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants