Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

README for pipeline #235

Open
SamuelBrand1 opened this issue Dec 13, 2024 · 2 comments
Open

README for pipeline #235

SamuelBrand1 opened this issue Dec 13, 2024 · 2 comments

Comments

@SamuelBrand1
Copy link
Collaborator

I think we should have a README for the pipeline folder which includes:

  • Purpose of pipeline
  • Expected data schema
  • Description of method

At least.

This was referenced Dec 17, 2024
@damonbayer
Copy link
Collaborator

This might be a good place to relocate the production pipeline diagram, but I question if maintaining it is worth the effort.

#32

flowchart TD
    prep_forecast["Prepare Forecast Data<br/>(Python using Polars)"]
    prep_retro["Prepare Retro Data<br/>(Python using Polars)<br/>(currently part of the Prepare Forecast Data script)"]
    report["Report Generator<br/>(does different things depending on inputs)<br/>(Quarto)"]
    joint_hub_output["Joint hub submission<br/>(csv)"]
    report_output("[Lightweight  or Detailed] [Retro or Forecast] Report<br/>(HTML)")

    combine_hub_output("Combine hub submissions<br/>(Python using Polars)")
    
    subgraph "For each location, in parallel"
        fit_model["Fit Model<br>(Python using PyRenew)"]
        forecast["Forecast, prior predictive, and posterior predictive <br>(Python using PyRenew)"]
        tidy["Tidy<br>(Python using forecasttools)"]
        interest_figs["Summarize quantities of interest<br/>(R using tidybayes)"]
        score["score<br/>(R using scoringutils)"]
        hub["Format for hub submission<br/>(Python using forecasttools)"]
        diagnostics["ArviZ Diagnostics<br/>(Python using ArviZ)"]

        data_fit("data for fit<br/>(json)")
        posterior_draws("Posterior MCMC draws<br/>(pickle)")
        all_mcmc_draws_ncdf("All MCMC draws with date coordinates<br/>(netCDF)")
        all_mcmc_draws_tab("All MCMC draws<br/>(parquet)")
        hub_output("Hub submission<br/>(csv)")
        retro_data("Retro data<br/>(tsv)")
        interest_figs_output("Tables and figures for quantities<br/>(parquet/tsv and svg/png)")
        diagnostics_output("Diagnostic tables and figures<br/>(parquet/tsv and svg/png)")
        scored("scored dataset<br/>(Parquet)")
    end
    
    prep_retro --> retro_data
    prep_forecast --> data_fit
    data_fit --> fit_model
    fit_model --> posterior_draws
    posterior_draws --> forecast
    forecast --> all_mcmc_draws_ncdf
    all_mcmc_draws_ncdf --> tidy
    tidy --> all_mcmc_draws_tab

    hub --> hub_output
    retro_data --> report    
    all_mcmc_draws_tab --> interest_figs
    all_mcmc_draws_tab --> score
    score --> scored
    all_mcmc_draws_tab --> hub
    all_mcmc_draws_ncdf --> diagnostics
    diagnostics --> diagnostics_output
    interest_figs --> interest_figs_output
    retro_data --> score

    diagnostics_output --> report
    interest_figs_output --> report
    scored --> report
    report --> report_output

    hub_output --> combine_hub_output
    combine_hub_output --> joint_hub_output
    %% Styling
    classDef script fill:;
    classDef file fill:#0099ff;

    class data_fit,posterior_draws,all_mcmc_draws_ncdf,all_mcmc_draws_tab,retro_data,interest_figs_output,diagnostics_output,scored,report_output,hub_output,joint_hub_output file
    class prep_forecast,prep_retro,fit_model,forecast,tidy,interest_figs,score,hub,diagnostics,report,combine_hub_output script
Loading

@damonbayer
Copy link
Collaborator

We should also document how to run the pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants