Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Belgium DAG for template to generalise to other countries #95

Open
sgreenbury opened this issue May 17, 2024 · 1 comment
Open

Comments

@sgreenbury
Copy link
Collaborator

From discussion with @yongrenjie as part of #92

Currently the individual census tables are filtered through the used of needed datasets and a corresponding partition.

As begun in #92 (see this section) the config for derived columns can be expanded to include:

  • Geography level
  • Aggregation column (IMO just define a (DF -> DF) function that gets called to generate the new statistic)

To enable the above, the type for derivation config (currently: dict[str, tuple[str, list[DerivedColumn]]]) can be updated to include the extra required items.

This could be something like:

# One per derived table
class DerivedColumn:
    hxltag: str
    aggregation_func: Callable[[pd.DataFrame], pd.DataFrame]
    output_column_name: str
    human_readable_name: str

# One per source table
class MetricDerivationInstructions:
   geography_level: str
   geo_id_col_name: str
   derived_columns: list[DerivedColumn]

Also see if needed_datasets + source_metrics assets can be skipped entirely.

Following any refactoring this pattern should be readily applicable to other countries to be updated in the pipeline (e.g. Scotland, NI, England/Wales, USA) new countries being added that conform to this DAG pattern for how the data is provided.

@sgreenbury
Copy link
Collaborator Author

The original aim of issue is superseded in porting Northern Ireland #98. Consider whether to keep open for incorporating all other census tables as metrics (@andrewphilipsmith for reference)

sgreenbury added a commit that referenced this issue Jul 2, 2024
…ium-class

Refactor Belgium to use new Country class (#95)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog:
Development

No branches or pull requests

1 participant