Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate sources in favour of origins #3331

Open
1 task
Marigold opened this issue Sep 30, 2024 · 1 comment
Open
1 task

Deprecate sources in favour of origins #3331

Marigold opened this issue Sep 30, 2024 · 1 comment

Comments

@Marigold
Copy link
Collaborator

Marigold commented Sep 30, 2024

Problem

Maintaining both sources and origins is cumbersome. Migrating all sources to origins would allow us to deprecate a significant amount of code in both ETL and Grapher, potentially speeding up the process. We've already migrated the most critical datasets to origins and are currently migrating the rest as needed. However, a one-time full migration might be more efficient.

Solution

Full migration to origins in snapshots

This would involve migrating all sources in snapshots to origins. We’d need to devise an automatic source-to-origin transformation and modify all steps to use Table instead of DataFrame to ensure proper origin propagation.

Adding origins to YAML files

For all garden and grapher datasets, we could extract sources, automatically migrate them to origins, and add them to the metadata YAML files (potentially as *.meta.override.yml).

TODO

  • How many (unarchived) datasets still use sources?
@larsyencken
Copy link
Collaborator

It would be good to understand the size of the migration needed to work out if it's worthwhile now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants