-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to publishing pipeline #108
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
penelopeysm
force-pushed
the
publishing-improvements
branch
3 times, most recently
from
June 6, 2024 10:02
ee8fbb7
to
b31223a
Compare
- Also: Introduces `GeometryOutput` and `MetricsOutput` dataclasses to represent the output types of the assets that produce geometry and metrics respectively. These will hopefully be easier to understand, document, and use, compared to raw tuples. - Also: Updates Belgium DAG to use these new types. Closes #94
Partitions are 'cached' between different Dagster runs, so doing this helps to 'clean up' old partitions that are no longer applicable to the version of the code being presently worked on.
This commit implements a class method called fix_types, which is called whenever a list of metadata classes is serialised to a dataframe. cls.fix_types(df) returns a new df where the types of columns are properly coerced to what they should be in the resulting parquet file. This avoids issues with pandas automatically inferring e.g. a None type for a string column that just happens to all be Nones, and ensures that dataframes from different countries can be concatenated by the CLI. Closes #106
This makes it consistent with updates to the sensor
My poor Malaysian internet connection can't handle it otherwise
Just a simple change pending a full refactor of BE
penelopeysm
force-pushed
the
publishing-improvements
branch
from
June 23, 2024 06:29
b31223a
to
687cbbe
Compare
penelopeysm
requested review from
sgreenbury and
andrewphilipsmith
and removed request for
sgreenbury
June 24, 2024 10:03
sgreenbury
reviewed
Jun 26, 2024
sgreenbury
reviewed
Jun 26, 2024
sgreenbury
reviewed
Jun 26, 2024
sgreenbury
reviewed
Jun 26, 2024
sgreenbury
reviewed
Jun 26, 2024
sgreenbury
approved these changes
Jun 26, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this @penelopeysm, this looks great to me (including the async addition) and I think the docs are very clear! Just added some small comments in the code and think this is good to merge.
Co-authored-by: Sam Greenbury <[email protected]>
CI broke because geopandas 1.0.0 was released 2 days ago 😄 |
penelopeysm
added a commit
that referenced
this pull request
Jun 26, 2024
Forgot to change this line as part of #108
Merged
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Commit messages have more detail, but the main changes of this PR are:
Closes make sure that correct types are encoded when serialising dataframes to parquet #106
to_supertypes()
call in the CLI, see config and logging popgetter-cli#36 (comment)Closes Add input validation for IO managers #94
Contributes towards Update contributing guidence #66 and Create guidence on how to add a new country to the pipeline #39
There is one commit (86746e5) which converts the NI catalog requests to use async HTTP requests. I had to implement this not because of anything wrong with the original implementation, but rather because my Internet connection over here is not quite the same as in the UK... 🥲 However, I am happy to drop it from this PR if this is deemed unnecessary.