-
Notifications
You must be signed in to change notification settings - Fork 277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a duplicates_strategy argument to read_df… #10130
Conversation
…es we can keep the first occurrence, the last one or calculate the average values
- 'error': raise an error (default) | ||
- 'keep_first': keep the first occurrence | ||
- 'keep_last': keep the last occurrence | ||
- 'avg': calculate the average numeric values | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a line
assert duplicates_strategy in ('error', 'keep_first', 'keep_last', 'avg'), duplicates_strategy
just to be protected against mispellings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
def concat_if_different(values): | ||
unique_values = values.dropna().unique().astype(str) | ||
# If all values are identical, return the single unique value, | ||
# otherwise join with "|" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return '|'.join(unique_values)
is shorter and does exactly the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
openquake/calculators/base.py
Outdated
@@ -982,7 +982,8 @@ def _read_risk3(self): | |||
# NB: get_station_data is extending the complete sitecol | |||
# which then is associated to the site parameters below | |||
self.station_data, self.observed_imts = \ | |||
readinput.get_station_data(oq, self.sitecol) | |||
readinput.get_station_data(oq, self.sitecol, | |||
duplicates_strategy='error') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would set duplicates_strategy='avg'
so that the aristotle tests will get green.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
LGTM, but please also update the changelog. |
…so in case of duplicates we can: