-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
handle faulty DST logic with repeated indicies #104
Comments
Here's my half baked solution def fix_st_to_dst(df):
# assume duplicated only occurs at ST to DST transition
duplicated = df.index.duplicated()
num_duplicates = duplicated.sum()
first_duplicate_index = np.argmax(duplicated)
first_shift_point = first_duplicate_index - num_duplicates
new_hour = \
df.index[first_shift_point:first_duplicate_index] - pd.Timedelta('1h')
new_hour_df = df.iloc[first_shift_point:first_duplicate_index].copy()
new_hour_df.index = new_hour
df_fixed = pd.concat([
df.iloc[:first_shift_point],
new_hour_df,
df.iloc[first_duplicate_index:]
])
return df_fixed |
Interesting, and weird that the "spring forward" duplicated 3-4am, as if the clock was reset to ST after one hour. I wonder if these data have already been modified in an attempt to fix time stamp issues, since a similar flaw doesn't appear on 1-Nov-2020. I could talk people at the data source and find out why this all happened. That may help us understand how to structure a useful function for pvanalytics. The code above is different than the pattern in |
I assumed it was happening within the datalogger, but yes, you may be right.
I don't have a preference, this is just what I hacked together for this script for arbiter insert. I'd like to say quickly hacked together, but alas, it was not so. A pvanalytics implementation following my approach might want to restrict the duplicates search to specific days. |
I can confirm that these errors are not happening within the datalogger, at least, not in this case. Here, the logger remains in local standard time (UTC-5) and does not change. So I believe the errors in the referenced file are due to an error in the system that exported the data from the database. However, I have seen plenty of times where a logger or data acquisition system switches from standard time to daylight time (or the other way), so I think it's probably a good idea to have a routine adept at correcting the issue. It should be trivial if the UTC offset is included in the time, but perhaps more difficult (and perilous!) if the UTC offset is not available with each time stamp. |
I have a file in which the data appears to not use DST, but there's a complication. The transition to DST does occur at 2 am. Then an hour later the data falls back to 3 am. So the effect is that the 3 am hour is repeated in the data.
I speculate that this happens with a non-trivial number of data loggers, so it would be nice if pvanalytics could flag it and fix it.
index.is_monotonic
andindex.duplicated()
seem like good places to start, but I don't have a complete suggestion for how to implement.See example in the details tag below, and find the file in the zip archive below that.
FSEC_RTC_Weather_2020.csv.zip
The text was updated successfully, but these errors were encountered: