You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did a bit of sleuthing and shared the following in an email thread.
On Dec. 10:
Those duplicates come from two TDX-HydroRegions:
"1020018110": "13", # (Congo River Basin, Africa)"5020049720": "54", # (Australia, Australia and Oceania)
It looks most of the large counts are from the Congo. Both of these places are very flat, so there could be issues with the underlying data.
On Dec. 17:
I explored the raw GeoPackage files from NGA along with the processed GeoParquet files we sent to you.
The short answer is that the duplicate basin records in the Congo and Australia were duplicates in the raw data! I’m a bit puzzled how GeoPandas allowed us to use LINKNO as the index, but that’s another question…
The good news is that we can just drop them, as they are complete duplicates! Those same duplicates don’t exist in the streamnet files, which were the basis of all of our MNSI calculations for delineation and hydrologic groupings. So all of our processing is just fine. Yay!
So this is an easy fix, and worth noting to the NGA folks sooner than later!
Unfortunately, I hadn't noticed that the duplicate rows were not completely identical, because as @rajadain noted they are identical "except for geom, which is a slightly different square for each case".
So my recommendation to just use the first one may have been wrong. We will likely need to merge the geometries of the duplicates to get the right geometry for each TDX-Hydro Basin.
This issue is a placeholder to explore further for the next round of work.
The text was updated successfully, but these errors were encountered:
As identified by @rajadain with WikiWatershed/model-my-watershed#3647 (comment), the TDX-Hydro basins datasets have a number of duplicates rows.
I did a bit of sleuthing and shared the following in an email thread.
On Dec. 10:
On Dec. 17:
Unfortunately, I hadn't noticed that the duplicate rows were not completely identical, because as @rajadain noted they are identical "except for geom, which is a slightly different square for each case".
So my recommendation to just use the first one may have been wrong. We will likely need to merge the geometries of the duplicates to get the right geometry for each TDX-Hydro Basin.
This issue is a placeholder to explore further for the next round of work.
The text was updated successfully, but these errors were encountered: