Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate setting range in config to values that best represent the palette #35

Open
julietcohen opened this issue Dec 28, 2023 · 1 comment

Comments

@julietcohen
Copy link
Collaborator

In many datasets, we want to visualize an attribute of the data rather than one of the custom statistics for polygon count or coverage that we use for the ice-wedge polygon data. An example can be found in the lake area time series dataset, where we visualize the areas of permanent water and seasonal water as separate layers, which are in units of hectares and are pulled straight from the vector data for each polygon. These attributes have ranges with a minimum of 0, and a max value that is an integer that varies per year, say 120,025. This data contains outliers and is not normally distributed, so setting the range in the config for these stats cannot simply be [0, 120025]. (The min and max for each z-level is calculated by default based on the raster_summary.csv if the range is not set by the user.) If we do allow the range max to be the max value of the attribute, the result is the web tiles palette does not represent all values of the data clearly on the portal. We see too many polygons with the value of the palette that represents the lower end of the values (like light blue), and few polygons in the tileset show the value of the palette that represents the largest values of the range (like dark blue). In order for users to best understand the data, there should be all values of the palette represented. By removing outliers in the range set in the config (meaning we set the max value to one that is lower than the max value in the attribute), we better represent the middle values in the palette too. Any values that fall beyond the max value in the config are set to the same color as the max in the range, so this is just like winsorizing.

Determining the best value for the max in the range would best be done mathematically, like using a percentile, or an approach that is more complicated and specific. Using a percentile was explored for the lake area time series data here. Depending on the distribution of each particular attribute, the percentile should be adjusted. This can be time consuming, so it would be best to integrate an approach into viz-staging that sets the range values to the ones that do not include outliers.

@julietcohen
Copy link
Collaborator Author

For clarity: when I refer to the val_range in the config, that is referring to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Status: No status
Development

No branches or pull requests

1 participant