An overview of potential avenues for performance enhancement #139
fBedecarrats
started this conversation in
Ideas
Replies: 2 comments 1 reply
-
Thanks for starting this discussion and the valuable ideas for improving performance! Two additional points come to mind:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi! library(sf)
library(terra)
library(exactextractr)
library(mapme.biodiversity)
brazil <- st_as_sf(raster::getData('GADM', country='BRA', level=2))
brazil <- st_cast(brazil, "POLYGON")
data_dir <- "./brazil-data"
dir.create(data_dir, showWarnings = FALSE)
aoi <- init_portfolio(brazil, 2000:2021, outdir = data_dir)
aoi <- get_resources(aoi, c("gfw_treecover", "gfw_lossyear"))
treecover_files <- list.files(data_dir, pattern = "treecover", full.names = TRUE, recursive = TRUE)
lossyear_files <- list.files(data_dir, pattern = "lossyear", full.names = TRUE, recursive = TRUE)
treecover_vrt <- vrt(grep(".tif$", treecover_files, value = TRUE))
lossyear_vrt <- vrt(grep(".tif$", lossyear_files, value = TRUE))
grid <- st_make_grid(brazil, cellsize = c(0.1, 0.1)) |> st_as_sf()
grid$ID <- 1:nrow(grid)
gfw <- c(treecover_vrt, lossyear_vrt)
names(gfw) <- c("treecover", "lossyear")
gfw_stats <- exact_extract(gfw, grid, function(data, cover){
data <- data[data$treecover > cover, ]
data$area <- data$area * data$coverage_fraction
loss_sum <- by(data$area,data$lossyear,sum)
result <- data.frame(
year = 0:21,
area = sum(data$area),
loss = 0
)
year <- as.numeric(names(loss_sum))
value <- as.numeric(loss_sum)
result$loss[(year+1)] <- value
result$loss[1] <- 0 # set loss of first year to 0
result$area <- result$area - cumsum(result$loss)
result
}, cover = 30, include_area = TRUE, summarize_df = TRUE, append_cols = "ID") |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Background: Currently the package offers an intelligible and coherent syntax for geodata manipulation, state-of-the-art processing methods and a well documented guided process to avoid common mistakes in data preparation. It is great.
Processing performance already good, but it reaches limits for small-scale analysis over large areas. From my intents, I have the impression that it can be suited (ie. taking less than days to process) to analyze average size areas of interest (>= 5km2?) at continental scale or small size area of interest at national/regional scale. However, the performance levels it achieves might not be sufficient to process small-scale areas of interest (<= 1km2?) at continental or global scale.
Some use cases (eg. the replication of Wolf et al. 2021) or the processing of statistics for all PAs in the world, could require (or benefit from) better performance.
I propose to dedicate this discussion thread to identify possible avenues to enhance the processing performance.
As a prerequisite, a few actions could help us share a common language and understanding of what this is about:
Then, different complementary avenues could be explored to enhance performance:
Any ideas on other options to consider and/or comments on these?
Beta Was this translation helpful? Give feedback.
All reactions