-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
s2_union_agg()
is slow
#103
Comments
This is not a solution for the speed of s2_union_agg. I am posting a wrapper function that might be useful to someone landing on this issue, such as I did while looking for faster processing using The function below
library(s2)
library(sf)
#> Linking to GEOS 3.9.0, GDAL 3.2.2, PROJ 7.2.1
library(tibble)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
s2_union_split_agg <- function(x, options = s2_options()){
# check for intersects
idx_intersects <- s2::s2_intersects_matrix(x, x, options = options)
# prapare groups of intersects
idx_list <- lapply(seq_along(idx_intersects)[lengths(idx_intersects) > 1], function(i) c(i, idx_intersects[[i]]))
group_list <- tibble::tibble(fid = integer(), gid = integer())
for(i in idx_list){
# check if intersecting fids already exist in the groups
gids <- dplyr::filter(group_list, fid %in% unique(i))
if(nrow(gids) > 0){
# get for new fids
new_ids <- unique(i)[!unique(i) %in% group_list$fid]
# get existing groups
new_gid <- unique(gids$gid)[1]
# add new fids to existing groups
group_list <- tibble::add_row(group_list, fid = new_ids, gid = new_gid)
# merge groups if new fids overlap several groups
group_list <- dplyr::mutate(group_list, gid = ifelse(gid %in% gids$gid, new_gid, gid))
} else {
# create a new group id
gid <- ifelse(nrow(group_list) == 0, 0, max(group_list$gid)) + 1
# add all new fids to the new group
group_list <- dplyr::bind_rows(tibble::tibble(fid = unique(i), gid = gid), group_list)
}
}
# split features based on groups of intersects
x_split <- split(x[group_list$fid], group_list$gid)
# spare features that do not intersect other features
x <- x[-group_list$fid]
# apply s2_union_agg to groups
x_union <- lapply(x_split, s2_union_agg, options = options)
# gather results into single object
result <- s2_rebuild_agg(c(x, do.call("c", x_union)), options = options)
return(result)
}
countries <- s2_data_countries()
system.time(res1 <- s2_union_split_agg(countries, options = s2_options(model = "closed")))
#> user system elapsed
#> 4.067 0.028 4.105
system.time(res2 <- s2_union_agg(countries, options = s2_options(model = "closed")))
#> user system elapsed
#> 4.527 0.002 4.530
# the output geometries have no difference
s2_difference(res1, res2)
#> <s2_geography[1]>
#> [1] <GEOMETRYCOLLECTION EMPTY>
# but the objects are not identical
identical(res1, res2)
#> [1] FALSE
plot(st_as_sf(res1)) plot(st_as_sf(res2)) Created on 2021-09-17 by the reprex package (v2.0.1) |
I never got around to thanking you for this example! I can and will implement something like this at the C++ level to speed up this type of union...hopefully for the next release! |
I still can and will make better use of this idea of intersecting groups, but I'm including a slight improvement in #165 that will make this a bit faster (although it's still O(n)). # remotes::install_github("r-spatial/s2#165")
library(s2)
library(ggplot2)
res <- bench::press(
n = c(1:5),
bench::mark(s2_union_agg(rep(s2_data_countries(), n)))
)
#> Running with:
#> n
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
ggplot(res, aes(n, median)) +
geom_point() # install.packages("s2")
library(s2)
library(ggplot2)
res <- bench::press(
n = c(1:5),
bench::mark(s2_union_agg(rep(s2_data_countries(), n)))
)
#> Running with:
#> n
#> 1 1
#> 2 2
#> 3 3
#> 4 4
#> 5 5
ggplot(res, aes(n, median)) +
geom_point() Created on 2022-03-05 by the reprex package (v2.0.1) |
The strategy of accumulating a union is correct but slow! I assume there is a faster way to go about this that I haven't found in the documentation yet.
Created on 2021-04-28 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: