Skip to content

Smoothes out analysis work flow especially when using mobile data collection systems (ODK/XLSForm/Kobo)

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

impact-initiatives/butteR

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

78 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

butteR

butteR can be used to smooth out the analysis and visualization of spatial survey data collected using mobile data collection systems (ODK/XLSform). ButteR mainly consists of convenient wrappers and pipelines for the survey, srvyr, sf, and rtree packages.

Installation

You can install the the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("zackarno/butteR")
## Example

Example using the stratified sampler function

The stratified sampler function can be useful if you want to generate random samples from spatial point data. It has been most useful for me when I have shelter footprint data that I want to sample. For now, the function only reads in point data. Therefore, if the footprint data you have is polygons it should first be converted to points (centroids).

I believe the most useful/powerful aspect of this function is the ability to write out well labelled kml/kmz files that can be loaded onto phone and opened with maps.me or other applications. To use this function properly it is important that you first familiarize yourself with some of the theory that underlies random sampling and that you learn how “seeds” can be used/set in R to make random sampling reproducible. The function generates random seeds and stores it as a an attribute field of the spatial sample. There is also the option to write the seed to the working directory as text file. Understanding how to use the seeds becomes important if you want to reproduce your results, or if you need to do subsequent rounds of sampling where you want to exclude the previous sample without having to read in the previous samples.

To show how the function can be used I will first simulate a spatial data set and sample frame

library(butteR)
library(dplyr)
library(sf)
lon<-runif(min=88.00863,max=92.68031, n=1000)
lat<-runif(min=20.59061,max=26.63451, n=1000)
strata_options<-LETTERS[1:8]

#simulate datasets
pt_data<-data.frame(lon=lon, lat=lat, strata=sample(strata_options,1000, replace=TRUE))
sample_frame<-data.frame(strata=strata_options,sample_size=round(runif(10,100,n=8),0))

Here are the first six rows of data for the sample frame and data set

pt_data %>% head() %>% knitr::kable()
lon lat strata
88.21260 25.12462 F
88.38863 22.14892 G
91.14093 23.49458 H
89.72288 22.25252 G
89.90292 22.34828 D
89.84144 23.05792 B
sample_frame %>% head() %>% knitr::kable()
strata sample_size
A 63
B 58
C 28
D 85
E 60
F 45

Next we will run the stratified_sampler function using the two simulated data sets as input.

You can check the function help file by typing ?stratified_sampler. There are quite a few parameters to set particularly if you want to write out the kml file. Therefore, it is important to read the functions documentation (it will be worth it).

sampler_ouput<-butteR::stratified_sampler(sample.target.frame = sample_frame, 
                           sample.target.frame.strata = "strata",
                           sample.target.frame.samp.size = "sample_size",pt.data =pt_data,
                           pt.data.strata = "strata",pt.data.labels = "strata" ,write_kml = FALSE 
                            )

The output is stored in a list of data frames. Each data frame consists of the sample for one strata. Below I have printed the table of the first 6 results for strata A,B, and C in our example.

sampler_ouput$results[1:3] %>% purrr:::map(head) %>% knitr::kable()
Description rnd_seed uuid
1_A 643039 17
2_A 643039 23
3_A 643039 58
4_A 643039 69
5_A 643039 77
6_A 643039 90
Description rnd_seed uuid
1_B 643039 28
2_B 643039 43
3_B 643039 49
4_B 643039 84
5_B 643039 88
6_B 643039 116
Description rnd_seed uuid
1_C 643039 12
2_C 643039 37
3_C 643039 48
4_C 643039 81
5_C 643039 107
6_C 643039 122
sampler_ouput$results$D %>% head()
#>   Description rnd_seed uuid
#> 1         1_D   643039   10
#> 2         2_D   643039   29
#> 3         3_D   643039   33
#> 4         4_D   643039   38
#> 5         5_D   643039   68
#> 6         6_D   643039   71

The random_seed is saved in the list as well as an attribute of each stratified sample. The random seed is very important to be able to reproduce you work. This is particularly useful when you need to perform additional rounds (sometimes unexpected) of sampling for an assessment.

sampler_ouput$random_seed 
#> [1] 643039

The output of the stratified sampler object also stores the remaining sample as a separate data frame. It is often a good idea to write these to a shapefile or csv as back up, especially if you are not 100 % sure how to use the random seeds to reproduce your sampling.

sampler_ouput$samp_remaining %>% head() %>% knitr::kable()
lon lat strata uuid rnd_seed
2 88.38863 22.14892 G 2 643039
4 89.72288 22.25252 G 4 643039
5 89.90292 22.34828 D 5 643039
6 89.84144 23.05792 B 6 643039
8 90.72916 21.23794 G 8 643039
9 92.38461 22.17366 A 9 643039

Example using the check_distance_from_target function

First I will generate 2 fake point data sets. The sf package is great!

library(sf)

set.seed(799)
lon1<-runif(min=88.00863,max=92.68031, n=1000)
lat1<-runif(min=20.59061,max=26.63451, n=1000)
lon2<-runif(min=88.00863,max=92.68031, n=1000)
lat2<-runif(min=20.59061,max=26.63451, n=1000)
strata_options<-LETTERS[1:8]

#make a simulated dataset
pt_data1<-data.frame(lon=lon1, lat=lat1, strata=sample(strata_options,1000, replace=TRUE))
pt_data2<-data.frame(lon=lon2, lat=lat2, strata=sample(strata_options,1000, replace=TRUE))

# convert to simple feature object
coords<- c("lon", "lat")
pt_sf1<- sf::st_as_sf(x = pt_data1, coords=coords, crs=4326)
pt_sf2<- sf::st_as_sf(x = pt_data2, coords=coords, crs=4326)

Next I will show two spatial verification functions. The first one just finds the closest distance between points. It uses rTree spatial indexing so it will work quickly on fairly large data sets.

closest_pts<- butteR::closest_distance_rtree(pt_sf1, pt_sf2)
#> Warning in rtree::knn.RTree(rTree = sf2_tree, st_coordinates(sf1)[,
#> c("X", : k was cast to integer, this may lead to unexpected results.

closest_pts %>% head() %>% knitr::kable()
strata geometry strata.1 geometry.1 dist_m
755 C c(88.5246591396806, 26.0766159565661) H c(88.542828683707, 25.8766529368377) 22228.020
798 C c(91.3460825806255, 22.3494960887145) F c(91.3754625593381, 22.3643193468922) 3442.702
464 C c(91.6884048353551, 26.0950136747809) B c(91.6959527733822, 26.0490176807472) 5151.514
902 B c(88.782772209299, 22.2289078448025) C c(88.812609722456, 22.2312796777867) 3087.283
199 B c(91.9385484030803, 22.9929798167442) A c(92.0439420932042, 22.9314622797974) 12776.161
419 D c(88.6396377435045, 22.2862520419468) C c(88.7253538271838, 22.3836231110146) 13936.767

You could easily just filter the “closest_pts” output by a distance threshold of your choice. However to make it simpler I have wrapped this function in the function “check_distances_from_target” (I need to come up with a better name for this function). It will return all of the points in from “data set”that are further than the set threshold from any point in the “target_points”. It will also show you the distance to the closest target point. Obviously this is fake data so there are a ton of points returned (I will just display the first 6 rows). In your assessment data there should obviously be much less.

set.seed(799)
pts_further_than_50m_threshold_from_target<-
  butteR::check_distances_from_target(dataset = pt_sf1,target_points =pt_sf2,dataset_coordinates = coords,
                                      cols_to_report = "strata", distance_threshold = 50)
#> Warning in rtree::knn.RTree(rTree = sf2_tree, st_coordinates(sf1)[,
#> c("X", : k was cast to integer, this may lead to unexpected results.


pts_further_than_50m_threshold_from_target %>% head() %>% knitr::kable()
strata dist_m
C 22228.020
C 3442.702
C 5151.514
B 3087.283
B 12776.161
D 13936.767

About

Smoothes out analysis work flow especially when using mobile data collection systems (ODK/XLSForm/Kobo)

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%