butteR

butteR can be used to smooth out the analysis and visualization of spatial survey data collected using mobile data collection systems (ODK/XLSform). ButteR mainly consists of convenient wrappers and pipelines for the survey, srvyr, sf, and rtree packages.

Installation

You can install the the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("zackarno/butteR")
## Example

Example using the stratified sampler function

The stratified sampler function can be useful if you want to generate random samples from spatial point data. It has been most useful for me when I have shelter footprint data that I want to sample. For now, the function only reads in point data. Therefore, if the footprint data you have is polygons it should first be converted to points (centroids).

I believe the most useful/powerful aspect of this function is the ability to write out well labelled kml/kmz files that can be loaded onto phone and opened with maps.me or other applications. To use this function properly it is important that you first familiarize yourself with some of the theory that underlies random sampling and that you learn how “seeds” can be used/set in R to make random sampling reproducible. The function generates random seeds and stores it as a an attribute field of the spatial sample. There is also the option to write the seed to the working directory as text file. Understanding how to use the seeds becomes important if you want to reproduce your results, or if you need to do subsequent rounds of sampling where you want to exclude the previous sample without having to read in the previous samples.

To show how the function can be used I will first simulate a spatial data set and sample frame

library(butteR)
library(dplyr)
library(sf)
lon<-runif(min=88.00863,max=92.68031, n=1000)
lat<-runif(min=20.59061,max=26.63451, n=1000)
strata_options<-LETTERS[1:8]

#simulate datasets
pt_data<-data.frame(lon=lon, lat=lat, strata=sample(strata_options,1000, replace=TRUE))
sample_frame<-data.frame(strata=strata_options,sample_size=round(runif(10,100,n=8),0))

Here are the first six rows of data for the sample frame and data set

pt_data %>% head() %>% knitr::kable()

lon	lat	strata
88.21260	25.12462	F
88.38863	22.14892	G
91.14093	23.49458	H
89.72288	22.25252	G
89.90292	22.34828	D
89.84144	23.05792	B

sample_frame %>% head() %>% knitr::kable()

strata	sample_size
A	63
B	58
C	28
D	85
E	60
F	45

Next we will run the stratified_sampler function using the two simulated data sets as input.

You can check the function help file by typing ?stratified_sampler. There are quite a few parameters to set particularly if you want to write out the kml file. Therefore, it is important to read the functions documentation (it will be worth it).

sampler_ouput<-butteR::stratified_sampler(sample.target.frame = sample_frame, 
                           sample.target.frame.strata = "strata",
                           sample.target.frame.samp.size = "sample_size",pt.data =pt_data,
                           pt.data.strata = "strata",pt.data.labels = "strata" ,write_kml = FALSE 
                            )

The output is stored in a list of data frames. Each data frame consists of the sample for one strata. Below I have printed the table of the first 6 results for strata A,B, and C in our example.

sampler_ouput$results[1:3] %>% purrr:::map(head) %>% knitr::kable()

Description	rnd_seed	uuid
1_A	643039	17
2_A	643039	23
3_A	643039	58
4_A	643039	69
5_A	643039	77
6_A	643039	90

Description	rnd_seed	uuid
1_B	643039	28
2_B	643039	43
3_B	643039	49
4_B	643039	84
5_B	643039	88
6_B	643039	116

Description	rnd_seed	uuid
1_C	643039	12
2_C	643039	37
3_C	643039	48
4_C	643039	81
5_C	643039	107
6_C	643039	122

sampler_ouput$results$D %>% head()
#>   Description rnd_seed uuid
#> 1         1_D   643039   10
#> 2         2_D   643039   29
#> 3         3_D   643039   33
#> 4         4_D   643039   38
#> 5         5_D   643039   68
#> 6         6_D   643039   71

The random_seed is saved in the list as well as an attribute of each stratified sample. The random seed is very important to be able to reproduce you work. This is particularly useful when you need to perform additional rounds (sometimes unexpected) of sampling for an assessment.

sampler_ouput$random_seed 
#> [1] 643039

The output of the stratified sampler object also stores the remaining sample as a separate data frame. It is often a good idea to write these to a shapefile or csv as back up, especially if you are not 100 % sure how to use the random seeds to reproduce your sampling.

sampler_ouput$samp_remaining %>% head() %>% knitr::kable()

	lon	lat	strata	uuid	rnd_seed
2	88.38863	22.14892	G	2	643039
4	89.72288	22.25252	G	4	643039
5	89.90292	22.34828	D	5	643039
6	89.84144	23.05792	B	6	643039
8	90.72916	21.23794	G	8	643039
9	92.38461	22.17366	A	9	643039

Example using the check_distance_from_target function

First I will generate 2 fake point data sets. The sf package is great!

library(sf)

set.seed(799)
lon1<-runif(min=88.00863,max=92.68031, n=1000)
lat1<-runif(min=20.59061,max=26.63451, n=1000)
lon2<-runif(min=88.00863,max=92.68031, n=1000)
lat2<-runif(min=20.59061,max=26.63451, n=1000)
strata_options<-LETTERS[1:8]

#make a simulated dataset
pt_data1<-data.frame(lon=lon1, lat=lat1, strata=sample(strata_options,1000, replace=TRUE))
pt_data2<-data.frame(lon=lon2, lat=lat2, strata=sample(strata_options,1000, replace=TRUE))

# convert to simple feature object
coords<- c("lon", "lat")
pt_sf1<- sf::st_as_sf(x = pt_data1, coords=coords, crs=4326)
pt_sf2<- sf::st_as_sf(x = pt_data2, coords=coords, crs=4326)

Next I will show two spatial verification functions. The first one just finds the closest distance between points. It uses rTree spatial indexing so it will work quickly on fairly large data sets.

closest_pts<- butteR::closest_distance_rtree(pt_sf1, pt_sf2)
#> Warning in rtree::knn.RTree(rTree = sf2_tree, st_coordinates(sf1)[,
#> c("X", : k was cast to integer, this may lead to unexpected results.

closest_pts %>% head() %>% knitr::kable()

	strata	geometry	strata.1	geometry.1	dist_m
755	C	c(88.5246591396806, 26.0766159565661)	H	c(88.542828683707, 25.8766529368377)	22228.020
798	C	c(91.3460825806255, 22.3494960887145)	F	c(91.3754625593381, 22.3643193468922)	3442.702
464	C	c(91.6884048353551, 26.0950136747809)	B	c(91.6959527733822, 26.0490176807472)	5151.514
902	B	c(88.782772209299, 22.2289078448025)	C	c(88.812609722456, 22.2312796777867)	3087.283
199	B	c(91.9385484030803, 22.9929798167442)	A	c(92.0439420932042, 22.9314622797974)	12776.161
419	D	c(88.6396377435045, 22.2862520419468)	C	c(88.7253538271838, 22.3836231110146)	13936.767

You could easily just filter the “closest_pts” output by a distance threshold of your choice. However to make it simpler I have wrapped this function in the function “check_distances_from_target” (I need to come up with a better name for this function). It will return all of the points in from “data set”that are further than the set threshold from any point in the “target_points”. It will also show you the distance to the closest target point. Obviously this is fake data so there are a ton of points returned (I will just display the first 6 rows). In your assessment data there should obviously be much less.

set.seed(799)
pts_further_than_50m_threshold_from_target<-
  butteR::check_distances_from_target(dataset = pt_sf1,target_points =pt_sf2,dataset_coordinates = coords,
                                      cols_to_report = "strata", distance_threshold = 50)
#> Warning in rtree::knn.RTree(rTree = sf2_tree, st_coordinates(sf1)[,
#> c("X", : k was cast to integer, this may lead to unexpected results.


pts_further_than_50m_threshold_from_target %>% head() %>% knitr::kable()

strata	dist_m
C	22228.020
C	3442.702
C	5151.514
B	3087.283
B	12776.161
D	13936.767

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
R		R
man		man
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
butteR.Rproj		butteR.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

butteR

Installation

Example using the stratified sampler function

Example using the check_distance_from_target function

About

Licenses found

Releases

Packages

Languages

License

Licenses found

twesigye10/butteR

Folders and files

Latest commit

History

Repository files navigation

butteR

Installation

Example using the stratified sampler function

Example using the check_distance_from_target function

About

Resources

License

Licenses found

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages