This repository contains datasets as well as R and Python packages for teaching statistics, data science, and related subjects. Our goal is that this package can be used from Python or R on a local machine or in the cloud with the same syntax. All files are cached and dataset-specific functionality can be defined in the optional documentation markdown file.
Installing the Released Version (no release yet)
install.packages("https://github.com/tensorchiefs/data/releases/download/testrelease2/edudat_0.1.tar.gz", repos = NULL, type = "source")
Using the R Package:
Load the edudat
package and datasets:
library(edudat)
df <- load_data("challenger.csv")
Showing the dataset and other functionality
show_data(df)
list_cache_files() #Lists all the cached files
Sourcing additional functions (currently only in R)
source_extra_code(df, verbose = TRUE)
plot(df) + ggtitle("Challenger dataset")
to_celcius(df$Temp)
Note that not all datasets have additional functions. They need to be defined in an accompanying qmd script in an code section named extra
, see data/challenger.qmd
Installation of the Python Package:
Install the edudat
package from PyPI:
pip install edudat
Using the Python Package: Load the CSV data in Python:
from edudat import load_data
df1 = load_data("challenger.csv")
The data sets can be described by quarto (qmd) files. These files contain additional information about the data set, such as a description, the source, the variables, and the data types. The qmd files are located in the data/
directory and are rendered into the docs branch. The rendered files can be found https://github.com/tensorchiefs/data/tree/main/docs.
In the cmd files, it is also possible to provide additional code for the data sets. Have a look at the challenger.qmd file for an example, where the R-Code plot_data
is defined as a named code chunk.
{r plot_data, echo=TRUE, eval=FALSE}
Please ensure that eval=FALSE
is set in the code chunk options if the code is not supposed to be executed in the automatic rendering.
data/
: Contains the CSV data.R/
: Contains the R packageedudat
.python/
: Contains the Python packageedudat
.docs/
: Contain documentation on the dataset
Installation of the R Package (as in githup main):
install.packages("devtools") #Install the `devtools` package if you haven't already:
#Install the `edudat` package directly from GitHub:
devtools::install_github("tensorchiefs/data/R/edudat")
Contributions are welcomed at a later stage, have a look at the contribution howto.