Skip to content

tensorchiefs/data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tensorchiefs Data Collection and Tools

This repository contains datasets as well as R and Python packages for teaching statistics, data science, and related subjects. Our goal is that this package can be used from Python or R on a local machine or in the cloud with the same syntax. All files are cached and dataset-specific functionality can be defined in the optional documentation markdown file.

Installation and simple Usage

R

Installing the Released Version (no release yet)

install.packages("https://github.com/tensorchiefs/data/releases/download/testrelease2/edudat_0.1.tar.gz", repos = NULL, type = "source")

Using the R Package:

Load the edudat package and datasets:

library(edudat)
df <- load_data("challenger.csv")

Showing the dataset and other functionality

show_data(df)
list_cache_files() #Lists all the cached files

Sourcing additional functions (currently only in R)

source_extra_code(df, verbose = TRUE)
plot(df) + ggtitle("Challenger dataset")
to_celcius(df$Temp)

Note that not all datasets have additional functions. They need to be defined in an accompanying qmd script in an code section named extra, see data/challenger.qmd

Python

Installation of the Python Package: Install the edudat package from PyPI:

pip install edudat

Using the Python Package: Load the CSV data in Python:

from edudat import load_data
df1 = load_data("challenger.csv")

Additional information/functionality on data sets

The data sets can be described by quarto (qmd) files. These files contain additional information about the data set, such as a description, the source, the variables, and the data types. The qmd files are located in the data/ directory and are rendered into the docs branch. The rendered files can be found https://github.com/tensorchiefs/data/tree/main/docs.

In the cmd files, it is also possible to provide additional code for the data sets. Have a look at the challenger.qmd file for an example, where the R-Code plot_data is defined as a named code chunk.

{r plot_data, echo=TRUE, eval=FALSE}

Please ensure that eval=FALSE is set in the code chunk options if the code is not supposed to be executed in the automatic rendering.

Structure

  • data/: Contains the CSV data.
  • R/: Contains the R package edudat.
  • python/: Contains the Python package edudat.
  • docs/: Contain documentation on the dataset

Advanced Issue

R

Installation of the R Package (as in githup main):

install.packages("devtools") #Install the `devtools` package if you haven't already:
#Install the `edudat` package directly from GitHub:
devtools::install_github("tensorchiefs/data/R/edudat")

Contributing

Contributions are welcomed at a later stage, have a look at the contribution howto.