pythia-datasets

CI
Docs
Package
License

pythia-datasets

Data repository for with sample data for the Pythia Foundations book.

Sample data sets

These files are used as sample data in Pythia Foundations and are downloaded by pythia_datasets package:

NARR_19930313_0000.nc
enso_data.csv
jan-17-co-asos.txt.xz
CESM2_sst_data.nc
CESM2_grid_variables.nc
daymet_v4_precip_sept_2013.nc

Adding new datasets

The scope of this data collection is to serve examples for Pythia Foundations. If you are adding new content to Foundations that requires a new dataset file, please follow these steps:

Add the dataset file to the data/ directory
From the command line, run python make_registry.py script to update the registry file residing in pythia_datasets/registry.txt
Commit and push your changes to GitHub

Using datasets in notebooks and/or scripts

Ensure the pythia_datasets package is installed in your environment

python -m pip install pythia-datasets

# or

python -m pip install git+https://github.com/ProjectPythia/pythia-datasets

Import DATASETS and inspect the registry to find out which datasets are available

In [1]: from pythia_datasets import DATASETS

In [2]: DATASETS.registry_files
Out[2]: ['jan-17-co-asos.txt.xz', 'NARR_19930313_0000.nc']

To fetch a data file of interest, use the .fetch method and provide the filename of the data file. This will
- download and cache the file if it doesn't exist already.
- retrieve and return the local path
```
In [4]: filepath = DATASETS.fetch('jan-17-co-asos.txt.xz')

In [5]: filepath
Out[5]: '/Users/abanihi/Library/Caches/pythia-datasets/jan-17-co-asos.txt.xz'
```
Once you have access to the local filepath, you can then use it to load your dataset into pandas or xarray or your package of choice:
```
In [6]: df = pd.read_csv(filepath)
```

Changing the default data cache location

The default cache location (where the data are saved on your local system) is dependent on the operating system. You can use the locate() method to identify it:

from pythia_datasets import locate
locate()

The location can be overwritten by the PYTHIA_DATASETS_DIR environment variable to the desired destination.

Name		Name	Last commit message	Last commit date
Latest commit History 125 Commits
.github		.github
ci		ci
data		data
pythia_datasets		pythia_datasets
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierrc.toml		.prettierrc.toml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
codecov.yml		codecov.yml
make_registry.py		make_registry.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pythia-datasets

Sample data sets

Adding new datasets

Using datasets in notebooks and/or scripts

Changing the default data cache location

About

Releases 3

Packages

Contributors 7

Languages

License

ProjectPythia/pythia-datasets

Folders and files

Latest commit

History

Repository files navigation

pythia-datasets

Sample data sets

Adding new datasets

Using datasets in notebooks and/or scripts

Changing the default data cache location

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 7

Languages

Packages