Example data for the CCDH project

This repository is intended to act as a store of example data files from across the NCI Cancer Research Data Commons nodes in a number of formats. Each directory represents a single dataset downloaded from a node, and contains a Jupyter Notebook documenting how they were downloaded. CCDH will use this example data to build and test the CRDC-H data model.

GDC Head and Mouth Dataset and conversion to CRDC-H

Our first example is based on a dataset of 560 cases that we downloaded from the GDC Public API. In a Jupyter Notebook, we describe how we can load this data into Python Data Classes and then export it as YAML, JSON-LD or Turtle. This is not yet intended to be a comprehensive transform of all the retrieved GDC case, but to showcase the features made available through the Python Data Classes that are part of the artifacts generated from the CRDC model. The JSON-LD and Turtle exports of the data are also available.

This example is based on CRDC-H model v1.0-pre1 of the CCDH model, which is included in this repository. We will continue to update this as the model develops, but may be out of sync with the latest version of the model until we have the time to update it.

Using Jupyter Notebooks

Many of the processes in this repository are documented in Jupyter Notebook format files, which have an .ipynb extension. These files can be viewed directly in GitHub (see CDA example for subject 09CO022 as an example). You can also run it in the Jupyter Notebook viewer (see CDA example for subject 09CO022 as an example).

If you would like to execute this file, you will need to install Jupyter Notebook (also available on Homebrew for Mac). You can then download the .ipynb file and open it in Jupyter Notebook on your computer by running:

$ jupyter notebook cptac2-subject-09CO022/CDA\ example\ for\ subject\ 09CO022.ipynb

This repository uses Poetry for dependency management. You can therefore also install Poetry, then run:

$ poetry install
$ poetry run jupyter notebook

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
.github/workflows		.github/workflows
ccdh-pilot		ccdh-pilot
cptac2-subject-09CO022		cptac2-subject-09CO022
head-and-mouth		head-and-mouth
.gitignore		.gitignore
GDC to CCDH conversion.ipynb		GDC to CCDH conversion.ipynb
GDC to CCDH conversion.md		GDC to CCDH conversion.md
LICENSE		LICENSE
README.md		README.md
pdc_to_crdch_transformation.ipynb		pdc_to_crdch_transformation.ipynb
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Example data for the CCDH project

GDC Head and Mouth Dataset and conversion to CRDC-H

Using Jupyter Notebooks

About

Releases

Packages

Contributors 5

Languages

License

cancerDHC/example-data

Folders and files

Latest commit

History

Repository files navigation

Example data for the CCDH project

GDC Head and Mouth Dataset and conversion to CRDC-H

Using Jupyter Notebooks

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages