The aim of the project is to use Bayesian inference on a graphical model to estimate the propensity to gain or loose DNA methylation. This repository contains the data analysis pipeline (in snakemake) to get the data for the model based on (possibly multiplexed) bisulfite sequencing data.
- conda (miniconda3 or anaconda3)
-
To make a local copy ready to run, do the following steps only once:
git clone [email protected]:okartal/population_epigenetics.git cd population_epigenetics conda-env create -f requirements.yaml
-
To run the pipeline on test data:
- Download the test data from here and move the content into
test/data/primary/
. - Activate the pop-epi conda environment:
conda activate pop-epi
- Run the test:
cd test/ snakemake -pj --use-conda --snakefile ../code/Snakefile
- Download the test data from here and move the content into
-
To run the pipeline on real data:
- Set up your project folder with the following folders and files (structure of data folder is similar to test/, you can use symbolic links in data/primary):
. ├── data ├── primary ├── multiplex_samples.csv ├── multiplex_units.csv ├── <genome data> ├── results ├── config.yaml
- Adapt the CSVs and the config file params and threads directives. Set up additional results folders with their own config.yaml to store runs with different configurations. However, do not change the data directive in the config file, that would break the workflow!
- Ensure that the pop-epi environment is active.
- Run the pipeline from the appropriate results folder:
cd <path/to/your/project/folder>/data/results_XY/ snakemake -pj --use-conda --snakefile <path/to/Snakefile/from/this/repo>
- Set up your project folder with the following folders and files (structure of data folder is similar to test/, you can use symbolic links in data/primary):