(Currently in development)
This pipeline measures the performace of different methods in finding trans-eQTLs on the real data, as well as their performance on null data (i.e. genotype with shuffled donors). The following methods will be included:
- MatrixEQTL
- GNetLMM (see Installation instructions)
- CPMA (The authors did not provide updated software, the pipeline uses JPA scores from TEJAAS)
- TEJAAS
We also want to compare:
- effect of different pre-filtering methods
- kNN
- effect of sparsity in TEJAAS
And, finally we plot everything together:
- Plot
We use the gene expression of two different tissues within the same population. We find trans-eQTLs using different methods, and then compare the methods using precision and recall, assuming that the tissue-consistent trans-eQTLs (those which are found in both tissues) are true positives while everything else is false positive.
The pipeline expects the following input files:
- Genotype (in gzipped dosage format)
- Expression (tab-separated text file: genes in rows, samples in columns. Header row with sample-ids, first column with gene names. In the header row, the first column is named
gene_id
). - Sample (a dummy sample file in Oxford format)
- GENCODE file
- gene position file (for MatrixEQTL)
- MAF file from 1000Genomes
- Python >3.6 (numpy, mpmath)
- TEJAAS
- LDSTORE
- GNeTLMM
- R v3.4.1 (MatrixEQTL)
- Python >3.6
- VCFtools (v0.1.15)
- htslib (v1.4.1 -- for
tabix
andbgzip
)
- Within
bsubfiles
folder, change the job submission criteria and module loadings as per your requirements (GWDG users, skip this step) - Modify
main/utils/submit_job
to your own job scheduling mechanism (bsub
users, skip this step) - Update the path of external programs
main/EXTERNAL
- Update the path of your datasets in
main/DATA
. - Create a
CONFIG
file (see example inconfigs/CONFIG
). - Run the pipeline from within
main
directory.
cd main
./01_validation_pipeline.sh configs/CONFIG
./02_process_chunks.sh configs/CONFIG