This is a pipeline for an association analysis of data from annotated VCF files in the 100K genome project. This repositoty was set-up for personal use and for the marker of my research project "Harnessing the power of the UK 100-thousand genome project to investigate the role of Fanconi anemia genes in heritable cancer predisposition".
The main steps include:
- Select genes (Biomart)
- Select relevant samples (LabKey)
- Variant QC and filtering
- Variant annotation
- VCF to R
- Exploring and additional filtering
- Population stratification (PCA)
- Consolidate and filter dataset
- Association analysis (per disease) (Logistic regression and SKAT)
The repository can be used by the public and it is adapted to the Genomics England research environment.