Bayesian approach to infer the fetal fraction, the fetal genotype and the fetal allele origin using sequencing data from the mother, the father and cfDNA in the maternal plasma [1].
[1] C. Liautard-Haag, G. Durif, C. Vangoethem, D. Baux, A. Louis, et al.. Noninvasive prenatal diagnosis of genetic diseases induced by triplet repeat expansion by linked read haplotyping and Bayesian approach. Scientific Reports, 2022, 12 (1), pp.11423. ⟨10.1038/s41598-022-15307-2⟩. ⟨hal-03716132⟩.
- Ghislain DURIF [a]
- Cathy LIAUTARD HAAG [b]
- Marie-Claire VINCENT [b,c]
[a] IMAG, Université de Montpellier, CNRS, Montpellier, France;
[b] Laboratoire de Génétique Moléculaire, Institut Universitaire de Recherche Clinique, Université de Montpellier, CHU Montpellier, Montpellier, France;
[c] PhyMedExp Univ. Montpellier, CNRS, INSERM, Montpellier, France;
The pipeline is available as a Python package called prediag
(see the src
directory), which provides a command line interface (CLI) tools. See below for instructions about installation and usage.
The prediag
package is released under the GPL-v3 license. See the attached files LICENSE.txt
and COPYING.txt
for full license details.
External software:
- Python 3+
- git (optional)
Note: To avoid messing with your system, we recommend to install the
prediag
Python package inside a Python virtual environment or inside a Conda environment (c.f. below for more details).
You can install the prediag
package from the remote repository using pip
with:
pip install git+https://github.com/gdurif/nipd.git#subdirectory=src
- Get the package sources
git clone https://github.com/gdurif/nipd
- Installation
# go to source directory
cd nipd/src
# install
pip install -e .
Before using the package (which provides command line tools), if relevant, you should activate your Python or Conda environment (depending on which you are using, c.f. below).
You can find bash script examples to use the prediag
command line interface in ths dedicated script/cli
folder.
Some arguments/options are optional, c.f. help (available with the -h
options).
- Command line tool help
prediag_fetal_fraction -h
- Using command line tool for fetal fraction estimation
prediag_fetal_fraction \
-cfdna_vcf cfnda_vcf_file \
-mat_vcf mother_vcf_file \
-pat_vcf father_vcf_file \
--output ff_output_file.csv \
--min_coverage 50 \
--min_rel_depth 0.02 \
--min_abs_depth 2 \
-v
- Command line tool help
prediag_genotyping -h
- Using command line tool for fetal genotype inference on a given region (being
chr4:3000000:3200000
):
prediag_fetal_genotyping \
-cfdna_vcf cfnda_vcf_file \
-mat_vcf mother_vcf_file \
-pat_vcf father_vcf_file \
--output gt_output_file.csv \
--region "chr4:3000000:3200000" \
--min_coverage 50 \
--min_rel_depth 0.02 \
--min_abs_depth 2 \
--fetal_fraction_file ff_output_file.csv \
--ff_smoothing_window 50e3 \
-v
Note: you can provide the output of fetal fraction estimation (potentially on a wider set of loci) with the option
--fetal_fraction_file
.
- Command line tool help
prediag_fetal_allele_origin_bayesian -h
- Using command line tool for fetal allele origin inference on a given region (being
chr4:3000000:3200000
):
prediag_fetal_allele_origin_bayesian \
-cfdna_vcf cfnda_vcf_file \
-mat_vcf mother_vcf_file \
-pat_vcf father_vcf_file \
--output allele_origin_output_file.csv \
--region "chr4:3000000:3200000" \
--min_coverage 50 \
--min_rel_depth 0.02 \
--min_abs_depth 2 \
--ff_smoothing_window 50e3 \
--fetal_fraction_file ff_output_file.csv \
--recombination_rate 1e-8 \
--max_genetic_dist 1e-2 \
--ncore 0 \
--nsample 5000 \
--nburn 2000 \
--sampling_lag 50 \
--both_parent_phased \
-v
Note: you can provide the output of fetal fraction estimation (potentially on a wider set of loci) with the option
--fetal_fraction_file
.
You can find Python script examples to use the prediag
Python package in a dedicated script/python
folder.
- allow to provide fetal fraction estimation computed on wider region to fetal genotyping and fetal allele origin inference
- filter loci by minimum coverage
- correct cfDNA genotype by checking that allelic depth is higher than an absolute threshold, and higher than a percentage of the coverage
- window-based estimation of fetal fraction with a weighted average of locus-based fetal fraction estimation with weights proportional to coverage
- account for chromosome break in fetal allele origin inference
To avoid messing with your system, we recommend that you install the prediag
Python package inside a Python virtual environment or inside a Conda environment.
- Create Python virtual environment
# edit the path
python -m venv /path/to/.pyenv
- Activate the Python virtual environment (should be done before every (re)installation/use of the package):
# edit the path
source /path/to/.pyenv/bin/activate
- You should install Anaconda or Miniconda Python distribution. To check if it is available, run:
conda list
- Create a Conda environment:
# replace `<name_of_the_environment>` by a name of your choice
conda create -n <name_of_the_environment>
- Activate the Conda environment (should be done before every (re)installAtion/use of the package):
# replace `<name_of_the_environment>` by the name you chose
conda activate <name_of_the_environment>