These scripts are intended to be used to add annotation to a MAF whether a given variant is a possible false positive. All take stdin
and can write to stdout
and are standalone with two exceptions, for which a fillout operation needs to be run. Filter flags are added to the FILTER
column, in a comma-separated manner. This filters almost exclusively operate on SNVs. Additionally, this repo contains a wrapper for running a VCF-based false-positive filter which populates the FILTER field of a VCF file, which can be retained if conversion to MAF is carried out with vcf2maf.
This script is a wrapper which will run any of the R based filters in this repository. The output MAF is annotated with headers to indicate which filter was used and which version of the repository.
Usage:
applyFilter.sh FILTER_NAME INPUT_MAF OUTPUT_MAF [Additional Parameters]
example:
applyFilter.sh filter_blacklist_regions.R \
Proj_1234_CMO_MAF.txt filteredMAF.txt
The first lines of the output MAF will look as follows:
#version 2.4
#ngs-filters/applyFilter.sh VERSION=v1.0.1-2-g4d3694b FILTER=filter_blacklist_regions.R
This script currently runs the following scripts in given order using applyFilter.sh:
- tag_hotspots
- filter_blacklist_region
- filter_dmp
- filter_normal_panel (if fillout maf for standard normal sample is given)
- filter_cohort_normal (if fillout maf for cohort normal sample is given)
Usage:
usage: run_ngs-filters.py [options]
This tool helps to tag hotspot events
optional arguments:
-h, --help show this help message and exit
-v, --verbose make lots of noise
-m SomeID.maf, --input-maf SomeID.maf
Input maf file which needs to be tagged
-o SomeID.maf, --output-maf SomeID.maf
Output maf file name
-outdir /somepath/output, --outDir /somepath/output
Full Path to the output dir.
-npmaf /somepath/to/normalpanel.maf, --normal-panel-maf /somepath/to/normalpanel.maf
Path to fillout maf file of panel of standard normals
-ncmaf /somepath/to/normalcohort.maf, --normal-cohort-maf /somepath/to/normalcohort.maf
Path to fillout maf file of cohort normals
-nsf /somepath/to/normalcohort.list, --normalSamplesFile /somepath/to/normalcohort.list
File with list of normal samples
-hsp hotspots.txt, --input-hotspot hotspots.txt
Input txt file which has hotspots
example:
python run_ngs-filters.py --verbose --input-maf data/sample_input.maf --output-maf sample_output.maf --normal-panel-maf data/sample_input_fill.maf --input-hotspot data/hotspot-list-union-v1-v2.txt
- Common variants
A variant is considered common if its minor allele frequency in ExAC exceeds 0.0004. This filter needs an
ExAC_AF
column which easiest is can be added to a MAF by running maf2maf, which now also annotates theFILTER
column. This hopefully will render this filter script obsolete. With the-f
flag this filter will annotate a maf with information from another MAF.
./filter_common_variants.R -m input.maf -o output.maf
- Low-confidence calls
A variant is considered a low-confidence call if it fulfills
n_alt_count > 1 | t_depth < 20 | t_alt_count <= 3
. Interpretation and use of this filter depends on the nature of the sequencing experiment.
./filter_low_conf.R -m input.maf -o output.maf
- Presence in study normals
Flags a variant if it is supported by 3 reads or more in any of the normals sequenced in the same study. The cut-off for supporting reads can be set with the
-n
flag. See instructions below for how to generate a fillout file.
./filter_cohort_normals.R -m input.maf -o output.maf -f study.fillout
- Presence in pool of normals Similarily to the previous filter, a variant is flagged by this filter if it is supported by 3 reads or more in at least 3 samples in a pool of normals. See instructions below for how to generate a fillout file.
./filter_normal_panel.R -m input.maf -o output.maf -f pon.fillout
- Presence in FFPE pool
Flags a variant if it is supported by 3 reads or more in a fillout against an FFPE pool. The cut-off for supporting reads can be set with the
-n
flag. See instructions below for how to generate a fillout file.
./filter_ffpe_pool.R -m input.maf -o output.maf -f ffpe.fillout
- FFPE artifact
Flags a variant if it looks like an FFPE artifact, i.e. occurs at low VAF and is a C>T substitution. This script also can help identifying samples suffering from FFPE artifacts by using the
-i
flag.
./filter_ffpe.R -m input.maf -o output.maf
### or
./filter_ffpe.R -m input.maf -i
- Low-mappability ("blacklisted") regions
Filter variants located in regions of to which sequencing reads are hard to map, as defined by ENCODE and RepeatMasker. See
data/source.txt
for details on the files used for this annotation.
./filter_blacklist_regions.R -m input.maf -o output.maf
- Hotspots ("whitelisted") sites Tag variants located in sites define as hotspots by hotspot-whitelist
./tag_hotspots.py -m input.maf -itxt hotspot-list-union-v1-v2.txt -o output.maf
- Add DMP Filter Tag
Flags a variant if it dose not pass allele count threshold set at DMP i.e for hotspot:AD>=8;DP>=20;VF=>0.02 & for non-hotspots D>=10;DP>=20;VF=>0.05 occurs at low VAF and is a C>T substitution. This script also can help identifying samples suffering from FFPE artifacts by using the
-i
flag.
./filter_dmp.R -m input.maf -o output.maf
This script wraps GetBaseCountsMultiSample
on luna and can be used to generate fillout files (i.e. allele counts for variants in input MAF) across a set of BAM files. The -n
flag can be used to run multithreaded. The genome of the MAF and the BAMs needs to be consistent and specified with the -g
flag, which knows where the assemblies for GRCh37, hg19, b37, and b37_dmp are located on luna. The script convert-maf-to-hg19.sh
can be used to fake an hg19 MAF.
./maf_fillout.py -m input.maf -b file1.bam file2.bam [..] -g genome -n threads -o output.fillout
This script wraps fpfilter.pl
from variant-filter. Filter parameters in fpfilter.pl
might be ajusted according to the nature of the sequencing experiment. Temporary files generated are removed upon completion. Like the fillout wrapper, this script knows where GRCh37, hg19, b37, and b37_dmp are located on luna.
./run-fpfilter.py -v input.vcf -b tumor.bam -g genome -f path/to/fpfilter.pl
Required R Packages:
$ Rscript install-packages.R
Required Python Libraries
$ pip install -r requirements.txt