You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need a workflow to do the following, starting with output from the DRAGEN tumor/normal analysis. Here is an example location with the files that will be used for this analysis:
-Filter all passing variants from *.hard-filtered.vcf.gz, *.sv.vcf.gz, and *.cnv.vcf.gz into a new combined VCF file. Note that in the CNV file, records should be gain or loss (not DRAGEN:REF) and they can have the 'lowModelConfidence' filter flag, but no others.
-Annotate the above file with VEP
-Generate a text file from both VCF files so we can open in excel, etc, see: /storage1/fs1/dspencer/Active/spencerlab/dhs/scripts/vep2txt.py
-Extract and count hits to transgene sequences, if provided as input. Here is a command line way to do this (running in the above directory):
We need a workflow to do the following, starting with output from the DRAGEN tumor/normal analysis. Here is an example location with the files that will be used for this analysis:
/storage1/fs1/dspencer/Active/spencerlab/dhs/projects/cs1cart_wgs/WSCS1CART_Validation1May2022
The workflow should do this:
-Filter all passing variants from *.hard-filtered.vcf.gz, *.sv.vcf.gz, and *.cnv.vcf.gz into a new combined VCF file. Note that in the CNV file, records should be gain or loss (not DRAGEN:REF) and they can have the 'lowModelConfidence' filter flag, but no others.
-Annotate the above file with VEP
-Generate a text file from both VCF files so we can open in excel, etc, see: /storage1/fs1/dspencer/Active/spencerlab/dhs/scripts/vep2txt.py
-Extract and count hits to transgene sequences, if provided as input. Here is a command line way to do this (running in the above directory):
samtools view -T -F 0x400 ../hg38cs1car.fa WSCS1CART_Validation1May2022.cram cs1car | awk -v WIN=$WINDOWSIZE '$7!="=" && $5>0 { print $7,sprintf("%d",$8/WIN)*WIN; }' | sort | uniq -c | awk '{ print $2,$3,$3+1,$1; }' | awk '{ print $1,$2,$3,"INS","+","sv"c++"-"$4; }' > carhits.svformat.bed
(where $WINDOWSIZE is the size in bp over which to collapse multiple hits into 1)
Then annotate the carhits.svformat.bed file with VEP:
/usr/bin/perl -I /opt/vep/lib/perl/VEP/Plugins /opt/vep/src/ensembl-vep/vep --plugin Downstream --fasta $HG38 --hgvs --symbol --term SO --flag_pick -i carhits.svformat.bed --offline --cache --max_af --dir /storage1/fs1/gtac-mgi/Active/CLE/reference/VEP_cache -o carhits.svformat.output.bed
After this analysis, the data should be used to make the figures/tables in the attached document:
CS1-Validation-1-analysis.docx
It would be good to make the read hit analysis optional in the worklow.
The text was updated successfully, but these errors were encountered: