-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
am I running this wrong? #183
Comments
Hello @shinlin77 , Can you provide a little more information about this?
Another way to compare is to take one sample, run PMDV and GATK and then compare before rounding up GVCFs, as this pipeline is modular, it should be easy to compare. You can also compare the VCFs using hap.py to derive a concordance set. We usually expect very high concordance if you have sufficient coverage. |
Here are my answers...
I don't think the aggregation of the gvcfs step is making that big of a difference. |
@shinlin77 , what happens if you use GATK GenotypeGVCFs with PMDV gvcfs? This behavior is inconsistent. |
GATK GenotypeGVCFs does not work on PMDV gvcfs. There are too many incompatibilities. For example PMDV gvcfs have <*> instead of <NON_REF> I changed those, which was easy enough, but then I ran into other incompatibilities that I couldn't figure out. |
hi @shinlin77 , One of the easier way to reduce variability between pipeline and compare the variant calls is to take these files from one sample (not combined):
And run: docker run -it -v /input:/input \
jmcdani20/hap.py:v0.3.12 /opt/hap.py/bin/hap.py \
/input/short_read.vcf.gz \
/input/pmdv.vcf.gz \
-r /input/human_reference.fasta \
-o /input/happy_output \
--pass-only \
--no-roc \
--no-json \
--engine=vcfeval \
--threads=$(nproc) The TP number will tell you how many variants are exactly the same (GT and allele-wise) between the two VCFs. You can do the same with clair3 vcf and compare the numbers. |
The numbers look very discordant here. I am also unsure what is happening. Is it possible for you to share the bams so I can run them to see if there's something we are missing? Only chr20 would do in this case I think as the numbers are so off. |
I have SR and LR on the same genomes. Previously, I used the following pipelines on the
SR: bowtie2 -> picard -> GATK
LR: minimap2 -> clair3 -> GATK (to round up the gvcfs)
and got reasonable concordance between the two, meaning most of the variants in one were in the other. Moreover, for variants called by both, the majority of the individual calls were the same, too.
Now, I was told PMD was a great caller for ONT data, so I used the following updated pipeline
LR updated: minimap2 -> PMD -> GLnexus (to round up the gvcfs)
however, the SR and LR updated results had almost no concordance. I feel maybe I'm not running PMD right. This is the commandline
singularity exec --bind /where_data_is/ --bind /local/scratch/ --bind /where_genome_is/ /server/apps/software/PEPPER_Margin_DeepVariant/0.8/docker/pepper_deepvariant run_pepper_margin_deepvariant call_variant -b LR_aligned.sorted.bam -f human_genome.fasta -o /output/PMD -p pepper_margin_deepvariant -t 16 --ont_r9_guppy5_sup --gvcf
Interestingly, the result has very few SNVs (they are almost all SVs). Do you have any suggestions (e.g. parameters to tweak)? Thanks.
The text was updated successfully, but these errors were encountered: