You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using all of HPRC assemblies, 20 haplotypes from my data and CHM13+GrCh38 references - which brings the dataset to a total of 116 assemblies.
I followed the steps to divide the dataset into chromosome-specific fasta files (partition), making a combined file for sex chromosomes and acrocentric chromosomes as mentioned in the link above.
Quoting the draft human pangenome paper methods :
“We then applied PGGB (v.0.2.0+531f85f) to each partition to build a chromosome-specific graph. Run in parallel over 6 PowerEdge R6515 AMD EPYC 7402P 24-core nodes with 384 GB of RAM, this process requires 22.49 system days, or around 3.7 days wallclock.”
However, in my case, the acrocentric chromosome community is exceeding 96 hours of walltime (limit of my shared HPC), even when given significantly more resources - a full node with 48 cores & 1440GB RAM.
This is the PGGB command I am launching:
pggb -I chrAcrocentric.fasta.gz -o "${PBS_JOBFS}/chrAcrocentric.pggb.out" -n 116 -p 98 -s 100000 -k 331 -O 0.03 -m -A -S -V chm13,grch38 -t 48 -T 48
Am I doing something wrong here? It clearly should not exceed 96 hours with the resources given.
Would masking out the ribosomal DNA of these chromosomes and assembling it as a separate graph would be a way to go?
The text was updated successfully, but these errors were encountered:
Hi @kiratalreja3, are you using the latest versions of PGGB? A lot has changed in the last year or so, including the sensitivity of the mapping phase in WFMASH. Higher sensitivity can lead to graphs that represent more variation, which is harder to handle, especially with acrocentric chromosomes. Which PGGB step is taking up the majority of your runtime?
Hi team,
I am trying to replicate the HPRC year1v2 PGGB steps stated here : https://github.com/pangenome/HPRCyear1v2genbank
I am using all of HPRC assemblies, 20 haplotypes from my data and CHM13+GrCh38 references - which brings the dataset to a total of 116 assemblies.
I followed the steps to divide the dataset into chromosome-specific fasta files (partition), making a combined file for sex chromosomes and acrocentric chromosomes as mentioned in the link above.
Quoting the draft human pangenome paper methods :
“We then applied PGGB (v.0.2.0+531f85f) to each partition to build a chromosome-specific graph. Run in parallel over 6 PowerEdge R6515 AMD EPYC 7402P 24-core nodes with 384 GB of RAM, this process requires 22.49 system days, or around 3.7 days wallclock.”
However, in my case, the acrocentric chromosome community is exceeding 96 hours of walltime (limit of my shared HPC), even when given significantly more resources - a full node with 48 cores & 1440GB RAM.
This is the PGGB command I am launching:
pggb -I chrAcrocentric.fasta.gz -o "${PBS_JOBFS}/chrAcrocentric.pggb.out" -n 116 -p 98 -s 100000 -k 331 -O 0.03 -m -A -S -V chm13,grch38 -t 48 -T 48
Am I doing something wrong here? It clearly should not exceed 96 hours with the resources given.
Would masking out the ribosomal DNA of these chromosomes and assembling it as a separate graph would be a way to go?
The text was updated successfully, but these errors were encountered: