Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PGGB takes more than 96 hours of walltime on HPC #407

Open
kiratalreja3 opened this issue Sep 2, 2024 · 1 comment
Open

PGGB takes more than 96 hours of walltime on HPC #407

kiratalreja3 opened this issue Sep 2, 2024 · 1 comment

Comments

@kiratalreja3
Copy link

kiratalreja3 commented Sep 2, 2024

Hi team,

I am trying to replicate the HPRC year1v2 PGGB steps stated here : https://github.com/pangenome/HPRCyear1v2genbank

I am using all of HPRC assemblies, 20 haplotypes from my data and CHM13+GrCh38 references - which brings the dataset to a total of 116 assemblies.

I followed the steps to divide the dataset into chromosome-specific fasta files (partition), making a combined file for sex chromosomes and acrocentric chromosomes as mentioned in the link above.

Quoting the draft human pangenome paper methods :
“We then applied PGGB (v.0.2.0+531f85f) to each partition to build a chromosome-specific graph. Run in parallel over 6 PowerEdge R6515 AMD EPYC 7402P 24-core nodes with 384 GB of RAM, this process requires 22.49 system days, or around 3.7 days wallclock.”

However, in my case, the acrocentric chromosome community is exceeding 96 hours of walltime (limit of my shared HPC), even when given significantly more resources - a full node with 48 cores & 1440GB RAM.

This is the PGGB command I am launching:
pggb -I chrAcrocentric.fasta.gz -o "${PBS_JOBFS}/chrAcrocentric.pggb.out" -n 116 -p 98 -s 100000 -k 331 -O 0.03 -m -A -S -V chm13,grch38 -t 48 -T 48

Am I doing something wrong here? It clearly should not exceed 96 hours with the resources given.
Would masking out the ribosomal DNA of these chromosomes and assembling it as a separate graph would be a way to go?

@AndreaGuarracino
Copy link
Member

Hi @kiratalreja3, are you using the latest versions of PGGB? A lot has changed in the last year or so, including the sensitivity of the mapping phase in WFMASH. Higher sensitivity can lead to graphs that represent more variation, which is harder to handle, especially with acrocentric chromosomes. Which PGGB step is taking up the majority of your runtime?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants