Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance issues with accessing bgzipped ancestral fastas #11

Open
wsdewitt opened this issue Jul 30, 2020 · 3 comments
Open

performance issues with accessing bgzipped ancestral fastas #11

wsdewitt opened this issue Jul 30, 2020 · 3 comments

Comments

@wsdewitt
Copy link
Collaborator

Accessing later regions of a fasta via a mutyper.Ancestor object (child class of pyfaidx.Fasta) is not performant, likely stemming from this issue in pyfaidx: mdshw5/pyfaidx#153.

This is particularly problematic for the mutyper targets subcommand, since it scans through all sites in a fasta record, or a sequence of bed regions.

The current workaround is to work with decompressed fasta data. A bgzipped fasta, e.g. named ancestor.fa.gz can be decompressed with bgzip -d ancestor.fa.gz to produce an uncompressed fasta ancestor.fa.

@wsdewitt
Copy link
Collaborator Author

This continues to cause problems, so suggest raising a warning with a link to this issue if a .gz file is supplied.

@ab08028
Copy link

ab08028 commented Mar 30, 2023

I ran into this issue using mutyper variants, good to know there's a workaround! Thanks to Luke for helping me troubleshoot!

@ab08028
Copy link

ab08028 commented Aug 17, 2023

Running into this again with a new dataset, and it's wild the difference in performance this makes. Ran the job with the bgzipped fasta for >2 days and only got 400MB through a vcf file, and now with the unzipped fasta am already at 1.5GB after an hour. Maybe consider throwing a warning or error if someone tries to input a compressed ancestral fasta? it's virtually unusable when it's that slow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants