performance issues with accessing bgzipped ancestral fastas #11

wsdewitt · 2020-07-30T02:03:02Z

Accessing later regions of a fasta via a mutyper.Ancestor object (child class of pyfaidx.Fasta) is not performant, likely stemming from this issue in pyfaidx: mdshw5/pyfaidx#153.

This is particularly problematic for the mutyper targets subcommand, since it scans through all sites in a fasta record, or a sequence of bed regions.

The current workaround is to work with decompressed fasta data. A bgzipped fasta, e.g. named ancestor.fa.gz can be decompressed with bgzip -d ancestor.fa.gz to produce an uncompressed fasta ancestor.fa.

The text was updated successfully, but these errors were encountered:

wsdewitt · 2021-08-25T22:45:51Z

This continues to cause problems, so suggest raising a warning with a link to this issue if a .gz file is supplied.

ab08028 · 2023-03-30T16:01:53Z

I ran into this issue using mutyper variants, good to know there's a workaround! Thanks to Luke for helping me troubleshoot!

ab08028 · 2023-08-17T18:40:26Z

Running into this again with a new dataset, and it's wild the difference in performance this makes. Ran the job with the bgzipped fasta for >2 days and only got 400MB through a vcf file, and now with the unzipped fasta am already at 1.5GB after an hour. Maybe consider throwing a warning or error if someone tries to input a compressed ancestral fasta? it's virtually unusable when it's that slow

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

performance issues with accessing bgzipped ancestral fastas #11

performance issues with accessing bgzipped ancestral fastas #11

wsdewitt commented Jul 30, 2020

wsdewitt commented Aug 25, 2021

ab08028 commented Mar 30, 2023

ab08028 commented Aug 17, 2023

performance issues with accessing bgzipped ancestral fastas #11

performance issues with accessing bgzipped ancestral fastas #11

Comments

wsdewitt commented Jul 30, 2020

wsdewitt commented Aug 25, 2021

ab08028 commented Mar 30, 2023

ab08028 commented Aug 17, 2023