-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BGZip slow performance near end of chromosomes #153
Comments
Seems that the lookup time just scales with the distance from the "start" of a contig. I just quickly scanned the internals, can't say I fully understand, but it seems that this is due to the way bgzip is implemented in biopython: https://github.com/biopython/biopython/blob/master/Bio/bgzf.py#L699 It seems to read the whole part before the contig you need...? |
@Maarten-vd-Sande this is definitely not due to the Bio.bgzf implementation and is definitely due to my incomplete implementation of virtual offset calculations from the start of each contig. I started work to fully support using the Lines 766 to 776 in f878775
You can see that I was still trying to figure out how this works, and never was able to make an entire round-trip (read a |
@mdshw5 thanks for the reply, that makes sense! I guess I'll just load the whole fasta in memory for now 😄 |
It can take over a minute to retrieve a few bases:
Low coordinates are fine:
You said in a previous issue:
I can't find that issue, so am raising this one. Good luck!
The text was updated successfully, but these errors were encountered: