Skip to content

LexicMap v0.5.0

Latest
Compare
Choose a tag to compare
@shenwei356 shenwei356 released this 18 Dec 12:27
· 4 commits to main since this release

v0.5.0 - 2024-12-18

This version is compatible with indexes created by LexicMap v0.4.0, but rebuilding the index is recommended for more accurate results.

  • New commands:
    • lexicmap utils remerge: Rerun the merging step for an unfinished index.
  • lexicmap index:
    • Big genomes with thousands of contigs (big yet fragmented assemblies) are automatically split into multiple chunks, and alignments from these chunks will be merged.
    • Change the default value of --partitions from 1024 to 4096, which increases the seed-matching speed at the cost of 2 GiB more memory occupation.
      For existing lexicmap indexes, just run lexicmap utils reindex-seeds --partitions 4096 to re-create seed indexes.
    • Do not save seeds of low-complexity.
    • Fix high memory usage in writing seed data.
    • Change the default value of -c/--chunks from all available CPUs to the value of -j/--threads.
    • Change the default value of --max-open-files from 512 to 1024.
    • Add a new flag --debug.
  • lexicmap search:
    • Improving chaining, pseudoalignment, and alignment for highly repetitive sequences.
    • More accurate chaining score with better chaining of overlapped anchors, this produces more accurate results with -n/--top-n-genomes:
      • Merging two overlapped non-gapped anchors into a longer one.
      • For these with gaps, only the non-overlapped part of the second anchor is used to compute the weight.
      • Using the score of the best chain (rather than the sum) for sorting genomes when using -n.
    • Fix positions and alignment texts for queries with highly repetitive sequences in end regions. #9
    • Skip seeds of low-complexity.
    • Change the default value of --max-open-files from 512 to 1024.
    • Change the default value of --align-band from 50 to 100.
    • Improve the speed of anchor deduplication, genome information extraction, and result ordering.
    • Improve the speed of chaining for long queries.
    • Improve the speed of seed matching when using -w/--load-whole-seeds.
    • Improve the speed of alignment, and reduce the memory usage.
    • Remain compatible after the change of lexicmap index.
    • Add a new flag --debug.
  • lexicmap utils genomes:
    • Do not sort genome ids.
    • Add a header line and add another column to show if the reference genome is chunked.
  • lexicmap utils subseq:
    • Remain compatible after the change of lexicmap index.
  • lexicmap utils seed-pos:
    • Remain compatible after the change of lexicmap index, while histograms are plotted separately for multiple genome chunks.
  • lexicmap utils reindex-seeds:
    • Change the default value of --partitions from 1024 to 4096.