covsonar 2 runtime (and memory usage?) #98

matthuska · 2023-08-31T19:21:54Z

It would be great if covsonar 2 was faster than covsonar 1, but we don't expect that to be the case because covsonar 2 is much more flexible than 1. Nevertheless, covsonar 2 has to be fast enough to be useful for us.

The following commands have to run in a reasonable amount of time* (and with a reasonable amount of memory?):

extract all metadata and mutation profiles from a large database (~16M sequences in GISAID global 2023-08-31)
add a large number of new sequences and metadata to a new database
add a large number of sequences and metadata to a database, most of which are already present in the database
extract (or count) sequences that match a given genomic profile with a set of mutations
extract (or count) sequences that match a given lineage and all sublineages
delete a small number of sequences from a large database

where reasonable is defined as < 1.5x the runtime of covsonar 1, or in a fixed amount of time that is deemed reasonable

matthuska · 2023-09-04T08:44:35Z

In case it's useful in the future, I profiled the addition of 10 sequences to the current covsonar2 version using pyinstrument. Nothing to do here, just wanted to keep it somewhere in case we need to optimize this process at some point. It looks like alignment takes ~25 seconds out of 42 seconds total, with the remaining time split equally between cigar_parse and lift_vars:

Program: sonar import --threads 1 --db output/covsonar2.db --fasta seqs-10.fasta --no-progress

41.884 <module>  sonar:2
├─ 40.941 main  covsonar/sonar.py:1100
│  └─ 40.934 execute_commands  covsonar/sonar.py:1058
│     └─ 40.929 handle_import  covsonar/sonar.py:718
│        └─ 40.929 import_data  covsonar/utils.py:549
│           └─ 40.914 _import_fasta  covsonar/utils.py:748
│              └─ 40.693 sonarAligner.process_cached_sample  covsonar/align.py:260
│                 ├─ 25.939 sonarAligner.align  covsonar/align.py:56
│                 │  └─ 25.872 sg_trace_striped_32  parasail/bindings_v2.py:3429
│                 ├─ 7.267 <listcomp>  covsonar/align.py:303
│                 │  └─ 7.265 sonarAligner.lift_vars  covsonar/align.py:403
│                 │     └─ 7.119 sonarAligner.update_nuc_positions  covsonar/align.py:343
│                 │        ├─ 4.205 Series.between  pandas/core/series.py:5411
│                 │        │     [14 frames hidden]  pandas
│                 │        ├─ 2.400 _LocIndexer.__setitem__  pandas/core/indexing.py:831
│                 │        │     [10 frames hidden]  pandas
│                 │        └─ 0.438 DataFrame.__getitem__  pandas/core/frame.py:3713
│                 ├─ 6.027 sonarAligner.parse_cigar  covsonar/align.py:83
│                 │  └─ 6.013 handle_deletion  covsonar/align.py:176
│                 │     └─ 6.013 is_frameshift_del  covsonar/align.py:119
│                 │        └─ 5.835 DataFrame.groupby  pandas/core/frame.py:8130
│                 │              [29 frames hidden]  pandas
│                 └─ 1.442 Result.__del__  parasail/bindings_v2.py:273
└─ 0.911 <module>  covsonar/sonar.py:5
   └─ 0.497 <module>  covsonar/cache.py:5
      └─ 0.441 <module>  pandas/__init__.py:1

matthuska · 2023-10-31T08:38:11Z

Closed because we do not plan to continue covsonar 2 development.

In summary the performance was much worse than covsonar 1, and some work was put into improving that situation (see #110) but was abandoned to switch to a different solution using PostgreSQL.

matthuska added this to the covsonar 2.0.0 milestone Aug 31, 2023

matthuska closed this as not planned Won't fix, can't repro, duplicate, stale Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

covsonar 2 runtime (and memory usage?) #98

covsonar 2 runtime (and memory usage?) #98

matthuska commented Aug 31, 2023

matthuska commented Sep 4, 2023

matthuska commented Oct 31, 2023

covsonar 2 runtime (and memory usage?) #98

covsonar 2 runtime (and memory usage?) #98

Comments

matthuska commented Aug 31, 2023

matthuska commented Sep 4, 2023

matthuska commented Oct 31, 2023