Bug: Update dropped strains file to list accession instead of strain #24

j23414 · 2024-02-07T18:37:31Z

Current Behavior

Currently, strains listed in phylogenetic/config/dropped_strains.txt are not being dropped since 8ab810f

Expected behavior

Strains listed in dropped_strains.txt are not in the final phylogenetic tree.

How to reproduce

Possible solution

Perhaps cherry pick a commit like:

67016d1

Your environment: if browsing Nextstrain online

Operating system:
Browser:

Your environment: if running Nextstrain locally

Operating system:
Browser:
Version (e.g. auspice 2.7.0):

Additional context

Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

victorlin · 2024-02-07T19:17:50Z

Good catch! If I understand correctly, phylogenetic/config/dropped_strains.txt is used for augur filter --exclude so it should be updated alongside 8ab810f. Looking more carefully at #12, the --sequences input should also be updated to match the new ID values but I don't see that it was changed. Does it still work?

j23414 · 2024-02-07T19:38:27Z

Correct, and yes the --sequences input still works :D

git clone https://github.com/nextstrain/dengue.git
cd dengue/phylogenetic
nextstrain build . data/sequences_all.fasta
grep  ">" data/sequences_all.fasta | head -n5

Which shows the sequences are ID'd by accession, not strain name:

>NC_075403
>NC_075435
>OQ919688
>ON123563
>ON123564

victorlin · 2024-02-07T19:52:46Z

Good to know! But how did it work before #12 if the sequences were ID'd by accession and augur filter was using strain as the ID column?

j23414 · 2024-02-07T19:59:17Z

augur filter was using strain as the ID column?

This worked when we were still pulling strains from fauna (which also took advantage of the deduplication of strain names of fauna).

However, we shifted to using the ingest folder and pulling data using ncbi datasets. After ingest was merged, the data.nextstrain.org/files were updated so we could have a smooth transition.

victorlin · 2024-02-14T02:14:45Z

Closed by #26

j23414 added the bug Something isn't working label Feb 7, 2024

j23414 mentioned this issue Feb 10, 2024

Fix: update dropped strains file to list accession instead of strain names #26

Merged

3 tasks

victorlin closed this as completed Feb 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Update dropped strains file to list accession instead of strain #24

Bug: Update dropped strains file to list accession instead of strain #24

j23414 commented Feb 7, 2024

victorlin commented Feb 7, 2024

j23414 commented Feb 7, 2024

victorlin commented Feb 7, 2024 •

edited

Loading

j23414 commented Feb 7, 2024

victorlin commented Feb 14, 2024

Bug: Update dropped strains file to list accession instead of strain #24

Bug: Update dropped strains file to list accession instead of strain #24

Comments

j23414 commented Feb 7, 2024

Current Behavior

Expected behavior

How to reproduce

Possible solution

Your environment: if browsing Nextstrain online

Your environment: if running Nextstrain locally

Additional context

victorlin commented Feb 7, 2024

j23414 commented Feb 7, 2024

victorlin commented Feb 7, 2024 • edited Loading

j23414 commented Feb 7, 2024

victorlin commented Feb 14, 2024

victorlin commented Feb 7, 2024 •

edited

Loading