Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Synonymous mutations #1535

Open
claudia-as opened this issue Oct 16, 2024 · 1 comment
Open

Synonymous mutations #1535

claudia-as opened this issue Oct 16, 2024 · 1 comment

Comments

@claudia-as
Copy link

Hello!

I am working with the csv reports that can be exported from Nextclade, and I am wondering if there is any way to know when there has been a synonymous amino acid mutation, and if so, which mutation it has been.

Example:
substitutions aasubstitutions
A22296G,C22414T | S:H245R

In this case we know that one nucleotide mutation corresponds to the amino acid mutation S:H245R, but the other nucleotide mutation corresponds to a synonymous amino acid mutation and we do not know which one it was.

Thanks in advance!

@ivan-aksamentov
Copy link
Member

ivan-aksamentov commented Oct 16, 2024

Hi @claudia-as,

This info is not available in the CSV/TSV tables.

We do try to associate nucleotide and aminoacid mutations by position, and we write this information to JSON/NDJSON output files in aaChangesGroups - this is the data used for displaying the "context" and the "mutations nearby" when hovering aminoacid mutations in Nextclade Web. This way, by looking at JSON, you could deduce which nuc muts happen to be nearby the triplet for the particular aa mut.

Alternatively, you could try and calculate it yourself, having lists of mutations, alignment and translations from Nextclade output.

Important notes:

  • in real sequencing data it's not always easy to establish which nucleotide substitution caused which change in the triplet - multiple mutation events can occur in different order and there could be deletions, insertions and data loss due to sequencing defects, so we currently trying to carefully formulate it as "mutations nearby", rather than assert any particular causality - it's up to the user to interpret this data and make a conclusion

  • positions in JSON/NDJSON files are 0-based (the convention that is used in most programming languages), while CSV/TSV contain 1-based position (the convention that is used in most of the bioinformatics research), e.g. substitution A123C from CSV will look like { "ref": "A", "pos": 122, "qry": "C" } in JSON - depending on which base you are working in you might need to add or subtract 1.

  • JSON/NDJSON formats are snapshots of the internal state of Nextclade program and although it contains a lot more data compared to tabular files, sadly we cannot ensure stability of the format - it might change between versions of Nextclade. Feel free to explore it but be aware of this limitation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants