Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output mash tree or equivalent? #50

Open
flashton2003 opened this issue May 12, 2020 · 4 comments
Open

Output mash tree or equivalent? #50

flashton2003 opened this issue May 12, 2020 · 4 comments
Labels
enhancement New feature or request

Comments

@flashton2003
Copy link

Hi,

It's nice that mob-suite tells me the plasmid in the reference database which is closest to each plasmid it identifies in my sample, but it would be good if it output a tree of mash distances so I can see how it relates to multiple plasmids in the database. Possibly within some mash distance threshold so that I don't end up with a tree with 12000 tips.

Or even just output the mash distance matrix of my plasmid vs everything in teh database, so I can easily see how far it is from another plasmid of interest.

I can roll my own using mashtree, but others might find useful?

Just a thought, thanks for the nice tool.

Best,

Phil

@jrober84 jrober84 added the enhancement New feature or request label May 22, 2020
@jrober84
Copy link
Collaborator

I will label this one as an enhancement for future versions. The clusters.txt file in the databases/ directory contains the typing information for all of the plasmids in the reference database. We have a primary cluster designation that is meant for aggregating similar plasmids together at a mash distance of 0.06 and a secondary cluster designation distance (0.025) which should capture near duplicates of sequences. You can select members of the same cluster in the file for building a tree with mash to see larger patterns. In our experience, draft versus complete versions of plasmids can vary up to 0.025 in mash distances, so if a plasmid shares that same cluster, you will want to use a more sensitive technique like SNP typing for distinguishing them further.

@kbessonov1984
Copy link
Collaborator

Dear Phil, although not exactly what you need in terms of distance matrix to all database entries, but you can try out our previous version (2.1.0) of MOB-Suite with plasmid host-range phylogenetic tree reconstruction feature. It will build a phylo tree based on plasmid features (replicon and cluster id) and overlay it against all plasmid sequences and corresponding taxonomy information in our database.

Thank you for feature suggestion.

$mob_typer -i plasmid.fasta -o mob-typer --host_range_detailed

@flashton2003
Copy link
Author

Thanks both!

@jrober84
Copy link
Collaborator

I am thinking to create a series of single-linkage flat clusters based on mash distances, between your input and the reference database and provide basic summary statistics on the average pairwise distance within the primary mob_cluster. This will constrain the number of samples and make the comparisons sensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants