-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Output mash tree or equivalent? #50
Comments
I will label this one as an enhancement for future versions. The clusters.txt file in the databases/ directory contains the typing information for all of the plasmids in the reference database. We have a primary cluster designation that is meant for aggregating similar plasmids together at a mash distance of 0.06 and a secondary cluster designation distance (0.025) which should capture near duplicates of sequences. You can select members of the same cluster in the file for building a tree with mash to see larger patterns. In our experience, draft versus complete versions of plasmids can vary up to 0.025 in mash distances, so if a plasmid shares that same cluster, you will want to use a more sensitive technique like SNP typing for distinguishing them further. |
Dear Phil, although not exactly what you need in terms of distance matrix to all database entries, but you can try out our previous version (2.1.0) of MOB-Suite with plasmid host-range phylogenetic tree reconstruction feature. It will build a phylo tree based on plasmid features (replicon and cluster id) and overlay it against all plasmid sequences and corresponding taxonomy information in our database. Thank you for feature suggestion.
|
Thanks both! |
I am thinking to create a series of single-linkage flat clusters based on mash distances, between your input and the reference database and provide basic summary statistics on the average pairwise distance within the primary mob_cluster. This will constrain the number of samples and make the comparisons sensible. |
Hi,
It's nice that mob-suite tells me the plasmid in the reference database which is closest to each plasmid it identifies in my sample, but it would be good if it output a tree of mash distances so I can see how it relates to multiple plasmids in the database. Possibly within some mash distance threshold so that I don't end up with a tree with 12000 tips.
Or even just output the mash distance matrix of my plasmid vs everything in teh database, so I can easily see how far it is from another plasmid of interest.
I can roll my own using mashtree, but others might find useful?
Just a thought, thanks for the nice tool.
Best,
Phil
The text was updated successfully, but these errors were encountered: