-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blooming taxa analysis #39
Comments
Sure let me know how I can help Ashley Shade, Ph.D.
|
Thanks, Ashley! Can you send a link to the manuscript Rob is referring to? Luke
|
A meta-analysis of changes in bacterial and archaeal communities with time. http://www.ncbi.nlm.nih.gov/pubmed/23575374
|
Thanks, Rob!
|
http://mbio.asm.org/content/5/4/e01371-14 Ashley Shade, Ph.D.
|
Just want to reiterate - would love to contribute to this! let me know if and how we can! |
Ashley, we would love to have you help also! We are working toward a paper on 20k samples from the EMP, focusing on those with the best metadata plus a few other factors. We are currently working to identify the final set of 20k samples, and meanwhile putting together a set of 2k samples for testing analysis methods. Most of the samples in the EMP do not have a specific temporal component. Do you have some thoughts for how your work on conditionally rare taxa (CRT) would translate to a dataset without a major temporal component? |
The temporal component is key for the "blooming" interpretation. However, the algorithm is generic in that it could easily be applied over space instead of time, but then the interpretation would concern the biogeography of populations that are typically rare but in some localities abundant. In these cases, we would hypothesize that contextual data would explain any local high abundance of otherwise rare taxa. In any one particular habitat, this would be a bit boring (the local environment drives differences in community structure - duh) but with the large EMP dataset it may be interesting to determine macroscale patterns of rarity. Perhaps we could use a model of the biogeographic distributions (based on the whole dataset) to test if these occurrences are different than expected given neutral assumptions. |
Hi @ashleyshade -- we are mobilizing forces to analyze the first 20k EMP samples (28,685 samples with >30,000 reads per sample -> 20,347 samples). How do the next few weeks look for you? Your idea to look at macroscale patterns of rarity sounds promising. Is there a version of the BIOM table that you'd prefer to work with? We have closed-reference (Greengenes and Silva), open-reference (Greengenes), "deblurred" actual sequences, and hope to also have de novo (Swarm). |
One clarification: The sequences per sample will vary depending on the OTU picking method. The 30k seqs/sample is for the closed-ref GG table. We are comparing GG to Silva, and also looking at the other OTU tables, to decide on a reasonable set of ~20k samples to include in all analyses. |
Hi Luke, Ashley Shade, Ph.D.
|
O yeah and I think closed reference is best for this. Ashley Shade, Ph.D.
|
OK great! We will get the closed-reference OTU table on the FTP site for you (probably Silva bc it assigns more reads than GG). I will email a condensed version of the mapping file, and you can let me know if it has what you need (the full one is 117MB or ~6x larger than this one). |
Ashley has posed these questions:
|
I have passed to @ashleyshade the closed-ref table with 5000 samples plus mapping file of those samples only. The files are on my Dropbox: https://www.dropbox.com/sh/pd5zjrfb0i7li6r/AAA1Wj7fnEQPucAWxE9QdDE-a?dl=0 |
@rob-knight suggested: we should definitely do some blooming taxa analysis as per the Shade paper and maybe encourage Ashley to do that.
Ashley Shade is at Michigan State.
The text was updated successfully, but these errors were encountered: