refinning MAGs #440
Replies: 16 comments
-
Cool that you point this out. To get started the command
Would translate to
I suggest you, using metabat as final dinner, so you don't need to pass the binning through DAS tool and then refinem. I'm eager to know if it improves your bins. Maybe I can add it to atlas if it is convincing. |
Beta Was this translation helpful? Give feedback.
-
ok great!, it worked with a previous samtools index step on the .bam files. thanks a lot |
Beta Was this translation helpful? Give feedback.
-
No this is more or less the reason. I like to know which tool is doing what. What I would find the best way is to create a folder Here is the code to create such a file if you have only the fasta files of the bins. Then if you set If you want and you have many samples, I can help you to implement this in atlas snakemake. E.g. that the enrichM is performed as part of Atlas. But may it is worth testing if there is an improvement. |
Beta Was this translation helpful? Give feedback.
-
Could you solve the Cyclic dependency problem? I also encountered, I should fix it in an update of Atlas. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
Let's say bins with contamination >10-20% or completeness < 50% are uninteresting. Then you have two before and three after refineM, isn't it? |
Beta Was this translation helpful? Give feedback.
-
yep. I don't like it either. I will check some other samples and show you. Also I will try with the phylogeny option, because this was tetranucleotides and coverage option... |
Beta Was this translation helpful? Give feedback.
-
I had similar experiences with magpurify.
May be you want also to try this tool. https://github.com/dib-lab/charcoal/tree/latest/doc
|
Beta Was this translation helpful? Give feedback.
-
I could run refineM phylogeny mode after correcting a python bug it had, so, in order to run refinem as final binner, I only need to change this in the config file and just "atlas run all", or should I remove something before? let's see if it improves or not. |
Beta Was this translation helpful? Give feedback.
-
Cool, above I explained how to integrate a new binner into atlas. And then you can run 'run atlas all' or 'run atlas binning' both produce a binning report in reports directory. I looking forward to seeing if it helps something. |
Beta Was this translation helpful? Give feedback.
-
I just ping @ctb, to say that you tried charcoal without success. |
Beta Was this translation helpful? Give feedback.
-
I send you some ugly plots for 5 samples, comparing completeness and contamination before (red) and after (blue/green) refinem, in taxonomy mode. it improves a little in general, but I would say that contamination is mostly from very similar strains with similar TNF and coverages... I guess this must be THE huge problem in genome reconstruction from metagenomes... right? link to plots: https://drive.google.com/file/d/1pGmEKG2Q01ayoT_TOU_QCUUwTd2xMl_p/view?usp=sharing |
Beta Was this translation helpful? Give feedback.
-
Thank you very much for your results. You would also say the results don’t seem convincing.
I don’t know If one can say in general that there is strain contamination. But yes the more similar TNF and abundance are the more complicated it is to bin genomes correctly.
There is a field in the checkM results that states if contamination is expected from a similar strain or not.
|
Beta Was this translation helpful? Give feedback.
-
first, yes, sorry, charcoal is in a broken state at the moment :(
It's definitely one of them :). You would certainly expect this to be a problem in every MAG workflow I have seen, based on the way assembly and binning work. I think it's unresolvable in that sense, without doing something quite different in the graph (see e.g. https://github.com/chrisquince/STRONG for a promising approach that could be applied to contamination). But we also see a surprising amount of cross-everything contamination in large MAG data sets, e.g. see some charcoal output here. I think that's a big practical problem because the databases are getting contaminated with wildly divergent taxonomic classifications... |
Beta Was this translation helpful? Give feedback.
-
Thank you for your comment. Do you think the coassembly can really help to disentangle the genomes?
|
Beta Was this translation helpful? Give feedback.
-
this is territory where I only have the vaguest of data, so mainly based on intuition, but - the information is in the reads, and we should be able to disentangle it! I don't think we can rely on co-assembly the way it's currently done tho. (for some data on this with our tools, see this comment, where we can clearly see multiple peaks for two different strain variants in the read abundance data from a single sample; with multiple samples, colored De Bruijn graphs should be able to disentangle the strains with some reasonable precision, and I think STRONG is a promising step towards actually showing that can work, albeit in a ~reference-based way.) (I'm in no way claiming that our tools are special here, it's just a figure I had ready to link :) (Also, I am not an author on STRONG!) |
Beta Was this translation helpful? Give feedback.
-
Is it possible to use refinem (https://github.com/dparks1134/RefineM) directly from ATLAS output? which files should I use as scaffolds/bins/bam files. Do they have similar headers so this tool can be used?
Thanks
Beta Was this translation helpful? Give feedback.
All reactions