refinning MAGs #440

botellaflotante · 2020-10-22T21:31:25Z

botellaflotante
Oct 22, 2020
Collaborator

Is it possible to use refinem (https://github.com/dparks1134/RefineM) directly from ATLAS output? which files should I use as scaffolds/bins/bam files. Do they have similar headers so this tool can be used?

Thanks

SilasK · 2020-10-23T09:41:28Z

SilasK
Oct 23, 2020
Maintainer

Cool that you point this out.

To get started the command

refinem scaffold_stats -c 16 <scaffold_file> <bin_dir> <stats_output_dir> <bam_files>

Would translate to

refinem scaffold_stats -c 16 {sample}/{sample}_contigs.fasta {sample}/{final_binner}/bins {your choice} {sample}/sequence_alignment/*.bam

I suggest you, using metabat as final dinner, so you don't need to pass the binning through DAS tool and then refinem.

I'm eager to know if it improves your bins. Maybe I can add it to atlas if it is convincing.

0 replies

botellaflotante · 2020-10-23T13:56:51Z

botellaflotante
Oct 23, 2020
Collaborator Author

ok great!, it worked with a previous samtools index step on the .bam files.
But then I would exclude the maxbin output, right? can I ask you why is it better to use metabat instead of DAStool bins? just to save some step or is there other reason?
then, if it improves, is it possible to use these bins again easily in atlas to get the final MAGs as usual?

thanks a lot

0 replies

SilasK · 2020-10-23T14:44:16Z

SilasK
Oct 23, 2020
Maintainer

But then I would exclude the maxbin output, right? can I ask you why is it better to use metabat instead of DAStool bins? just to save some step or is there other reason?

No this is more or less the reason. I like to know which tool is doing what.

What I would find the best way is to create a folder {sample}/binning/refinem/ and then create a file therein called cluster_attribution.tsv, that maps contigs to bin.

Here is the code to create such a file if you have only the fasta files of the bins.

Then if you set final_binner: enrichm atlas should take this file, run checkM and continue the pipeline.

If you want and you have many samples, I can help you to implement this in atlas snakemake. E.g. that the enrichM is performed as part of Atlas. But may it is worth testing if there is an improvement.

0 replies

SilasK · 2020-10-27T19:02:55Z

SilasK
Oct 27, 2020
Maintainer

Could you solve the Cyclic dependency problem? I also encountered, I should fix it in an update of Atlas.

0 replies

botellaflotante · 2020-10-28T19:42:42Z

botellaflotante
Oct 28, 2020
Collaborator Author

yes, I had erased some important directory (genomes) for the binning step, I think that was the problem. Now I am just changing the config file to metabat as final binner and repeating without removing anything. I was to run some other samples just to be sure if there is a real improvement or not with refinem. I just got this for one sample (before refinem: red, after refinem: blue)...

0 replies

SilasK · 2020-10-28T20:27:03Z

SilasK
Oct 28, 2020
Maintainer

Let's say bins with contamination >10-20% or completeness < 50% are uninteresting.

Then you have two before and three after refineM, isn't it?
However you 2 best bins loose 10% completeness.

0 replies

botellaflotante · 2020-10-29T00:33:05Z

botellaflotante
Oct 29, 2020
Collaborator Author

yep. I don't like it either. I will check some other samples and show you. Also I will try with the phylogeny option, because this was tetranucleotides and coverage option...

0 replies

SilasK · 2020-10-29T06:02:34Z

SilasK
Oct 29, 2020
Maintainer

I had similar experiences with magpurify. May be you want also to try this tool. https://github.com/dib-lab/charcoal/tree/latest/doc

0 replies

botellaflotante · 2020-11-11T19:05:01Z

botellaflotante
Nov 11, 2020
Collaborator Author

I could run refineM phylogeny mode after correcting a python bug it had, so, in order to run refinem as final binner, I only need to change this in the config file and just "atlas run all", or should I remove something before? let's see if it improves or not.
I tried with charcoal but could not get it running...

0 replies

SilasK · 2020-11-11T19:51:05Z

SilasK
Nov 11, 2020
Maintainer

Cool, above I explained how to integrate a new binner into atlas. And then you can run 'run atlas all' or 'run atlas binning' both produce a binning report in reports directory. I looking forward to seeing if it helps something.

0 replies

SilasK · 2020-11-11T19:52:50Z

SilasK
Nov 11, 2020
Maintainer

I just ping @ctb, to say that you tried charcoal without success.

0 replies

botellaflotante · 2020-11-13T15:07:52Z

botellaflotante
Nov 13, 2020
Collaborator Author

I send you some ugly plots for 5 samples, comparing completeness and contamination before (red) and after (blue/green) refinem, in taxonomy mode. it improves a little in general, but I would say that contamination is mostly from very similar strains with similar TNF and coverages... I guess this must be THE huge problem in genome reconstruction from metagenomes... right?

link to plots:

https://drive.google.com/file/d/1pGmEKG2Q01ayoT_TOU_QCUUwTd2xMl_p/view?usp=sharing

0 replies

SilasK · 2020-11-13T15:24:45Z

SilasK
Nov 13, 2020
Maintainer

Thank you very much for your results. You would also say the results don’t seem convincing. I don’t know If one can say in general that there is strain contamination. But yes the more similar TNF and abundance are the more complicated it is to bin genomes correctly. There is a field in the checkM results that states if contamination is expected from a similar strain or not.

0 replies

ctb · 2020-11-13T15:38:19Z

ctb
Nov 13, 2020

first, yes, sorry, charcoal is in a broken state at the moment :(

I would say that contamination is mostly from very similar strains with similar TNF and coverages... I guess this must be THE huge problem in genome reconstruction from metagenomes... right?

It's definitely one of them :). You would certainly expect this to be a problem in every MAG workflow I have seen, based on the way assembly and binning work. I think it's unresolvable in that sense, without doing something quite different in the graph (see e.g. https://github.com/chrisquince/STRONG for a promising approach that could be applied to contamination).

But we also see a surprising amount of cross-everything contamination in large MAG data sets, e.g. see some charcoal output here. I think that's a big practical problem because the databases are getting contaminated with wildly divergent taxonomic classifications...

0 replies

SilasK · 2020-11-13T16:00:47Z

SilasK
Nov 13, 2020
Maintainer

Thank you for your comment. Do you think the coassembly can really help to disentangle the genomes?

0 replies

ctb · 2020-11-13T16:08:50Z

ctb
Nov 13, 2020

this is territory where I only have the vaguest of data, so mainly based on intuition, but - the information is in the reads, and we should be able to disentangle it! I don't think we can rely on co-assembly the way it's currently done tho.

(for some data on this with our tools, see this comment, where we can clearly see multiple peaks for two different strain variants in the read abundance data from a single sample; with multiple samples, colored De Bruijn graphs should be able to disentangle the strains with some reasonable precision, and I think STRONG is a promising step towards actually showing that can work, albeit in a ~reference-based way.)

(I'm in no way claiming that our tools are special here, it's just a figure I had ready to link :)

(Also, I am not an author on STRONG!)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refinning MAGs #440

{{title}}

Replies: 16 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

refinning MAGs #440

botellaflotante Oct 22, 2020 Collaborator

Replies: 16 comments

SilasK Oct 23, 2020 Maintainer

botellaflotante Oct 23, 2020 Collaborator Author

SilasK Oct 23, 2020 Maintainer

SilasK Oct 27, 2020 Maintainer

botellaflotante Oct 28, 2020 Collaborator Author

SilasK Oct 28, 2020 Maintainer

botellaflotante Oct 29, 2020 Collaborator Author

SilasK Oct 29, 2020 Maintainer

botellaflotante Nov 11, 2020 Collaborator Author

SilasK Nov 11, 2020 Maintainer

SilasK Nov 11, 2020 Maintainer

botellaflotante Nov 13, 2020 Collaborator Author

SilasK Nov 13, 2020 Maintainer

ctb Nov 13, 2020

SilasK Nov 13, 2020 Maintainer

ctb Nov 13, 2020

botellaflotante
Oct 22, 2020
Collaborator

SilasK
Oct 23, 2020
Maintainer

botellaflotante
Oct 23, 2020
Collaborator Author

SilasK
Oct 23, 2020
Maintainer

SilasK
Oct 27, 2020
Maintainer

botellaflotante
Oct 28, 2020
Collaborator Author

SilasK
Oct 28, 2020
Maintainer

botellaflotante
Oct 29, 2020
Collaborator Author

SilasK
Oct 29, 2020
Maintainer

botellaflotante
Nov 11, 2020
Collaborator Author

SilasK
Nov 11, 2020
Maintainer

SilasK
Nov 11, 2020
Maintainer

botellaflotante
Nov 13, 2020
Collaborator Author

SilasK
Nov 13, 2020
Maintainer

ctb
Nov 13, 2020

SilasK
Nov 13, 2020
Maintainer

ctb
Nov 13, 2020