-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Best practice for large datasets? #92
Comments
That should work in principle. If you don't hear back from Isaac, you might try asking @kvg. |
Thanks for the answer, I am trying to merge the clean graphs all together in one single command and it took a lot of time. |
McCortex loads all the graphs into memory before joining them, and yes, this can be a bit slow. I think what you've outlined would be faster, but it's not clear that the improvement would be particularly significant (I'd imagine it depends on the contents of the graphs - particularly the number of shared k-mers between each sample). An alternate strategy that might help you is the "Join" command we wrote in a companion tool, Corticall. This assumes your graphs are stored in sorted order (with the '-s' option in mccortex commands), and then the graphs are merged linearly. This tends to be much faster than the built-in McCortex join command; I've used this to merge a couple hundred microbial genomes. The resulting joined graphs will remain compatible with all of mccortex's subcommands. After downloading and building Corticall, the command-line for this would be: $ java -jar build/jars/corticall.jar Join -g <graph_1.sorted.ctx> -g <graph_2.sorted.ctx> ... -g <graph_N.sorted.ctx> -o joined.ctx Please let me know if that does or doesn't work for you. |
Dear KVG, Thanks for your response, I will try it. Meanwhile I am still working on running through the whole workflow using a subset of data. Thanks you so much for the help!! |
Dear Isaac,
I would like to apply mccortex on a large scale resequencing project. (~400 individuals, 1GB genome size),
I read through the wiki, and here is what I think a possible workflow might look like
Do you have any suggestion about the workflow or is there any pitfall I need to be aware of?
Thank you so much.
The text was updated successfully, but these errors were encountered: