You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am anotating some big animal and plant genomes, when doing homolog base annotation, I want to use those proteins in OrthoDB as homolog proteins, but I found that there are too many protein sequences (5,000,000+ for vertebrate) and metaeuk is slow.
May I cut the whole protein database into 10s or 100s pieces and run metaeuk using each piece seperately, then combine all targets sequences in metaeuk results, and run metaeuk again using this combined target sequences to get the final results?
Best,
Kun
The text was updated successfully, but these errors were encountered:
I am very sorry for the late reply. This issue somehow escaped me.
What you suggest sounds reasonable. Basically, it is a way to pre-filter the target database and retain only the sequences that have potential to contribute something at a later stage. However, if it is too involved to implement the idea, here are other things you could try:
Divide your contigs to several input files and run each against the large target database
Cluster your target database and use only the representative sequences as a slimmer version of the target (or construct profiles from each cluster)
Choose a different, smaller target database. You can find some options using the command databases
Hi,
I am anotating some big animal and plant genomes, when doing homolog base annotation, I want to use those proteins in OrthoDB as homolog proteins, but I found that there are too many protein sequences (5,000,000+ for vertebrate) and metaeuk is slow.
May I cut the whole protein database into 10s or 100s pieces and run metaeuk using each piece seperately, then combine all targets sequences in metaeuk results, and run metaeuk again using this combined target sequences to get the final results?
Best,
Kun
The text was updated successfully, but these errors were encountered: