-
Notifications
You must be signed in to change notification settings - Fork 34
Viral contigs containing RdRP
https://serratus-public.s3.amazonaws.com/rdrp_contigs/rdrp_contigs.tar.gz (1Gb tarball)
Tarball contains two files:
rdrp_contigs.fa
(2.9 Gb FASTA) Serratus contigs with palmprint detected by palmscan
rdrp_contigs.tsv
(265Mb tab-separated text)
A contig is classified as viral
if (1) it has a high-confidence RdRP according to palmscan, and (2) it has an E-value <= 1e-6 in a diamond search of the named viral subset of PalmDB. Otherwise, it is undetermined (undet
).
A contig is classified as known
if its palmprint has >= 90% identity in a diamond search of the NCBI non-redundant protein database NR, otherwise it is novel
.
1016347 viral/novel
326942 viral/known
96359 undet/novel
6197 undet/known
Tentative taxonomies were predicted by a simple consensus method. The usearch_global command in usearch was used to search the named viral (NV) subset of PalmDB release 2021-03-14 named.fa.gz. The top 10 hits were considered for each palmprint, and the majority name assigned at each rank. If there was no majority, no name is assigned. Identity thresholds were applied: phylum=0%, class=30%, order=30%, family=40%. genus=70%, species=90%. If a hit had identity less than the threshold, the name at that rank is excluded.
1 Contig
FASTA label of contig
2 SRA
SRA accession
3 Length
Contig length
4 Depth
Mean coverage (read depth)
5 Category
One of viral/novel, viral/known, undet/novel, undet/known
6 NR_label
Label of top hit to non-redundant protein (NR).
7 NR_pctid
Identity of top hit in NR.
8 NR_evalue
E-value of top hit in NR.
9 NV_label
Label of top hit to named viral (NV).
10 NV_pctid
Identity of top hit in NV.
11 NV_evalue
E-value of top hit in NV.
12 PalmDB_label
Label of top hit to PalmDB species-like OTU.
13 PalmDB_pctid
Identity to PalmDB sOTU.
14 phylum
Tentative phylum
15 class
Tentative class
16 order
Tentative order
17 family
Tentative family
18 genus
Tentative genus
19 species
Tentative species