You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I would like to express my gratitude for this incredible community and the nf-core initiative as a cornerstone for reproducibility in computational biology.
I am reaching out to highlight an issue I encountered in versions 3.12.0 and 3.18.0.
Specifically, the file "salmon.merged.gene_counts.tsv" does not appear to contain raw integer counts. Instead, it contains what seem to be floating-point values, as illustrated in the screenshot below:
I have attached the relevant files below (v 3.18.0) and am working with the following sample: SRX1874029 - NCBI SRA
Thank you for your attention to this matter. I look forward to your insights and any potential solutions.
The read counts in the salmon.merged.gene_counts.tsv output file not being integers is expected. This is how salmon reports the results in the quant.sf and quant.genes.sf files for each sample.
NumReads — This is salmon’s estimate of the number of reads mapping to each transcript that was quantified. It is an “estimate” insofar as it is the expected number of reads that have originated from each transcript given the structure of the uniquely mapping and multi-mapping reads and the relative abundance estimates for each transcript.
Also, see this comment from Rob Patro, the main Salmon developer, on the rationale behind this (emphasis added):
Regarding outputting "original read counts"; salmon does output the estimates for the number of reads deriving from each transcript. If the question is, why is this number not an integer, that's because the best estimate (the maximum likelihood estimate) is often not integral. Tools that simply count reads (e.g. HTSeq) produce integer counts, but these are in no way "original read counts" for the corresponding genes, and are usually less accurate (farther from the true number of fragments deriving from a transcript / gene) than the estimates produced by salmon. The fact that the best estimate is often not an integer is a direct result of the fact one is considering a statistical model and taking expectations.
Description of the bug
Dear nf-core team,
First of all, I would like to express my gratitude for this incredible community and the nf-core initiative as a cornerstone for reproducibility in computational biology.
I am reaching out to highlight an issue I encountered in versions 3.12.0 and 3.18.0.
Specifically, the file "salmon.merged.gene_counts.tsv" does not appear to contain raw integer counts. Instead, it contains what seem to be floating-point values, as illustrated in the screenshot below:
I have attached the relevant files below (v 3.18.0) and am working with the following sample:
SRX1874029 - NCBI SRA
Thank you for your attention to this matter. I look forward to your insights and any potential solutions.
Best regards,
Christian Andersen
Command used and terminal output
Relevant files
JobParameters.json
stdout.txt
samplesheet_1_sample_peters.csv
System information
Hardware: Ucloud
OS: Linux
The text was updated successfully, but these errors were encountered: