pipeline: differential_expression:deAnalysis (1)` terminated with an error exit status (1) #127

afazhra · 2024-10-29T05:59:25Z

I’m facing an issue with the report below.

we have a total of six samples divided into two groups: treated and control. Each sample has been processed independently up to the DE Analysis step, with no issues observed in earlier stages. All FASTQ files concatenated without errors, and the sample sheet is configured correctly, ensuring proper sample separation between groups.

The entire workflow progressed smoothly up until the DE Analysis step,

I tried running it with 20 threads

Thank you.

this is my command:
nextflow run epi2me-labs/wf-transcriptomes --fastq fastq_pass/ --transcriptome_source precomputed --ref_genome ../DATA/GCF_012489685.1_LjGifu_v1.2_genomic.fna.gz --ref_transcriptome ../DATA/GCF_012489685.1_LjGifu_v1.2_rna.fna.gz --ref_annotation ../DATA/GCF_012489685.1_LjGifu_v1.2_genomic.gtf.gz --de_analysis --threads 20 --cdna_kit SQK-PCB114 --sample_sheet sample_sheet.csv -c memory.config -resume

this is a log report

`ERROR ~ Error executing process > 'pipeline:differential_expression:deAnalysis (1)'

Caused by:
Process pipeline:differential_expression:deAnalysis (1) terminated with an error exit status (1)

Command executed:

mkdir merged
mkdir de_analysis
de_analysis.R annotation.gtf 3 1 10 3 "sample_sheet.csv"

Command exit status:
1

Command output:
Loading counts, conditions and parameters.
Checking annotation file type.
Annotation file type is gtf.
Checking annotation file for presence of transcript_id versions.
Annotation file transcript_ids include versions.
Loading annotation database.
Filtering counts using DRIMSeq.
Building model matrix.
Sum transcript counts into gene counts.
Running differential gene expression analysis using edgeR.
Running differential transcript usage analysis using DEXSeq.

Command error:
package 'DRIMSeq' was built under R version 4.3.2
Warning messages:
1: package 'GenomicFeatures' was built under R version 4.3.2
2: package 'BiocGenerics' was built under R version 4.3.2
3: package 'S4Vectors' was built under R version 4.3.3
4: package 'IRanges' was built under R version 4.3.3
5: package 'GenomeInfoDb' was built under R version 4.3.2
6: package 'GenomicRanges' was built under R version 4.3.3
7: package 'AnnotationDbi' was built under R version 4.3.2
8: package 'Biobase' was built under R version 4.3.3
Warning messages:
1: package 'edgeR' was built under R version 4.3.3
2: package 'limma' was built under R version 4.3.3
Loading counts, conditions and parameters.
Checking annotation file type.
Annotation file type is gtf.
Checking annotation file for presence of transcript_id versions.
Annotation file transcript_ids include versions.
Loading annotation database.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
'select()' returned 1:many mapping between keys and columns
Filtering counts using DRIMSeq.
Building model matrix.
Warning message:
package 'dplyr' was built under R version 4.3.3
Sum transcript counts into gene counts.
Running differential gene expression analysis using edgeR.
Warning messages:
1: package 'DEXSeq' was built under R version 4.3.3
2: package 'BiocParallel' was built under R version 4.3.3
3: package 'SummarizedExperiment' was built under R version 4.3.2
4: package 'MatrixGenerics' was built under R version 4.3.3
5: package 'matrixStats' was built under R version 4.3.3
6: package 'DESeq2' was built under R version 4.3.3
7: package 'RColorBrewer' was built under R version 4.3.3
Running differential transcript usage analysis using DEXSeq.
converting counts to integer mode
Warning message:
In DESeqDataSet(rse, design, ignoreRank = TRUE) :
some variables in design formula are characters, converting to factors
Error in estimateSizeFactorsForMatrix(featureCounts(object), locfunc, :
every gene contains at least one zero, cannot compute log geometric means
Calls: estimateSizeFactors ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix
Execution halted

Container:
ontresearch/wf-transcriptomes:shad8671ea3a8ed52f2c0f40355e8eb5c6f00d2cbda

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details
WARN: Killing running tasks (1)
`

The text was updated successfully, but these errors were encountered:

fgponce · 2024-10-31T03:16:59Z

Some samples? Do you mean some Experiments? I think DE analysis will need multiple samples to calculate differences. Alternatively, perhaps fastcat is combining multiple samples into one? It will do that if the directory structure isn't what its expecting ie merges demultiplexed files into one sample if the files are in the same folder.

afazhra · 2024-10-31T08:57:26Z

Apologies for the confusion @fgponce . To clarify, we have a total of six samples divided into two groups: treated and control. Each sample has been processed independently up to the DE Analysis step, with no issues observed in earlier stages. All FASTQ files concatenated without errors, and the sample sheet is configured correctly, ensuring proper sample separation between groups.

The entire workflow progressed smoothly up until the DE Analysis step, so we don’t suspect any problems related to sample handling or the sample sheet format.

fgponce · 2024-10-31T20:10:53Z

Thats great news @afazhra. It caused me problems since most of the count cols ended up empty so couldn't do DE. I just noticed it mentions lots of zero entries in the error, and thats how my mistake broke the pipeline. I also had a problem where one of my sample names was numeric. The code tries to alter this for the R-steps and adds an x to the col header during processing, it removes this post processing. However, at a check step where it looks to make sure the samplesheet and the counts file have the same col names it errors saying they are different. They aren't hahaha but it must be using the col headers from an earlier step (with the x) instead of the actual col header in the files its about to merge info from.

afazhra · 2024-11-01T02:31:26Z

Thank you for the insights @fgponce ! My samplesheet currently has the following format:

barcode,sample_id,alias,condition
barcode04,Wild_1,Wild_1,control
barcode05,Wild_2,Wild_2,control
barcode06,Wild_3,Wild_3,control
barcode01,H34_1,H34_1,treated
barcode02,H34_2,H34_2,treated
barcode03,H34_3,H34_3,treated

Each sample name has an underscore with no numeric-only names. Do you think this format could still cause any issues? Also, were there specific steps or settings that helped you prevent empty count columns from impacting the DE analysis? Just making sure I understand fully before re-running the pipeline.

sarahjeeeze · 2024-11-04T16:19:30Z

Hi, I suspect for some reason it does not like the counts file being input - Are you able to go to the work directory of the deAnalysis process and grab the counts.tsv and share it here? I think you can put it in a zip folder.

afazhra · 2024-11-05T03:52:28Z

counts.zip
Here's the counts.zip file from the work directory as requested @sarahjeeeze. Let me know if you need any further details.

Thank you.

afazhra added the question Further information is requested label Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pipeline: differential_expression:deAnalysis (1)` terminated with an error exit status (1) #127

pipeline: differential_expression:deAnalysis (1)` terminated with an error exit status (1) #127

afazhra commented Oct 29, 2024 •

edited

Loading

fgponce commented Oct 31, 2024

afazhra commented Oct 31, 2024

fgponce commented Oct 31, 2024

afazhra commented Nov 1, 2024 •

edited

Loading

sarahjeeeze commented Nov 4, 2024

afazhra commented Nov 5, 2024

pipeline: differential_expression:deAnalysis (1)` terminated with an error exit status (1) #127

pipeline: differential_expression:deAnalysis (1)` terminated with an error exit status (1) #127

Comments

afazhra commented Oct 29, 2024 • edited Loading

fgponce commented Oct 31, 2024

afazhra commented Oct 31, 2024

fgponce commented Oct 31, 2024

afazhra commented Nov 1, 2024 • edited Loading

sarahjeeeze commented Nov 4, 2024

afazhra commented Nov 5, 2024

afazhra commented Oct 29, 2024 •

edited

Loading

afazhra commented Nov 1, 2024 •

edited

Loading