Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pipeline: differential_expression:deAnalysis (1)` terminated with an error exit status (1) #127

Open
afazhra opened this issue Oct 29, 2024 · 6 comments
Labels
question Further information is requested

Comments

@afazhra
Copy link

afazhra commented Oct 29, 2024

I’m facing an issue with the report below.

we have a total of six samples divided into two groups: treated and control. Each sample has been processed independently up to the DE Analysis step, with no issues observed in earlier stages. All FASTQ files concatenated without errors, and the sample sheet is configured correctly, ensuring proper sample separation between groups.

The entire workflow progressed smoothly up until the DE Analysis step,

I tried running it with 20 threads

Thank you.

this is my command:
nextflow run epi2me-labs/wf-transcriptomes --fastq fastq_pass/ --transcriptome_source precomputed --ref_genome ../DATA/GCF_012489685.1_LjGifu_v1.2_genomic.fna.gz --ref_transcriptome ../DATA/GCF_012489685.1_LjGifu_v1.2_rna.fna.gz --ref_annotation ../DATA/GCF_012489685.1_LjGifu_v1.2_genomic.gtf.gz --de_analysis --threads 20 --cdna_kit SQK-PCB114 --sample_sheet sample_sheet.csv -c memory.config -resume

this is a log report

`ERROR ~ Error executing process > 'pipeline:differential_expression:deAnalysis (1)'

Caused by:
Process pipeline:differential_expression:deAnalysis (1) terminated with an error exit status (1)

Command executed:

mkdir merged
mkdir de_analysis
de_analysis.R annotation.gtf 3 1 10 3 "sample_sheet.csv"

Command exit status:
1

Command output:
Loading counts, conditions and parameters.
Checking annotation file type.
Annotation file type is gtf.
Checking annotation file for presence of transcript_id versions.
Annotation file transcript_ids include versions.
Loading annotation database.
Filtering counts using DRIMSeq.
Building model matrix.
Sum transcript counts into gene counts.
Running differential gene expression analysis using edgeR.
Running differential transcript usage analysis using DEXSeq.

Command error:
package 'DRIMSeq' was built under R version 4.3.2
Warning messages:
1: package 'GenomicFeatures' was built under R version 4.3.2
2: package 'BiocGenerics' was built under R version 4.3.2
3: package 'S4Vectors' was built under R version 4.3.3
4: package 'IRanges' was built under R version 4.3.3
5: package 'GenomeInfoDb' was built under R version 4.3.2
6: package 'GenomicRanges' was built under R version 4.3.3
7: package 'AnnotationDbi' was built under R version 4.3.2
8: package 'Biobase' was built under R version 4.3.3
Warning messages:
1: package 'edgeR' was built under R version 4.3.3
2: package 'limma' was built under R version 4.3.3
Loading counts, conditions and parameters.
Checking annotation file type.
Annotation file type is gtf.
Checking annotation file for presence of transcript_id versions.
Annotation file transcript_ids include versions.
Loading annotation database.
Import genomic features from the file as a GRanges object ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
Warning message:
In .get_cds_IDX(mcols0$type, mcols0$phase) :
The "phase" metadata column contains non-NA values for features of type
stop_codon. This information was ignored.
'select()' returned 1:many mapping between keys and columns
Filtering counts using DRIMSeq.
Building model matrix.
Warning message:
package 'dplyr' was built under R version 4.3.3
Sum transcript counts into gene counts.
Running differential gene expression analysis using edgeR.
Warning messages:
1: package 'DEXSeq' was built under R version 4.3.3
2: package 'BiocParallel' was built under R version 4.3.3
3: package 'SummarizedExperiment' was built under R version 4.3.2
4: package 'MatrixGenerics' was built under R version 4.3.3
5: package 'matrixStats' was built under R version 4.3.3
6: package 'DESeq2' was built under R version 4.3.3
7: package 'RColorBrewer' was built under R version 4.3.3
Running differential transcript usage analysis using DEXSeq.
converting counts to integer mode
Warning message:
In DESeqDataSet(rse, design, ignoreRank = TRUE) :
some variables in design formula are characters, converting to factors
Error in estimateSizeFactorsForMatrix(featureCounts(object), locfunc, :
every gene contains at least one zero, cannot compute log geometric means
Calls: estimateSizeFactors ... estimateSizeFactors -> .local -> estimateSizeFactorsForMatrix
Execution halted

Container:
ontresearch/wf-transcriptomes:shad8671ea3a8ed52f2c0f40355e8eb5c6f00d2cbda

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line

-- Check '.nextflow.log' file for details
WARN: Killing running tasks (1)
`

@afazhra afazhra added the question Further information is requested label Oct 29, 2024
@fgponce
Copy link

fgponce commented Oct 31, 2024

Some samples? Do you mean some Experiments? I think DE analysis will need multiple samples to calculate differences. Alternatively, perhaps fastcat is combining multiple samples into one? It will do that if the directory structure isn't what its expecting ie merges demultiplexed files into one sample if the files are in the same folder.

@afazhra
Copy link
Author

afazhra commented Oct 31, 2024

Apologies for the confusion @fgponce . To clarify, we have a total of six samples divided into two groups: treated and control. Each sample has been processed independently up to the DE Analysis step, with no issues observed in earlier stages. All FASTQ files concatenated without errors, and the sample sheet is configured correctly, ensuring proper sample separation between groups.

The entire workflow progressed smoothly up until the DE Analysis step, so we don’t suspect any problems related to sample handling or the sample sheet format.

@fgponce
Copy link

fgponce commented Oct 31, 2024

Thats great news @afazhra. It caused me problems since most of the count cols ended up empty so couldn't do DE. I just noticed it mentions lots of zero entries in the error, and thats how my mistake broke the pipeline. I also had a problem where one of my sample names was numeric. The code tries to alter this for the R-steps and adds an x to the col header during processing, it removes this post processing. However, at a check step where it looks to make sure the samplesheet and the counts file have the same col names it errors saying they are different. They aren't hahaha but it must be using the col headers from an earlier step (with the x) instead of the actual col header in the files its about to merge info from.

@afazhra
Copy link
Author

afazhra commented Nov 1, 2024

Thank you for the insights @fgponce ! My samplesheet currently has the following format:

barcode,sample_id,alias,condition
barcode04,Wild_1,Wild_1,control
barcode05,Wild_2,Wild_2,control
barcode06,Wild_3,Wild_3,control
barcode01,H34_1,H34_1,treated
barcode02,H34_2,H34_2,treated
barcode03,H34_3,H34_3,treated

Each sample name has an underscore with no numeric-only names. Do you think this format could still cause any issues? Also, were there specific steps or settings that helped you prevent empty count columns from impacting the DE analysis? Just making sure I understand fully before re-running the pipeline.

@sarahjeeeze
Copy link
Contributor

Hi, I suspect for some reason it does not like the counts file being input - Are you able to go to the work directory of the deAnalysis process and grab the counts.tsv and share it here? I think you can put it in a zip folder.

@afazhra
Copy link
Author

afazhra commented Nov 5, 2024

counts.zip
Here's the counts.zip file from the work directory as requested @sarahjeeeze. Let me know if you need any further details.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Development

No branches or pull requests

3 participants