ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

physnano · 2024-09-30T15:51:46Z

My workflow keeps failing at the reference_assembly:map_reads step:

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)'

Caused by:
  Process `pipeline:reference_assembly:map_reads (1)` terminated with an error exit status (140)

Command executed:

  minimap2 -t 1 -ax splice -uf genome_index.mmi seqs.fastq.gz        | samtools view -q 40 -F 2304 -Sb -        | seqkit bam -j 1 -x -T 'AlnContext: { Ref: "GRCh38.primary_assembly.genome.fa", LeftShift: -24,
      RightShift: 24, RegexEnd: "[Aa]{8,}",
      Stranded: True,Invert: True, Tsv: "internal_priming_fail.tsv"} ' -        | samtools sort --write-index -@ 1 -o "E3_rep2_reads_aln_sorted.bam##idx##E3_rep2_reads_aln_sorted.bam.bai" - ;
  ((cat "E3_rep2_reads_aln_sorted.bam" | seqkit bam -s -j 1 - 2>&1)  | tee E3_rep2_read_aln_stats.tsv ) || true
  
  # Add sample id header and column
  sed "s/$/E3_rep2/" "E3_rep2_read_aln_stats.tsv"         | sed "1 s/E3_rep2/sample_id/" > tmp
  mv tmp "E3_rep2_read_aln_stats.tsv"
  
  if [[ -s "internal_priming_fail.tsv" ]];
      then
          tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $4 }' - > "context_internal_priming_fail_start.fasta"
          tail -n +2 "internal_priming_fail.tsv" | awk '{print ">" $1 "\n" $6 }' - > "context_internal_priming_fail_end.fasta"
  fi

Command exit status:
  140

Command output:
  (empty)

Error code 140 suggests Memory/CPU constraint, however adding the following to the config file has not resolved the issue:

process {
    withName: 'makeReport' {
    queue = 'himem'
    memory = '512.GB'
    }

    withName: 'reference_assembly:map_reads' {
    memory = '32.GB'
    } 
}

--->

WARN: There's no process matching config selector: reference_assembly:map_reads

The text was updated successfully, but these errors were encountered:

nrhorner · 2024-09-30T19:51:03Z

Hi @physnano

Just the process name should be included in the process selector like so:

    withName: 'map_reads' {
    memory = '32.GB'
    }

physnano · 2024-10-03T15:50:46Z

Thanks @nrhorner, that along with clusterOptions = '--qos=long' seemed to help. Although now I am seeing the following:

ERROR ~ Error executing process > 'pipeline:split_bam (2)'

Caused by:
  Process `pipeline:split_bam (2)` terminated with an error exit status (137)

Command executed:

  n=`samtools view -c isob11_rep2_reads_aln_sorted.bam`
  if [[ n -lt 1 ]]
  then
      echo 'There are no reads mapping for isob11_rep2. Exiting!'
      exit 1
  fi
  
  re='^[0-9]+$'
  
  if [[ 50000 =~ $re ]]
  then
      echo "Bundling up the bams"
      seqkit bam -j 4 -N 50000 isob11_rep2_reads_aln_sorted.bam -o  bam_bundles/
      let i=1
      for b in bam_bundles/*.bam; do
          echo $b
          newname="isob11_rep2_batch_${i}.bam"
          mv $b $newname
         ((i++))
      done
  else
      echo 'no bundling'
      ln -s isob11_rep2_reads_aln_sorted.bam isob11_rep2_batch_1.bam
  fi

Command exit status:
  137

It seems that many of the steps of this workflow do not have sufficient default memory allocated to the (sub)processes...

nrhorner · 2024-10-10T06:25:36Z

Hi @physnano

Ok, thanks for the update. We will review memory allocations for this workflow. Would you be able to share a bit of information about your data? How many samples and total number of reads are you using? ALso which version of the workflow and the command you used?

Thanks,

Neil

physnano · 2024-10-22T16:15:38Z

Hi @nrhorner , In my case 3 replicates for 2 samples (6 total) were split across two PromethION flow cells, so ~40-50M raw reads per individual barcode. The makeReport step spikes to ~200GB according to my monitoring. I am using the latest version v1.4.0 --> Command used:

nextflow run ${wfPath}wf-transcriptomes \
    --fastq ${fqPath} \
    --de_analysis \
    --ref_genome ${refPath}GRCh38.primary_assembly.genome.fa \
    --ref_annotation ${refPath}gencode.v46.primary_assembly.annotation.gtf \
    --ref_transcriptome ${refPath}gencode.v46.transcripts.fa \
    --sample_sheet ${wfPath}sample_sheet.csv \
    --cdna_kit SQK-PCB114 \
    --out_dir ${wfPath}outdir-de \
    -profile singularity \
    -c ${wfPath}wf-transcriptomes/nextflow.config \
    --threads 4 \
    -resume

nrhorner · 2024-11-06T07:34:32Z

Hi @physnano

It's not good that the report generation step is using so much memory. I will investigate this.

physnano added the question Further information is requested label Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

physnano commented Sep 30, 2024 •

edited

Loading

nrhorner commented Sep 30, 2024

physnano commented Oct 3, 2024 •

edited

Loading

nrhorner commented Oct 10, 2024

physnano commented Oct 22, 2024

nrhorner commented Nov 6, 2024

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

ERROR ~ Error executing process > 'pipeline:reference_assembly:map_reads (1)' #121

Comments

physnano commented Sep 30, 2024 • edited Loading

nrhorner commented Sep 30, 2024

physnano commented Oct 3, 2024 • edited Loading

nrhorner commented Oct 10, 2024

physnano commented Oct 22, 2024

nrhorner commented Nov 6, 2024

physnano commented Sep 30, 2024 •

edited

Loading

physnano commented Oct 3, 2024 •

edited

Loading