Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virusbreakend gridsscache file exists #22

Closed
iagooteroc opened this issue May 6, 2024 · 9 comments
Closed

virusbreakend gridsscache file exists #22

iagooteroc opened this issue May 6, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@iagooteroc
Copy link

Description of the bug

Process NFCORE_ONCOANALYSER:WGTS:VIRUSBREAKEND_CALLING:VIRUSBREAKEND tries to create a symlink, but the file already exists. This causes the pipeline to fail.

I see that this was a change made last week:
https://github.com/nf-core/oncoanalyser/blame/d8bbe1ca8f9b0f43a511d82b8666bd26eb4a100d/modules/local/virusbreakend/main.nf#L32

Command used and terminal output

nextflow run nf-core/oncoanalyser -r dev -c ${CONFIG} --input ${INPUT} --mode wgts --genome GRCh38_hmf  --ref_data_virusbreakenddb_path ${VIRUS_PATH}/virusbreakenddb_20210401.tar.gz --ref_data_hmf_data_path ${HMF_DATA_PATH}/5.34_38--2 --outdir ${OUTDIR} -resume -profile singularity

[Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'NFCORE_ONCOANALYSER:WGTS:VIRUSBREAKEND_CALLING:VIRUSBREAKEND (G1)'
Caused by:
  Process `NFCORE_ONCOANALYSER:WGTS:VIRUSBREAKEND_CALLING:VIRUSBREAKEND (G1)` terminated with an error exit status (1)

Command executed:

  # Symlink indices next to assembly FASTA
  ln -s $(find -L GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache -type f) ./
  
  virusbreakend \
       \
      --gridssargs "--jvmheap 73443940762" \
      --threads 12 \
      --db virusbreakenddb_20210401/ \
      --output A.virusbreakend.vcf \
      --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fna \
      A.markdups.bam
  
  cat <<-END_VERSIONS > versions.yml
  "NFCORE_ONCOANALYSER:WGTS:VIRUSBREAKEND_CALLING:VIRUSBREAKEND":
      gridss: $(CallVariants --version 2>&1 | sed 's/-gridss$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  INFO:    Converting SIF file to temporary sandbox...
  WARNING: While bind mounting '/scratch:/scratch': destination is already in the mount point list
  ln: ./GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.gridsscache: File exists
  INFO:    Cleaning up image...

Relevant files

No response

System information

Nextflow: 23.04.2
Hardware: HPC
Executor: slurm and local
Container engine: Singularity
Version of nf-core/oncoanalyser: dev revision d8bbe1c

@iagooteroc iagooteroc added the bug Something isn't working label May 6, 2024
@scwatts
Copy link
Collaborator

scwatts commented May 6, 2024

Thanks for the report. There have been a few changes to the reference data and how that is handled internally recently. To first rule out some causes, can I ask:

  • had this VIRUSBreakend stage been resumed from a previous interruption?
  • have you configured oncoanalyser to use locally staged reference data?
  • has this issue occurred in just this sample or in multiple?

I also strongly recommend that you use -r 0.4.5 since the dev branch isn't considered stable and encountering potential breakage like this is expected.

@iagooteroc
Copy link
Author

  • The pipeline was resuming from a previous run, but it hadn't reached the VIRUSBreakend stage by the time it stopped.
  • Yes, I changed the fasta, fai, dict and bwa_index values for the GRCh38_hmf genome in hmf_genomes.config to use local files.
  • This is my first time running this pipeline, so I can't tell. But I see now that it is also giving this error in some of the next processes too, like NFCORE_ONCOANALYSER:WGTS:GRIDSS_SVPREP_CALLING:ASSEMBLE

@iagooteroc iagooteroc reopened this May 6, 2024
@scwatts
Copy link
Collaborator

scwatts commented May 6, 2024

With the recent changes on the dev branch, the reference data is expected to be organised differently to what I'd guess you have configured.

Your set up should work if you stick to 0.4.5 rather the dev:

nextflow run nf-core/oncoanalyser \
  -r 0.4.5 \
  -c ${CONFIG} \
  --input ${INPUT} \
  --mode wgts \
  --genome GRCh38_hmf  \
  --ref_data_virusbreakenddb_path ${VIRUS_PATH}/virusbreakenddb_20210401.tar.gz \
  --ref_data_hmf_data_path ${HMF_DATA_PATH}/5.34_38--2 \
  --outdir ${OUTDIR} \
  -profile singularity

I would avoid resuming from the existing run you have as you're effectively changing between two different versions of oncoanalyser.

Let me know how it goes!

@iagooteroc
Copy link
Author

Okay thanks! I tried that but I'm getting this error, not sure why:

ERROR ~ ERROR: You used a core Nextflow option with two hyphens: '--version'. Please resubmit with '-version'

 -- Check '.nextflow.log' file for details

WARN: Found unexpected parameters:
* --validationShowHiddenParams: false
* --validationSchemaIgnoreParams: igenomes_base,lint_ignore,genome_type,genome_version,genomes,hmf_data_paths,panel,panel_data_paths,ref_data,ref_data_genome_bwa_index,ref_data_genome_bwa_index_biidx,ref_data_genome_bwa_index_bseq,ref_data_genome_bwa_index_image,ref_data_genome_dict,ref_data_genome_fai,ref_data_genome_fasta,ref_data_genome_gridss_index,ref_data_genome_star_index,ref_data_hla_slice_bed,ref_data_hmf_data_path,ref_data_panel_data_path,ref_data_virusbreakenddb_path
* --validationLenientMode: true
* --validationFailUnrecognisedParams: false
* --lint_ignore: [lint_ignore, genome_type, genome_version, genomes, hmf_data_paths, panel, panel_data_paths, ref_data, ref_data_genome_bwa_index, ref_data_genome_bwa_index_biidx, ref_data_genome_bwa_index_bseq, ref_data_genome_bwa_index_image, ref_data_genome_dict, ref_data_genome_fai, ref_data_genome_fasta, ref_data_genome_gridss_index, ref_data_genome_star_index, ref_data_hla_slice_bed, ref_data_hmf_data_path, ref_data_panel_data_path, ref_data_virusbreakenddb_path]
* --version: false
- Ignore this warning: params.schema_ignore_params = "validationShowHiddenParams,validationSchemaIgnoreParams,validationLenientMode,validationFailUnrecognisedParams,lint_ignore,version" 

@scwatts
Copy link
Collaborator

scwatts commented May 6, 2024

Can you post both the .nextflow.log and configuration file?

@iagooteroc
Copy link
Author

Sure, thank you for your help.

nextflow.log
nextflow.config.txt

@scwatts
Copy link
Collaborator

scwatts commented May 6, 2024

I think I see the problem - you only need to include your custom settings in the config file, so if you change it to contain just the following (and any other changes I might have missed) it should work:

executor {
    name = 'slurm'
    queueSize = 10
}
process {
    executor = 'slurm'
    clusterOptions = '-N 1 -n 1'
}

@iagooteroc
Copy link
Author

You're right, it works now. Thanks for your help and quick response!

@scwatts
Copy link
Collaborator

scwatts commented May 7, 2024

No worries. I'd also recommend decompressing virusbreakenddb_20210401.tar.gz and providing the resulting directory to oncoanalyser rather than having it decompressed at runtime with every oncoanalyser analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants