Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable organelles subworkflow #27

Merged
merged 45 commits into from
Mar 22, 2024
Merged

Conversation

ksenia-krasheninnikova
Copy link
Contributor

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

Copy link

github-actions bot commented Dec 12, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit d99f5cc

+| ✅ 130 tests passed       |+
#| ❔  19 tests were ignored |#
!| ❗   3 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: conf/igenomes.config
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in base.config: Customise requirements for specific processes.

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-genomeassembly_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomeassembly_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-genomeassembly_logo_dark.png
  • files_exist - File is ignored: .github/ISSUE_TEMPLATE/config.yml
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/workflows/linting.yml
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-genomeassembly_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomeassembly_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-genomeassembly_logo_dark.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreTemplate.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore or pyproject.toml
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/genomeassembly/genomeassembly/.github/workflows/awstest.yml

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2024-03-22 17:17:09

@ksenia-krasheninnikova
Copy link
Contributor Author

@gq1 The purpose of this PR is to try setting the borrowed NCBI API key to check if it solves request errors on tower.

conf/test.config Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@muffato
Copy link
Member

muffato commented Dec 13, 2023

I've done some digging. The environment variable NCBI_API_KEY is only recognised by the Entrez CLI, not BioPython.

See this part of the stack-trace

    File "/opt/MitoHiFi/findMitoReference.py", line 39, in find_full_mito
      handle = Entrez.esearch(db="nucleotide",term=term, idtype="acc")

The documentation https://biopython.org/docs/1.75/api/Bio.Entrez.html#module-Bio.Entrez says there is an api_key parameter that needs to be set.

MitoHifi itself (specifically the findMitoReference.py script) needs to be changed to take the API key (from the command-line or the environment) and pass it to the Entrez module:

  • Either by adding api_key=... to every Entrez function call like the one above
  • Or by setting it once for all
    Entrez.api_key = ...
    
    at the beginning of the script

@ksenia-krasheninnikova
Copy link
Contributor Author

@muffato thank you very much for looking into it
I misread the docs and hoped setting the env variable will be enough. In this case I can update the findMitoReference.py script but this will require rebuilding of the docker/singularity image which might take time for back and forth with the MitoHiFi maintainer. I suggest I switch off organelles_on flag in github testing for now and we release the pipeline as is with the further updates on organelles subworkflow in release 1.1 that would also contain the oatk module so it all adds up nicely.

@@ -49,6 +53,10 @@ jobs:
mkdir -p $NXF_SINGULARITY_CACHEDIR
mkdir -p $NXF_SINGULARITY_LIBRARYDIR

- name: Fix Longranger repo
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/sanger-tol/genomeassembly/blob/fixes_before_release/conf/test_github.config

Maybe it is easier to change the image for the process in the profile config file.

This profile is only for Github, so you can change it here.

@ksenia-krasheninnikova
Copy link
Contributor Author

  • NCBI API key for ToL API is added to the repo by @muffato
  • A docker image for the new release of mitohifi supporting NCBI API key in findMitoReference.py script is being pulled.

Pipeline logic slightly changed:

  • organelles workflow is run over the raw contigs instead of purged;
  • purging-wise: haplotigs are not purged, instead haplotigs removed from the primary assembly are joined with hifiasm haplotigs

@@ -51,4 +61,4 @@ jobs:

- name: Run pipeline with test data
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test_github,singularity --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test_github,singularity -c conf/test_github_params.config --outdir ./results
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we merge 2 test_github config files together?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function of having conf/test_github_params.config is to override HIFIASM and OATK settings from the ones set in the modules.conf file. It wouldn't work by setting them in the config corresponding to the test_github profile. Do you know how to force overriding the ext.args in the profile config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it works in 2 config files, there is no reason for it not working in one file.

You can see the the extra setting for oatk in the one config being picked up.

Command executed:

  oatk \
      -k1001 -c90 -Ttmp \
      -m insecta_mito.fam \
       \
      -t 2 \
      -o baUndUnlc1 \
      baUndUnlc1.fasta

The actual error message is
.command.run: line 24: /usr/bin/awk: No such file or directory

If you go to the container, you can see the awk being installed here: /usr/local/bin/oatk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gq1 It failed in the first place because it didn't pick up the right setting: it must have been -c5 from conf/test_github.config instead of -c 90 from conf/modules.config

[E::pathfinder] no organelle component found
[E::main] pathfinder program failed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't see the details.

I tried it on the farm and have the same problem. However, I didn't get this error: /usr/bin/awk: No such file or directory

Maybe @muffato knows, can we overwrite the process's ext.args in the profile config file, which is already defined in the modules.config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep ! I had to do that in the genome-note pipeline to add some BUSCO options in test mode:
https://github.com/sanger-tol/genomenote/blob/1.1.1/conf/modules.config#L52-L59

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modules.config is loaded after profile config file so it will overwrite the setting there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For genome-note I faced that same problem. My first solution was to define a profiles { test { ... } } block in conf/modules.config. It worked in some cases, but not always, but then I undid the change in sanger-tol/genomenote@f04a524 to what I posted in my comment above.
The solution I have now is that conf/modules.config is made aware of the profiles and changes ext.args depending on the profile. I leave the profile definition and conf/test.config unchanged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems working now.

conf/modules.config Outdated Show resolved Hide resolved
container = "ghcr.io/sanger-tol/longranger:2.2.2-c4"
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should align to process.

@ksenia-krasheninnikova
Copy link
Contributor Author

Thank you @gq1 and @muffato !
Merging into dev.

@ksenia-krasheninnikova ksenia-krasheninnikova merged commit 85f0dbf into dev Mar 22, 2024
6 checks passed
@muffato muffato deleted the fixes_before_release branch March 22, 2024 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants