Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added spinal cord analysis pipeline (Nextflow) #13

Draft
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

jcohenadad
Copy link
Collaborator

@jcohenadad jcohenadad commented Apr 20, 2021

Context

This PR introduces spinal cord analysis pipeline with SCT and NF.

TODO

  • Create spinal cord analysis pipeline in neuromod-process-spinalcord.nf
  • Update documentation

This file is a copy of neuromod-process-anat.nf and will be adapted in subsequent commits.
@jcohenadad jcohenadad marked this pull request as draft April 20, 2021 17:33
@agahkarakuzu
Copy link
Collaborator

@jcohenadad I set a skeleton workflow for spinal cord dataset with example processes:

nextflow -C neuromod-process-spinalcord.config run neuromod-process-spinalcord.nf --bids /neuromod/bids/directory

I added a lot of comments in the workflow file to explain general principles from a practical standpoint.

// the files will be moved (alternatives are copying or symlinking)
// We set overwrite true, if there are files sharing the same name with the
// outputs, they'll be overwritten.
publishDir "${derivativesDir}/${out.sub}/${out.ses}anat", mode: 'move', overwrite: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agahkarakuzu when running the pipeline, i noticed that the published output were actually symlinks pointing to the work/ directory. In order to have the physical files (not symlinks), i had to change mode: 'move' for mode: 'copy'. Do you confirm?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes @jcohenadad IIRC, copy will create binary copies. I will double check the refs to confirm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason why you created symlinks for the brain pipeline you created? the problem i see is:

  • bob runs the pipeline
  • bob thinks that the output files are created under derivatives/
  • bob packages the whole dataset and shares it with elsa
  • elsa opens the dataset and fails to read the derivatives (because they are symlinks pointing to bob's computer)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason why you created symlinks for the brain pipeline you created?

According to the documentation, publishDir directive's default mode is creting symlinks. I was expecting move option to move the file itself, previously I was using copy. I did not want to create symlinks under the derivatives folder intentionally, I will look into this.

@jcohenadad jcohenadad mentioned this pull request May 5, 2021
@jcohenadad
Copy link
Collaborator Author

jcohenadad commented Jun 7, 2021

@agahkarakuzu i started putting the whole processing inside a single nf module. I got the following error with 9a8408e

Terminal output
julien-macbook:~/code/neuromod-anat-processing $ nextflow -C neuromod-process-spinalcord.config run neuromod-process-spinalcord.nf --bids ~/data/neuromod
N E X T F L O W  ~  version 20.10.0
Launching `neuromod-process-spinalcord.nf` [cranky_mirzakhani] - revision: 5d8ecaa21c
███    ██ ███████ ██    ██ ██████   ██████  ███    ███  ██████  ██████        ███████  ██████ ████████ 
████   ██ ██      ██    ██ ██   ██ ██    ██ ████  ████ ██    ██ ██   ██       ██      ██         ██    
██ ██  ██ █████   ██    ██ ██████  ██    ██ ██ ████ ██ ██    ██ ██   ██ █████ ███████ ██         ██ 
██  ██ ██ ██      ██    ██ ██   ██ ██    ██ ██  ██  ██ ██    ██ ██   ██            ██ ██         ██ 
██   ████ ███████  ██████  ██   ██  ██████  ██      ██  ██████  ██████        ███████  ██████    ██ 
Session-level organization has been ENABLED.
Input: /Users/julien/data/neuromod
Derivatives: /Users/julien/data/neuromod/derivatives/SCT
Nextflow Work Dir: /Users/julien/code/neuromod-anat-processing/work
WARN: Input `set` must define at least two component -- Check process `SpinalCord`
executor >  local (2)
executor >  local (2)
executor >  local (2)
executor >  local (3)
[93/a382f1] process > SpinalCord (sub-01_ses-003) [  0%] 0 of 19
[-        ] process > publishOutputs              -
WARN: Input tuple does not match input set cardinality declared by process `SpinalCord` -- offending value: [sub-01_ses-001, /Users/julien/data/neuromod/sub-01/ses-001/anat/sub-01_ses-001_bp-cspine_T2w.nii.gz]
Error executing process > 'SpinalCord (sub-01_ses-002)'

Caused by:
  Process `SpinalCord (sub-01_ses-002)` terminated with an error exit status (1)

Command executed:

  # Segment spinal cord on the T2w image
  sct_deepseg_sc -i sub-01_ses-002_bp-cspine_T2w.nii.gz -c t2 -qc /Users/julien/data/neuromod/derivatives/SCT/qc -qc-subject sub-01_ses-002

Command exit status:
  1

Command output:
  
  --
  Spinal Cord Toolbox (git-master-a685d7f0d8032f9df685a08d22947947ac416e71)
  
  sct_deepseg_sc -i sub-01_ses-002_bp-cspine_T2w.nii.gz -c t2 -qc /Users/julien/data/neuromod/derivatives/SCT/qc -qc-subject sub-01_ses-002
  --

Command error:
  Traceback (most recent call last):
    File "/Users/julien/code/sct/python/envs/venv_sct/lib/python3.6/site-packages/nibabel/loadsave.py", line 42, in load
      stat_result = os.stat(filename)
  FileNotFoundError: [Errno 2] No such file or directory: 'sub-01_ses-002_bp-cspine_T2w.nii.gz'
  
executor >  local (3)
[93/a382f1] process > SpinalCord (sub-01_ses-003) [ 11%] 2 of 18, failed: 2
[-        ] process > publishOutputs              -
WARN: Input tuple does not match input set cardinality declared by process `SpinalCord` -- offending value: [sub-01_ses-001, /Users/julien/data/neuromod/sub-01/ses-001/anat/sub-01_ses-001_bp-cspine_T2w.nii.gz]
Error executing process > 'SpinalCord (sub-01_ses-002)'

Caused by:
  Process `SpinalCord (sub-01_ses-002)` terminated with an error exit status (1)

Command executed:

  # Segment spinal cord on the T2w image
  sct_deepseg_sc -i sub-01_ses-002_bp-cspine_T2w.nii.gz -c t2 -qc /Users/julien/data/neuromod/derivatives/SCT/qc -qc-subject sub-01_ses-002

Command exit status:
  1

Command output:
  
  --
  Spinal Cord Toolbox (git-master-a685d7f0d8032f9df685a08d22947947ac416e71)
  
  sct_deepseg_sc -i sub-01_ses-002_bp-cspine_T2w.nii.gz -c t2 -qc /Users/julien/data/neuromod/derivatives/SCT/qc -qc-subject sub-01_ses-002
  --

Command error:
  Traceback (most recent call last):
    File "/Users/julien/code/sct/python/envs/venv_sct/lib/python3.6/site-packages/nibabel/loadsave.py", line 42, in load
      stat_result = os.stat(filename)
  FileNotFoundError: [Errno 2] No such file or directory: 'sub-01_ses-002_bp-cspine_T2w.nii.gz'
  
  During handling of the above exception, another exception occurred:
  
  Traceback (most recent call last):
    File "/Users/julien/code/sct/spinalcordtoolbox/scripts/sct_deepseg_sc.py", line 214, in <module>
      main(sys.argv[1:])
    File "/Users/julien/code/sct/spinalcordtoolbox/scripts/sct_deepseg_sc.py", line 190, in main
      check_dim(fname_image, dim_lst=[2, 3])
    File "/Users/julien/code/sct/spinalcordtoolbox/image.py", line 1438, in check_dim
      dim = Image(fname).hdr['dim'][:4]
    File "/Users/julien/code/sct/spinalcordtoolbox/image.py", line 285, in __init__
      self.loadFromPath(param, verbose)
    File "/Users/julien/code/sct/spinalcordtoolbox/image.py", line 405, in loadFromPath
      self.im_file = nib.load(path)
    File "/Users/julien/code/sct/python/envs/venv_sct/lib/python3.6/site-packages/nibabel/loadsave.py", line 44, in load
      raise FileNotFoundError(f"No such file or no access: '{filename}'")
  FileNotFoundError: No such file or no access: 'sub-01_ses-002_bp-cspine_T2w.nii.gz'

Work dir:
  /Users/julien/code/neuromod-anat-processing/work/4d/dae364da76fb9bfda117c6a98167d0

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

Which suggests the file is not there. However, it is. Proof, when i go on the directory i can run SCT:

julien-macbook:~/data/neuromod/sub-01/ses-002/anat $ sct_deepseg_sc -i sub-01_ses-002_bp-cspine_T2w.nii.gz -c t2 -qc /Users/julien/data/neuromod/derivatives/SCT/qc -qc-subject sub-01_ses-002

--
Spinal Cord Toolbox (git-master-a685d7f0d8032f9df685a08d22947947ac416e71)

sct_deepseg_sc -i sub-01_ses-002_bp-cspine_T2w.nii.gz -c t2 -qc /Users/julien/data/neuromod/derivatives/SCT/qc -qc-subject sub-01_ses-002
--

Config deepseg_sc:
  Centerline algorithm: svm
  Brain in image: True
  Kernel dimension: 2d
  Contrast: t2
  Threshold: 0.7
Creating temporary folder (/var/folders/s8/4qnm5q1n261ch35b5kkclsb00000gn/T/sct-20210607184831.907951-kkr6o9ho)
...

Maybe the issue is due to the file being a hardlink?

julien-macbook:~/data/neuromod/sub-01/ses-002/anat $ ll sub-01_ses-002_bp-cspine_T2w.nii.gz
lrwxr-xr-x  1 julien  staff  143  2 Feb 14:45 sub-01_ses-002_bp-cspine_T2w.nii.gz -> ../../../.git/annex/objects/J8/x7/MD5E-s7547341--913c8227c5a466ae370115e517031138.nii.gz/MD5E-s7547341--913c8227c5a466ae370115e517031138.nii.gz

EDIT 2021-06-07 18:57:06: The problem is due to the fact that the processing is run under a special directory (eg: /Users/julien/code/neuromod-anat-processing/work/67/a6d950ca3da94ccae7add4efcc5890) and there is no image there... so, images need to be copied there before processing.

@agahkarakuzu i'm just going to copy each image to the "work" directory-- how can i get the "source path" of the data inside the NF module file?
i know this is completely outside of the philosophy of NF but i need to move forward and i have no more time to dedicate to being familiar with NF. so, my goal is to have everything under "script", including:

  • copy files under work
  • process files
  • copy files where they should be (derivatives/sct, etc.)

@agahkarakuzu
Copy link
Collaborator

agahkarakuzu commented Jun 7, 2021

Does the hash named folder in the nextflow workdir contain that file?

Another question, does that process work with other files from different subjects/sessions?

I am not sure why it would be a hardlink,and in that case cause an issue. Nextflow can usually follow softlinks as long as the parent binaries are found.

If file exists in the source dir but nextflow can't copy it over to the work dir, it is usually logged why it could not fetch that file (at least a warning).

@agahkarakuzu
Copy link
Collaborator

Now that I read file permissions again, it states the file is a symlink and the owner code is 143. From which terminal you called it? Once I experienced that nextflow call in a terminal in the vscode was not finding files but osx terminal could find.

If nextflow was having an issue with symlinks, we could not process any of the datalad files, this is confusing.

@jcohenadad
Copy link
Collaborator Author

thank you for chipping in @agahkarakuzu, i’ve since edited my previous message: the cause was: file not present in the work dir. I’ve established a strategy although that strategy really bypasses all the point of nextflow...

@agahkarakuzu
Copy link
Collaborator

agahkarakuzu commented Jun 8, 2021

so, images need to be copied there before processing.

Normally ball is on Nextflow's court to copy files over there (that's what channels were supposed to be handling). Copying files to a folder under work manually can be tricky because of hash generation, not sure if there are variables dynamically holding the work directories like 4d/8432y7hjhg8234h23942 for sub-01_ses-002, then 7y/287yhfkbsd7yr8i289isuehr for sub-06_ses-001 etc.. Without knowing this copying files for a process to find them would be really difficult.

If you somehow managed to establish this strategy and it works for now, that's great 😲 In the long run we can open an issue on nextflow and ask for help to understand why those files could not be found in the first place.

@jcohenadad jcohenadad changed the title Added spinal cord analysis pipeline Added spinal cord analysis pipeline (Nextflow) Jun 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants