Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow distance matrix generation for volumetric parcellations & improved code suggestion #64

Open
1 task done
LeonDLotter opened this issue May 23, 2022 · 2 comments
Open
1 task done

Comments

@LeonDLotter
Copy link

LeonDLotter commented May 23, 2022

Description of issue

Hi neuromaps team,
I am currently working with BrainSMASH to create null maps for both surface and volumetric data and had a look into your code. I realized that your solution for generating distance matrices for volumetric parcellations is unnecessary slow. I came up with a much faster solution and wanted to share it with you. I hope, you don't mind that I did not implement it directly in neuromaps and put up a pull request. I also changed some variable names in the code snippets for my own convenience.

Data preparation:

import numpy as np
import nibabel as nib
from scipy import ndimage
from scipy.spatial.distance import cdist
from neuromaps.images import load_nifti
from tqdm.auto import tqdm

parc = load_nifti("parcellation.nii")
parc_data = parc.get_fdata()
parc_affine = parc.affine
parcels = np.trim_zeros(np.unique(parc_data))
n_parcels = len(parcels)

Your solution

(13 - 14 minutes with a 116parcels-2mm-resolution-parcellation on my M1 Pro Mac):

mask = np.logical_not(np.logical_or(np.isclose(parc_data, 0), np.isnan(parc_data)))
ijk = nib.affines.apply_affine(parc_affine, np.column_stack(np.where(mask)))
parc_data_m = parc_data[mask]

row_dist = np.zeros((len(ijk), n_parcels), dtype='float32')
dist = np.zeros((n_parcels, n_parcels), dtype='float32')
for n, row in enumerate(tqdm(ijk, desc="xyz loop")):
    ijk_dist = cdist(row[None], ijk).astype('float32')
    row_dist[n] = ndimage.mean(ijk_dist, parc_data_m, parcels)
for n in range(n_parcels):
    dist[n] = ndimage.mean(row_dist[:, n], parc_data_m, parcels)

dist[0,:10]

Output:

array([15.151454, 46.026604, 33.721268, 65.14529 , 65.7071  , 43.989674,
       53.251812, 68.88147 , 68.9169  , 42.83058 ], dtype=float32)

My solution

Iterates parcels and calculates the mean pairwise distance between all voxels in each parcel, only for the upper right triangle of the matrix, then mirrors the triangle to the lower left of the matrix (~ 14 seconds):

mask = np.logical_not(np.logical_or(np.isclose(parc_data, 0), np.isnan(parc_data)))
parc_data_m = parc_data * mask

ijk_parcels = dict()
for i_parcel in parcels:
    xyz_parcel = np.column_stack(np.where(parc_data_m==i_parcel))
    ijk_parcels[i_parcel] = nib.affines.apply_affine(parc_affine, xyz_parcel)

dist = np.zeros((n_parcels, n_parcels), dtype='float32')
for i, i_parcel in enumerate(tqdm(parcels)):
    j = i
    for _ in range(n_parcels - j):
        dist[i,j] = cdist(ijk_parcels[i_parcel], ijk_parcels[parcels[j]]).mean().astype('float32')
        j += 1

dist = dist + dist.T - np.diag(np.diag(dist))

dist[0,:10]

Output ( (completely equal to your code):

array([15.151454, 46.026604, 33.721268, 65.14529 , 65.7071  , 43.989674,
       53.251812, 68.88147 , 68.9169  , 42.83058 ], dtype=float32)

Much faster alternative

(< .5 sec) calculate distances between centroid coordinates of each parcel. That should actually not make much difference regarding spatial properties of the distance matrix, allthough I didn't test that. Obviously, we'll now have a zero diagonal.

mask = np.logical_not(np.logical_or(np.isclose(parc_data, 0), np.isnan(parc_data)))
parc_data_m = parc_data * mask

xyz = np.zeros((n_parcels, 3), float)
for i, i_parcel in enumerate(parcels):
    xyz[i,:] = np.column_stack(np.where(parc_data_m==i_parcel)).mean(axis=0)
ijk = nib.affines.apply_affine(parc_affine, xyz)

dist = np.zeros((n_parcels, n_parcels), dtype='float32')
for i, row in enumerate(ijk):
    dist[i] = cdist(row[None], ijk).astype('float32')

dist[0,:10]

Output:

array([ 0.      , 44.091846, 31.6248  , 63.11293 , 64.568054, 42.52424 ,
       51.843227, 67.06589 , 67.84859 , 38.684334], dtype=float32)

Feel free to incorporate the code snippet if you like! :)
Best,
Leon

Code of Conduct

  • I agree to follow the neuromaps Code of Conduct
@justinehansen
Copy link
Contributor

Hi Leon!

Sorry for the super late reply. Just want to say that this is awesome and thanks so much for sharing your improvement. We will work on getting it implemented! Please leave the issue open so I don't forget 😉

Best,
Justine

@LeonDLotter
Copy link
Author

Hi Justine,
no problem, you're welcome!
BW,
Leon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants