Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitening filter removing spikes #815

Open
gjin239 opened this issue Nov 10, 2024 · 19 comments
Open

Whitening filter removing spikes #815

gjin239 opened this issue Nov 10, 2024 · 19 comments
Assignees

Comments

@gjin239
Copy link

gjin239 commented Nov 10, 2024

Hi all,

I know Kilosort/Phy is specialised for data with large channel counts, but currently we are only running 4-channel tetrode recordings through this software. As such, the whitening filter appears to be nullifying our valid spikes, even while using low whitening ranges:
(raw VS whitened)
image
image

Is there a way to somehow disable the whitening filter or minimise its effects on our data? Thank you!

@morales-gregorio
Copy link

Hi @gjin239,

I think you can still see spikes in the lower figure, just with a much smaller amplitude. Can you z-score the signals in time?

Whitening removes the correlation between the channels, so the spikes will only remain in whichever channel had the highest amplitude for that spike. Whitening also drastically reduces the amplitude of the whole signal (but not necesarily the ratio between noise and spikes)

@gjin239
Copy link
Author

gjin239 commented Nov 14, 2024

Hi @morales-gregorio, thank you for the information on whitening.

We're more accustomed to using the trace view of KS2, which appears to show that good spikes are sometimes ignored after pre-processing. Not quite sure how to go about z-scoring signals yet: we're trying to reproduce data outputs in KS4, but keep running into a TruncatedSVD ValueError.

@jacobpennington
Copy link
Collaborator

@gjin239 Can you please upload kilosort4.log from the results directory so I can see where you're encountering the error?

@gjin239
Copy link
Author

gjin239 commented Nov 16, 2024

@jacobpennington Nothing is saved to the results directory yet, but I can copy-paste the log straight from the message logbox:

C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\io.py:498: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:212.)
  X[:, self.nt : self.nt+nsamp] = torch.from_numpy(data).to(self.device).float()

Preprocessing filters computed in  0.00s; total  0.00s



computing drift


nblocks = 0, skipping drift correction


drift computed in  0.00s; total  0.01s



Extracting spikes using templates


Re-computing universal templates from data.


Traceback (most recent call last):

  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 82, in run

    
st, tF, Wall0, clu0 = detect_spikes(ops, self.device, bfile, tic0=tic0,


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 392, in detect_spikes

    
st0, tF, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 193, in run

    
ops['wPCA'], ops['wTEMP'] = extract_wPCA_wTEMP(


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 70, in extract_wPCA_wTEMP

    
model = TruncatedSVD(n_components=ops['settings']['n_pcs']).fit(clips)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\decomposition\_truncated_svd.py", line 209, in fit

    
self.fit_transform(X)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\utils\_set_output.py", line 295, in wrapped

    
data_to_wrap = f(self, X, *args, **kwargs)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\base.py", line 1474, in wrapper

    
return fit_method(estimator, *args, **kwargs)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\decomposition\_truncated_svd.py", line 229, in fit_transform

    
X = self._validate_data(X, accept_sparse=["csr", "csc"], ensure_min_features=2)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\base.py", line 633, in _validate_data

    
out = check_array(X, input_name="X", **check_params)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\utils\validation.py", line 1072, in check_array

    
raise ValueError(


ValueError
: 
Found array with 0 sample(s) (shape=(0, 51)) while a minimum of 1 is required by TruncatedSVD.

Just in case they're important, here are the settings I'm currently using, While there are only 4 active channels, as a carry-over from KS2 I've fitted on 12 blank dummy channels such that the data could be read as a Linear x16 probe.
image

@jacobpennington
Copy link
Collaborator

@gjin239 Unfortunately I will need to see the full log, there is a lot more information in there. Please upload it when you are able.

@gjin239
Copy link
Author

gjin239 commented Nov 20, 2024

@jacobpennington Turns out I was using an old version of KS4; here's the error log now that I've updated it.
kilosort4.log

@jacobpennington
Copy link
Collaborator

I just noticed your comment about concatenating dummy channels. That can cause all sorts of unexpected behavior for Kilosort4. If whitening is removing spikes that are occurring across all channels, the best solution is to concatenate multiple tetrode recordings to be sorted simultaenously, i.e. sort 4 recordings together for 16 channels total (or more).

@gjin239
Copy link
Author

gjin239 commented Nov 20, 2024

@jacobpennington Huh, interesting! Is there any chance that 4 recordings would influence each other's sorting outputs?
(And bonus question: does this apply to both KS4 and earlier versions of KS?)

@jacobpennington
Copy link
Collaborator

jacobpennington commented Nov 22, 2024

I'm not sure about earlier versions of Kilosort. We're not providing support for those anymore, but you could try asking another user that has sorted tetrode data on previous versions.

As for KS4, sorting the recordings together should not influence sorting results provided that the probe layout is set up correctly. On that note, you sould not use the 16-channel linear probe that gets included by default. We use that for some testing, but the contacts for that probe are only spaced 1um apart, which will result in some strange behavior for sorting real data. You should create your own probe map, with channels on the same tetrode spaced close together on the y-axis (maybe ~20um or so) and a gap of at least ~100um between tetrodes so that they're sorted separately.

This is different from previous versions of Kilosort, for which only the relative spacing of contacts mattered. For KS4, the absolute distances between channels make a difference.

@gjin239
Copy link
Author

gjin239 commented Nov 23, 2024

@jacobpennington Sounds good, I'll give that a go.
Just one more question: the raw version of the data we use contains a lot of large-amplitude artifacts. In KS2, we had to crop these out for effective spike detection, causing recordings to have different lengths. Does artifact thresholding (or another preprocessing feature) allow KS4 to automatically remove these large artifacts and make it easier for us to concatenate multiple recordings?

@jacobpennington
Copy link
Collaborator

There is a parameter for very basic artifact removal, artifact_threshold. Any batch containing at least one sample with an absolute value greater than that number will be zeroed out. It's a ham-handed approach that we hope to improve later on, but that's what's there for now. As long as it doesn't end up removing long stretches all at once (i.e. several batches in a row), it seems to work well enough.

@gjin239
Copy link
Author

gjin239 commented Nov 24, 2024

@jacobpennington I've created the new file and probe, and now I'm getting a different error:
kilosort4.log
Another thing I'm curious about: does the CMR affect anything here at all, and should I be disabling it?
(Thank you for all the information so far, by the way; it's been very insightful!)

@jacobpennington
Copy link
Collaborator

jacobpennington commented Dec 3, 2024

@gjin239 your batch size is set to the entire duration of the recording, which is going to cause all sorts of problems. As a general rule, you should start by running KS4 without changing any of the parameters from their defaults, except for the sampling rate and number of channels. The default batch size of 60000 should work fine, or 50000 if you want it to be a round number of seconds for your sampling rate. If you run into issues with that (i.e. not enough data per batch because of a low channel count) you can try doubling or tripling that number, but anything more than a ~20 second batch size is well outside the expected range.

@jacobpennington
Copy link
Collaborator

As for your other question, I don't see any reason why you would need to disable CMR.

@gjin239
Copy link
Author

gjin239 commented Dec 3, 2024

@jacobpennington KS4 can run through the entirety of spike sorting now, and the phy outputs are consistent with individual files. Thank you!
However, it currently seems inconsistent for if diagnostic diagrams and phy parameters are saved at all- am I supposed to be clicking the 'run' button multiple times to get all the outputs I need?

@jacobpennington
Copy link
Collaborator

jacobpennington commented Dec 3, 2024

I'm not sure what you mean about the inconsistent output. You should only click run once for each sorting job. It's possible there's a bug related to running multiple jobs back to back in the same GUI session (i.e. without closing the GUI and then launching it again), but it's hard to say without seeing what outputs (or lack of outputs) you're referring to.

If you're trying to sort many recordings in sequence, it would be simpler to use the API. I.e. something like:

from kilosort import run_kilosort

data_paths = ['/path/to/data1.bin', '/path/to/data2.bin', ...]
probe_paths = ['/path/to/probe1.json', '/path/to/probe2.json', ...]
results_paths = ['/path/to/save_results1', 'path/to/save_results2/', ...]

for data, probe, results in zip(data_paths, probe_paths, results_paths):
    settings = {
        'n_chan_bin': 16, 'probe_path': probe, 'filename': data,
        'results_dir': results, ... # other settings
        }
    ops, st, clu, ... (etc) = run_kilosort(settings, ...)
    # ... do whatever you need to do with the results if anything
    del(ops, st, clu, ... etc)

You can see more details about that here: https://kilosort.readthedocs.io/en/latest/tutorials/basic_example.html

@gjin239
Copy link
Author

gjin239 commented Dec 7, 2024

@jacobpennington I solved this issue, looks like it just takes a while longer to generate all the phy files because of the data size. Thank you for this information nonetheless, it is useful.

KS4 is currently generating a total of ~12 clusters for a combined dataset, with each cluster clearly consisting of several different merged cells- e.g. these spikes from the same initial identified clusters.
image
image
Is there a setting similar to AUC Split from earlier versions of KS that can be adjusted, or should I keep playing with Th_universal until I reach a happy medium?

@jacobpennington
Copy link
Collaborator

Changes to Th_universal would not directly affect how clusters are merged. If small fluctuations are being detected as spikes but are actually noise, you should increase the threshold. If you're missing spikes at lower amplitudes, decrease the threshold. If you think units are being overmerged or undermerged, you could try tweaking ccg_threshold. There's a full list of parameters with their descriptions in kilosort.parameters.py, or you can mouse-over the parameter names in the GUI to see the same descriptions.

@gjin239
Copy link
Author

gjin239 commented Dec 14, 2024

@jacobpennington I see, thank you. However, it doesn't really seem like tweaking ccg_threshold changes much on my end- would it be because their features are too similar?
E.g. threshold of 0.5 (1, 15 units detected) VS 0.05 (2, 14 units detected).
(1) diagnostics_ccg0 5_600
(2) diagnostics

Another query I have is whether KS4 has trouble with certain chunks of data: you can see clearly here that there are two sections where spikes were not detected in any part of the recordings.
image
In the KS4 GUI itself the display appears blank for these sections.
image
As in the raw data in these sections are perfectly fine, I was wondering what might be causing these.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants