Whitening filter removing spikes #815

gjin239 · 2024-11-10T04:21:34Z

Hi all,

I know Kilosort/Phy is specialised for data with large channel counts, but currently we are only running 4-channel tetrode recordings through this software. As such, the whitening filter appears to be nullifying our valid spikes, even while using low whitening ranges:
(raw VS whitened)

Is there a way to somehow disable the whitening filter or minimise its effects on our data? Thank you!

morales-gregorio · 2024-11-13T14:09:37Z

Hi @gjin239,

I think you can still see spikes in the lower figure, just with a much smaller amplitude. Can you z-score the signals in time?

Whitening removes the correlation between the channels, so the spikes will only remain in whichever channel had the highest amplitude for that spike. Whitening also drastically reduces the amplitude of the whole signal (but not necesarily the ratio between noise and spikes)

gjin239 · 2024-11-14T04:29:06Z

Hi @morales-gregorio, thank you for the information on whitening.

We're more accustomed to using the trace view of KS2, which appears to show that good spikes are sometimes ignored after pre-processing. Not quite sure how to go about z-scoring signals yet: we're trying to reproduce data outputs in KS4, but keep running into a TruncatedSVD ValueError.

jacobpennington · 2024-11-15T18:35:59Z

@gjin239 Can you please upload kilosort4.log from the results directory so I can see where you're encountering the error?

gjin239 · 2024-11-16T02:45:29Z

@jacobpennington Nothing is saved to the results directory yet, but I can copy-paste the log straight from the message logbox:

C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\io.py:498: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\utils\tensor_numpy.cpp:212.)
  X[:, self.nt : self.nt+nsamp] = torch.from_numpy(data).to(self.device).float()

Preprocessing filters computed in  0.00s; total  0.00s



computing drift


nblocks = 0, skipping drift correction


drift computed in  0.00s; total  0.01s



Extracting spikes using templates


Re-computing universal templates from data.


Traceback (most recent call last):

  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\gui\sorter.py", line 82, in run

    
st, tF, Wall0, clu0 = detect_spikes(ops, self.device, bfile, tic0=tic0,


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\run_kilosort.py", line 392, in detect_spikes

    
st0, tF, ops = spikedetect.run(ops, bfile, device=device, progress_bar=progress_bar)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 193, in run

    
ops['wPCA'], ops['wTEMP'] = extract_wPCA_wTEMP(


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\kilosort\spikedetect.py", line 70, in extract_wPCA_wTEMP

    
model = TruncatedSVD(n_components=ops['settings']['n_pcs']).fit(clips)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\decomposition\_truncated_svd.py", line 209, in fit

    
self.fit_transform(X)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\utils\_set_output.py", line 295, in wrapped

    
data_to_wrap = f(self, X, *args, **kwargs)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\base.py", line 1474, in wrapper

    
return fit_method(estimator, *args, **kwargs)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\decomposition\_truncated_svd.py", line 229, in fit_transform

    
X = self._validate_data(X, accept_sparse=["csr", "csc"], ensure_min_features=2)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\base.py", line 633, in _validate_data

    
out = check_array(X, input_name="X", **check_params)


  File "C:\Users\gjin0008\AppData\Local\anaconda3\envs\kilosort\lib\site-packages\sklearn\utils\validation.py", line 1072, in check_array

    
raise ValueError(


ValueError
: 
Found array with 0 sample(s) (shape=(0, 51)) while a minimum of 1 is required by TruncatedSVD.

Just in case they're important, here are the settings I'm currently using, While there are only 4 active channels, as a carry-over from KS2 I've fitted on 12 blank dummy channels such that the data could be read as a Linear x16 probe.

jacobpennington · 2024-11-19T19:35:37Z

@gjin239 Unfortunately I will need to see the full log, there is a lot more information in there. Please upload it when you are able.

gjin239 · 2024-11-20T01:57:47Z

@jacobpennington Turns out I was using an old version of KS4; here's the error log now that I've updated it.
kilosort4.log

jacobpennington · 2024-11-20T19:54:15Z

I just noticed your comment about concatenating dummy channels. That can cause all sorts of unexpected behavior for Kilosort4. If whitening is removing spikes that are occurring across all channels, the best solution is to concatenate multiple tetrode recordings to be sorted simultaenously, i.e. sort 4 recordings together for 16 channels total (or more).

gjin239 · 2024-11-20T23:02:50Z

@jacobpennington Huh, interesting! Is there any chance that 4 recordings would influence each other's sorting outputs?
(And bonus question: does this apply to both KS4 and earlier versions of KS?)

jacobpennington · 2024-11-22T17:38:18Z

I'm not sure about earlier versions of Kilosort. We're not providing support for those anymore, but you could try asking another user that has sorted tetrode data on previous versions.

As for KS4, sorting the recordings together should not influence sorting results provided that the probe layout is set up correctly. On that note, you sould not use the 16-channel linear probe that gets included by default. We use that for some testing, but the contacts for that probe are only spaced 1um apart, which will result in some strange behavior for sorting real data. You should create your own probe map, with channels on the same tetrode spaced close together on the y-axis (maybe ~20um or so) and a gap of at least ~100um between tetrodes so that they're sorted separately.

This is different from previous versions of Kilosort, for which only the relative spacing of contacts mattered. For KS4, the absolute distances between channels make a difference.

gjin239 · 2024-11-23T05:54:34Z

@jacobpennington Sounds good, I'll give that a go.
Just one more question: the raw version of the data we use contains a lot of large-amplitude artifacts. In KS2, we had to crop these out for effective spike detection, causing recordings to have different lengths. Does artifact thresholding (or another preprocessing feature) allow KS4 to automatically remove these large artifacts and make it easier for us to concatenate multiple recordings?

jacobpennington · 2024-11-23T06:03:23Z

There is a parameter for very basic artifact removal, artifact_threshold. Any batch containing at least one sample with an absolute value greater than that number will be zeroed out. It's a ham-handed approach that we hope to improve later on, but that's what's there for now. As long as it doesn't end up removing long stretches all at once (i.e. several batches in a row), it seems to work well enough.

gjin239 · 2024-11-24T05:39:54Z

@jacobpennington I've created the new file and probe, and now I'm getting a different error:
kilosort4.log
Another thing I'm curious about: does the CMR affect anything here at all, and should I be disabling it?
(Thank you for all the information so far, by the way; it's been very insightful!)

jacobpennington · 2024-12-03T01:26:56Z

@gjin239 your batch size is set to the entire duration of the recording, which is going to cause all sorts of problems. As a general rule, you should start by running KS4 without changing any of the parameters from their defaults, except for the sampling rate and number of channels. The default batch size of 60000 should work fine, or 50000 if you want it to be a round number of seconds for your sampling rate. If you run into issues with that (i.e. not enough data per batch because of a low channel count) you can try doubling or tripling that number, but anything more than a ~20 second batch size is well outside the expected range.

jacobpennington · 2024-12-03T01:28:06Z

As for your other question, I don't see any reason why you would need to disable CMR.

gjin239 · 2024-12-03T04:15:42Z

@jacobpennington KS4 can run through the entirety of spike sorting now, and the phy outputs are consistent with individual files. Thank you!
However, it currently seems inconsistent for if diagnostic diagrams and phy parameters are saved at all- am I supposed to be clicking the 'run' button multiple times to get all the outputs I need?

jacobpennington · 2024-12-03T18:27:32Z

I'm not sure what you mean about the inconsistent output. You should only click run once for each sorting job. It's possible there's a bug related to running multiple jobs back to back in the same GUI session (i.e. without closing the GUI and then launching it again), but it's hard to say without seeing what outputs (or lack of outputs) you're referring to.

If you're trying to sort many recordings in sequence, it would be simpler to use the API. I.e. something like:

from kilosort import run_kilosort

data_paths = ['/path/to/data1.bin', '/path/to/data2.bin', ...]
probe_paths = ['/path/to/probe1.json', '/path/to/probe2.json', ...]
results_paths = ['/path/to/save_results1', 'path/to/save_results2/', ...]

for data, probe, results in zip(data_paths, probe_paths, results_paths):
    settings = {
        'n_chan_bin': 16, 'probe_path': probe, 'filename': data,
        'results_dir': results, ... # other settings
        }
    ops, st, clu, ... (etc) = run_kilosort(settings, ...)
    # ... do whatever you need to do with the results if anything
    del(ops, st, clu, ... etc)

You can see more details about that here: https://kilosort.readthedocs.io/en/latest/tutorials/basic_example.html

gjin239 · 2024-12-07T03:28:59Z

@jacobpennington I solved this issue, looks like it just takes a while longer to generate all the phy files because of the data size. Thank you for this information nonetheless, it is useful.

KS4 is currently generating a total of ~12 clusters for a combined dataset, with each cluster clearly consisting of several different merged cells- e.g. these spikes from the same initial identified clusters.

Is there a setting similar to AUC Split from earlier versions of KS that can be adjusted, or should I keep playing with Th_universal until I reach a happy medium?

jacobpennington · 2024-12-10T21:01:46Z

Changes to Th_universal would not directly affect how clusters are merged. If small fluctuations are being detected as spikes but are actually noise, you should increase the threshold. If you're missing spikes at lower amplitudes, decrease the threshold. If you think units are being overmerged or undermerged, you could try tweaking ccg_threshold. There's a full list of parameters with their descriptions in kilosort.parameters.py, or you can mouse-over the parameter names in the GUI to see the same descriptions.

gjin239 · 2024-12-14T04:41:47Z

@jacobpennington I see, thank you. However, it doesn't really seem like tweaking ccg_threshold changes much on my end- would it be because their features are too similar?
E.g. threshold of 0.5 (1, 15 units detected) VS 0.05 (2, 14 units detected).
(1)
(2)

Another query I have is whether KS4 has trouble with certain chunks of data: you can see clearly here that there are two sections where spikes were not detected in any part of the recordings.

In the KS4 GUI itself the display appears blank for these sections.

As in the raw data in these sections are perfectly fine, I was wondering what might be causing these.

jacobpennington assigned marius10p Nov 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitening filter removing spikes #815

Whitening filter removing spikes #815

gjin239 commented Nov 10, 2024

morales-gregorio commented Nov 13, 2024

gjin239 commented Nov 14, 2024

jacobpennington commented Nov 15, 2024

gjin239 commented Nov 16, 2024

jacobpennington commented Nov 19, 2024

gjin239 commented Nov 20, 2024

jacobpennington commented Nov 20, 2024

gjin239 commented Nov 20, 2024

jacobpennington commented Nov 22, 2024 •

edited

Loading

gjin239 commented Nov 23, 2024

jacobpennington commented Nov 23, 2024

gjin239 commented Nov 24, 2024

jacobpennington commented Dec 3, 2024 •

edited

Loading

jacobpennington commented Dec 3, 2024

gjin239 commented Dec 3, 2024

jacobpennington commented Dec 3, 2024 •

edited

Loading

gjin239 commented Dec 7, 2024

jacobpennington commented Dec 10, 2024

gjin239 commented Dec 14, 2024

Whitening filter removing spikes #815

Whitening filter removing spikes #815

Comments

gjin239 commented Nov 10, 2024

morales-gregorio commented Nov 13, 2024

gjin239 commented Nov 14, 2024

jacobpennington commented Nov 15, 2024

gjin239 commented Nov 16, 2024

jacobpennington commented Nov 19, 2024

gjin239 commented Nov 20, 2024

jacobpennington commented Nov 20, 2024

gjin239 commented Nov 20, 2024

jacobpennington commented Nov 22, 2024 • edited Loading

gjin239 commented Nov 23, 2024

jacobpennington commented Nov 23, 2024

gjin239 commented Nov 24, 2024

jacobpennington commented Dec 3, 2024 • edited Loading

jacobpennington commented Dec 3, 2024

gjin239 commented Dec 3, 2024

jacobpennington commented Dec 3, 2024 • edited Loading

gjin239 commented Dec 7, 2024

jacobpennington commented Dec 10, 2024

gjin239 commented Dec 14, 2024

jacobpennington commented Nov 22, 2024 •

edited

Loading

jacobpennington commented Dec 3, 2024 •

edited

Loading

jacobpennington commented Dec 3, 2024 •

edited

Loading