Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO function for loading ops.npy at different operation system #832

Open
Tevin-yue opened this issue Dec 12, 2024 · 6 comments
Open

IO function for loading ops.npy at different operation system #832

Tevin-yue opened this issue Dec 12, 2024 · 6 comments

Comments

@Tevin-yue
Copy link

Feature you'd like to see:

Thank you very much for your work. I have some suggestions regarding adding functionalities to the I/O functions. Here's the scenario: the analysis is completed on Linux, and the post-processing is done on Windows. In this case, there are issues with load_ops, which is related to pathlib. Considering such a scenario, I think it's necessary to add functionality to accommodate the differences between Linux and Windows. You can refer to the following code for a solution.
'''

    class PosixPathToWindowsPathUnpickler(pickle.Unpickler):  
        def find_class(self, module, name):  
            if module == "pathlib" and name == "PosixPath":  
                return pathlib.WindowsPath  
            return super().find_class(module, name)  
      
    def load_with_custom_unpickler(file_path):  
        with open(file_path, 'rb') as f:   
            version = np.lib.format.read_magic(f)  
            np.lib.format._check_version(version)  
            shape, fortran_order, dtype = np.lib.format.read_array_header_1_0(f)  
            return PosixPathToWindowsPathUnpickler(f).load() 
    
    def load_ops(ops_path, device=None):
        if device is None:
            if torch.cuda.is_available():
                device = torch.device('cuda')
            else:
                device = torch.device('cpu')
    
        try:   
            ops = np.load(ops_path, allow_pickle=True).item()
        except Exception as e:  
            ops = load_with_custom_unpickler(ops_path).item()
    
        for k, v in ops.items():
            if k in ops['is_tensor']:
                ops[k] = torch.from_numpy(v).to(device)
        # TODO: Why do we have one copy of this saved as numpy, one as tensor,
        #       at different levels?
        ops['preprocessing'] = {k: torch.from_numpy(v).to(device)
                                for k,v in ops['preprocessing'].items()}
    
        return ops

’‘’

Secondly, I suggest generating a file that maps clusters to specific channels in the Kilosort results, indicating which channel each cluster belongs to. This would be quite useful for post-processing.

Thirdly, I have a minor question regarding the results: there appear to be very similar waveforms. I would like to ask if this is because Neuropixels inherently capture many such signals (which aren't observed in practice), or if it requires adjusting different parameters and re-sorting.
cluster_1_total_999

Lastly, I hope there could be a feature to extract all waveforms, not just a subset. This would also help in assessing the quality of the sorting. Just like this.
d8c7b84c9b42b14a17e57cac75f6349

Additional Context

No response

@Tevin-yue Tevin-yue changed the title FEATURE: <Please replace this text with a comprehensive title>io improve IO function for loading ops.npy at different operation system Dec 13, 2024
@Tevin-yue
Copy link
Author

One more question: I have noticed that the corresponding cluster channel information remains consistent after using Phy2. Which one is more accurate?

@jacobpennington
Copy link
Collaborator

  1. What errors are you encountering when using load_ops?
  2. There is already a utility function for this, look at kilosort.data_tools.get_best_channel to see how the channel is determined.
  3. I don't understand your question. Which signals are you saying are not observed? Waveforms within each cluster are expected to be quite similar.
  4. There are already utility functions for this, see kilosort.data_tools, in particular: get_cluster_spikes, get_best_channel, and get_spike_waveforms. There is also an example of this in this tutorial: https://kilosort.readthedocs.io/en/latest/tutorials/plotting_example.html
  5. Your "one more question:" I'm not sure what you're asking for the accuracy of. Which two things are you comparing?

@Tevin-yue
Copy link
Author

Thank you very much for your prompt response. I apologize for not clarifying the issue earlier. Let me provide more details:

  1. The issue arises when KS is run on Linux and the results are then viewed on Windows, resulting in the following error:

       NotImplementedError                       Traceback (most recent call last)
       Cell In[4], line 1
       ----> 1 ops = np.load(r'./test/ops.npy', allow_pickle=True)
       
       File E:\miniconda\envs\analysis\lib\site-packages\numpy\lib\npyio.py:456, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
           453         return format.open_memmap(file, mode=mmap_mode,
           454                                   max_header_size=max_header_size)
           455     else:
       --> 456         return format.read_array(fid, allow_pickle=allow_pickle,
           457                                  pickle_kwargs=pickle_kwargs,
           458                                  max_header_size=max_header_size)
           459 else:
           460     # Try a pickle
           461     if not allow_pickle:
       
       File E:\miniconda\envs\analysis\lib\site-packages\numpy\lib\format.py:800, in read_array(fp, allow_pickle, pickle_kwargs, max_header_size)
           798     pickle_kwargs = {}
           799 try:
       --> 800     array = pickle.load(fp, **pickle_kwargs)
           801 except UnicodeError as err:
           802     # Friendlier error message
           803     raise UnicodeError("Unpickling a python object failed: %r\n"
           804                        "You may need to pass the encoding= option "
           805                        "to numpy.load" % (err,)) from err
       
       File E:\miniconda\envs\analysis\lib\pathlib.py:962, in Path.__new__(cls, *args, **kwargs)
           960 self = cls._from_parts(args)
           961 if not self._flavour.is_supported:
       --> 962     raise NotImplementedError("cannot instantiate %r on your system"
           963                               % (cls.__name__,))
           964 return self
       
       NotImplementedError: cannot instantiate 'PosixPath' on your system
    

    When operating within the same system, I have not encountered this problem. As previously mentioned, it seems to be related to the pathlib path issue.

  2. Questions Q2, Q4, and Q5 are actually about channel issues. I noticed the use of kilosort.data_tools.get_best_channel and I understand it. In Phy, I reviewed the script it uses, which differs from what is used here (Phy uses tools from phylib, which seem to determine the cluster's channel based on amplitude rather than the template). After generating results with KS and simply loading them into Phy for viewing and saving, the number of channels for a cluster changes and can differ by one or two channels from those identified by get_best_channel. An example is as follows:

    a4f53b422694baaf2a2af249a411209

    Therefore, in Q5, I would like to ask which method is more appropriate to rely on after post-processing in Phy.

  3. In Q3, when detecting spikes in NP, I noticed many small positive spikes that differ slightly from the ideal waveforms (negative followed by positive), and the proportion seems a bit high. While NP can indeed detect signals like dendritic or even intracellular signals, such cases might exist. However, should a high proportion of these signals be considered unusual?

@jacobpennington
Copy link
Collaborator

Re: 3: are you using the default values for detection thresholds? Can you please upload kilosort4.log from the results directory?

@jacobpennington
Copy link
Collaborator

Re: channel numbers sometimes being different in KS4 versus Phy, use whichever one you decide looks more appropriate for your data. You should get similar results for both methods, but they will not always be identically. Especially if the cluster contains a lot of noise, in which case the channel assignment is not reliable.

@Tevin-yue
Copy link
Author

Re 3: Thank you for your help. Attached is a KS run log file related to question 3. Additionally, I understand the issue regarding the channels, thank you for the explanation.
kilosort4.log

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants