Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenEphysBinaryRawIO: Separate Neural and Non-Neural Data into Distinct Streams #1624

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

h-mayorquin
Copy link
Contributor

@h-mayorquin h-mayorquin commented Jan 16, 2025

The continuous.dat file in the Open Ephys Binary format often contains both neural and non-neural (ADC) channels. For detailed documentation, refer to the Open Ephys GUI User Manual.

An example dataset (provided by @rly) illustrating this structure is available here. Additionally, a related discussion can be found in the NeuroConv repository: Issue #1023. I think this is the module that the test set used

This pull request modifies the reader to separate non-neural channels into their own stream. This approach allows users to clearly distinguish between neural and non-neural data and will enable programmatic separation in downstream libraries like SpikeInterface and Neuroconv.

Implementation Details:

To maintain backward-compatibility of stream_ids, the non-neural data's stream_id is defined as the neural data's stream_id plus the total number of available streams. For example, if the neural data has a stream_id of "1" and there are three streams in total, the non-neural data will be assigned a stream_id of "4".

The stream_name for non-neural data is derived by appending _ADC to the neural data's stream_name. For instance, if the neural data's stream_name is Data, the non-neural data's stream_name will be DATA_ADC.

I welcome better suggestions for conventions of naming and any other type of feedback from people who are more familiar with this format @alejoe91

@alejoe91 alejoe91 self-requested a review January 16, 2025 19:35
@h-mayorquin
Copy link
Contributor Author

Testing data is here:

https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/150

Once the data is merged I can add a test

@alejoe91 alejoe91 self-assigned this Jan 17, 2025
@alejoe91
Copy link
Contributor

@h-mayorquin can you fix Ryan's last name here? https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/150#issuecomment-5561

Se we can merge and add a test here

Comment on lines 464 to 466
if stream_index >= len(self._sig_streams[block_index][seg_index]):
stream_index = stream_index - len(self._sig_streams[block_index][seg_index])
t_start = self._sig_streams[block_index][seg_index][stream_index]["t_start"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand correctly the ptach stream_index is not anymore a stream_index but an internal variable which is not a direct mapping to stream_index.
self._sig_streams[block_index][seg_index][stream_index]["t_start"]

If this is the case I would change this internal variable to something else to avoid mistale when reading the entire code.

Maybe I am mistunderstanding or not clear at all but this writting is a bit weird to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. This is a good idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

@h-mayorquin
Copy link
Contributor Author

I corrected Ryan's last name (my bad) on the test data repo.

Question: why do we have stream_id of the synch channel as the empty string? It is loaded with the neural data of the continuous event with a flag, shouldn't they belong to the same stream?

@alejoe91
Copy link
Contributor

Gin is merged!

@alejoe91
Copy link
Contributor

@h-mayorquin so just missing the test with the new test file and this is ready for final review, right?

@h-mayorquin
Copy link
Contributor Author

This is ready for final review now.

@h-mayorquin h-mayorquin marked this pull request as ready for review January 17, 2025 13:59
stream_id_neural = stream_id
stream_id_non_neural = str(int(stream_id) + self._num_of_signal_streams)

# Note this implementation assumes that the neural channels come before the non-neural channels
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here guys, these are the strongest assumptions that the reader makes. I wanted to check with @alejoe91 if he knows about this:

  • Is it true that the synch trace will always be the last channel even if you have non-neural channels? I could not find this in the documentation.
  • Are the neural channels and non-neural always in order and not interleaved?

If the first case is not True, we can extract the specific index from the structure.oebin. That's a bigger change but I think doable.

For the second case, if it is not true, I am afraid we will have to error somehow as the current data model does not support non-contigious streams across the buffer with the buffer slice mechanism (@samuelgarcia )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am adding error messages for this advising people to open an issue if they have interleaved neural and non-neural or synch trace that is not at the end.

@alejoe91
Copy link
Contributor

@h-mayorquin some tests are failing. This is because the array annotations have changed now, since also they need to be split across streams

@h-mayorquin
Copy link
Contributor Author

@h-mayorquin some tests are failing. This is because the array annotations have changed now, since also they need to be split across streams

I fixed them now. The array annotations were not divided for the stream.

@h-mayorquin
Copy link
Contributor Author

@alejoe91 tests are passing but the docs are failing for some reason (they have failed intermittently through this PR) not sure if related to this. Updating the branch to check.

Copy link
Contributor

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few comments. I know Alessio and Sam provide better feedback for openephys so take what you like and ignore what you don't :)

# For ADC channels multiplying by the bit_volts when units are not provided converts to Volts
units = "V" if units == "" else units

gain = chan_info["bit_volts"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a safe float? or do we need to convert it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean safe? as in numpy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I mean is it a numpy scalar (I should have written that better :) ) or is it a python float?

Copy link
Contributor Author

@h-mayorquin h-mayorquin Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For floats, there is no difference I think right? We only have to be concerned about integers. I think np.float is the same thing as float in python (from my memory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it anyway I guess null casting should have no cost.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked it up mostly from stackoverflow discussion
https://stackoverflow.com/questions/40726490/overflow-error-in-pythons-numpy-exp-function
and reddit
https://www.reddit.com/r/learnpython/comments/feakn2/numpy_overflow_help_needed/?rdt=48979
And seems like it can overflow, but honestly the issue of float overflow seems to be more benign and casting to python seems to cost precision that honestly shouldn't really matter right? But float precision is so OS dependent that I think requiring that level of precision can't really be expected from floats. Appreciate the casting though, but in general your intuition/memory of this being an int thing seems to be true.

neo/rawio/openephysbinaryrawio.py Outdated Show resolved Hide resolved
for stream_index, stream_name in enumerate(sig_stream_names):
# stream_index is the index in vector sytream names
# stream_index is the index in vector stream names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

# We defined their stream_id as the stream_index of neural data plus the number of neural streams
# This is to not break backwards compatbility with the stream_id numbering
stream_id = str(stream_index + len(sig_stream_names))
# For ADC channels multiplying by the bit_volts when units are not provided converts to Volts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean here?

the way it is written sounds like if units are provided then we SHOULD NOT worry about using the gain,? So is the gain 1 in this case and our code is fine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's not very clear without context. More context: When the units are not provided the units of ADC are Volts. This is in opposition to the neural channels where if the units are not provided the units are microvolts.

https://open-ephys.github.io/gui-docs/User-Manual/Recording-data/Binary-format.html#continuous

But the units are always the units of the data AFTER you use the gain. Only the unitless(units=="") case differs between neural and non-neural channels. Microvolts for the former, volts for the latter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, yeah that makes sense in the context (and is the same with intan so ADC being volts and neural being microvolts seems common enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-wrote the whole thing to separate unit determination from stream determination and I think it reads better now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree it makes it clearer to me at least!

neo/rawio/openephysbinaryrawio.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants