Merge pull request #25 from neuroinformatics-unit/responsiveness

Responsiveness logic
neuroinformatics-unit · Mar 30, 2023 · 10677fe · 10677fe
2 parents 97515c6 + b101958
commit 10677fe
Show file tree

Hide file tree

Showing 14 changed files with 1,030 additions and 257 deletions.
diff --git a/.flake8 b/.flake8
@@ -1,3 +1,4 @@
 [flake8]
 max-line-length = 79
 exclude = __init__.py,build,.eggs
+ignore = E203, W503
diff --git a/README.md b/README.md
@@ -1,4 +1,3 @@
-[![Python Version](https://img.shields.io/pypi/pyversions/cellfinder.svg)](https://pypi.org/project/cellfinder)
 [![Wheel](https://img.shields.io/pypi/wheel/cellfinder.svg)](https://pypi.org/project/cellfinder)
 [![Development Status](https://img.shields.io/pypi/status/cellfinder.svg)](https://github.com/brainglobe/cellfinder)
 [![Tests](https://img.shields.io/github/workflow/status/brainglobe/cellfinder/tests)](
@@ -27,3 +26,54 @@ main()
 ```
 
 This script will call the `main()` method and ask you for the name of the folder containing the data you want to analyse, which corresponds to a portion of the name of the data file.
+
+## Data processing
+
+The original data is stored as a nested dictionary, usually referred to as the data_raw attribute. It contains the following keys: `day`, `imaging`, `f`, `is_cell`, `r_neu`, `stim`, `trig`. For our analysis, we focus mainly on `f` and `stim`.
+
+The `f` key holds a 3D array of fluorescence traces for all cells in the recording. These cells are identified as rois. The array has dimensions (`n_sessions`, `len_session`, `n_neurons`). In each session, there are `len_session` frames, which are subdivided into multiple triggers. Triggers can be part of a stimulus or not, and their distance from each other is constant. At the beginning and end of each session, there are "baseline triggers", while the rest are stimulus triggers. A stimulus can consist of two or three parts, signalled by triggers, in which what is displayed to the animal changes. The last part always consists of drifting gratings. If there are two parts, the drifting gratings are composed of static gratings. If there are three parts, the static gratings are preceded by a grey screen.
+
+The total length of a session is given by the following formula:
+
+```python
+len_session = int(
+        (2 * n_baseline_triggers + n_stim / n_sessions * n_triggers_per_stim)
+        * n_frames_per_trigger
+    )
+```
+where len_trigger and len_baseline_trigger are the lengths of the triggers in frames.
+
+The `stim` key is a dictionary containing information about the stimuli, with `stim["stimulus"]` being the most important. It contains the sequence of randomized features of the gratings. A given stimulus, composed of `sf`/`tf`/`direction` features, is repeated three times a day, and distributed across sessions.
+
+`data_raw`, is reorganized into a `pandas.DataFrame` called signal, which is stored in the `PhotonData` object. The `pandas.DataFrame` has the following columns:
+
+```
+[
+    "day",
+    "time from beginning",
+    "frames_id",
+    "signal",
+    "roi_id",
+    "session_id",
+    "sf",
+    "tf",
+    "direction",
+    "stimulus_onset"
+]
+```
+`frames_id` corresponds to the frame number in the original data, starting from 0 for every session. `signal` is the fluorescence trace of the cell, taken directly from the `f` matrix of `data_raw`. `roi_id` is the cell's ID, `session_id` is the session ID, and `sf`, `tf`, `direction` are the stimulus features. `stimulus_onset` is a boolean indicating whether the frame is the onset of the stimulus or not. Stimulus feature cells are filled only when `stimulus_onset` is True. The `PhotonData` object performs the intersection between the original `f` matrix and the `stim` dictionary in the `fill_up_with_stim_info` method. The indexes that make this join operation possible are the `stimulus_start_frames`.
+
+Overall, the `PhotonData` object provides a more organized and accessible format for analyzing the raw data.
+
+## Spatial and temporal frequency analysis
+
+The goal of this analysis is to identify the cells that respond to the drifting gratings.
+
+The features of the stimuli of which we care about are their spatial and temporal frequency, in particular the combination of the two. This is why we focus on the `sf` and `tf` columns of the `signal` dataframe. Combinations of these two features are repeated n times, where `n = len(directions) * len(repetitions)`.
+
+In order to compute the various statical analysis, two frames windows are taken into account, the response window, in the drifting grating part, and the baseline window, in the static or gray part of the stimulus. The mean is computed across the frames in these windows, and the difference between the two means is computed. These values are stored in the `response` `pandas.DataFrame`.
+
+Three non-parametric statistics are computed in order to quantify id the response to `sf`/`tf` combinations is significant:
+- The Kruskal-Wallis test
+- The Wilcoxon rank-sum test
+- The sign-rank test
diff --git a/pyproject.toml b/pyproject.toml
@@ -26,7 +26,8 @@ dependencies = [
     "types-PyYAML",
     "h5py",
     "python-decouple",
-    "pandas"
+    "pandas",
+    "scipy",
 ]
 
 [project.urls]