-
Notifications
You must be signed in to change notification settings - Fork 0
Conversation
…ical calculations (last one stil; buggy)
What there is in this PR?
Let me know if the description in the README.md file is not sufficient to understand the PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey excellent work @lauraporta! The code is super clear and easy to review, and it runs very smoothly. It is also very easy to test now with the new features (e.g. setup_for_testing.py). I've added some suggestions to discuss but there are no hard changes requested, please let me know if anything isn't clear. Looking forward to the dashboard!
self.n_sessions = data_raw.frames.shape[0] | ||
self.n_roi = data_raw.frames[0].shape[0] | ||
self.n_frames_per_session = data_raw.frames[0].shape[1] | ||
self.day_stim = data_raw.day["stimulus"] | ||
self.day_stim = data_raw.day["stimulus"] # seems useless |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these variables, I get IDE style warning that all class attributes should be defined in init (i.e. initialise as None). Unfortunately, the IDE complains even if putting a type hint there rather than initialising. It does seem worth doing this even though annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
related to this, because this function contains routines that make assumptions based on the experimental setup (e.g. baseline before and after triggers) e.g. self.n_all_triggers - 2 * self.n_session_boundary_baseline_triggers ) * self.n_sessions
. Maybe this function can be factored out somewhere, if it may change depending on experimental setup (e.g. there could be multiple versions). Then, it can return all these variables in the init where they are saved as class attributes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This is a complex refactor of the class PhotonData. However to do so I need more information on how to handle new types of raw_data that come from multiple days as well. I will do In a separate PR.
New issue opened #37
@@ -72,13 +78,10 @@ def set_general_variables(self, data_raw): | |||
The raw data object from which the data will be extracted | |||
""" | |||
self.screen_size = data_raw.stim[0]["screen_size"] | |||
self.is_cell = data_raw.is_cell | |||
self.day_roi = data_raw.day["roi"] | |||
self.day_roi_label = data_raw.day["roi_label"] | |||
self.n_sessions = data_raw.frames.shape[0] | |||
self.n_roi = data_raw.frames[0].shape[0] | |||
self.n_frames_per_session = data_raw.frames[0].shape[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(if my understanding is correct, and each element of frames is a session), just checking, can it always be assumed the number of frames will be the same across sessions? (e.g. if one day, things go wrong halfway through the session etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually yes the number of frames are the same across sessions if everything turned out well. I know this a priory actually, I could just add a check.
#37
|
||
|
||
@pytest.fixture | ||
def get_sf_tf_instance(get_config, get_data_raw_object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a small point, but I believe the tests are all currently interlinked. e.g. create raw data object (is tested) -> from this make PhotonData (is tested) -> from this make Frequency results. This makes a lot of sense but creates some interdependency between tests (e.g. if something breaks in PhotonData is will break all tests after this). You could consider splitting this interdependency by creating a completely new fake photondata.signal table to feed into FrequencyAnalysis. However, this will be more work and in practice if PhotonData does break then you'd just fix it and re-run tests. But just something to note
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you mean. Another idea is to abracting more the analysis methods, make them static, in a way that they work independently from which kind of object they are handling under the hood. This is a bit hard and I didn't figured out yet a good way to do it, but I think that you are right. It is worth keeping this as an objective.
window_mask_baseline, | ||
) = sf_tf.get_response_and_baseline_windows() | ||
|
||
assert ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it also be possible / does it also make sense to check all variables, and all arrays elements by element? is it possible that the len() might be the same but somehow actual values are different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible and you are right 🥹 But I didn't came up with an elegant way to do this without hard coding the numbers.
sf_tf.calculate_mean_response_and_baseline() | ||
p_values = sf_tf.nonparam_anova_over_rois() | ||
|
||
decimal_points = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this onwards could be refactored the below and be equivilent (in other tests too):
p_values = list(p_values.values())
p_values_seed_101 = np.array([0.055, 0.473, 0.324, 0.127, 0.653])
assert np.allclose(p_values, p_values_seed_101, rtol=0, atol=1e-03)
this np.fromiter is very cool I didn't know about that
np.fromiter(p_values.values(), dtype=float), decimal_points | ||
) | ||
# based on random seed = 101 | ||
p_values_seed_101 = np.array([0.055, 0.473, 0.324, 0.127, 0.653]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a quick doc on how these test values were generated might be useful. It might be worth considering generating test data and results on the fly as well as having fixed values. The benefit with this is you are not tied to the random seed and generate with random values / arbitary ranges of input data to interrorgate the software a little more and possibly detect edge cases. The downside of this is you have to code the same thing twice - maybe in a different way - but are always liable to make the same error twice. Maybe a mix between hard-coded test values and some randomly-generated inputs is the best of both worlds.
sf_tf = get_sf_tf_instance | ||
sf_tf.responsiveness() | ||
|
||
assert len(sf_tf.responsive_rois) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall these tests are very nice and robust!! Is it also feasible to somehow test against their original analysis pipeline? e.g. run their pipeline with some .mat file input data, get their p-values etc. Then make sure all results match using this pipeline? That way you can be 100% sure the implementations yeild identical outputs (or, at least very similar to within some known tolerance).
Joe, thanks for your comments and suggestions. |
@lauraporta all sounds great, let me know if there's anything you'd like to discuss! |
No description provided.