You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I bumped into some unclear behavior of coolpup.py. Finally, I managed to overcome the problem, but maybe this issue might be helpful for other users and might help to improve this awesome tool.
TLDR:
AssertionError: domain_score not in dict2 from pu.pileupsWithControl means there are regions you want to pile-up may be intersected or close to bad bins.
Story
I tried to get rescaled pileup of TADs with calculation of TADs scores (strengths). I've done everything according to the Distribution of TAD strength scores part of the tutorial. For your example HFF_MicroC data everything worked fine, but for my data the behavior was strange:
pu.pileupsWithControl raises AssertionError: domain_score not in dict2 form the accumulate_values function
I found out that there were too many NaNs in the expected df. (Yes, I should pay more attention to my data, but back to the coolpup.py). I recalculated expected without respect to the centromeres and the AssertionError disappeared.
But then, after pileup ran without any errors, I found that length of pup.loc[0].domain_score is less the len(tads). I tried to record all the TADs processed by the add_domain_score and found that some of the TADs are missed:
So, these unprocessed TADs were skipped silently, without any warning, When I tried to run pu.pileupsWithControl specifically on the unprocessed TADs, it gave me AssertionError: domain_score not in dict2 again.
These unprocessed TADs were first or last TADs on their chromosomes and, apparently, are intersected with Hi-C map black stripes ("bad bins"). For their snippets data was containing only NaNs. After I performed more strict TADs filtering - pileup ended up fine for my data.
So the conclusions for the users:
Always check the shapes of the input and output data
Pay attention to the handling of bad bins prior to the post-hoc analysis (e.x. filtering out TADs intersected or close to bad bins)
Conclusions for the coolpuppy team
If there are bad bins influencing excepted values or intersected with TADs df -> then coolpuppy behave a bit vague:
It raises AssertionError: domain_score not in dict2 error, which is very far away from the real problem
In some cases it even just skips bad TADs without any warning.
May be it worth adding some additional checks if the pipeup of some particular TAD contains only NaN. I didnt find why in some cases such TADs are just dropped (they don't even pass into the postprocess_func).
Again, in my case it just solved by more strict filtering, however probably there may be cases when such TADs skippings may spoil the results.
I don't know weather this issue might be useful, especially I am not able to provide my data right now. You can close it if you want. If you will be discovering this case, then I can assist you.
System
OS: Linux Ubuntu 20.04.6 LTS
coolpuppy 1.1.0
Python 3.10.8
Sincerely,
Nikita
The text was updated successfully, but these errors were encountered:
Thank you for reporting it Nikita! This is an important issue, at the very least regions should not be silently skipped for no clear reason... It would be more useful if you could provide some reproducible example even with some random public data... But we'll try to look into it when we find the time.
Hi, I bumped into some unclear behavior of coolpup.py. Finally, I managed to overcome the problem, but maybe this issue might be helpful for other users and might help to improve this awesome tool.
TLDR:
AssertionError: domain_score not in dict2
frompu.pileupsWithControl
means there are regions you want to pile-up may be intersected or close to bad bins.Story
I tried to get rescaled pileup of TADs with calculation of TADs scores (strengths). I've done everything according to the Distribution of TAD strength scores part of the tutorial. For your example HFF_MicroC data everything worked fine, but for my data the behavior was strange:
pu.pileupsWithControl
raisesAssertionError: domain_score not in dict2
form theaccumulate_values
functionI found out that there were too many NaNs in the expected df. (Yes, I should pay more attention to my data, but back to the coolpup.py). I recalculated expected without respect to the centromeres and the
AssertionError
disappeared.But then, after pileup ran without any errors, I found that length of
pup.loc[0].domain_score
is less thelen(tads)
. I tried to record all the TADs processed by theadd_domain_score
and found that some of the TADs are missed:So, these unprocessed TADs were skipped silently, without any warning, When I tried to run
pu.pileupsWithControl
specifically on the unprocessed TADs, it gave meAssertionError: domain_score not in dict2
again.These unprocessed TADs were first or last TADs on their chromosomes and, apparently, are intersected with Hi-C map black stripes ("bad bins"). For their snippets data was containing only NaNs. After I performed more strict TADs filtering - pileup ended up fine for my data.
So the conclusions for the users:
Conclusions for the coolpuppy team
AssertionError: domain_score not in dict2
error, which is very far away from the real problempostprocess_func
).Again, in my case it just solved by more strict filtering, however probably there may be cases when such TADs skippings may spoil the results.
I don't know weather this issue might be useful, especially I am not able to provide my data right now. You can close it if you want. If you will be discovering this case, then I can assist you.
System
Sincerely,
Nikita
The text was updated successfully, but these errors were encountered: