`pu.pileupsWithControl` might skip some pile-up regions or raise vague `domain_score not in dict2` AssertionError #146

nvaulin · 2024-05-25T11:38:59Z

Hi, I bumped into some unclear behavior of coolpup.py. Finally, I managed to overcome the problem, but maybe this issue might be helpful for other users and might help to improve this awesome tool.

TLDR:

AssertionError: domain_score not in dict2 from pu.pileupsWithControl means there are regions you want to pile-up may be intersected or close to bad bins.

Story

I tried to get rescaled pileup of TADs with calculation of TADs scores (strengths). I've done everything according to the Distribution of TAD strength scores part of the tutorial. For your example HFF_MicroC data everything worked fine, but for my data the behavior was strange:

pu.pileupsWithControl raises AssertionError: domain_score not in dict2 form the accumulate_values function

I found out that there were too many NaNs in the expected df. (Yes, I should pay more attention to my data, but back to the coolpup.py). I recalculated expected without respect to the centromeres and the AssertionError disappeared.

But then, after pileup ran without any errors, I found that length of pup.loc[0].domain_score is less the len(tads). I tried to record all the TADs processed by the add_domain_score and found that some of the TADs are missed:

So, these unprocessed TADs were skipped silently, without any warning, When I tried to run pu.pileupsWithControl specifically on the unprocessed TADs, it gave me AssertionError: domain_score not in dict2 again.

These unprocessed TADs were first or last TADs on their chromosomes and, apparently, are intersected with Hi-C map black stripes ("bad bins"). For their snippets data was containing only NaNs. After I performed more strict TADs filtering - pileup ended up fine for my data.

So the conclusions for the users:

Always check the shapes of the input and output data
Pay attention to the handling of bad bins prior to the post-hoc analysis (e.x. filtering out TADs intersected or close to bad bins)

Conclusions for the coolpuppy team

If there are bad bins influencing excepted values or intersected with TADs df -> then coolpuppy behave a bit vague:
- It raises AssertionError: domain_score not in dict2 error, which is very far away from the real problem
- In some cases it even just skips bad TADs without any warning.
May be it worth adding some additional checks if the pipeup of some particular TAD contains only NaN. I didnt find why in some cases such TADs are just dropped (they don't even pass into the postprocess_func).

Again, in my case it just solved by more strict filtering, however probably there may be cases when such TADs skippings may spoil the results.

I don't know weather this issue might be useful, especially I am not able to provide my data right now. You can close it if you want. If you will be discovering this case, then I can assist you.

System

OS: Linux Ubuntu 20.04.6 LTS
coolpuppy 1.1.0
Python 3.10.8

Sincerely,
Nikita

The text was updated successfully, but these errors were encountered:

Phlya · 2024-05-30T12:53:29Z

Thank you for reporting it Nikita! This is an important issue, at the very least regions should not be silently skipped for no clear reason... It would be more useful if you could provide some reproducible example even with some random public data... But we'll try to look into it when we find the time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pu.pileupsWithControl` might skip some pile-up regions or raise vague `domain_score not in dict2` AssertionError #146

`pu.pileupsWithControl` might skip some pile-up regions or raise vague `domain_score not in dict2` AssertionError #146

nvaulin commented May 25, 2024

Phlya commented May 30, 2024

pu.pileupsWithControl might skip some pile-up regions or raise vague domain_score not in dict2 AssertionError #146

pu.pileupsWithControl might skip some pile-up regions or raise vague domain_score not in dict2 AssertionError #146

Comments

nvaulin commented May 25, 2024

TLDR:

Story

So the conclusions for the users:

Conclusions for the coolpuppy team

Phlya commented May 30, 2024

`pu.pileupsWithControl` might skip some pile-up regions or raise vague `domain_score not in dict2` AssertionError #146

`pu.pileupsWithControl` might skip some pile-up regions or raise vague `domain_score not in dict2` AssertionError #146