Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed enhancement: Make Removing general concepts from query graph faster? #2388

Closed
edeutsch opened this issue Sep 25, 2024 · 8 comments
Closed
Assignees

Comments

@edeutsch
Copy link
Collaborator

Based on profiling tests in #2372 and #2373, it seems that the step that removes general concepts from the knowledge graph seems to take much longer than seems naively expected. In #2373, it look 2.6 seconds. I'm thinking it should take <0.1 seconds, so I wonder if there is some easily addressable gross inefficiency that could be remedied?

Anyone willing to put this up on the lift and isolate where it is taking the most time and computation?

@dkoslicki
Copy link
Member

Is this the same as the blocklist filter? If so, I think @kvnthomas98 could take a look at speeding it up

@edeutsch
Copy link
Collaborator Author

I don't actually know what it does exactly. I'm just thinking it seems very slow compared to other parts of the system for what I imagine it might be doing.

@amykglen
Copy link
Member

I'm pretty sure it is indeed the blocklist filter step

@dkoslicki
Copy link
Member

@kvnthomas98 Do you mind adding this to the Translator GH project page so you (or another one of us PSU people) has it on their docket?

@kvnthomas98
Copy link
Collaborator

Sure, I've added it to the backlog

kvnthomas98 added a commit that referenced this issue Oct 7, 2024
@kvnthomas98
Copy link
Collaborator

Regular Expression Matching was slowing this process down. Not a lot of our text matching for node names are regular expressions. So I created a separate field in the general_concepts.json file to specify regular expressions. And we do regular expression matching only for these fields and for the others we do a string comparision. In the below query, the time taken reduces from 0.5 to 0.15.
Before Changes:
image

After changes:
image

@kvnthomas98
Copy link
Collaborator

query given above now takes 0.18 seconds at the removing general concepts step. Good to close @edeutsch ?

@edeutsch
Copy link
Collaborator Author

Sure, fine to close.

I will say I am a bit surprised that we're doing string matching and regular expression matching. I would have expected that our graph was NodeSynonymized and our blocklist would be a set of NodeSynonymized CURIEs, and thus the matching should all be very fast dictionary lookups of CURIEs. And thus the process should take < 0.018 seconds, rather than 0.18 seconds. But anyway, 0.18 seconds is sufficient I think, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants