-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor semmeddb SmartAPI annotation to better represent text snippets #833
Comments
I believe this can be done with a JQ wrap template applied on SemMedDB. |
Great idea, @rjawesome . Though hold off on working on this for a moment. @colleenXu had a chat earlier today while they get some further clarifications on that structure and how the UI consumes it. But the jq templates do seem like a good option when it comes down to implementation! |
For BioThings SEMMEDDB, we want to post-process some of the sub-query response data into a special TRAPI format (sentence/publication info). Example SEMMEDDB data
https://biothings.ci.transltr.io/semmeddb/association/C0043481-STIMULATES-4780 (where this comes from) We want to post-process each element in the Note:
For testing, this TRAPI query should only return the example data as 1 TRAPI edge
Then we'll want modified x-bte annotation
I have modifications stored on this branch.
Example:
Then we want to format the SEMMEDDB
|
I notice some edges where the evidence_count (>50) doesn't match the number of text-snippet edge-attributes (29). Maybe it's worth double-checking? I imagine it could be accurate, if diff records had overlapping sets of publications -> merge into 1 KG edge and add the evidence_counts togther. But there also might be some overwriting/loss of data? Example 1
Send this query through your local instance (semmeddb-only):
There should be 1 edge in the response. Console logs say there's 3 records involved (merged into 1 edge?). The evidence count is 58, but there's only 29 text-snippet edge-attributes (32 edge-attributes total). I would have expected the max of 50, if those 58 were 58 unique PMIDs... Example 2
Send this query through your local instance (semmeddb-only):
There should be 1 edge in the response. Console logs say there's 5 records involved (merged into 1 edge?). The evidence count is 83, but there's only 29 text-snippet edge-attributes (32 edge-attributes total). |
@colleenXu @tokebe Changes that I pushed (biothings/bte_trapi_query_graph_handler#219):
|
@rjawesome Decision from a meeting between myself and @colleenXu: Can you ignore the existing evidence count (@colleenXu will be removing evidence count from the response mappings) and then add special behavior to generate evidence count for Semmeddb? This would just be a straight count of PMIDs after your deduplication code. |
I've just pushed updates to the override yaml to remove the If you still want to see the old behavior, you can adjust the override to use the older commit's version. |
|
Going to revert all PRs related to this due to TRAPI edge-attribute problems found (see notes starting here) . Needs a rethink
(Don't revert biothings/bte_trapi_query_graph_handler#220 from #880. That's not related enough that it's fine) |
Messy notes:
|
Requirements note: We may want to rewrite the requirements anyways to make clear what we want the record-merging / |
TMKP represents their text snippets in a way that the UI is able to display them. In contrast, for SemMedDB, the UI only displays the first sentence of the abstract. More analysis on what BTE is doing is in NCATSTranslator/Feedback#625 (comment), and the TMPK solution is described in NCATSTranslator/Feedback#625 (comment).
The text was updated successfully, but these errors were encountered: