Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTD processing 2: batch-queries #584

Closed
colleenXu opened this issue Mar 14, 2023 · 16 comments
Closed

CTD processing 2: batch-queries #584

colleenXu opened this issue Mar 14, 2023 · 16 comments
Labels
data source jq / jmespath On Test Related changes are deployed to Test server x-bte

Comments

@colleenXu
Copy link
Collaborator

colleenXu commented Mar 14, 2023

Intro: see intro section of #583 (comment). Originally noted in #558 (comment)

2. processing batch-queries correctly

The current x-bte-kgs-operations aren't written as batch-queries, even though the CTD API does allow batch-querying.

The problem is how BTE handles the batch-query responses. The API response is an array of associations (objects) - and each association matched to one of the input IDs. Each association has an "Input" field where the value is the matched input ID (all lowercase, has an ID-prefix for diseases (MESH or OMIM) and pathways (REACT or KEGG)).

However, BTE's default api-response-transform isn't correctly handling this - instead, it's linking the first input ID to every possible output ID.

Example:

Edit SmartAPI and run BTE locally

In a local copy of the SmartAPI yaml, copy-paste the following into the chemical2gene operation. It's changing the supportBatch and queryInputs info.

    - supportBatch: true
      useTemplating: true
      inputs:
      - id: MESH
        semantic: SmallMolecule
      outputs:
      - id: NCBIGene
        semantic: Gene
      parameters:
        inputType: chem
        inputTerms: "{{ queryInputs | joinSafe('|') }}"
        inputTermSearchType: directAssociations
        report: genes_curated
        format: json
      predicate: related_to
      response_mapping:
        "$ref": "#/components/x-bte-response-mapping/chemical2gene"

Set up a local instance of BTE to override and use your local copy of the CTD yaml. Then POST to that specific api (v1/smartapi/{id}/query endpoint):

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["MESH:C006303", "MESH:D015250"],
                    "categories": ["biolink:SmallMolecule"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            }
        }
    }
}
CTD's raw response

During execution, BTE should generate this query with two input IDs to CTD.

In CTD's raw response, some genes are only linked to the second ID D015250 / Aclarubicin, like PARP1.

    {
        "CasRN": "57576-44-0",
        "ChemicalId": "D015250",
        "ChemicalName": "Aclarubicin",
        "GeneId": "142",
        "GeneSymbol": "PARP1",
        "Input": "d015250",
        "Organism": "Homo sapiens",
        "OrganismId": "9606",
        "PubMedIds": "20399885"
    },
BTE's current flawed response

BTE links every output gene with only the first ID C006303 / acivicin / PUBCHEM.COMPOUND:294641. It's easier to see through the console log:

  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:836 has 4 +1ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:1080 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:10800 has 1 +1ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:2678 has 3 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:834 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:841 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:1676 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:2623 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:2950 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:3145 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:4778 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:2908 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:142 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:6582 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:6607 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:6647 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:331 has 1 +0ms
desired format for BTE's response

Instead, BTE should correctly link each input ID / entity with its associations. The console log should look like this:

  • some results have the first input ID C006303 / acivicin / PUBCHEM.COMPOUND:294641
  • other results have the second input ID D015250 / Aclarubicin / PUBCHEM.COMPOUND:451415
  • PARP1 (NCBIGene:142) is only linked to the second ID: PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:142. Most genes are linked to only one of the input IDs.
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:836 has 1 +1ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:1080 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:10800 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:294641_&_n1-NCBIGene:2678 has 3 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:834 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:836 has 3 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:841 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:1676 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:2623 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:2950 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:3145 has 1 +1ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:4778 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:2908 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:142 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:6582 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:6607 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:6647 has 1 +0ms
  bte:biothings-explorer-trapi:QueryResult result ID: n0-PUBCHEM.COMPOUND:451415_&_n1-NCBIGene:331 has 1 +0ms
@rjawesome
Copy link
Contributor

This should be able to be solved by a custom pairCurieWithAPIResponse function, I can work on this in the JQ and/or javascript transformer for CTD

@rjawesome
Copy link
Contributor

rjawesome commented Mar 16, 2023

Here is the pairCurieWithAPIResponse JQ solves this problem.
reduce (.response | .[]) as $item ({}; .[generateCurie($edge.association.input_id; $item.Input | ascii_upcase)] = [] + .[generateCurie($edge.association.input_id; $item.Input | ascii_upcase)] + [$item]) | map_values([.])
Will push shortly to JQ branch but I would need to double check the "Input" field is present in all queries to CTD
(this pair function could also be set in the yaml for an operation via transformer.pair_jq)

@colleenXu
Copy link
Collaborator Author

@tokebe

It's not clear to me how BTE will construct large batch-queries to CTD, and whether we'll need to make adjustments to BTE. I'm specifically thinking about:

  • url character-limits: the batch-queries are GET requests w/ inputTerms as a parameter. Will BTE construct these properly (aka not exceed the character limit)?
  • max batch size: I think CTD has a batch-size-limit of 4000 IDs. I'm not sure if putting this limit in BTE's query-handler here will ensure that BTE doesn't exceed this batch-size limit, since the comment on that line implies that it's just for pending BioThings APIs that do POST queries (also with useTemplating: true, but all x-bte operations do that now - including the CTD ones).

Notes:

  • we don't do batch-querying for any of the other external APIs
  • this endpoint also accepts POST queries, but I haven't figured out a way to do a POST query AND put the inputTerms in the requestBody rather than the parameters of the request. It seems that POST queries only allow the inputTerms to be in the parameters OR an uploaded file (tsv-only? queryFile and queryFileColumn parameter described here and here)

@tokebe
Copy link
Member

tokebe commented Oct 17, 2023

  • I don't believe BTE presently controls batch size with respect to URL character limit. This is an enhancement we should probably add. For now, it should be possible to reason about the maximum we could fit in a URL and set a conservative batch size by that.
  • That batch size limit works for everything you can set it for any smartapi id. The comment (AFAIK) was a comment you added to explain the purpose of the current entries.
  • I did some playing around trying to figure out the intended method for POST batch queries, but the documentation is rather unclear. I think it's meant to be a multipart/form-data encoded file, but how exactly queryFile is meant to work with that is beyond me at the moment.

@colleenXu
Copy link
Collaborator Author

Replying to @tokebe (thanks for the quick reply!) with my thoughts:

  • I think setting the batch-size by the minimum we can fit in a URL makes sense? Aka take the operations with the longest IDs (ones that keep the OMIM or MESH prefix probably) and do some rough calculations on how many of those can fit in the limit...
  • I'm still not clear on what would be BTE or CTD's URL character limit....do we know?
  • I'm not sure how to easily test the batch-size-limit after we set it up...
  • on the POST method...I don't think BTE is set up to generate those kinds of requests (aka send a file), right?

@colleenXu
Copy link
Collaborator Author

colleenXu commented Nov 30, 2023

I think a safe batch-size is 80 IDs, assuming a 2048 character-max for the GET url.

Rough calculations

2048 = a*x + (x-1) + b = (a+1)*x + (b-1)
Where:

  • x is the max number of IDs (round down to nearest integer)
  • a is the number of characters in each ID (in API's required format)
  • b is the number of characters in the rest of the url, which depends on the dataset/relationship and input ID namespaces
  • a*x is for all the ID characters, (x-1) is for all the pipe-delimiters

The most crucial number is a. The max number of characters for 1 input ID is 21 for REACT (Pathway) IDs.

click to see character num for all input IDs

  • 10
    • MESH IDs without prefix: 1 (C or D) plus 9 characters max according to bioregistry
    • NCBIGene IDs without prefix, estimated: the longest ID I found in my browser history is 9 characters, 106099062). I'm estimating because bioregistry doesn't give a character limit
  • 11
    • OMIM IDs with prefix, estimated: 5 (OMIM:) + 6 characters, based on looking at the new entries like 620637). I'm estimating because bioregistry doesn't give a character limit
  • 14
    • KEGG.PATHWAY IDs with custom prefix: 5 (KEGG:) + 9 characters max, based on bioregistry
  • 15
    • MESH IDs with prefix: 5 (MESH:) + 10 (explained above)
  • 21
    • REACT IDs with prefix, estimated: 6 (REACT:) + 15 characters, based on looking at the v86 (latest) new/updated topics and pathways like REACT:R-HSA-9836573.1 (Mitochondrial RNA degradation)

b = 140 for the 1 x-bte operation that uses REACT IDs as input.
For the 1 x-bte operation that uses REACT IDs as input. (An example GET url with 2 input IDs is: http://ctdbase.org/tools/batchQuery.go?inputType=pathway&inputTermSearchType=directAssociations&report=genes_curated&format=json&inputTerms=REACT:R-HSA-5669034|REACT:R-HSA-5668541)

So the equation for this situation is: 2048 = (a+1)*x + (b-1) = (21+1)*x + (140-1) = 22*x + 139, x ~ 86

Rounding down to the nearest ten gets 80.

@colleenXu
Copy link
Collaborator Author

colleenXu commented Nov 30, 2023

@tokebe

I'm getting JQ-related errors when I try to test the batch-size limit, using the process in the next section.

  1. If I start with the main branches, things seem to work okay. 1 of the 4 sub-queries fails, but that kind of error seems to be happening on dev/ci when I'm not testing the batch-size limit too.
Recreating the error with a simpler example, not testing the batch-size-limit

Noticed on ci/dev instances, but not test/prod. No overrides, no batch-size-limit-testing adjustments done.

TRAPI query:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["MESH:D020138"],
                    "categories": ["biolink:Disease"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            }
        }
    }
}

2/3 subqueries fail with Error: jq: error (at <stdin>:0): Cannot iterate over null (null): see full console logs ctd-error-1.txt

Interestingly, I think those two sub-queries are returning 0 hits: this and this, vs the 3rd sub-query that has hits

  1. If I start with the dev branches, I encounter errors after doing the SmartAPI override (see step 6 in the next section). However, I also encounter this kind of error when I don't set the batch-size-limit (step 2) and when I use a simpler 2-ID query that normally works in dev (w/o the override).
recreating the problem with a simple query

Follow the steps in the next section, but don't set the batch-size-limit (step 2 in the next section)

Then do the simple query that works in dev without the override:

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["REACT:R-HSA-5669034", "REACT:R-HSA-5668541"],
                    "categories": ["biolink:Pathway"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            }
        }
    }
}

I'd normally get 134 results, but instead I get 0 results. In the console logs, the sub-query fails with Error: jq: error (at <stdin>:0): explode input must be a string. The full console logs are: simple-ctd-error-dev.txt


My full process to test the batch-size-limit

  1. Setup: Check out the right branches (either main or dev), pnpm i.
2. Adding the batch-size limit to the query-handler's config

To API_BATCH_SIZE, add:

  {
    id: '0212611d1c670f9107baf00b77f0889a',
    name: 'CTD API',
    max: 80,
  },

3. Setting an override to use CTD x-bte annotation for batch-querying

I actually override to my local file with the branch checked out, but this should do the same thing.

Paste into BTE's smartapi_overrides file, so it'll use this x-bte annotation:

{
  "conf": {
    "only_overrides": true
  },
  "apis": {
    "0212611d1c670f9107baf00b77f0889a": "https://raw.githubusercontent.com/NCATS-Tangerine/translator-api-registry/ctd-batch-query/CTD/smartapi.yaml"
  }
}

  1. pnpm build, then API_OVERRIDE=true pnpm run smartapi_sync to set up BTE with the changes and get the x-bte info

  2. Run BTE, then query CTD thru BTE (http://localhost:3000/v1/smartapi/0212611d1c670f9107baf00b77f0889a/query) with this request body trapi_300react.txt. It's a TRAPI query for 300 REACT IDs (Pathway) -> Gene.
    BTE then runs 4 sub-queries, which is correct (3*80 + 60).

    1. Note: All the IDs are real IDs for human pathways (from Reactome's Complete List of Pathways), but CTD may not have data for them.
  3. If I started with dev instances and run that query, all the sub-queries fail with the message The error is Error: jq: error (at <stdin>:0): explode input must be a string
    Full console logs: console-300react.txt

Console log of a sub-query

  bte:call-apis:query using template builder +0ms
  bte:call-apis:query query success, transforming hits->records... +0ms
  bte:api-response-transform:index api name CTD API +0ms
  bte:api-response-transform:index api tags: translator,ctd +0ms
  bte:call-apis:query Failed to make to following query: {"url":"http://ctdbase.org/tools/batchQuery.go","params":{"inputType":"pathway","inputTerms":"REACT:R-HSA-446193|REACT:R-HSA-196780|REACT:R-HSA-9636467|REACT:R-HSA-9033658|REACT:R-HSA-70895|REACT:R-HSA-352238|REACT:R-HSA-168302|REACT:R-HSA-162588|REACT:R-HSA-450385|REACT:R-HSA-8851680|REACT:R-HSA-5621481|REACT:R-HSA-75102|REACT:R-HSA-5218900|REACT:R-HSA-9662834|REACT:R-HSA-5621575|REACT:R-HSA-5690714|REACT:R-HSA-389356|REACT:R-HSA-389357|REACT:R-HSA-389359|REACT:R-HSA-9013148|REACT:R-HSA-68689|REACT:R-HSA-9833576|REACT:R-HSA-69017|REACT:R-HSA-447041|REACT:R-HSA-5607763|REACT:R-HSA-5607764|REACT:R-HSA-5660668|REACT:R-HSA-6811434|REACT:R-HSA-6811436|REACT:R-HSA-6807878|REACT:R-HSA-204005|REACT:R-HSA-140180|REACT:R-HSA-199920|REACT:R-HSA-442742|REACT:R-HSA-442720|REACT:R-HSA-442729|REACT:R-HSA-8874211|REACT:R-HSA-399956|REACT:R-HSA-2024101|REACT:R-HSA-389513|REACT:R-HSA-5358747|REACT:R-HSA-5358749|REACT:R-HSA-5358751|REACT:R-HSA-5358752|REACT:R-HSA-211999|REACT:R-HSA-111996|REACT:R-HSA-1296052|REACT:R-HSA-4086398|REACT:R-HSA-111997|REACT:R-HSA-111932|REACT:R-HSA-2025928|REACT:R-HSA-419812|REACT:R-HSA-111933|REACT:R-HSA-901042|REACT:R-HSA-111957|REACT:R-HSA-72737|REACT:R-HSA-8955332|REACT:R-HSA-5576891|REACT:R-HSA-9733709|REACT:R-HSA-5694530","inputTermSearchType":"directAssociations","report":"genes_curated","format":"json"},"method":"get","timeout":50000,"headers":{"User-Agent":"BTE/dev Node/v18.16.1 darwin"}}. The error is Error: jq: error (at <stdin>:0): explode input must be a string
  bte:call-apis:query  with Error: jq: error (at <stdin>:0): explode input must be a string
  bte:call-apis:query 
  bte:call-apis:query     at ChildProcess.<anonymous> (/Users/colleenxu/Desktop/BTE_typescript_pnpm/biothings_explorer/node_modules/.pnpm/[email protected]/node_modules/node-jq/lib/exec.js:31:35)
  bte:call-apis:query     at ChildProcess.emit (node:events:513:28)
  bte:call-apis:query     at ChildProcess.emit (node:domain:489:12)
  bte:call-apis:query     at maybeClose (node:internal/child_process:1091:16)
  bte:call-apis:query     at ChildProcess._handle.onexit (node:internal/child_process:302:5)
  bte:call-apis:query     at Process.callbackTrampoline (node:internal/async_hooks:130:17) +24ms

@tokebe
Copy link
Member

tokebe commented Nov 30, 2023

Looks like this is a problem in the JQ string, largely due to CTD's inconsistent response structure depending on if anything was found or not. Working on a fix...

@tokebe
Copy link
Member

tokebe commented Dec 4, 2023

Ok, turns out this was less CTD's inconsistencies and more JQ's inconsistencies (and my lack of familiarity...). I've pushed a fix to dev which should address this.

@colleenXu
Copy link
Collaborator Author

The fix worked!

I tested all 3 example queries in my previous post in both dev and main (CI) branches. Everything worked as-intended without any errors.

The PRs to deploy are:

@colleenXu
Copy link
Collaborator Author

colleenXu commented Dec 6, 2023

Update!

I've included the CTD x-bte changes in the overrides biothings/bte-server#4 - so it'll deploy alongside the orphanet changes. I think the override will end up deploying with or after the code changes (JQ / batch-size-limit), so I don't anticipate any issues. (aka I think NodeNorm will deploy the orphanet changes at the same pace or slower than our deployments to instances).

@colleenXu colleenXu added the On Dev Related changes are deployed to Dev server label Dec 13, 2023
@colleenXu
Copy link
Collaborator Author

I think we can close this issue once:

  • the code changes (JQ/batch-size-limit) + overrides are deployed to Prod
  • I merge the yaml PR

We'll then have a separate process to remove the overrides (not needed once the yaml PRs are all merged / registrations refreshed).

@colleenXu
Copy link
Collaborator Author

@tokebe

I double-checked and it's not working on CI, probably because of the larger cache-update issues (recent lab Slack convo)

My test

POST to CTD through BTE CI https://bte.ci.transltr.io/v1/smartapi/0212611d1c670f9107baf00b77f0889a/query

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["KEGG.PATHWAY:hsa05323", "KEGG.PATHWAY:hsa04917"],
                    "categories": ["biolink:Pathway"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            }
        }
    }
}

Based on the logs in the TRAPI response, I can tell that 2 sub-queries were sent (1 ID each). But if batch-querying was working, only 1 sub-query should have been sent. This may mean BTE CI didn't successfully use the override.

        {
            "timestamp": "2023-12-16T06:08:27.395Z",
            "level": "DEBUG",
            "message": "call-apis: 2 planned queries for edge e01",
            "code": null
        },
        {
            "timestamp": "2023-12-16T06:08:27.792Z",
            "level": "DEBUG",
            "message": "Successful GET http://ctdbase.org (1 ID): Pathway > has_participant > Gene (obtained 70 records, took 121ms)",
            "code": null
        },
        {
            "timestamp": "2023-12-16T06:08:27.808Z",
            "level": "DEBUG",
            "message": "Successful GET http://ctdbase.org (1 ID): Pathway > has_participant > Gene (obtained 89 records, took 178ms)",
            "code": null
        },

@tokebe
Copy link
Member

tokebe commented Dec 18, 2023

Issue should now be addressed by 3019cec, please test again

@colleenXu
Copy link
Collaborator Author

Now it's working on BTE CI! Yay!

The previous test now works as-intended - with 1 planned batch-query. Logs:

        {
            "timestamp": "2023-12-18T21:40:08.965Z",
            "level": "DEBUG",
            "message": "call-apis: 1 planned queries for edge e01",
            "code": null
        },
        {
            "timestamp": "2023-12-18T21:40:09.492Z",
            "level": "DEBUG",
            "message": "Successful GET http://ctdbase.org (2 IDs): Pathway > has_participant > Gene (obtained 159 records, took 181ms)",
            "code": null
        },

I also tested the batch-size-limit=80 with a 150-QNode-IDs query (current max, see #762), and it worked too. Two sub-queries were sent (80 + 70)

Batch-size-limit test

POST to CTD through BTE CI https://bte.ci.transltr.io/v1/smartapi/0212611d1c670f9107baf00b77f0889a/query using the attached JSON as the request body: CTD-150ReactIDs.txt

Logs show that two sub-queries were sent (80 + 70), so the batch-size-limit of 80 was respected

       {
            "timestamp": "2023-12-18T21:41:52.878Z",
            "level": "DEBUG",
            "message": "call-apis: 2 planned queries for edge e01",
            "code": null
        },
        {
            "timestamp": "2023-12-18T21:42:02.309Z",
            "level": "DEBUG",
            "message": "Successful GET http://ctdbase.org (80 IDs): Pathway > has_participant > Gene (obtained 1703 records, took 195ms)",
            "code": null
        },
        {
            "timestamp": "2023-12-18T21:42:02.344Z",
            "level": "DEBUG",
            "message": "Successful GET http://ctdbase.org (70 IDs): Pathway > has_participant > Gene (obtained 2603 records, took 290ms)",
            "code": null
        },

@colleenXu colleenXu added On CI Related changes are deployed to CI server and removed On Dev Related changes are deployed to Dev server labels Dec 18, 2023
@tokebe tokebe added On Test Related changes are deployed to Test server and removed On CI Related changes are deployed to CI server labels Dec 21, 2023
@colleenXu
Copy link
Collaborator Author

I've confirmed that things work as-expected after the Prod deployment. Closing issue, updating the registered yamls and registrations, and opening another issue for removing the overrides.

Example: POST to https://bte.transltr.io/v1/smartapi/0212611d1c670f9107baf00b77f0889a/query, will get a response with results and a log saying Successful GET http://ctdbase.org (2 IDs): Pathway > has_participant > Gene (obtained 159 records, took 215ms). This shows that the batch-query occurred.

{
    "message": {
        "query_graph": {
            "edges": {
                "e01": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:related_to"]
                }
            },
            "nodes": {
                "n0": {
                    "ids": ["KEGG.PATHWAY:hsa05323", "KEGG.PATHWAY:hsa04917"],
                    "categories": ["biolink:Pathway"]
                },
                "n1": {
                    "categories": ["biolink:Gene"]
                }
            }
        }
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data source jq / jmespath On Test Related changes are deployed to Test server x-bte
Projects
None yet
Development

No branches or pull requests

4 participants