Getting Neptune schema fails #14

hpiili · 2024-03-14T06:59:50Z

neptune-for-graphql --input-graphdb-schema-neptune-endpoint db-neptune-my-instance-1-read-replica.xxxx.eu-west-1.neptune.amazonaws.com:8182 --output-aws-pipeline-cdk --output-aws-pipeline-cdk-name MY --output-resolver-query-sdk --output-aws-pipeline-cdk-region eu-west-1

Fetching the schema is causing 98% CPU load to reader or writer and the schema fetch starts failing with errors
"Http query request failed: Request failed with status code 500
Trying with the AWS SDK"

and finally
"SDK query request failed: Request rejected because there are already too many concurrent requests being processed."

log.txt

My schema has 23 node types and 36 edge types.

How can I get the schema extraction to pass?

triggan · 2024-03-15T19:36:38Z

Hi @hpiili - what version of Neptune are you running on your cluster? This utility will use the Statistics Summary to infer schema if using engine version 1.2.1.0 or newer. If you're using an older version, it will run a set of queries (that could be performance intensive) to fetch the full list of nodes, edges, and property keys within your graph.

hpiili · 2024-03-16T06:37:18Z

I am using 1.3.0.0 Neptune. In the audit logs I can see something like what you described - a set of queries to find combinations of relations between nodes etc.
auditlogs.zip

Cole-Greer · 2024-03-27T01:22:27Z

Hi @hpiili I have identified the portion of the code responsible for overwhelming your instance with queries and am working on a solution. Would you mind sharing the instance size you are currently using?

hpiili · 2024-03-27T12:01:59Z

I am using db.x2g.xlarge - writer and one reader configuration in the cluster

hpiili · 2024-04-04T14:16:04Z

Hi @Cole-Greer
I tested your PR version from "updateGetEdgesDirections" branch. Not a complete success yet. Still fails with errors
"Http query request failed: Request failed with status code 500
Trying with the AWS SDK"

Attached the execution log from console and the Neptune auditlogs
auditlogs_2024_04_04.zip

Cole-Greer · 2024-04-04T19:59:30Z

Hi @hpiili, that's interesting. That PR changes the way that edge directions are queried such that it will run 1 larger query per edge type, to find all source and destination node types for that edge.

I setup a test graph with a similar number of edge and node types as yours, and in my tests the updated queries performed much better than the old ones. My test graph likely has way fewer edges than yours. I see in the slow query log that you many of these queries are running for almost exactly 2 minutes. Notably, the default neptune query timeout is 2 min.

Could I ask roughly how many edges you have for some of these largest edge types so I can better replicate your issue? Also if you are willing to try again, I suspect that raising the neptune query timeout (in the parameter group) will allow these queries to complete.

hpiili · 2024-04-05T09:10:31Z

Two of the biggest amounts of nodes are DeliveredPart (22035281 vertices) and AssetAssembly (3809142 vertices)
from AssetAssembly nodes we have edge to DeliveredParts (22035281 edges)
from DeliveredParts we have ~5 edges out of each

After increasing the query timeout to 10x, the schema extraction goes further. Still two 500 errors.

log_2024_04_05.txt

At the end also creating CDK fails

The command that I executed as neptune-for-graphql --input-graphdb-schema-neptune-endpoint db-neptune-pelm-lcd-dev-instance-1-read-replica.c2hkwv1gpquj.eu-west-1.neptune.amazonaws.com:8182 --output-aws-pipeline-cdk --output-aws-pipeline-cdk-name LCD --output-resolver-query-sdk --output-aws-pipeline-cdk-region eu-west-1 2>&1 | tee log_2024_04_05.txt

hpiili · 2024-04-05T09:12:54Z

I did not find a very good way of listing the edges. I tried to use
MATCH (o:DeliveredPart)-[r]->(n)
with distinct type(r) as cr,r
return count(r), collect(distinct type(r))

but that fails first with query timeout and then with out of memory. Even the db.r5.12xlarge reader was not able to finish this query.

Any better way of finding the amounts of edges that you asked for?

hpiili · 2024-04-06T08:05:02Z

I deleted now most of my data from the database in order to be able to run this command. Now I am able to create the schema for limited scope.

Second topic comes from me running this against read replica.

The code does not contain proper try catch and error handling code based on my limited coding knowledge. For example the DescribeDBClustersCommand failed because of running against read replica, but the script is not reporting anything else but fail. When I added the catch and error output, the cause of my mistake came bit more obvious.

Cole-Greer · 2024-04-09T23:40:51Z

I'm glad you were able to complete the setup on a smaller dataset. I will work on additional query modifications to improve schema fetching for your original size of graph. I will additionally review the error handling to ensure error messages are being surfaced effectively.

Cole-Greer · 2024-04-25T20:23:11Z

Hi @hpiili, I'm sorry that I have been unable to return to this issue for the last few weeks. I expect to have time to continue investigating this in mid-May.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting Neptune schema fails #14

Getting Neptune schema fails #14

hpiili commented Mar 14, 2024

triggan commented Mar 15, 2024

hpiili commented Mar 16, 2024

Cole-Greer commented Mar 27, 2024

hpiili commented Mar 27, 2024

hpiili commented Apr 4, 2024

Cole-Greer commented Apr 4, 2024

hpiili commented Apr 5, 2024 •

edited

Loading

hpiili commented Apr 5, 2024

hpiili commented Apr 6, 2024 •

edited

Loading

Cole-Greer commented Apr 9, 2024

Cole-Greer commented Apr 25, 2024 •

edited

Loading

Getting Neptune schema fails #14

Getting Neptune schema fails #14

Comments

hpiili commented Mar 14, 2024

triggan commented Mar 15, 2024

hpiili commented Mar 16, 2024

Cole-Greer commented Mar 27, 2024

hpiili commented Mar 27, 2024

hpiili commented Apr 4, 2024

Cole-Greer commented Apr 4, 2024

hpiili commented Apr 5, 2024 • edited Loading

hpiili commented Apr 5, 2024

hpiili commented Apr 6, 2024 • edited Loading

Cole-Greer commented Apr 9, 2024

Cole-Greer commented Apr 25, 2024 • edited Loading

hpiili commented Apr 5, 2024 •

edited

Loading

hpiili commented Apr 6, 2024 •

edited

Loading

Cole-Greer commented Apr 25, 2024 •

edited

Loading