SQL validation is running too long, how to cut it into different pieces #764

Folux · 2024-02-05T15:30:19Z

Hey,

I'm running the SQL checks on a huge project and after 3 hours my process runs into a timeout. I'm trying to understand how I can make the SQL validation smaller because it should not take multiple hours to run, even if the project is bigger.

I'm running only on incremental changes on a branch. The branch did not change anything related to the LookML so it should not run any SQL checks in my understanding.

This is my command:
spectacles content --project huge_project --branch my-spectacles-test --incremental --remote-reset -v

I can see that it is validating hundreds of queries in the logs:

Waiting to acquire a query slot
Creating async query for delivery/provider_provider/*
Attempting to get results for 3 query tasks
Query for delivery/provider_provider/* created as query 38363719
Running query Query(explore=provider_provider n=4) [qid=38363719]
Starting query 38363719
Query task 07be71e976c618a783d7b6dd99f85c92 status is: running
Query task 2d4c7f836991a24f5dfcf6caab45f496 status is: running
Query task bbb856146bb5cc5f6b48f73f8df12b45 status is: error
Query 38363719 is running under query task fb2889c3e7e866bc58ae21806fbdd2b9
Waiting to acquire a query slot
Creating async query for delivery/provider_provider/*
Attempting to get results for 3 query tasks
Query for delivery/provider_provider/* created as query 38363673
Running query Query(explore=provider_provider n=4) [qid=38363673]

In other projects, the SQL check runs for a few minutes. I can't really understand why it runs for several hours in this project, and why it checks so many things if nothing has changed.

I could not find much in the documentation, so can you how to configure this huge SQL validation, so it does not want to check everything and runs faster?

The text was updated successfully, but these errors were encountered:

DylanBaker · 2024-02-08T09:25:58Z

Hi @Folux. Thanks for opening the issue.

Do you know if this project has any SQL errors in production? I ask as this can significantly adversely affect run times.

Our docs could probably be clearer on this, but the way we check for errors incrementally is as follows:

Generate a single SQL query for each explore with all dimensions selected in the branch/commit being tested.
Generate a single SQL query for each explore with all dimensions selected in the branch being merged into (typically production)
Find the explores where the SQL has been changed to identify those that need to have the queries run for them.
Run the single large SQL query for all the changed explores
If the explore has an error drill down to the dimension level.
For each dimension with an error, generate the SQL for that dimension in the branch being merged into (typically production) to understand if the error to the dimension is "new".

We do it this way because it is typically quicker to run the queries down to the dimension level than generating the SQL for all dimensions (though @joshtemple would should validate if this is still the case with async).

As a result of this, even if an explore doesn't have new errors, we will need to drill down to the dimension level in the explore to ascertain whether that is the case or not. I could be wrong (and please correct me if I am) but I expect there might be production errors on some large explores that are affected by your changes which is causing the long run time.

If I'm wrong, I'm not sure what the other reasons could be for long runtimes, but would be happy to dig in.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SQL validation is running too long, how to cut it into different pieces #764

SQL validation is running too long, how to cut it into different pieces #764

Folux commented Feb 5, 2024

DylanBaker commented Feb 8, 2024

SQL validation is running too long, how to cut it into different pieces #764

SQL validation is running too long, how to cut it into different pieces #764

Comments

Folux commented Feb 5, 2024

DylanBaker commented Feb 8, 2024