New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Local results and perf comparisons for queries #6206

Draft

clee2000 wants to merge 2 commits into main from csl/ch_query_perf_local

Contributor

clee2000 commented Jan 22, 2025

Second commit onwards are the actual changes. The first commit just changes all the formatting for the existing files.

This changes the formatting of the params.json file to allow for example inputs for the queries that can be used to compare results and performance to be used in the script mentioned below. The new format can be seen in the README.

This also adds a script to get compare time and memory a query uses between local and a specific commit. It can also compare what the query returns.

vercel bot commented Jan 22, 2025 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
torchci	✅ Ready (Inspect)	Visit Preview	Jan 22, 2025 9:45pm

facebook-github-bot added the CLA Signed label

clee2000 force-pushed the csl/ch_query_perf_local branch from 5abc109 to 3d18cc9 Compare

January 22, 2025 21:42

clee2000 added 2 commits

January 22, 2025 13:43

tc

4ddb431

tc

e93971e

clee2000 force-pushed the csl/ch_query_perf_local branch from 3d18cc9 to e93971e Compare

January 22, 2025 21:43

vercel bot deployed to Preview

January 22, 2025 21:45

View deployment

huydhn reviewed

View reviewed changes

torchci/clickhouse_queries/commit_failed_jobs/params.json

+                "params": {
+                  "shas": "String"
+                },
+                "tests": []

Contributor

huydhn Jan 24, 2025 •

edited

Loading

We will probably need a linter for this later to enforce this structure I guess, maybe put up a warning about having no test case

huydhn reviewed

View reviewed changes

tools/torchci/clickhouse_query_perf.py

+                  for i, test in enumerate(tests):
+                      new_results = query_clickhouse(query, test)
+                      base_results = query_clickhouse(base_query, test)
+                      if new_results != base_results:

Contributor

huydhn Jan 24, 2025

This comparison might be brittle if the results returned by the query is not sorted. But I think we could keep it simple with the current implementation.

A more complex approach would be to provide an option to compare the results in strict mode (same order) and relax mode (in any order)

huydhn reviewed

View reviewed changes

tools/torchci/clickhouse_query_perf.py

+                      tests = json.load(f).get("tests", [])
+                  with open(REPO_ROOT / "torchci" / "clickhouse_queries" / query / "query.sql") as f:
+                      query = f.read()
+                  for test in tests:

Contributor

huydhn Jan 24, 2025 •

edited

Loading

Why isn't this from_now part in get_base_query too? I think we should be able to have just one get_query function that accepts the query name and the SHA, the use it to get both the current and the base queries.

huydhn reviewed

View reviewed changes

tools/torchci/clickhouse_query_perf.py

		print(table)


		def results_compare(args: argparse.Namespace) -> None:

Contributor

huydhn Jan 24, 2025

For the comparison, an edge case might be when adding a new query or, in rarer case, to delete one.

huydhn reviewed

View reviewed changes

tools/torchci/clickhouse_query_perf.py

		return args


		def get_query_id(query: str, params: dict) -> Optional[str]:

Contributor

huydhn Jan 24, 2025

Nit: It might be cleaner to have this function declared inside get_query_ids as it's the only place uses it while get_query_ids is used everywhere else

huydhn reviewed

View reviewed changes

tools/torchci/clickhouse_query_perf.py

		]


		def format_comparision_string(new: float, old: float) -> str:

Contributor

huydhn Jan 24, 2025

Is this function used anywhere?

huydhn reviewed

View reviewed changes

tools/torchci/clickhouse_query_perf.py

+                      help="Number of times to run the query. Only relevant if --perf is used",
+                      default=10,
+                  )
+                  parser.add_argument(

Contributor

huydhn Jan 24, 2025

IMO, it seems like a better interface to have this script accepts 2 arguments like -a and -b and compares the two sides, which can be set to any SHA or branch. We could have a special syntax like current to denote the current SHA/branch

huydhn mentioned this pull request

Sqlfluff linted queries #6210

Open

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels