Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Local results and perf comparisons for queries #6206

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

clee2000
Copy link
Contributor

Second commit onwards are the actual changes. The first commit just changes all the formatting for the existing files.

This changes the formatting of the params.json file to allow for example inputs for the queries that can be used to compare results and performance to be used in the script mentioned below. The new format can be seen in the README.

This also adds a script to get compare time and memory a query uses between local and a specific commit. It can also compare what the query returns.

Copy link

vercel bot commented Jan 22, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
torchci ✅ Ready (Inspect) Visit Preview Jan 22, 2025 9:45pm

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2025
@clee2000 clee2000 force-pushed the csl/ch_query_perf_local branch from 5abc109 to 3d18cc9 Compare January 22, 2025 21:42
"params": {
"shas": "String"
},
"tests": []
Copy link
Contributor

@huydhn huydhn Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will probably need a linter for this later to enforce this structure I guess, maybe put up a warning about having no test case

for i, test in enumerate(tests):
new_results = query_clickhouse(query, test)
base_results = query_clickhouse(base_query, test)
if new_results != base_results:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comparison might be brittle if the results returned by the query is not sorted. But I think we could keep it simple with the current implementation.

A more complex approach would be to provide an option to compare the results in strict mode (same order) and relax mode (in any order)

tests = json.load(f).get("tests", [])
with open(REPO_ROOT / "torchci" / "clickhouse_queries" / query / "query.sql") as f:
query = f.read()
for test in tests:
Copy link
Contributor

@huydhn huydhn Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this from_now part in get_base_query too? I think we should be able to have just one get_query function that accepts the query name and the SHA, the use it to get both the current and the base queries.

print(table)


def results_compare(args: argparse.Namespace) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the comparison, an edge case might be when adding a new query or, in rarer case, to delete one.

return args


def get_query_id(query: str, params: dict) -> Optional[str]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It might be cleaner to have this function declared inside get_query_ids as it's the only place uses it while get_query_ids is used everywhere else

]


def format_comparision_string(new: float, old: float) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this function used anywhere?

help="Number of times to run the query. Only relevant if --perf is used",
default=10,
)
parser.add_argument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it seems like a better interface to have this script accepts 2 arguments like -a and -b and compares the two sides, which can be set to any SHA or branch. We could have a special syntax like current to denote the current SHA/branch

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants