-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Local results and perf comparisons for queries #6206
base: main
Are you sure you want to change the base?
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
5abc109
to
3d18cc9
Compare
3d18cc9
to
e93971e
Compare
"params": { | ||
"shas": "String" | ||
}, | ||
"tests": [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will probably need a linter for this later to enforce this structure I guess, maybe put up a warning about having no test case
for i, test in enumerate(tests): | ||
new_results = query_clickhouse(query, test) | ||
base_results = query_clickhouse(base_query, test) | ||
if new_results != base_results: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comparison might be brittle if the results returned by the query is not sorted. But I think we could keep it simple with the current implementation.
A more complex approach would be to provide an option to compare the results in strict mode (same order) and relax mode (in any order)
tests = json.load(f).get("tests", []) | ||
with open(REPO_ROOT / "torchci" / "clickhouse_queries" / query / "query.sql") as f: | ||
query = f.read() | ||
for test in tests: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this from_now
part in get_base_query
too? I think we should be able to have just one get_query
function that accepts the query name and the SHA, the use it to get both the current and the base queries.
print(table) | ||
|
||
|
||
def results_compare(args: argparse.Namespace) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the comparison, an edge case might be when adding a new query or, in rarer case, to delete one.
return args | ||
|
||
|
||
def get_query_id(query: str, params: dict) -> Optional[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It might be cleaner to have this function declared inside get_query_ids
as it's the only place uses it while get_query_ids
is used everywhere else
] | ||
|
||
|
||
def format_comparision_string(new: float, old: float) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this function used anywhere?
help="Number of times to run the query. Only relevant if --perf is used", | ||
default=10, | ||
) | ||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it seems like a better interface to have this script accepts 2 arguments like -a
and -b
and compares the two sides, which can be set to any SHA or branch. We could have a special syntax like current
to denote the current SHA/branch
Second commit onwards are the actual changes. The first commit just changes all the formatting for the existing files.
This changes the formatting of the params.json file to allow for example inputs for the queries that can be used to compare results and performance to be used in the script mentioned below. The new format can be seen in the README.
This also adds a script to get compare time and memory a query uses between local and a specific commit. It can also compare what the query returns.