-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
scripts to convert tests from old format to new format and vice-versa #97
base: main
Are you sure you want to change the base?
scripts to convert tests from old format to new format and vice-versa #97
Conversation
e905d1a
to
9f11e37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New and old are ephemeral terms. Five years from now they'll both be old. One is YAML-based, the other is Substrait test-based (or antlr4 based). Perhaps use those as names instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this run on every PR?
return not diff | ||
|
||
|
||
def compare_directories(original_dir, roundtrip_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document that this compares either format to a Substrait test format directory.
intermediate_dir = f"{temp_dir}/substrait_cases" | ||
roundtrip_dir = f"{temp_dir}/cases" | ||
uri_prefix = ( | ||
"https://github.com/substrait-io/substrait/blob/main/extensions/substrait" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By referring to the GitHub directory this test will fail when the test cases and functions change breaking the CI here. Probably better to refer to the tests in the local submodule (when the tests are available).
"https://github.com/substrait-io/substrait/blob/main/extensions/substrait" | ||
) | ||
|
||
# Step 1: Convert from `../../cases` to `./temp/substrait_cases/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're converting from initial cases, the variable may change especially if you end up pulling those locations from the command line (so CI can control where they are located).
if value_type != "str": | ||
return None | ||
pattern = r"^(?:(\d+)\s+days?,\s*)?(\d+):(\d+):(\d+)$" | ||
return re.match(pattern, value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returning a match object breaks the encapsulation -- usually the caller doesn't need to know how a function did its work. If the function later changes to not use regular expressions to parse the caller would need to be updated because the contract would have to change even though it would essentially be doing the same work -- breaking a time into parts.
Check if a string is in the format of 'X days, HH:MM:SS'. | ||
Returns a match object if successful, or None if it doesn't match. | ||
""" | ||
if value_type != "str": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring says it matches strings -- should the caller be making that determination. If so, value_type isn't needed in the signature.
|
||
def format_timestamp_tz(timestamp_with_tz): | ||
"""Convert a timestamp with timezone abbreviation to ISO 8601 with offset.""" | ||
if "-" in timestamp_with_tz and not has_last_colon_after_last_dash( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line appears to be wanting a function to test whether the provided timestamp has a particular property such as being a timestamp with an explicit timezone in it. Perhaps create a function for that purpose?
iso_interval = "".join(iso_parts) | ||
|
||
# Determine whether it's a 'iyear' or 'iday' based on units | ||
if iyear: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't hold for compound intervals, but we won't have those in YAML tests. Add a comment to state that we only will see these two types?
return f"{day_str}, {time_str}" if day_str else time_str | ||
|
||
|
||
def convert_to_old_value(value, value_type, level=0): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
old is yaml?
No description provided.