-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convenience: allow specifying the corpus description json file #172
base: main
Are you sure you want to change the base?
Conversation
I found it useful sometimes to have variations of the corpus description - e.g. remove swaths of modules. This patch allows multiple descriptions to co-exist under a corpus directory, and select the desired one transparently. Renamed the argument since "data path" is more of a directory than a file, "location" covers both.
@@ -238,7 +240,10 @@ def __init__(self, | |||
output) and validated. | |||
|
|||
Args: | |||
data_path: corpus directory. | |||
location: either the path to the corpus description json file in a corpus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it better that we only allow the input being the corpus_description path? The only concern is that it adds some migration cost...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that'd be better, but to avoid migration pain, we could stage it: this patch first; then warn about deprecation; then deprecate
we can skip the 2nd step because we don't have a release model and so when folks would notice the change is out of our control; but can work with e.g. Fuchsia to update their scripts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the migration plan sgtm!
@@ -32,6 +32,8 @@ | |||
# command line, where all the flags reference existing, local files. | |||
FullyQualifiedCmdLine = Tuple[str, ...] | |||
|
|||
DEFAULT_CORPUS_DESCRIPTION_FILENAME = 'corpus_description.json' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT_CORPUS...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean adding underscore('') before DEFAULT...
github believes that ''DEFAULT... means italic the word following the '_'
@@ -225,7 +227,7 @@ class ReplaceContext: | |||
|
|||
def __init__(self, | |||
*, | |||
data_path: str, | |||
location: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
location is a bit confusing, seems that data_path is a more informative name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine to change to anything, but did you read the commit message (because I'm making the opposite argument there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh i missed that part.
data_path sounds to me that can represents both file path or dir path. My major concern about the naming of location is: 1) it does not describe it is the location of what so can be a bit confusing, data_location may be more descriptive in that sense; 2) the flag name is data_path now, having different names increases the burden a little bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about corpus_description
, since we'd subsequently go with deprecating dir-based value anyway? (probably subsequently worth updating the flag, too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with corpus_description is that it is a bit too specific so if in the future we have more/different corpus formatting we have to do some migration work again. Meanwhile, data_path is a general and reasonable name imo.
I'm fine with either as long as the name is the same as the flag name, so up to you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corpus_path
(and ack on the flag). Less general than "data", reasonably ambiguous-ish to mean either dir or file, and says nothing about description or anything like that. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
corpus_path sounds great!
@@ -238,7 +240,10 @@ def __init__(self, | |||
output) and validated. | |||
|
|||
Args: | |||
data_path: corpus directory. | |||
location: either the path to the corpus description json file in a corpus |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the migration plan sgtm!
I found it useful sometimes to have variations of the corpus description - e.g. remove swaths of modules. This patch allows multiple descriptions to co-exist under a corpus directory, and select the desired one transparently.
Renamed the ctor argument since "data path" is more of a directory than a file, "location" covers both cases.