-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convenience: allow specifying the corpus description json file #172
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -32,6 +32,8 @@ | |
# command line, where all the flags reference existing, local files. | ||
FullyQualifiedCmdLine = Tuple[str, ...] | ||
|
||
DEFAULT_CORPUS_DESCRIPTION_FILENAME = 'corpus_description.json' | ||
|
||
|
||
def _apply_cmdline_filters( | ||
orig_options: Tuple[str, ...], | ||
|
@@ -185,7 +187,7 @@ def __call__(self, | |
class Corpus: | ||
"""Represents a corpus. | ||
|
||
A corpus is created from a corpus_description.json file, produced by | ||
A corpus is created from a corpus description json file, produced by | ||
extract_ir.py (for example). | ||
|
||
To use the corpus: | ||
|
@@ -225,7 +227,7 @@ class ReplaceContext: | |
|
||
def __init__(self, | ||
*, | ||
data_path: str, | ||
location: str, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. location is a bit confusing, seems that data_path is a more informative name There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine to change to anything, but did you read the commit message (because I'm making the opposite argument there). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. oh i missed that part. data_path sounds to me that can represents both file path or dir path. My major concern about the naming of location is: 1) it does not describe it is the location of what so can be a bit confusing, data_location may be more descriptive in that sense; 2) the flag name is data_path now, having different names increases the burden a little bit. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The issue with corpus_description is that it is a bit too specific so if in the future we have more/different corpus formatting we have to do some migration work again. Meanwhile, data_path is a general and reasonable name imo. I'm fine with either as long as the name is the same as the flag name, so up to you :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. corpus_path sounds great! |
||
module_filter: Optional[Callable[[str], bool]] = None, | ||
additional_flags: Tuple[str, ...] = (), | ||
delete_flags: Tuple[str, ...] = (), | ||
|
@@ -238,7 +240,10 @@ def __init__(self, | |
output) and validated. | ||
|
||
Args: | ||
data_path: corpus directory. | ||
location: either the path to the corpus description json file in a corpus | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it better that we only allow the input being the corpus_description path? The only concern is that it adds some migration cost... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that'd be better, but to avoid migration pain, we could stage it: this patch first; then warn about deprecation; then deprecate we can skip the 2nd step because we don't have a release model and so when folks would notice the change is out of our control; but can work with e.g. Fuchsia to update their scripts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the migration plan sgtm! |
||
directory, or just the corpus directory, case in which | ||
`DEFAULT_CORPUS_DESCRIPTION_FILENAME` will be assumed to be available | ||
in that directory and used as corpus description. | ||
additional_flags: list of flags to append to the command line | ||
delete_flags: list of flags to remove (both `-flag=<value` and | ||
`-flag <value>` are supported). | ||
|
@@ -251,18 +256,24 @@ def __init__(self, | |
module_filter: a regular expression used to filter 'in' modules with names | ||
matching it. None to include everything. | ||
""" | ||
self._base_dir = data_path | ||
if tf.io.gfile.isdir(location): | ||
self._base_dir = location | ||
corpus_description = os.path.join(location, | ||
DEFAULT_CORPUS_DESCRIPTION_FILENAME) | ||
else: | ||
self._base_dir = os.path.dirname(location) | ||
corpus_description = location | ||
|
||
self._sampler = sampler | ||
# TODO: (b/233935329) Per-corpus *fdo profile paths can be read into | ||
# {additional|delete}_flags here | ||
with tf.io.gfile.GFile( | ||
os.path.join(data_path, 'corpus_description.json'), 'r') as f: | ||
with tf.io.gfile.GFile(corpus_description, 'r') as f: | ||
corpus_description: Dict[str, Any] = json.load(f) | ||
|
||
module_paths = corpus_description['modules'] | ||
if len(module_paths) == 0: | ||
raise ValueError( | ||
f'{data_path}\'s corpus_description contains no modules.') | ||
f'{corpus_description} corpus description contains no modules.') | ||
|
||
has_thinlto: bool = corpus_description['has_thinlto'] | ||
|
||
|
@@ -273,7 +284,7 @@ def __init__(self, | |
if corpus_description[ | ||
'global_command_override'] == constant.UNSPECIFIED_OVERRIDE: | ||
raise ValueError( | ||
'global_command_override in corpus_description.json not filled.') | ||
f'global_command_override in {corpus_description} not filled.') | ||
cmd_override = tuple(corpus_description['global_command_override']) | ||
if len(additional_flags) > 0: | ||
logging.warning( | ||
|
@@ -314,7 +325,8 @@ def get_cmdline(name: str): | |
if cmd_override_was_specified: | ||
ret = cmd_override | ||
else: | ||
with tf.io.gfile.GFile(os.path.join(data_path, name + '.cmd')) as f: | ||
with tf.io.gfile.GFile(os.path.join(self._base_dir, | ||
name + '.cmd')) as f: | ||
ret = tuple(f.read().replace(r'{', r'{{').replace(r'}', | ||
r'}}').split('\0')) | ||
# The options read from a .cmd file must be run with -cc1 | ||
|
@@ -331,8 +343,8 @@ def get_cmdline(name: str): | |
contents = tp.map( | ||
lambda name: ModuleSpec( | ||
name=name, | ||
size=tf.io.gfile.GFile(os.path.join(data_path, name + '.bc')). | ||
size(), | ||
size=tf.io.gfile.GFile( | ||
os.path.join(self._base_dir, name + '.bc')).size(), | ||
command_line=get_cmdline(name), | ||
has_thinlto=has_thinlto), module_paths) | ||
self._module_specs = tuple( | ||
|
@@ -405,6 +417,6 @@ def create_corpus_for_testing(location: str, | |
if cmdline_is_override: | ||
corpus_description['global_command_override'] = cmdline | ||
with tf.io.gfile.GFile( | ||
os.path.join(location, 'corpus_description.json'), 'w') as f: | ||
os.path.join(location, DEFAULT_CORPUS_DESCRIPTION_FILENAME), 'w') as f: | ||
f.write(json.dumps(corpus_description)) | ||
return Corpus(data_path=location, **kwargs) | ||
return Corpus(location=location, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT_CORPUS...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DEFAULT...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean adding underscore('') before DEFAULT...
github believes that ''DEFAULT... means italic the word following the '_'