Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training descriptions serialization #154

Open
shuttle1987 opened this issue May 26, 2018 · 1 comment
Open

Training descriptions serialization #154

shuttle1987 opened this issue May 26, 2018 · 1 comment

Comments

@shuttle1987
Copy link
Member

shuttle1987 commented May 26, 2018

Related to #153

As it stands the training descriptions are made in an ad-hoc format:

#Get information about training for the names of output files.
frame = inspect.currentframe()
# pylint: disable=deprecated-method
# It was a mistake to deprecate this in Python 3.5
args, _, _, values = inspect.getargvalues(frame)
with open(os.path.join(self.exp_dir, "train_description.txt"), "w") as desc_f:
for arg in args:
if type(values[arg]) in [str, int, float] or isinstance(
values[arg], type(None)):
print("%s=%s" % (arg, values[arg]), file=desc_f)
else:
print("%s=%s" % (arg, values[arg].__dict__), file=desc_f)
print("num_train=%s" % (self.corpus_reader.num_train), file=desc_f)
print("batch_size=%s" % (self.corpus_reader.batch_size), file=desc_f)

There are a few concerning things here, especially the use of getting info directly out of the frame. I think this was a really smart way of dealing with this initially but I'd like to move over to use a more specific method from Tensorflow for extracting relevant variables, especially if more than one model is available at a time.

I propose that we make this a more easily parseable format when we write this to a file.

I'm thinking something like YAML or JSON works well here. Is there a preference for serialization format?

JSON comes in the standard library but has a bit less flexibility. That said flexibility isn't necessarily a benefit here.

@oadams
Copy link
Collaborator

oadams commented May 27, 2018

I don't have a format preference. But whatever we do here it's important that the output file gives a complete picture of the experiment so it can be reproduced. This should include, but may not be limited to: the model information and the corpus information (as well as perhaps a hash of the data too).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants