Training descriptions serialization #154

shuttle1987 · 2018-05-26T08:16:53Z

Related to #153

As it stands the training descriptions are made in an ad-hoc format:

Lines 202 to 215 in aa69575

    
           #Get information about training for the names of output files. 
        
           frame = inspect.currentframe() 
        
           # pylint: disable=deprecated-method 
        
           # It was a mistake to deprecate this in Python 3.5 
        
           args, _, _, values = inspect.getargvalues(frame) 
        
           with open(os.path.join(self.exp_dir, "train_description.txt"), "w") as desc_f: 
        
               for arg in args: 
        
                   if type(values[arg]) in [str, int, float] or isinstance( 
        
                           values[arg], type(None)): 
        
                       print("%s=%s" % (arg, values[arg]), file=desc_f) 
        
                   else: 
        
                       print("%s=%s" % (arg, values[arg].__dict__), file=desc_f) 
        
               print("num_train=%s" % (self.corpus_reader.num_train), file=desc_f) 
        
               print("batch_size=%s" % (self.corpus_reader.batch_size), file=desc_f)

There are a few concerning things here, especially the use of getting info directly out of the frame. I think this was a really smart way of dealing with this initially but I'd like to move over to use a more specific method from Tensorflow for extracting relevant variables, especially if more than one model is available at a time.

I propose that we make this a more easily parseable format when we write this to a file.

I'm thinking something like YAML or JSON works well here. Is there a preference for serialization format?

JSON comes in the standard library but has a bit less flexibility. That said flexibility isn't necessarily a benefit here.

oadams · 2018-05-27T22:32:25Z

I don't have a format preference. But whatever we do here it's important that the output file gives a complete picture of the experiment so it can be reproduced. This should include, but may not be limited to: the model information and the corpus information (as well as perhaps a hash of the data too).

shuttle1987 mentioned this issue May 26, 2018

Model descriptions serialization #153

Closed

shuttle1987 added enhancement question refactor labels May 26, 2018

This was referenced May 26, 2018

[MRG] Clean up Model #147

Merged

Write corpus information to the experiment directory. #122

Open

shuttle1987 mentioned this issue Jun 2, 2018

Model.train() needs a refactor #79

Open

shuttle1987 removed the question label Dec 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training descriptions serialization #154

Training descriptions serialization #154

shuttle1987 commented May 26, 2018 •

edited

Loading

oadams commented May 27, 2018

Training descriptions serialization #154

Training descriptions serialization #154

Comments

shuttle1987 commented May 26, 2018 • edited Loading

oadams commented May 27, 2018

shuttle1987 commented May 26, 2018 •

edited

Loading