csv file time history #115

jsantner · 2018-06-14T17:42:12Z

Fixes #
Tests added
Added entry into CHANGELOG.md
Documentation updated

Changes proposed in this pull request:

Add test .yaml file that utilizes a .csv file for time-history
Alter the validation module so that reading time-history from .csv does not create errors.

@pr-omethe-us/chemked

codecov · 2018-06-14T17:51:21Z

Codecov Report

Merging #115 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #115   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files           4      4           
  Lines         966    987   +21     
  Branches      226    231    +5     
=====================================
+ Hits          966    987   +21

Impacted Files	Coverage Δ
pyked/validation.py	`100% <100%> (ø)`	⬆️
pyked/chemked.py	`100% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57c936c...af3145c. Read the comment docs.

bryanwweber · 2018-06-14T17:58:48Z

@jsantner Why are these changes needed?

jsantner · 2018-06-14T18:09:36Z

@bryanwweber
Previously, when loading a yaml file with a time history defined in a csv file, the validator gives this error message:

  File "c:\users\jsantne\documents\github\pyked\pyked\validation.py", line 255, in _validate_isvalid_history
    n_cols = len(value['values'][0])

KeyError: 0

You can see this on the Travis report after my second commit on this branch, where I had only added a test yaml file with time history defined in a csv file.

bryanwweber

@jsantner Thanks for submitting this! A few suggestions:

I think the assumption is that the CSV filename will be specified relative to the directory of the YAML file, hence the directory argument should be unnecessary. If that's not working, I'd be interested to see a failing test case.
If possible, I think we should load and check the CSV file as well. However, I think that will fit better when we refactor all of the validation, so we can skip it for now.

bryanwweber · 2018-06-23T22:04:29Z

pyked/validation.py

+        # If reading from a file, the file will not be validated.
+        # A file can have an arbitrary number of columns, and the columns
+        # to be used are specified.
+        if type(value['values']) is list:


Why not 'filename' not in value['values'].keys()?

I agree that we can assume the CSV file is specified relative to the yaml file, but DataPoint doesn't know where the yaml file directory is unless it's specified as an argument to __init__, right? Is there a simpler way to deal with a relative path?

I tried using 'filename' not in value['values'].keys() in a previous commit but this fails when the values are given as a list. In that case, value['values'] is a list, not a dict, so if 'filename' not in value['values'].keys() raises an error.

As far as I know, just trying to open the file will try to open it relative to the working directory of the Python process. Are you saying that if I have a file structure like

|- database |---butanol |------file_1.yaml |------file_1.csv

and I start Python in the database directory, and load file_1.yaml like

>>> ChemKED('butanol/file_1.yaml')

it won't work, because Python will assume the file_1.csv is relative to database, not butanol?

As on your other PR, I think the simpler/more "pythonic" way to do this is a try...except block, rather than checking the type of the value.

Yes, that's exactly what I'm saying. I was using a script to read multiple yaml files in a complex directory structure within a database directory, and python was looking for the csv file in database, not in the folder with the yaml file.

I'm not sure that a try...except block would work well here. Are you thinking of something like? It seems more complex and confusing this way, and it puts a lot of code between the try and except

try: if 'filename' in value['values'].keys(): # Don't do anything because csv files aren't validated pass else: # This should never happen. If vale['values'] is a dictionary with keys, 'filename' should be a key self._error(field, 'must include filename or list of values') except KeyError: # value['values'] is probably a list. # Code from earlier that checks the number of columns

Since this PR is related to time histories, I have a somewhat related question for you that I just stumbled on. Let's say somebody has a csv file with three columns - time, pressure, and volume. Right now, these must be implemented as two separate time-histories and the csv will be loaded twice, right? Is there interest in allowing the user to specify multiple time-histories using a single file?

Just to keep this focused on the code here, I moved the other discussion to the main comment thread. Anyhow, two things:

I want to validate csv files in the future, so we might as well set up for that here

The code could be

try: if 'filename' not in value['values'].keys(): self._error(field, 'must include filename or list of values') except TypeError: # Code from earlier that checks the number of columns

or

try: filename = value['values'].get('filename') if filename is None: self._error(field... except AttributeError: # Code from earlier

which isn't all that confusing to me. The reason (to me) to avoid the type function is that it doesn't always handle inheritance in a straightforward way, so we'd be relying on the underlying YAML library to always return something that's a subclass of list. On the other hand, with the try-except, we're using the duck-typing in Python to try something that we expect to be the case, and catch the resulting errors.

bryanwweber · 2018-06-26T12:06:42Z

Yes, that's exactly what I'm saying. I was using a script to read multiple yaml files in a complex directory structure within a database directory, and python was looking for the csv file in database, not in the folder with the yaml file.

OK, that's a case that we missed for sure. What if, rather than passing the directory name around, we turn the yaml_file into an instance of a Path object? Then we can find the path of the CSV file by doing yaml_file.stem/csv_filename. As of Python 3.6, the built-in open functions (which NumPy relies on) all accept instances of Path (see https://docs.python.org/3/library/functions.html#open), so we only really have to handle the Python 3.5 case, for which we can write a custom open function that just converts the filename to a string, something like

if sys.version < (3,6):
    oldopen = open
    def open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None):
        return oldopen(str(file), mode, buffering, encoding, errors, newline, closefd, opener)

I should note that this is just off the top of my head, and there may be a better way to handle this backwards dependency. Also, this doesn't help in the case of using a dictionary as the input. I'm not sure there's a good way to handle that, though.

Since this PR is related to time histories, I have a somewhat related question for you that I just stumbled on. Let's say somebody has a csv file with three columns - time, pressure, and volume. Right now, these must be implemented as two separate time-histories and the csv will be loaded twice, right? Is there interest in allowing the user to specify multiple time-histories using a single file?

The only problem I see with loading twice is that it might take some time to load the file from disk. I think it makes more sense to keep the specifications of the time histories separate, and try to cache the data in the csv file somehow, rather than loading it from disk twice.

bryanwweber · 2018-06-26T12:18:09Z

pyked/chemked.py

-                    values = np.genfromtxt(hist['values']['filename'], delimiter=',')
+                    filename = hist['values']['filename']
+                    if not isabs(filename):
+                        filename = join(directory, filename)


I think this will fail if the input is a dict and therefore directory is None

Changing the default directory to '' would fix that. But, if the input is a dict, then the filename must be specified as an absolute path, right? Since there's no yaml file, a path relative to a yaml file wouldn't make sense. So, line 693 won't be run anyway.

I don't think that change will fix the problem on this line, because the directory argument gets set to None if yaml_file is None, so this line will join None and filename, which won't work. Actually, I think that if people provide a dictionary input, the CSV file has to be specified relative to the PWD of the Python process, or as an absolute file name, but we don't need to check for that case.

Good catch. I just pushed another commit so that directory will never be set to None. Now, if yaml_file is None, then directory will be ''

…r forgiveness instead of permission when looking for csv file

jsantner · 2018-06-26T16:51:32Z

I've never used Path objects before, that's an interesting idea. How would the Path object be sent from ChemKED to DataPoint where it's used to read the csv file, though? It still must be passed as an argument, right? I can add that functionality in if you think it's better than passing the directory.

Using a dictionary input, I think the filename would have to be specified as an absolute path. If it were a relative path, what would it be relative to? There's no yaml file.

bryanwweber · 2018-06-27T15:36:00Z

What if we do the path munging in the loop that creates the DataPoints? Something like

for point in self._properties['datapoints']:
    if 'time-histories' in point:
        for th in point['time-histories']:
            try:
                filename = Path(th['values']['filename'])
            except TypeError:
                pass
            else:
                if yaml_file is not None:
                    th['values']['filename'] = (yaml_file.stem/filename).resolve()
                else:
                    th['values']['filename'] = filename.resolve()
     self.datapoints.append(DataPoint(point))

(please correct any indentation errors, writing code with proportional fonts is hard...) Then we don't have to pass anything around. This will resolve the path into an absolute path relative to the yaml file (if given) or relative to the CWD of the Python process, which should handle the dictionary and the yaml file cases gracefully.

BTW, one of the reasons I'm pushing back here is that I don't want to change how DataPoint is called, if at all possible.

Can you please make sure to add tests for all these code branches? The diff coverage should be 100%, you can see the lines that haven't been run here: https://codecov.io/gh/pr-omethe-us/PyKED/pull/115/diff where the lines that are in the brighter red in the left column weren't executed during testing.

jsantner · 2018-06-27T15:58:59Z

That's a smart way to do it, I'll add it in and test it.
I wasn't sure how to make a test for an error message (line 256 in validation.py), that's why the coverage isn't 100%. Can you point me to an example in the tests where an error message is tested?

bryanwweber · 2018-06-27T16:05:09Z

https://github.com/pr-omethe-us/PyKED/blob/master/pyked/tests/test_chemked.py#L59

By the way, you'll also have to turn the yaml_file into a Path when that gets processed in the __init__ method.

if yaml_file is not None:
    yaml_file = Path(yaml_file)
    with open...

Revert DataPoint to its original form, use Path objects to indicate location of csv file. Still need to add tests for 100% coverage.

…her.

kyleniemeyer

this is a test PR review

kyleniemeyer · 2019-12-19T18:18:06Z

CITATION.md

@@ -11,7 +11,7 @@ A BibTeX entry for LaTeX users is
 ```TeX
 @misc{PyKED,
    author = {Kyle E Niemeyer and Bryan W Weber},
-    year = 2017,
+    year = 2018,


It is now 2019

Suggested change

year = 2018,

year = 2020,

kyleniemeyer and others added 7 commits June 14, 2018 14:45

updated year in CITATION

e6bb395

Add test for reading time-history from file

aed0699

force travis test

c2e36f7

Don't test number of columns when reading time-history from file

b846996

Minor fix on dictionary structure

abe35c4

Make it work without filename

c35b681

When loading a csv file, tell DataPoint which directory to look in

5d851fd

jsantner force-pushed the csv-file-time-history branch from 0f5cd88 to 5d851fd Compare June 14, 2018 21:46

Fix case for reading from dictionary

5e31e4f

jsantner mentioned this pull request Jun 14, 2018

Uncertainty in time-histories #116

Open

bryanwweber requested changes Jun 23, 2018

View reviewed changes

bryanwweber reviewed Jun 26, 2018

View reviewed changes

Default directory is empty string to remove error possibility. Ask fo…

a7976a9

…r forgiveness instead of permission when looking for csv file

directory is empty string for dictionary input

52243bd

jsantner added 5 commits June 27, 2018 11:35

Requested changes.

26ccdd6

Revert DataPoint to its original form, use Path objects to indicate location of csv file. Still need to add tests for 100% coverage.

Fix PEP8 problems, still have to update tests

fc124a9

Add tests for csv file time history

d023bc0

Simpler way to test for missing filename, shouldn't create errors eit…

d91b4f1

…her.

Fixing minor mistake

af3145c

kyleniemeyer reviewed Dec 19, 2019

View reviewed changes

bryanwweber mentioned this pull request Jan 29, 2020

Proposal: Add shock tube species profiles as an experiment type #60

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

csv file time history #115

csv file time history #115

jsantner commented Jun 14, 2018 •

edited

Loading

codecov bot commented Jun 14, 2018 •

edited

Loading

bryanwweber commented Jun 14, 2018

jsantner commented Jun 14, 2018

bryanwweber left a comment

bryanwweber Jun 23, 2018

jsantner Jun 25, 2018

bryanwweber Jun 25, 2018

jsantner Jun 25, 2018

bryanwweber Jun 26, 2018

bryanwweber commented Jun 26, 2018 •

edited

Loading

bryanwweber Jun 26, 2018

jsantner Jun 26, 2018

bryanwweber Jun 27, 2018

jsantner Jun 27, 2018

jsantner commented Jun 26, 2018

bryanwweber commented Jun 27, 2018

jsantner commented Jun 27, 2018

bryanwweber commented Jun 27, 2018 •

edited

Loading

kyleniemeyer left a comment

kyleniemeyer Dec 19, 2019

kyleniemeyer Dec 19, 2019

csv file time history #115

Are you sure you want to change the base?

csv file time history #115

Conversation

jsantner commented Jun 14, 2018 • edited Loading

codecov bot commented Jun 14, 2018 • edited Loading

Codecov Report

bryanwweber commented Jun 14, 2018

jsantner commented Jun 14, 2018

bryanwweber left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bryanwweber commented Jun 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsantner commented Jun 26, 2018

bryanwweber commented Jun 27, 2018

jsantner commented Jun 27, 2018

bryanwweber commented Jun 27, 2018 • edited Loading

kyleniemeyer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jsantner commented Jun 14, 2018 •

edited

Loading

codecov bot commented Jun 14, 2018 •

edited

Loading

bryanwweber commented Jun 26, 2018 •

edited

Loading

bryanwweber commented Jun 27, 2018 •

edited

Loading