Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv file time history #115

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CITATION.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ A BibTeX entry for LaTeX users is
```TeX
@misc{PyKED,
author = {Kyle E Niemeyer and Bryan W Weber},
year = 2017,
year = 2018,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now 2019

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
year = 2018,
year = 2020,

title = {PyKED v0.4.1},
doi = {10.5281/zenodo.597935},
url = {https://github.com/pr-omethe-us/PyKED},
Expand Down
14 changes: 10 additions & 4 deletions pyked/chemked.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Main ChemKED module
"""
# Standard libraries
from os.path import exists
from os.path import exists, isabs, dirname, join
from collections import namedtuple
from warnings import warn
from copy import deepcopy
Expand Down Expand Up @@ -111,8 +111,10 @@ def __init__(self, yaml_file=None, dict_input=None, *, skip_validation=False):
if yaml_file is not None:
with open(yaml_file, 'r') as f:
self._properties = yaml.safe_load(f)
directory = dirname(yaml_file)
elif dict_input is not None:
self._properties = dict_input
directory = None
else:
raise NameError("ChemKED needs either a YAML filename or dictionary as input.")

Expand All @@ -121,7 +123,7 @@ def __init__(self, yaml_file=None, dict_input=None, *, skip_validation=False):

self.datapoints = []
for point in self._properties['datapoints']:
self.datapoints.append(DataPoint(point))
self.datapoints.append(DataPoint(point, directory))

self.reference = Reference(
volume=self._properties['reference'].get('volume'),
Expand Down Expand Up @@ -589,6 +591,7 @@ class DataPoint(object):

Arguments:
properties (`dict`): Dictionary adhering to the ChemKED format for ``datapoints``
directory (`str`, optional): Directory to look for auxiliary files

Attributes:
composition (`list`): List of dictionaries representing the species and their quantities
Expand Down Expand Up @@ -631,7 +634,7 @@ class DataPoint(object):
'compression-ratio'
]

def __init__(self, properties):
def __init__(self, properties, directory=None):
for prop in self.value_unit_props:
if prop in properties:
quant = self.process_quantity(properties[prop])
Expand Down Expand Up @@ -685,7 +688,10 @@ def __init__(self, properties):
values = np.array(hist['values'])
else:
# Load the values from a file
values = np.genfromtxt(hist['values']['filename'], delimiter=',')
filename = hist['values']['filename']
if not isabs(filename):
filename = join(directory, filename)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will fail if the input is a dict and therefore directory is None

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the default directory to '' would fix that. But, if the input is a dict, then the filename must be specified as an absolute path, right? Since there's no yaml file, a path relative to a yaml file wouldn't make sense. So, line 693 won't be run anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that change will fix the problem on this line, because the directory argument gets set to None if yaml_file is None, so this line will join None and filename, which won't work. Actually, I think that if people provide a dictionary input, the CSV file has to be specified relative to the PWD of the Python process, or as an absolute file name, but we don't need to check for that case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I just pushed another commit so that directory will never be set to None. Now, if yaml_file is None, then directory will be ''

values = np.genfromtxt(filename, delimiter=',')

time_history = TimeHistory(
time=Q_(values[:, time_col], time_units),
Expand Down
2 changes: 1 addition & 1 deletion pyked/tests/test_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ def properties(self, request):

@pytest.mark.parametrize("properties", [
'testfile_st.yaml', 'testfile_st2.yaml', 'testfile_rcm.yaml', 'testfile_required.yaml',
'testfile_uncertainty.yaml', 'testfile_rcm2.yaml',
'testfile_uncertainty.yaml', 'testfile_rcm2.yaml', 'testfile_rcm3.yaml'
], indirect=['properties'])
def test_valid_yaml(self, properties):
"""Ensure ChemKED YAML is validated
Expand Down
66 changes: 66 additions & 0 deletions pyked/tests/testfile_rcm3.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
file-authors:
- name: Kyle E Niemeyer
ORCID: 0000-0003-4425-7097
file-version: 0
chemked-version: 0.0.1
reference:
doi: 10.1002/kin.20180
authors:
- name: Gaurav Mittal
- name: Chih-Jen Sung
ORCID: 0000-0003-2046-8076
- name: Richard A Yetter
journal: International Journal of Chemical Kinetics
year: 2006
volume: 38
pages: 516-529
detail: Fig. 6, open circle
experiment-type: ignition delay
apparatus:
kind: rapid compression machine
institution: Case Western Reserve University
facility: CWRU RCM
datapoints:
- temperature:
- 297.4 kelvin
ignition-delay:
- 1.0 ms
pressure:
- 958.0 torr
composition:
kind: mole fraction
species:
- species-name: H2
InChI: 1S/H2/h1H
amount:
- 0.12500
- species-name: O2
InChI: 1S/O2/c1-2
amount:
- 0.06250
- species-name: N2
InChI: 1S/N2/c1-2
amount:
- 0.18125
- species-name: Ar
InChI: 1S/Ar
amount:
- 0.63125
ignition-type:
target: pressure
type: d/dt max
rcm-data:
compression-time:
- 38.0 ms
time-histories:
- type: volume
time:
units: s
column: 0
quantity:
units: cm3
column: 1
values:
filename: rcm_history.csv
...
22 changes: 14 additions & 8 deletions pyked/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -252,14 +252,20 @@ def _validate_isvalid_history(self, isvalid_history, field, value):
'with ' + property_units['time'])

# Check that the values have the right number of columns
n_cols = len(value['values'][0])
max_cols = max(value['time']['column'],
value['quantity']['column'],
value.get('uncertainty', {}).get('column', 0)) + 1
if n_cols > max_cols:
self._error(field, 'too many columns in the values')
elif n_cols < max_cols:
self._error(field, 'not enough columns in the values')
# If reading from a file, the file will not be validated.
# A file can have an arbitrary number of columns, and the columns
# to be used are specified.
if type(value['values']) is list:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not 'filename' not in value['values'].keys()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we can assume the CSV file is specified relative to the yaml file, but DataPoint doesn't know where the yaml file directory is unless it's specified as an argument to __init__, right? Is there a simpler way to deal with a relative path?

I tried using 'filename' not in value['values'].keys() in a previous commit but this fails when the values are given as a list. In that case, value['values'] is a list, not a dict, so if 'filename' not in value['values'].keys() raises an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, just trying to open the file will try to open it relative to the working directory of the Python process. Are you saying that if I have a file structure like

|- database
|---butanol
|------file_1.yaml
|------file_1.csv

and I start Python in the database directory, and load file_1.yaml like

>>> ChemKED('butanol/file_1.yaml')

it won't work, because Python will assume the file_1.csv is relative to database, not butanol?

As on your other PR, I think the simpler/more "pythonic" way to do this is a try...except block, rather than checking the type of the value.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly what I'm saying. I was using a script to read multiple yaml files in a complex directory structure within a database directory, and python was looking for the csv file in database, not in the folder with the yaml file.

I'm not sure that a try...except block would work well here. Are you thinking of something like? It seems more complex and confusing this way, and it puts a lot of code between the try and except

try:
    if 'filename' in value['values'].keys():
        # Don't do anything because csv files aren't validated
        pass
    else:
        # This should never happen. If vale['values'] is a dictionary with keys, 'filename' should be a key
        self._error(field, 'must include filename or list of values')
except KeyError:
        # value['values'] is probably a list.
        # Code from earlier that checks the number of columns

Since this PR is related to time histories, I have a somewhat related question for you that I just stumbled on. Let's say somebody has a csv file with three columns - time, pressure, and volume. Right now, these must be implemented as two separate time-histories and the csv will be loaded twice, right? Is there interest in allowing the user to specify multiple time-histories using a single file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to keep this focused on the code here, I moved the other discussion to the main comment thread. Anyhow, two things:

  1. I want to validate csv files in the future, so we might as well set up for that here
  2. The code could be
try:
    if 'filename' not in value['values'].keys():
        self._error(field, 'must include filename or list of values')
except TypeError:
    # Code from earlier that checks the number of columns

or

try:
    filename = value['values'].get('filename')
    if filename is None:
        self._error(field...
except AttributeError:
    # Code from earlier

which isn't all that confusing to me. The reason (to me) to avoid the type function is that it doesn't always handle inheritance in a straightforward way, so we'd be relying on the underlying YAML library to always return something that's a subclass of list. On the other hand, with the try-except, we're using the duck-typing in Python to try something that we expect to be the case, and catch the resulting errors.

n_cols = len(value['values'][0])
max_cols = max(value['time']['column'],
value['quantity']['column'],
value.get('uncertainty', {}).get('column', 0)) + 1
if n_cols > max_cols:
self._error(field, 'too many columns in the values')
elif n_cols < max_cols:
self._error(field, 'not enough columns in the values')
elif 'filename' not in value['values'].keys():
self._error(field, 'must include filename or list of values')

def _validate_isvalid_quantity(self, isvalid_quantity, field, value):
"""Checks for valid given value and appropriate units.
Expand Down