Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: schemas #66

Merged
merged 29 commits into from
Jan 3, 2018
Merged

WIP: schemas #66

merged 29 commits into from
Jan 3, 2018

Conversation

CJ-Wright
Copy link
Member

No description provided.

@CJ-Wright CJ-Wright requested a review from scopatz December 12, 2017 22:26
@CJ-Wright CJ-Wright self-assigned this Dec 13, 2017
@CJ-Wright
Copy link
Member Author

from regolith.validators import validate_schema
from regolith.schemas import schemas
from regolith.exemplars import exemplars
import pytest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 Third party imports go before package imports

python
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really a valid requirement?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took inspiration for this from conda forge recipes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is crazy if it is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually include python as a run requirement in conda recipes, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is true. I wonder about whether this affects pip, though maybe we don't care 🤷‍♂️

The sequential keys for the data being accessed, used for
printing/debuging errors
"""
if isinstance(record, dict):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check collections.abc.Mapping, not dict. Same goes for below.

raise ValueError('{} is required in {}'.format(k, keys))
elif k in record and k in schema:
validate_schema(record[k], schema[k], keys + (k, ))
elif isinstance(record, collections.Iterable) and not isinstance(record,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this really include set and generators? Or is this supposed to be collections.abc.Sequence?

for r in record:
validate_schema(r, schema, keys)
else:
if not isinstance(record, schema['type']):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an elif, rather than an if inside of an else.

@@ -0,0 +1,11 @@
from regolith.validators import validate_schema
from regolith.schemas import schemas
from regolith.exemplars import exemplars
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the exemplars are only used in the tests, then they should be in the test suite, not the package.

@@ -0,0 +1,400 @@
"""Database schemas"""
schemas = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 global constants should be in UPPERCASE

The database record to be tracked
schema : dict
The schema to validate the record against
keys : tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be listed as , optional


for k in total_keys:
if k not in schema:
pass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it an error if we get a key that we don't expect?

The schema to validate the record against
keys : tuple
The sequential keys for the data being accessed, used for
printing/debuging errors
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly speaking, I guess I'd prefer if this returned (bool, str message) rather than raising an error. That way the call site can decide whether or not to raise the error. Also, it allows the message to be appended to so that all of the ways that the record doesn't adhere to the schema can be known with one call to this function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is done with cerberus. (validator.errors)

@scopatz
Copy link
Collaborator

scopatz commented Dec 17, 2017

This is a really great first cut @CJ-Wright! Other than the parts listed inline, I think the thing that makes sense (so that info isn't duplicated) is to have the docs in conf.py generate the schema section of the collections pages from the schema itself. I think that this is needed so there aren't two, possibly diverging sources of truth.

@scopatz
Copy link
Collaborator

scopatz commented Dec 17, 2017

Also, cerebus is interesting, but it seems to not be on conda-forge.

@CJ-Wright
Copy link
Member Author

@scopatz
Copy link
Collaborator

scopatz commented Dec 17, 2017

Ahh, mistyped it!

@CJ-Wright
Copy link
Member Author

Sorry, I just pushed up a cerberus driven version

import os
import re

from cerberus import Validator
from getpass import getpass
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 cerberus is a 3rd party library and so should go below getpass, which is from the standard library and should be grouped with those above.

if isinstance(schema, dict):
schema.pop('description', None)
for v in schema.values():
pop_description(v)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need to do this. Instead subclass validtor and make it skip custom keys in the schema we want to add.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent some time trying to do that without much headway, but I will try again.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There has to be a way. Popping stuff out of a global, mutable object is wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case at least. The alternative is to have a second dict which has the same structure and only contains the descriptions, but that feels wrong too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw there is an open issue for description pyeve/cerberus#254

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from regolith.validators import validate_schema
from regolith.schemas import SCHEMAS
from regolith.exemplars import EXEMPLARS
import pytest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 import order

class MyValidator(Validator):
def _validate_description(self, description, field, value):
if False:
self._error(field, "Shouldn't be here")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this function just be a pass?

return v.validate(record), v.errors


EXEMPLARS = {'abstracts': {'_id': 'Mouginot.Model',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 Constants should be at the top of the file.

@@ -48,10 +49,14 @@ def dump_yaml(filename, docs, inst=None):
"""Dumps a dict of documents into a file."""
inst = YAML() if inst is None else inst
inst.indent(mapping=2, sequence=4, offset=2)
for doc in docs.values():
my_sorted_dict = ruamel.yaml.comments.CommentedMap()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the my_ from this variable.

auto/people
auto/projects
auto/proposals
auto/students
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really prefer these not go into an auto folder. It doesn't seems necessary.

grades
grants
auto/grades
auto/grants
jobs
news
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these not generated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't made schemas for them yet

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha OK - should that be a requirement for this PR? Or do you want to delay it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to delay it. I'm expecting a couple more takes at this as I start using more of the dbs so I might come around to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me!

'email': '[email protected]',
'university_id': 'HAP42'}}

SCHEMAS = {'abstracts': {'_description': {'description': 'Abstracts for a conference or workshop. This is generally public information\n\n'},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

many lines too long here, also probably shouldn't have \n\n at the end of them. That can be added in the doc generator

@@ -6,6 +6,8 @@

from flask import Flask, abort, request, render_template, redirect, url_for

from regolith.schemas import validate

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 two newlines required after import lines

regolith/app.py Outdated
n = os.path.join(td.name, 'regolith.txt')
print(errors)
print('Writing text file to {}. '
'Please try again.'.format(n))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these prints should go into the body of the error message below. At the very least, they should be printed to stderr....

regolith/app.py Outdated
print(errors)
print('Writing text file to {}. '
'Please try again.'.format(n))
with open(n) as f:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file isn't opened in write mode. Do you have any tests for the invalid case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any tests for the app.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made an issue #69

regolith/app.py Outdated
print('Writing text file to {}. '
'Please try again.'.format(n))
with open(n) as f:
f.write(form['body'])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

@@ -0,0 +1,8 @@
import pytest
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please move the tests files to a top-level tests/ dir, rather than embedding them in the project dir? Thanks!

@scopatz
Copy link
Collaborator

scopatz commented Jan 2, 2018

Thanks! Some more minor comments.

@CJ-Wright
Copy link
Member Author

@scopatz ready for another round!

@scopatz
Copy link
Collaborator

scopatz commented Jan 2, 2018

Looks good to go! Can you please provide a news entry?

news/schema.rst Outdated
@@ -0,0 +1,16 @@
**Added:**

* Schemas for the tables
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not particularly informative. Can you please expand on it. Also, aren't we calling them collections, not tables?

required fields are filled and the values are the same type(s) listed in the
schema. The schema also includes descriptions of the data to be included.
The exemplars are examples which have all the specified fields and are
used to check the validation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not valid rst, which requires the paragraph to be indented after the bullet point.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge this and fix it though.

@scopatz scopatz merged commit 5063ba7 into regro:master Jan 3, 2018
@CJ-Wright CJ-Wright deleted the schemas branch January 3, 2018 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants