Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate input file metadata against a schema #8

Open
aidanheerdegen opened this issue Apr 18, 2024 · 5 comments
Open

Validate input file metadata against a schema #8

aidanheerdegen opened this issue Apr 18, 2024 · 5 comments

Comments

@aidanheerdegen
Copy link
Member

The axiom tool would allow us to validate input data against a schema

https://axiom.readthedocs.io/en/latest/schemas/schemas.html

Clearly this is a fraught proposal, as many input files would fail, but that shouldn't be a reason not to do it, but would inform the process by which it is done.

Opinions/thoughts welcome

ping @bschroeter @dougiesquire @kdruken

@aidanheerdegen aidanheerdegen transferred this issue from ACCESS-NRI/access-om2-configs May 3, 2024
@bschroeter
Copy link

@aidanheerdegen, Axiom is capable of validating against an arbitrary schema. It is a lesser-known component of the DRS routine that is exposed through public methods.

I'm happy to help out with this, in the case that Axiom needs any adjustment for our needs - I am still the administrator of the project.

@dougiesquire
Copy link
Collaborator

I like the idea in principle, though we might end up with a very large number of schema as (depending on how prescriptive they are) they will be quite different for different inputs

@aidanheerdegen
Copy link
Member Author

I'm happy to help out with this, in the case that Axiom needs any adjustment for our needs - I am still the administrator of the project.

Thanks @bschroeter. I assume there aren't too many competing tools out there, otherwise you'd not have needed to write that functionality. Basically I don't want to use it just because someone in the org wrote it, but that is obviously a compelling reason.

I like the idea in principle, though we might end up with a very large number of schema as (depending on how prescriptive they are) they will be quite different for different inputs

Maybe. As you say it might depend on how prescriptive the schema are.

I think we could get a fair bit of value out of just quantifying what level of compliance we have. So have some minimal schema standards and work up from there.

@bschroeter
Copy link

Thanks @bschroeter. I assume there aren't too many competing tools out there, otherwise you'd not have needed to write that functionality. Basically I don't want to use it just because someone in the org wrote it, but that is obviously a compelling reason.

From memory, there were a few, but in order to get them to the level of flexibility that we needed for CCAM it was less dev effort to write something from scratch. Document validation (which is basically what this is) is a reasonably old problem - you could carbon-date me once I start talking about xmlschema etc!

Maybe. As you say it might depend on how prescriptive the schema are. I think we could get a fair bit of value out of just quantifying what level of compliance we have. So have some minimal schema standards and work up from there.

Spot on, Axiom can be as strict as you like. There is the option to specify very exacting standards or a minimum set and other things depending on the use case.

I'd start with looking at the commonalities of the things you want to validate - that will be the base ruleset, then we can work from there.

@aidanheerdegen
Copy link
Member Author

ARDC has a FAIR self assessment tool

https://ardc.edu.au/resource/fair-data-self-assessment-tool/

It has petty broad guidelines, but we could use that as a starting point to design some criteria and see what level we're achieving and how we can climb the rungs to improve. e.g. local identifier -> url -> DOI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants